1
|
Fair B, Buen Abad Najar CF, Zhao J, Lozano S, Reilly A, Mossian G, Staley JP, Wang J, Li YI. Global impact of unproductive splicing on human gene expression. Nat Genet 2024; 56:1851-1861. [PMID: 39223315 PMCID: PMC11387194 DOI: 10.1038/s41588-024-01872-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 07/16/2024] [Indexed: 09/04/2024]
Abstract
Alternative splicing (AS) in human genes is widely viewed as a mechanism for enhancing proteomic diversity. AS can also impact gene expression levels without increasing protein diversity by producing 'unproductive' transcripts that are targeted for rapid degradation by nonsense-mediated decay (NMD). However, the relative importance of this regulatory mechanism remains underexplored. To better understand the impact of AS-NMD relative to other regulatory mechanisms, we analyzed population-scale genomic data across eight molecular assays, covering various stages from transcription to cytoplasmic decay. We report threefold more unproductive splicing compared with prior estimates using steady-state RNA. This unproductive splicing compounds across multi-intronic genes, resulting in 15% of transcript molecules from protein-coding genes being unproductive. Leveraging genetic variation across cell lines, we find that GWAS trait-associated loci explained by AS are as often associated with NMD-induced expression level differences as with differences in protein isoform usage. Our findings suggest that much of the impact of AS is mediated by NMD-induced changes in gene expression rather than diversification of the proteome.
Collapse
Affiliation(s)
- Benjamin Fair
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | | | - Junxing Zhao
- Department of Medicinal Chemistry, University of Kansas, Lawrence, KS, USA
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Stephanie Lozano
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
- Center for Neuroscience, University of California Davis, Davis, CA, USA
| | - Austin Reilly
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Gabriela Mossian
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Jonathan P Staley
- Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, IL, USA
| | - Jingxin Wang
- Department of Medicinal Chemistry, University of Kansas, Lawrence, KS, USA
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL, USA.
- Department of Human Genetics, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
2
|
Bénitière F, Duret L, Necsulea A. GTDrift: a resource for exploring the interplay between genetic drift, genomic and transcriptomic characteristics in eukaryotes. NAR Genom Bioinform 2024; 6:lqae064. [PMID: 38867915 PMCID: PMC11167491 DOI: 10.1093/nargab/lqae064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 04/22/2024] [Accepted: 05/27/2024] [Indexed: 06/14/2024] Open
Abstract
We present GTDrift, a comprehensive data resource that enables explorations of genomic and transcriptomic characteristics alongside proxies of the intensity of genetic drift in individual species. This resource encompasses data for 1506 eukaryotic species, including 1413 animals and 93 green plants, and is organized in three components. The first two components contain approximations of the effective population size, which serve as indicators of the extent of random genetic drift within each species. In the first component, we meticulously investigated public databases to assemble data on life history traits such as longevity, adult body length and body mass for a set of 979 species. The second component includes estimations of the ratio between the rate of non-synonymous substitutions and the rate of synonymous substitutions (dN/dS) in protein-coding sequences for 1324 species. This ratio provides an estimate of the efficiency of natural selection in purging deleterious substitutions. Additionally, we present polymorphism-derived N e estimates for 66 species. The third component encompasses various genomic and transcriptomic characteristics. With this component, we aim to facilitate comparative transcriptomics analyses across species, by providing easy-to-use processed data for more than 16 000 RNA-seq samples across 491 species. These data include intron-centered alternative splicing frequencies, gene expression levels and sequencing depth statistics for each species, obtained with a homogeneous analysis protocol. To enable cross-species comparisons, we provide orthology predictions for conserved single-copy genes based on BUSCO gene sets. To illustrate the possible uses of this database, we identify the most frequently used introns for each gene and we assess how the sequencing depth available for each species affects our power to identify major and minor splice variants.
Collapse
Affiliation(s)
- Florian Bénitière
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, UMR CNRS 5558, Villeurbanne, France
- Laboratoire d’Écologie des Hydrosystèmes Naturels et Anthropisés, Université Lyon 1, UMR CNRS 5023, Villeurbanne, France
| | - Laurent Duret
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, UMR CNRS 5558, Villeurbanne, France
| | - Anamaria Necsulea
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, UMR CNRS 5558, Villeurbanne, France
| |
Collapse
|
3
|
Nanni A, Titus-McQuillan J, Bankole KS, Pardo-Palacios F, Signor S, Vlaho S, Moskalenko O, Morse A, Rogers RL, Conesa A, McIntyre LM. Nucleotide-level distance metrics to quantify alternative splicing implemented in TranD. Nucleic Acids Res 2024; 52:e28. [PMID: 38340337 PMCID: PMC10954468 DOI: 10.1093/nar/gkae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/29/2023] [Accepted: 01/18/2024] [Indexed: 02/12/2024] Open
Abstract
Advances in affordable transcriptome sequencing combined with better exon and gene prediction has motivated many to compare transcription across the tree of life. We develop a mathematical framework to calculate complexity and compare transcript models. Structural features, i.e. intron retention (IR), donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs, are compared and the distance between transcript models is calculated with nucleotide level precision. All metrics are implemented in a PyPi package, TranD and output can be used to summarize splicing patterns for a transcriptome (1GTF) and between transcriptomes (2GTF). TranD output enables quantitative comparisons between: annotations augmented by empirical RNA-seq data and the original transcript models; transcript model prediction tools for longread RNA-seq (e.g. FLAIR versus Isoseq3); alternate annotations for a species (e.g. RefSeq vs Ensembl); and between closely related species. In C. elegans, Z. mays, D. melanogaster, D. simulans and H. sapiens, alternative exons were observed more frequently in combination with an alternative donor/acceptor than alone. Transcript models in RefSeq and Ensembl are linked and both have unique transcript models with empirical support. D. melanogaster and D. simulans, share many transcript models and long-read RNAseq data suggests that both species are under-annotated. We recommend combined references.
Collapse
Affiliation(s)
- Adalena Nanni
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - James Titus-McQuillan
- University of North Carolina at Charlotte Department of Bioinformatics and Genomics Charlotte, NC, USA
| | - Kinfeosioluwa S Bankole
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | | | - Sarah Signor
- Department of Biological Sciences, North Dakota State University, Fargo, ND, USA
| | - Srna Vlaho
- Department of Biological Sciences, University of Southern California, Los Angeles, CA, USA
| | - Oleksandr Moskalenko
- University of Florida Research Computing, University of Florida, Gainesville, FL 32611, USA
| | - Alison M Morse
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| | - Rebekah L Rogers
- University of North Carolina at Charlotte Department of Bioinformatics and Genomics Charlotte, NC, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology. Spanish National Research Council, Paterna, Spain
| | - Lauren M McIntyre
- Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA
- University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA
| |
Collapse
|
4
|
Bénitière F, Necsulea A, Duret L. Random genetic drift sets an upper limit on mRNA splicing accuracy in metazoans. eLife 2024; 13:RP93629. [PMID: 38470242 DOI: 10.7554/elife.93629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/13/2024] Open
Abstract
Most eukaryotic genes undergo alternative splicing (AS), but the overall functional significance of this process remains a controversial issue. It has been noticed that the complexity of organisms (assayed by the number of distinct cell types) correlates positively with their genome-wide AS rate. This has been interpreted as evidence that AS plays an important role in adaptive evolution by increasing the functional repertoires of genomes. However, this observation also fits with a totally opposite interpretation: given that 'complex' organisms tend to have small effective population sizes (Ne), they are expected to be more affected by genetic drift, and hence more prone to accumulate deleterious mutations that decrease splicing accuracy. Thus, according to this 'drift barrier' theory, the elevated AS rate in complex organisms might simply result from a higher splicing error rate. To test this hypothesis, we analyzed 3496 transcriptome sequencing samples to quantify AS in 53 metazoan species spanning a wide range of Ne values. Our results show a negative correlation between Ne proxies and the genome-wide AS rates among species, consistent with the drift barrier hypothesis. This pattern is dominated by low abundance isoforms, which represent the vast majority of the splice variant repertoire. We show that these low abundance isoforms are depleted in functional AS events, and most likely correspond to errors. Conversely, the AS rate of abundant isoforms, which are relatively enriched in functional AS events, tends to be lower in more complex species. All these observations are consistent with the hypothesis that variation in AS rates across metazoans reflects the limits set by drift on the capacity of selection to prevent gene expression errors.
Collapse
Affiliation(s)
- Florian Bénitière
- Laboratoire de Biometrie et Biologie Evolutive, CNRS, Universite Lyon 1, Villeurbanne, France
| | - Anamaria Necsulea
- Laboratoire de Biometrie et Biologie Evolutive, CNRS, Universite Lyon 1, Villeurbanne, France
| | - Laurent Duret
- Laboratoire de Biometrie et Biologie Evolutive, CNRS, Universite Lyon 1, Villeurbanne, France
| |
Collapse
|
5
|
Fair B, Najar CBA, Zhao J, Lozano S, Reilly A, Mossian G, Staley JP, Wang J, Li YI. Global impact of aberrant splicing on human gene expression levels. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.13.557588. [PMID: 37745605 PMCID: PMC10515962 DOI: 10.1101/2023.09.13.557588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Alternative splicing (AS) is pervasive in human genes, yet the specific function of most AS events remains unknown. It is widely assumed that the primary function of AS is to diversify the proteome, however AS can also influence gene expression levels by producing transcripts rapidly degraded by nonsense-mediated decay (NMD). Currently, there are no precise estimates for how often the coupling of AS and NMD (AS-NMD) impacts gene expression levels because rapidly degraded NMD transcripts are challenging to capture. To better understand the impact of AS on gene expression levels, we analyzed population-scale genomic data in lymphoblastoid cell lines across eight molecular assays that capture gene regulation before, during, and after transcription and cytoplasmic decay. Sequencing nascent mRNA transcripts revealed frequent aberrant splicing of human introns, which results in remarkably high levels of mRNA transcripts subject to NMD. We estimate that ~15% of all protein-coding transcripts are degraded by NMD, and this estimate increases to nearly half of all transcripts for lowly-expressed genes with many introns. Leveraging genetic variation across cell lines, we find that GWAS trait-associated loci explained by AS are similarly likely to associate with NMD-induced expression level differences as with differences in protein isoform usage. Additionally, we used the splice-switching drug risdiplam to perturb AS at hundreds of genes, finding that ~3/4 of the splicing perturbations induce NMD. Thus, we conclude that AS-NMD substantially impacts the expression levels of most human genes. Our work further suggests that much of the molecular impact of AS is mediated by changes in protein expression levels rather than diversification of the proteome.
Collapse
Affiliation(s)
- Benjamin Fair
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Carlos Buen Abad Najar
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Junxing Zhao
- Department of Medicinal Chemistry, University of Kansas, Lawrence, KS 66047, USA
| | - Stephanie Lozano
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
- Present address: Center for Neuroscience, University of California Davis, Davis, CA 95618, USA
| | - Austin Reilly
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Gabriela Mossian
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Jonathan P Staley
- Department of Molecular Genetics and Cell Biology, University of Chicago, Chicago, IL 60637, USA
| | - Jingxin Wang
- Department of Medicinal Chemistry, University of Kansas, Lawrence, KS 66047, USA
| | - Yang I Li
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
6
|
Eléouët M, Lu C, Zhou Y, Yang P, Ma J, Xu G. Insights on the biological functions and diverse regulation of RNA-binding protein 39 and their implication in human diseases. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2023; 1866:194902. [PMID: 36535628 DOI: 10.1016/j.bbagrm.2022.194902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/24/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022]
Abstract
RNA-binding protein 39 (RBM39) involves in pre-mRNA splicing and transcriptional regulation. RBM39 is dysregulated in many cancers and its upregulation enhances cancer cell proliferation. Recently, it has been discovered that aryl sulfonamides act as molecular glues to recruit RBM39 to the CRL4DCAF15 E3 ubiquitin ligase complex for its ubiquitination and proteasomal degradation. Therefore, various studies have focused on the degradation of RBM39 by aryl sulfonamides in the aim of finding new cancer therapeutics. These discoveries also attracted focus for thorough study on the biological functions of RBM39. RBM39 was found to regulate the splicing and transcription of genes mainly involved in pre-mRNA splicing, cell cycle regulation, DNA damage response, and metabolism, but the understanding of these regulations is still in its infancy. This article reviews the advances of the current literature and discusses the remaining key issues on the biological function and dynamic regulation of RBM39 at the post-translational level.
Collapse
Affiliation(s)
- Morgane Eléouët
- Jiangsu Key Laboratory of Neuropsychiatric Diseases and College of Pharmaceutical Sciences, Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Key Laboratory of Drug Research for Prevention and Treatment of Hyperlipidemic Diseases, Soochow University, 199 Ren'ai Road, Suzhou, Jiangsu 215123, China; Synbio Technologies Company, BioBay C20, 218 Xinghu Street, Suzhou, Jiangsu 215123, China
| | - Chengpiao Lu
- Jiangsu Key Laboratory of Neuropsychiatric Diseases and College of Pharmaceutical Sciences, Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Key Laboratory of Drug Research for Prevention and Treatment of Hyperlipidemic Diseases, Soochow University, 199 Ren'ai Road, Suzhou, Jiangsu 215123, China
| | - Yijia Zhou
- Jiangsu Key Laboratory of Neuropsychiatric Diseases and College of Pharmaceutical Sciences, Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Key Laboratory of Drug Research for Prevention and Treatment of Hyperlipidemic Diseases, Soochow University, 199 Ren'ai Road, Suzhou, Jiangsu 215123, China
| | - Ping Yang
- Synbio Technologies Company, BioBay C20, 218 Xinghu Street, Suzhou, Jiangsu 215123, China
| | - Jingjing Ma
- Department of Pharmacy, Medical Center of Soochow University, Dushu Lake Hospital Affiliated to Soochow University, Suzhou, Jiangsu 215123, China.
| | - Guoqiang Xu
- Jiangsu Key Laboratory of Neuropsychiatric Diseases and College of Pharmaceutical Sciences, Jiangsu Province Engineering Research Center of Precision Diagnostics and Therapeutics Development, Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases, Suzhou Key Laboratory of Drug Research for Prevention and Treatment of Hyperlipidemic Diseases, Soochow University, 199 Ren'ai Road, Suzhou, Jiangsu 215123, China.
| |
Collapse
|
7
|
Kwiatkowski M, Hotze M, Schumacher J, Asif AR, Pittol JMR, Brenig B, Ramljak S, Zischler H, Herlyn H. Protein speciation is likely to increase the chance of proteins to be determined in 2‐DE/MS. Electrophoresis 2022; 43:1203-1214. [DOI: 10.1002/elps.202000393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 11/30/2021] [Accepted: 02/02/2022] [Indexed: 11/08/2022]
Affiliation(s)
- Marcel Kwiatkowski
- Department of Biochemistry and Center for Molecular Biosciences Innsbruck University of Innsbruck Innsbruck Austria
| | - Madlen Hotze
- Department of Biochemistry and Center for Molecular Biosciences Innsbruck University of Innsbruck Innsbruck Austria
| | | | - Abdul R. Asif
- Department of Clinical Chemistry/UMG‐Laboratories University Medical Center Göttingen Germany
| | - Jose Miguel Ramos Pittol
- Department of Biochemistry and Center for Molecular Biosciences Innsbruck University of Innsbruck Innsbruck Austria
| | - Bertram Brenig
- Department of Molecular Biology of Livestock Institute of Veterinary Medicine University of Göttingen Göttingen Germany
| | | | - Hans Zischler
- Institute of Organismic and Molecular Evolution, Anthropology University of Mainz Mainz Germany
| | - Holger Herlyn
- Institute of Organismic and Molecular Evolution, Anthropology University of Mainz Mainz Germany
| |
Collapse
|
8
|
Titus MB, Chang AW, Olesnicky EC. Exploring the Diverse Functional and Regulatory Consequences of Alternative Splicing in Development and Disease. Front Genet 2021; 12:775395. [PMID: 34899861 PMCID: PMC8652244 DOI: 10.3389/fgene.2021.775395] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 11/05/2021] [Indexed: 12/17/2022] Open
Abstract
Alternative splicing is a fundamental mechanism of eukaryotic RNA regulation that increases the transcriptomic and proteomic complexity within an organism. Moreover, alternative splicing provides a framework for generating unique yet complex tissue- and cell type-specific gene expression profiles, despite using a limited number of genes. Recent efforts to understand the negative consequences of aberrant splicing have increased our understanding of developmental and neurodegenerative diseases such as spinal muscular atrophy, frontotemporal dementia and Parkinsonism linked to chromosome 17, myotonic dystrophy, and amyotrophic lateral sclerosis. Moreover, these studies have led to the development of innovative therapeutic treatments for diseases caused by aberrant splicing, also known as spliceopathies. Despite this, a paucity of information exists on the physiological roles and specific functions of distinct transcript spliceforms for a given gene. Here, we will highlight work that has specifically explored the distinct functions of protein-coding spliceforms during development. Moreover, we will discuss the use of alternative splicing of noncoding exons to regulate the stability and localization of RNA transcripts.
Collapse
Affiliation(s)
- M Brandon Titus
- University of Colorado Colorado Springs, Colorado Springs, CO, United States
| | - Adeline W Chang
- University of Colorado Colorado Springs, Colorado Springs, CO, United States
| | - Eugenia C Olesnicky
- University of Colorado Colorado Springs, Colorado Springs, CO, United States
| |
Collapse
|
9
|
Karousis ED, Gypas F, Zavolan M, Mühlemann O. Nanopore sequencing reveals endogenous NMD-targeted isoforms in human cells. Genome Biol 2021; 22:223. [PMID: 34389041 PMCID: PMC8361881 DOI: 10.1186/s13059-021-02439-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 07/26/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Nonsense-mediated mRNA decay (NMD) is a eukaryotic, translation-dependent degradation pathway that targets mRNAs with premature termination codons and also regulates the expression of some mRNAs that encode full-length proteins. Although many genes express NMD-sensitive transcripts, identifying them based on short-read sequencing data remains a challenge. RESULTS To identify and analyze endogenous targets of NMD, we apply cDNA Nanopore sequencing and short-read sequencing to human cells with varying expression levels of NMD factors. Our approach detects full-length NMD substrates that are highly unstable and increase in levels or even only appear when NMD is inhibited. Among the many new NMD-targeted isoforms that our analysis identifies, most derive from alternative exon usage. The isoform-aware analysis reveals many genes with significant changes in splicing but no significant changes in overall expression levels upon NMD knockdown. NMD-sensitive mRNAs have more exons in the 3΄UTR and, for those mRNAs with a termination codon in the last exon, the length of the 3΄UTR per se does not correlate with NMD sensitivity. Analysis of splicing signals reveals isoforms where NMD has been co-opted in the regulation of gene expression, though the main function of NMD seems to be ridding the transcriptome of isoforms resulting from spurious splicing events. CONCLUSIONS Long-read sequencing enables the identification of many novel NMD-sensitive mRNAs and reveals both known and unexpected features concerning their biogenesis and their biological role. Our data provide a highly valuable resource of human NMD transcript targets for future genomic and transcriptomic applications.
Collapse
Affiliation(s)
- Evangelos D Karousis
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Foivos Gypas
- Friedrich Miescher Institute for Biomedical Research, Maulbeerstrasse 66, 4058, Basel, Switzerland
| | - Mihaela Zavolan
- Biozentrum, University of Basel and Swiss Institute of Bioinformatics, Klingelbergstrasse 50-70, 4056, Basel, Switzerland
| | - Oliver Mühlemann
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
10
|
Halstead MM, Islas-Trejo A, Goszczynski DE, Medrano JF, Zhou H, Ross PJ. Large-Scale Multiplexing Permits Full-Length Transcriptome Annotation of 32 Bovine Tissues From a Single Nanopore Flow Cell. Front Genet 2021; 12:664260. [PMID: 34093657 PMCID: PMC8173071 DOI: 10.3389/fgene.2021.664260] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 04/06/2021] [Indexed: 12/18/2022] Open
Abstract
A comprehensive annotation of transcript isoforms in domesticated species is lacking. Especially considering that transcriptome complexity and splicing patterns are not well-conserved between species, this presents a substantial obstacle to genomic selection programs that seek to improve production, disease resistance, and reproduction. Recent advances in long-read sequencing technology have made it possible to directly extrapolate the structure of full-length transcripts without the need for transcript reconstruction. In this study, we demonstrate the power of long-read sequencing for transcriptome annotation by coupling Oxford Nanopore Technology (ONT) with large-scale multiplexing of 93 samples, comprising 32 tissues collected from adult male and female Hereford cattle. More than 30 million uniquely mapping full-length reads were obtained from a single ONT flow cell, and used to identify and characterize the expression dynamics of 99,044 transcript isoforms at 31,824 loci. Of these predicted transcripts, 21% exactly matched a reference transcript, and 61% were novel isoforms of reference genes, substantially increasing the ratio of transcript variants per gene, and suggesting that the complexity of the bovine transcriptome is comparable to that in humans. Over 7,000 transcript isoforms were extremely tissue-specific, and 61% of these were attributed to testis, which exhibited the most complex transcriptome of all interrogated tissues. Despite profiling over 30 tissues, transcription was only detected at about 60% of reference loci. Consequently, additional studies will be necessary to continue characterizing the bovine transcriptome in additional cell types, developmental stages, and physiological conditions. However, by here demonstrating the power of ONT sequencing coupled with large-scale multiplexing, the task of exhaustively annotating the bovine transcriptome - or any mammalian transcriptome - appears significantly more feasible.
Collapse
Affiliation(s)
| | | | | | | | | | - Pablo J. Ross
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| |
Collapse
|
11
|
Ait-Hamlat A, Zea DJ, Labeeuw A, Polit L, Richard H, Laine E. Transcripts' Evolutionary History and Structural Dynamics Give Mechanistic Insights into the Functional Diversity of the JNK Family. J Mol Biol 2020; 432:2121-2140. [PMID: 32067951 DOI: 10.1016/j.jmb.2020.01.032] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2019] [Revised: 01/03/2020] [Accepted: 01/28/2020] [Indexed: 12/14/2022]
Abstract
Alternative splicing and alternative initiation/termination transcription sites have the potential to greatly expand the proteome in eukaryotes by producing several transcript isoforms from the same gene. Although these mechanisms are well described at the genomic level, little is known about their contribution to protein evolution and their impact at the protein structure level. Here, we address both issues by reconstructing the evolutionary history of transcripts and by modeling the tertiary structures of the corresponding protein isoforms. We reconstruct phylogenetic forests relating 60 protein-coding transcripts from the c-Jun N-terminal kinase (JNK) family observed in seven species. We identify two alternative splicing events of ancient origin and show that they induce subtle changes in the protein's structural dynamics. We highlight a previously uncharacterized transcript whose predicted structure seems stable in solution. We further demonstrate that orphan transcripts, for which no phylogeny could be reconstructed, display peculiar sequence and structural properties. Our approach is implemented in PhyloSofS (Phylogenies of Splicing Isoforms Structures), a fully automated computational tool freely available at https://github.com/PhyloSofS-Team/PhyloSofS.
Collapse
Affiliation(s)
- Adel Ait-Hamlat
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, 75005, France
| | - Diego Javier Zea
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, 75005, France
| | - Antoine Labeeuw
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, 75005, France
| | - Lélia Polit
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, 75005, France
| | - Hugues Richard
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, 75005, France.
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, 75005, France.
| |
Collapse
|
12
|
Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q. Opportunities and challenges in long-read sequencing data analysis. Genome Biol 2020; 21:30. [PMID: 32033565 PMCID: PMC7006217 DOI: 10.1186/s13059-020-1935-5] [Citation(s) in RCA: 781] [Impact Index Per Article: 195.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 01/15/2020] [Indexed: 12/11/2022] Open
Abstract
Long-read technologies are overcoming early limitations in accuracy and throughput, broadening their application domains in genomics. Dedicated analysis tools that take into account the characteristics of long-read data are thus required, but the fast pace of development of such tools can be overwhelming. To assist in the design and analysis of long-read sequencing projects, we review the current landscape of available tools and present an online interactive database, long-read-tools.org, to facilitate their browsing. We further focus on the principles of error correction, base modification detection, and long-read transcriptomics analysis and highlight the challenges that remain.
Collapse
Affiliation(s)
- Shanika L. Amarasinghe
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Shian Su
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Xueyi Dong
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| | - Luke Zappia
- Bioinformatics, Murdoch Children’s Research Institute, Parkville, 3052 Australia
- School of Biosciences, Faculty of Science, The University of Melbourne, Parkville, 3010 Australia
| | - Matthew E. Ritchie
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
- School of Mathematics and StatisticsThe University of Melbourne, Parkville, 3010 Australia
| | - Quentin Gouil
- Epigenetics and Development Division, The Walter and Eliza Hall Institute of Medical Research, Parkville, 3052 Australia
- Department of Medical Biology, The University of Melbourne, Parkville, 3010 Australia
| |
Collapse
|
13
|
Li R, Ren X, Ding Q, Bi Y, Xie D, Zhao Z. Direct full-length RNA sequencing reveals unexpected transcriptome complexity during Caenorhabditis elegans development. Genome Res 2020; 30:287-298. [PMID: 32024662 PMCID: PMC7050527 DOI: 10.1101/gr.251512.119] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 12/18/2019] [Indexed: 01/08/2023]
Abstract
Massively parallel sequencing of the polyadenylated RNAs has played a key role in delineating transcriptome complexity, including alternative use of an exon, promoter, 5′ or 3′ splice site or polyadenylation site, and RNA modification. However, reads derived from the current RNA-seq technologies are usually short and deprived of information on modification, compromising their potential in defining transcriptome complexity. Here, we applied a direct RNA sequencing method with ultralong reads using Oxford Nanopore Technologies to study the transcriptome complexity in Caenorhabditis elegans. We generated approximately six million reads using native poly(A)-tailed mRNAs from three developmental stages, with average read lengths ranging from 900 to 1100 nt. Around half of the reads represent full-length transcripts. To utilize the full-length transcripts in defining transcriptome complexity, we devised a method to classify the long reads as the same as existing transcripts or as a novel transcript using sequence mapping tracks rather than existing intron/exon structures, which allowed us to identify roughly 57,000 novel isoforms and recover at least 26,000 out of the 33,500 existing isoforms. The sets of genes with differential expression versus differential isoform usage over development are largely different, implying a fine-tuned regulation at isoform level. We also observed an unexpected increase in putative RNA modification in all bases in the coding region relative to the UTR, suggesting their possible roles in translation. The RNA reads and the method for read classification are expected to deliver new insights into RNA processing and modification and their underlying biology in the future.
Collapse
Affiliation(s)
- Runsheng Li
- Department of Biology, Hong Kong Baptist University, Hong Kong, 999077, China
| | - Xiaoliang Ren
- Department of Biology, Hong Kong Baptist University, Hong Kong, 999077, China
| | - Qiutao Ding
- Department of Biology, Hong Kong Baptist University, Hong Kong, 999077, China
| | - Yu Bi
- Department of Biology, Hong Kong Baptist University, Hong Kong, 999077, China
| | - Dongying Xie
- Department of Biology, Hong Kong Baptist University, Hong Kong, 999077, China
| | - Zhongying Zhao
- Department of Biology, Hong Kong Baptist University, Hong Kong, 999077, China.,State Key Laboratory of Environmental and Biological Analysis, Hong Kong Baptist University, Hong Kong, 999077, China
| |
Collapse
|
14
|
Re-annotation of 191 developmental and epileptic encephalopathy-associated genes unmasks de novo variants in SCN1A. NPJ Genom Med 2019; 4:31. [PMID: 31814998 PMCID: PMC6889285 DOI: 10.1038/s41525-019-0106-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 11/01/2019] [Indexed: 12/21/2022] Open
Abstract
The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60-65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional 'footprint' of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders.
Collapse
|
15
|
Benjelloun B, Boyer F, Streeter I, Zamani W, Engelen S, Alberti A, Alberto FJ, BenBati M, Ibnelbachyr M, Chentouf M, Bechchari A, Rezaei HR, Naderi S, Stella A, Chikhi A, Clarke L, Kijas J, Flicek P, Taberlet P, Pompanon F. An evaluation of sequencing coverage and genotyping strategies to assess neutral and adaptive diversity. Mol Ecol Resour 2019; 19:1497-1515. [PMID: 31359622 PMCID: PMC7115901 DOI: 10.1111/1755-0998.13070] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Revised: 06/30/2019] [Accepted: 07/08/2019] [Indexed: 12/12/2022]
Abstract
Whole genome sequences (WGS) greatly increase our ability to precisely infer population genetic parameters, demographic processes, and selection signatures. However, WGS may still be not affordable for a representative number of individuals/populations. In this context, our goal was to assess the efficiency of several SNP genotyping strategies by testing their ability to accurately estimate parameters describing neutral diversity and to detect signatures of selection. We analysed 110 WGS at 12× coverage for four different species, i.e., sheep, goats and their wild counterparts. From these data we generated 946 data sets corresponding to random panels of 1K to 5M variants, commercial SNP chips and exome capture, for sample sizes of five to 48 individuals. We also extracted low-coverage genome resequencing of 1×, 2× and 5× by randomly subsampling reads from the 12× resequencing data. Globally, 5K to 10K random variants were enough for an accurate estimation of genome diversity. Conversely, commercial panels and exome capture displayed strong ascertainment biases. Besides the characterization of neutral diversity, the detection of the signature of selection and the accurate estimation of linkage disequilibrium (LD) required high-density panels of at least 1M variants. Finally, genotype likelihoods increased the quality of variant calling from low coverage resequencing but proportions of incorrect genotypes remained substantial, especially for heterozygote sites. Whole genome resequencing coverage of at least 5× appeared to be necessary for accurate assessment of genomic variations. These results have implications for studies seeking to deploy low-density SNP collections or genome scans across genetically diverse populations/species showing similar genetic characteristics and patterns of LD decay for a wide variety of purposes.
Collapse
Affiliation(s)
- Badr Benjelloun
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
- National Institute of Agronomic Research (INRA Maroc), Regional Centre of Agronomic Research, 23000 Beni-Mellal, Morocco
| | - Frédéric Boyer
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - Ian Streeter
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Wahid Zamani
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
- Department of Environmental Sciences, Faculty of Natural Resources and Marine Sciences, Tarbiat Modares University, 46417-76489 Noor, Mazandaran, Iran
| | - Stefan Engelen
- CEA - Institut de biologie François-Jacob, Genoscope, 2 Rue Gaston Cremieux 91057 Evry Cedex, France
| | - Adriana Alberti
- CEA - Institut de biologie François-Jacob, Genoscope, 2 Rue Gaston Cremieux 91057 Evry Cedex, France
| | - Florian J. Alberto
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - Mohamed BenBati
- National Institute of Agronomic Research (INRA Maroc), Regional Centre of Agronomic Research, 23000 Beni-Mellal, Morocco
| | - Mustapha Ibnelbachyr
- National Institute of Agronomic Research (INRA Maroc), CRRA Errachidia, 52000 Errachidia, Morocco
| | - Mouad Chentouf
- National Institute of Agronomic Research (INRA Maroc), CRRA Tangier, 90010 Tangier, Morocco
| | - Abdelmajid Bechchari
- National Institute of Agronomic Research (INRA Maroc), CRRA Oujda, 60000 Oujda, Morocco
| | - Hamid R. Rezaei
- Department of Environmental Sci, Gorgan University of Agricultural Sciences & Natural Resources, 41996-13776 Gorgan, Iran
| | - Saeid Naderi
- Environmental Sciences Department, Natural Resources Faculty, University of Guilan, 49138-15749 Guilan, Iran
| | - Alessandra Stella
- PTP Science Park, Bioinformatics Unit, Via Einstein-Loc. Cascina Codazza, 26900 Lodi, Italy
| | - Abdelkader Chikhi
- National Institute of Agronomic Research (INRA Maroc), CRRA Errachidia, 52000 Errachidia, Morocco
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - James Kijas
- Commonwealth Scientific and Industrial Research Organisation Animal Food and Health Sciences, St Lucia, QLD 4067, Australia
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD UK
| | - Pierre Taberlet
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| | - François Pompanon
- Univ. Grenoble-Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, F-38000 Grenoble, France
| |
Collapse
|
16
|
Pervouchine D, Popov Y, Berry A, Borsari B, Frankish A, Guigó R. Integrative transcriptomic analysis suggests new autoregulatory splicing events coupled with nonsense-mediated mRNA decay. Nucleic Acids Res 2019; 47:5293-5306. [PMID: 30916337 PMCID: PMC6547761 DOI: 10.1093/nar/gkz193] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Accepted: 03/12/2019] [Indexed: 11/12/2022] Open
Abstract
Nonsense-mediated decay (NMD) is a eukaryotic mRNA surveillance system that selectively degrades transcripts with premature termination codons (PTC). Many RNA-binding proteins (RBP) regulate their expression levels by a negative feedback loop, in which RBP binds its own pre-mRNA and causes alternative splicing to introduce a PTC. We present a bioinformatic analysis integrating three data sources, eCLIP assays for a large RBP panel, shRNA inactivation of NMD pathway, and shRNA-depletion of RBPs followed by RNA-seq, to identify novel such autoregulatory feedback loops. We show that RBPs frequently bind their own pre-mRNAs, their exons respond prominently to NMD pathway disruption, and that the responding exons are enriched with nearby eCLIP peaks. We confirm previously proposed models of autoregulation in SRSF7 and U2AF1 genes and present two novel models, in which (i) SFPQ binds its mRNA and promotes switching to an alternative distal 3'-UTR that is targeted by NMD, and (ii) RPS3 binding activates a poison 5'-splice site in its pre-mRNA that leads to a frame shift and degradation by NMD. We also suggest specific splicing events that could be implicated in autoregulatory feedback loops in RBM39, HNRNPM, and U2AF2 genes. The results are available through a UCSC Genome Browser track hub.
Collapse
Affiliation(s)
- Dmitri Pervouchine
- Skolkovo Institute of Science and Technology, Ulitsa Nobelya 3, Moscow 121205, Russia
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskiye Gory 1-73, 119234 Moscow, Russia
| | - Yaroslav Popov
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Leninskiye Gory 1-73, 119234 Moscow, Russia
| | - Andy Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SA Hinxton, Cambridge, UK
| | - Beatrice Borsari
- Center for Genomic Regulation, The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, CB10 1SA Hinxton, Cambridge, UK
| | - Roderic Guigó
- Center for Genomic Regulation, The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
- Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain
| |
Collapse
|
17
|
Lopez-Perolio I, Leman R, Behar R, Lattimore V, Pearson JF, Castéra L, Martins A, Vaur D, Goardon N, Davy G, Garre P, García-Barberán V, Llovet P, Pérez-Segura P, Díaz-Rubio E, Caldés T, Hruska KS, Hsuan V, Wu S, Pesaran T, Karam R, Vallon-Christersson J, Borg A, Valenzuela-Palomo A, Velasco EA, Southey M, Vreeswijk MPG, Devilee P, Kvist A, Spurdle AB, Walker LC, Krieger S, de la Hoya M. Alternative splicing and ACMG-AMP-2015-based classification of PALB2 genetic variants: an ENIGMA report. J Med Genet 2019; 56:453-460. [PMID: 30890586 PMCID: PMC6591742 DOI: 10.1136/jmedgenet-2018-105834] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 01/10/2019] [Accepted: 02/06/2019] [Indexed: 11/04/2022]
Abstract
BACKGROUND PALB2 monoallelic loss-of-function germ-line variants confer a breast cancer risk comparable to the average BRCA2 pathogenic variant. Recommendations for risk reduction strategies in carriers are similar. Elaborating robust criteria to identify loss-of-function variants in PALB2-without incurring overprediction-is thus of paramount clinical relevance. Towards this aim, we have performed a comprehensive characterisation of alternative splicing in PALB2, analysing its relevance for the classification of truncating and splice site variants according to the 2015 American College of Medical Genetics and Genomics-Association for Molecular Pathology guidelines. METHODS Alternative splicing was characterised in RNAs extracted from blood, breast and fimbriae/ovary-related human specimens (n=112). RNAseq, RT-PCR/CE and CloneSeq experiments were performed by five contributing laboratories. Centralised revision/curation was performed to assure high-quality annotations. Additional splicing analyses were performed in PALB2 c.212-1G>A, c.1684+1G>A, c.2748+2T>G, c.3113+5G>A, c.3350+1G>A, c.3350+4A>C and c.3350+5G>A carriers. The impact of the findings on PVS1 status was evaluated for truncating and splice site variant. RESULTS We identified 88 naturally occurring alternative splicing events (81 newly described), including 4 in-frame events predicted relevant to evaluate PVS1 status of splice site variants. We did not identify tissue-specific alternate gene transcripts in breast or ovarian-related samples, supporting the clinical relevance of blood-based splicing studies. CONCLUSIONS PVS1 is not necessarily warranted for splice site variants targeting four PALB2 acceptor sites (exons 2, 5, 7 and 10). As a result, rare variants at these splice sites cannot be assumed pathogenic/likely pathogenic without further evidences. Our study puts a warning in up to five PALB2 genetic variants that are currently reported as pathogenic/likely pathogenic in ClinVar.
Collapse
Affiliation(s)
- Irene Lopez-Perolio
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Raphaël Leman
- Laboratory of Clinical Biology and Oncology, Centre François Baclesse, Inserm U1245 Genomics and Personalized Medicine in Cancer and Neurological Disorders, Normandy University, Caen, France
| | - Raquel Behar
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Vanessa Lattimore
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - John F Pearson
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - Laurent Castéra
- Laboratory of Clinical Biology and Oncology, Centre François Baclesse, Inserm U1245 Genomics and Personalized Medicine in Cancer and Neurological Disorders, Normandy University, Caen, France
| | - Alexandra Martins
- Inserm U1245 Genomics and Personalized Medecine in Cancer and Neurological Disorders, UNIROUEN, Normandie Université, Normandy Centre for Genomic and Personalized Medicine, Rouen, France
| | - Dominique Vaur
- Laboratory of Clinical Biology and Oncology, Centre François Baclesse, Inserm U1245 Genomics and Personalized Medicine in Cancer and Neurological Disorders, Normandy University, Caen, France
| | - Nicolas Goardon
- Laboratory of Clinical Biology and Oncology, Centre François Baclesse, Inserm U1245 Genomics and Personalized Medicine in Cancer and Neurological Disorders, Normandy University, Caen, France
| | - Grégoire Davy
- Laboratory of Clinical Biology and Oncology, Centre François Baclesse, Inserm U1245 Genomics and Personalized Medicine in Cancer and Neurological Disorders, Normandy University, Caen, France
| | - Pilar Garre
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Vanesa García-Barberán
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Patricia Llovet
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Pedro Pérez-Segura
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Eduardo Díaz-Rubio
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | - Trinidad Caldés
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| | | | | | - Sitao Wu
- Ambry Genetics, Aliso Viejo, CA, USA
| | | | | | - Johan Vallon-Christersson
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Ake Borg
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | -
- Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.,The Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, Australia
| | - Alberto Valenzuela-Palomo
- Splicing and genetic susceptibility to cancer, Instituto de Biología y Genética Molecular (CSIC-UVa), Valladolid, Spain
| | - Eladio A Velasco
- Splicing and genetic susceptibility to cancer, Instituto de Biología y Genética Molecular (CSIC-UVa), Valladolid, Spain
| | - Melissa Southey
- Genetic Epidemiology Laboratory, Department of Clinical Pathology, The University of Melbourne, Melbourne, VIC, Australia
| | - Maaike P G Vreeswijk
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Peter Devilee
- Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
| | - Anders Kvist
- Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden
| | - Amanda B Spurdle
- Molecular Cancer Epidemiology Laboratory, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Logan C Walker
- Department of Pathology and Biomedical Science, University of Otago, Christchurch, New Zealand
| | - Sophie Krieger
- Laboratory of Clinical Biology and Oncology, Centre François Baclesse, Inserm U1245 Genomics and Personalized Medicine in Cancer and Neurological Disorders, Normandy University, Caen, France
| | - Miguel de la Hoya
- Molecular Oncology Laboratory CIBERONC, Hospital Clínico San Carlos, IdISSC (Instituto de Investigación Sanitaria del Hospital Clínico San Carlos), Madrid, Spain
| |
Collapse
|
18
|
Liu R, Liu J, Zhao G, Li W, Zheng M, Wang J, Li Q, Cui H, Wen J. Relevance of the intestinal health-related pathways to broiler residual feed intake revealed by duodenal transcriptome profiling. Poult Sci 2019; 98:1102-1110. [DOI: 10.3382/ps/pey506] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Accepted: 10/13/2018] [Indexed: 12/13/2022] Open
|
19
|
Wang G, Yu S, Liao J. Identification and Characterisation of Alternative Splice Variants of Hoxb9 and Their Correlation with Melanogenesis in the Black-Boned Chicken. BRAZILIAN JOURNAL OF POULTRY SCIENCE 2019. [DOI: 10.1590/1806-9061-2018-0904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- G Wang
- Leshan Normal University, China
| | - S Yu
- Leshan Normal University, China
| | - J Liao
- Leshan Normal University, China
| |
Collapse
|
20
|
Bhuiyan SA, Ly S, Phan M, Huntington B, Hogan E, Liu CC, Liu J, Pavlidis P. Systematic evaluation of isoform function in literature reports of alternative splicing. BMC Genomics 2018; 19:637. [PMID: 30153812 PMCID: PMC6114036 DOI: 10.1186/s12864-018-5013-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Accepted: 08/14/2018] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Although most genes in mammalian genomes have multiple isoforms, an ongoing debate is whether these isoforms are all functional as well as the extent to which they increase the functional repertoire of the genome. To ground this debate in data, it would be helpful to have a corpus of experimentally-verified cases of genes which have functionally distinct splice isoforms (FDSIs). RESULTS We established a curation framework for evaluating experimental evidence of FDSIs, and analyzed over 700 human and mouse genes, strongly biased towards genes that are prominent in the alternative splicing literature. Despite this bias, we found experimental evidence meeting the classical definition for functionally distinct isoforms for ~ 5% of the curated genes. If we relax our criteria for inclusion to include weaker forms of evidence, the fraction of genes with evidence of FDSIs remains low (~ 13%). We provide evidence that this picture will not change substantially with further curation and conclude there is a large gap between the presumed impact of splicing on gene function and the experimental evidence. Furthermore, many functionally distinct isoforms were not traceable to a specific isoform in Ensembl, a database that forms the basis for much computational research. CONCLUSIONS We conclude that the claim that alternative splicing vastly increases the functional repertoire of the genome is an extrapolation from a limited number of empirically supported cases. We also conclude that more work is needed to integrate experimental evidence and genome annotation databases. Our work should help shape research around the role of splicing on gene function from presuming large general effects to acknowledging the need for stronger experimental evidence.
Collapse
Affiliation(s)
- Shamsuddin A. Bhuiyan
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
- Graduate Program in Bioinformatics, University of British Columbia, Vancouver, Canada
| | - Sophia Ly
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Minh Phan
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Brandon Huntington
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Ellie Hogan
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Chao Chun Liu
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - James Liu
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| | - Paul Pavlidis
- Michael Smith Laboratories, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
- Department of Psychiatry, University of British Columbia, Vancouver, BC V6T 1Z4 Canada
| |
Collapse
|
21
|
Genome-wide transcriptome analysis identifies alternative splicing regulatory network and key splicing factors in mouse and human psoriasis. Sci Rep 2018. [PMID: 29515135 PMCID: PMC5841439 DOI: 10.1038/s41598-018-22284-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Psoriasis is a chronic inflammatory disease that affects the skin, nails, and joints. For understanding the mechanism of psoriasis, though, alternative splicing analysis has received relatively little attention in the field. Here, we developed and applied several computational analysis methods to study psoriasis. Using psoriasis mouse and human datasets, our differential alternative splicing analyses detected hundreds of differential alternative splicing changes. Our analysis of conservation revealed many exon-skipping events conserved between mice and humans. In addition, our splicing signature comparison analysis using the psoriasis datasets and our curated splicing factor perturbation RNA-Seq database, SFMetaDB, identified nine candidate splicing factors that may be important in regulating splicing in the psoriasis mouse model dataset. Three of the nine splicing factors were confirmed upon analyzing the human data. Our computational methods have generated predictions for the potential role of splicing in psoriasis. Future experiments on the novel candidates predicted by our computational analysis are expected to provide a better understanding of the molecular mechanism of psoriasis and to pave the way for new therapeutic treatments.
Collapse
|
22
|
Tardaguila M, de la Fuente L, Marti C, Pereira C, Pardo-Palacios FJ, Del Risco H, Ferrell M, Mellado M, Macchietto M, Verheggen K, Edelmann M, Ezkurdia I, Vazquez J, Tress M, Mortazavi A, Martens L, Rodriguez-Navarro S, Moreno-Manzano V, Conesa A. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. Genome Res 2018; 28:396-411. [PMID: 29440222 PMCID: PMC5848618 DOI: 10.1101/gr.222976.117] [Citation(s) in RCA: 224] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 01/08/2018] [Indexed: 01/15/2023]
Abstract
High-throughput sequencing of full-length transcripts using long reads has paved the way for the discovery of thousands of novel transcripts, even in well-annotated mammalian species. The advances in sequencing technology have created a need for studies and tools that can characterize these novel variants. Here, we present SQANTI, an automated pipeline for the classification of long-read transcripts that can assess the quality of data and the preprocessing pipeline using 47 unique descriptors. We apply SQANTI to a neuronal mouse transcriptome using Pacific Biosciences (PacBio) long reads and illustrate how the tool is effective in characterizing and describing the composition of the full-length transcriptome. We perform extensive evaluation of ToFU PacBio transcripts by PCR to reveal that an important number of the novel transcripts are technical artifacts of the sequencing approach and that SQANTI quality descriptors can be used to engineer a filtering strategy to remove them. Most novel transcripts in this curated transcriptome are novel combinations of existing splice sites, resulting more frequently in novel ORFs than novel UTRs, and are enriched in both general metabolic and neural-specific functions. We show that these new transcripts have a major impact in the correct quantification of transcript levels by state-of-the-art short-read-based quantification algorithms. By comparing our iso-transcriptome with public proteomics databases, we find that alternative isoforms are elusive to proteogenomics detection. SQANTI allows the user to maximize the analytical outcome of long-read technologies by providing the tools to deliver quality-evaluated and curated full-length transcriptomes.
Collapse
Affiliation(s)
- Manuel Tardaguila
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | - Lorena de la Fuente
- Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), 46012 Valencia, Spain
| | - Cristina Marti
- Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), 46012 Valencia, Spain
| | - Cécile Pereira
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | | | - Hector Del Risco
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | - Marc Ferrell
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | | | - Marissa Macchietto
- Department of Developmental and Cell Biology, University of California, Irvine, California 92617, USA
| | - Kenneth Verheggen
- VIB-UGent Center for Medical Biotechnology, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Mariola Edelmann
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
| | - Iakes Ezkurdia
- Centro Nacional de Investigaciones Cardiovasculares CNIC, 28029 Madrid, Spain
| | - Jesus Vazquez
- Centro Nacional de Investigaciones Cardiovasculares CNIC, 28029 Madrid, Spain
| | - Michael Tress
- Spanish National Cancer Research Centre (CNIO), 28029 Madrid, Spain
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, California 92617, USA
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Susana Rodriguez-Navarro
- Gene Expression and mRNA Metabolism Laboratory, CSIC, IBV, 46010 Valencia, Spain
- Gene Expression and mRNA Metabolism Laboratory, CIPF, 46012 Valencia, Spain
| | | | - Ana Conesa
- Department of Microbiology and Cell Science, Institute for Food and Agricultural Sciences, Genetics Institute, University of Florida, Gainesville, Florida 32611, USA
- Genomics of Gene Expression Laboratory, Centro de Investigaciones Principe Felipe (CIPF), 46012 Valencia, Spain
| |
Collapse
|
23
|
Tranchevent LC, Aubé F, Dulaurier L, Benoit-Pilven C, Rey A, Poret A, Chautard E, Mortada H, Desmet FO, Chakrama FZ, Moreno-Garcia MA, Goillot E, Janczarski S, Mortreux F, Bourgeois CF, Auboeuf D. Identification of protein features encoded by alternative exons using Exon Ontology. Genome Res 2017; 27:1087-1097. [PMID: 28420690 PMCID: PMC5453322 DOI: 10.1101/gr.212696.116] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2016] [Accepted: 03/28/2017] [Indexed: 12/16/2022]
Abstract
Transcriptomic genome-wide analyses demonstrate massive variation of alternative splicing in many physiological and pathological situations. One major challenge is now to establish the biological contribution of alternative splicing variation in physiological- or pathological-associated cellular phenotypes. Toward this end, we developed a computational approach, named “Exon Ontology,” based on terms corresponding to well-characterized protein features organized in an ontology tree. Exon Ontology is conceptually similar to Gene Ontology-based approaches but focuses on exon-encoded protein features instead of gene level functional annotations. Exon Ontology describes the protein features encoded by a selected list of exons and looks for potential Exon Ontology term enrichment. By applying this strategy to exons that are differentially spliced between epithelial and mesenchymal cells and after extensive experimental validation, we demonstrate that Exon Ontology provides support to discover specific protein features regulated by alternative splicing. We also show that Exon Ontology helps to unravel biological processes that depend on suites of coregulated alternative exons, as we uncovered a role of epithelial cell-enriched splicing factors in the AKT signaling pathway and of mesenchymal cell-enriched splicing factors in driving splicing events impacting on autophagy. Freely available on the web, Exon Ontology is the first computational resource that allows getting a quick insight into the protein features encoded by alternative exons and investigating whether coregulated exons contain the same biological information.
Collapse
Affiliation(s)
- Léon-Charles Tranchevent
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fabien Aubé
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Louis Dulaurier
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Clara Benoit-Pilven
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Amandine Rey
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Arnaud Poret
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Emilie Chautard
- Laboratoire de Biométrie et Biologie Évolutive, Université Lyon 1, UMR CNRS 5558, INRIA Erable, Villeurbanne, F-69622, France
| | - Hussein Mortada
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - François-Olivier Desmet
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Fatima Zahra Chakrama
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Maira Alejandra Moreno-Garcia
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Evelyne Goillot
- Institut NeuroMyoGène, CNRS UMR 5310, INSERM U1217, Université Lyon 1, Lyon, F-69007 France
| | - Stéphane Janczarski
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Franck Mortreux
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Cyril F Bourgeois
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| | - Didier Auboeuf
- Université Lyon 1, ENS de Lyon, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, F-69007, Lyon, France
| |
Collapse
|
24
|
Human NDE1 splicing and mammalian brain development. Sci Rep 2017; 7:43504. [PMID: 28266585 PMCID: PMC5339911 DOI: 10.1038/srep43504] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 01/27/2017] [Indexed: 11/09/2022] Open
Abstract
Exploring genetic and molecular differences between humans and other close species may be the key to explain the uniqueness of our brain and the selective pressures under which it evolves. Recent discoveries unveiled the involvement of Nuclear distribution factor E-homolog 1 (NDE1) in human cerebral cortical neurogenesis and suggested a role in brain evolution; however the evolutionary changes involved have not been investigated. NDE1 has a different gene structure in human and mouse resulting in the production of diverse splicing isoforms. In particular, mouse uses the terminal exon 8 T, while Human uses terminal exon 9, which is absent in rodents. Through chimeric minigenes splicing assay we investigated the unique elements regulating NDE1 terminal exon choice. We found that selection of the terminal exon is regulated in a cell dependent manner and relies on gain/loss of splicing regulatory sequences across the exons. Our results show how evolutionary changes in cis as well as trans acting signals have played a fundamental role in determining NDE1 species specific splicing isoforms supporting the notion that alternative splicing plays a central role in human genome evolution, and possibly human cognitive predominance.
Collapse
|
25
|
Abstract
A genome sequence is worthless if it cannot be deciphered; therefore, efforts to describe - or 'annotate' - genes began as soon as DNA sequences became available. Whereas early work focused on individual protein-coding genes, the modern genomic ocean is a complex maelstrom of alternative splicing, non-coding transcription and pseudogenes. Scientists - from clinicians to evolutionary biologists - need to navigate these waters, and this has led to the design of high-throughput, computationally driven annotation projects. The catalogues that are being produced are key resources for genome exploration, especially as they become integrated with expression, epigenomic and variation data sets. Their creation, however, remains challenging.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
| | - Jennifer Harrow
- Department of Computational Genomics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK.,Illumina Cambridge Ltd, Chesterford Research Park, Little Chesterford, Saffron Walden CB10 1 XL, UK
| |
Collapse
|
26
|
Tress ML, Abascal F, Valencia A. Alternative Splicing May Not Be the Key to Proteome Complexity. Trends Biochem Sci 2016; 42:98-110. [PMID: 27712956 DOI: 10.1016/j.tibs.2016.08.008] [Citation(s) in RCA: 225] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 05/19/2016] [Accepted: 08/15/2016] [Indexed: 12/21/2022]
Abstract
Alternative splicing is commonly believed to be a major source of cellular protein diversity. However, although many thousands of alternatively spliced transcripts are routinely detected in RNA-seq studies, reliable large-scale mass spectrometry-based proteomics analyses identify only a small fraction of annotated alternative isoforms. The clearest finding from proteomics experiments is that most human genes have a single main protein isoform, while those alternative isoforms that are identified tend to be the most biologically plausible: those with the most cross-species conservation and those that do not compromise functional domains. Indeed, most alternative exons do not seem to be under selective pressure, suggesting that a large majority of predicted alternative transcripts may not even be translated into proteins.
Collapse
Affiliation(s)
- Michael L Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain
| | - Federico Abascal
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain; Human Genetics Department, Sandhu Group, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain; National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029 Madrid, Spain.
| |
Collapse
|
27
|
Xu Z, Ji C, Zhang Y, Zhang Z, Nie Q, Xu J, Zhang D, Zhang X. Combination analysis of genome-wide association and transcriptome sequencing of residual feed intake in quality chickens. BMC Genomics 2016; 17:594. [PMID: 27506765 PMCID: PMC4979145 DOI: 10.1186/s12864-016-2861-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2016] [Accepted: 06/29/2016] [Indexed: 01/07/2023] Open
Abstract
Background Residual feed intake (RFI) is a powerful indicator for energy utilization efficiency and responds to selection. Low RFI selection enables a reduction in feed intake without affecting growth performance. However, the effective variants or major genes dedicated to phenotypic differences in RFI in quality chickens are unclear. Therefore, a genome-wide association study (GWAS) and RNA sequencing were performed on RFI to identify genetic variants and potential candidate genes associated with energy improvement. Results A lower average daily feed intake was found in low-RFI birds compared to high-RFI birds. The heritability of RFI measured from 44 to 83 d of age was 0.35. GWAS showed that 32 of the significant single nucleotide polymorphisms (SNPs) associated with the RFI (P < 10−4) accounted for 53.01 % of the additive genetic variance. More than half of the effective SNPs were located in a 1 Mb region (16.3–17.3 Mb) of chicken (Gallus gallus) chromosome (GGA) 12. Thus, focusing on this region should enable a deeper understanding of energy utilization. RNA sequencing was performed to profile the liver transcriptomes of four male chickens selected from the high and low tails of the RFI. One hundred and sixteen unique genes were identified as differentially expressed genes (DEGs). Some of these genes were relevant to appetite, cell activities, and fat metabolism, such as CCKAR, HSP90B1, and PCK1. Some potential genes within the 500 Kb flanking region of the significant RFI-related SNPs detected in GWAS (i.e., MGP, HIST1H110, HIST1H2A4L3, OC3, NR0B2, PER2, ST6GALNAC2, and G0S2) were also identified as DEGs in chickens with divergent RFIs. Conclusions The GWAS findings showed that the 1 Mb narrow region of GGA12 should be important because it contained genes involved in energy-consuming processes, such as lipogenesis, social behavior, and immunity. Similar results were obtained in the transcriptome sequencing experiments. In general, low-RFI birds seemed to optimize energy employment by reducing energy expenditure in cell activities, immune responses, and physical activity compared to eating. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2861-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhenqiang Xu
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642, Guangdong Province, China.,Wen's Nanfang Poultry Breeding Co. Ltd, Guangdong Province, Yunfu, 527400, China
| | - Congliang Ji
- Wen's Nanfang Poultry Breeding Co. Ltd, Guangdong Province, Yunfu, 527400, China
| | - Yan Zhang
- Wen's Nanfang Poultry Breeding Co. Ltd, Guangdong Province, Yunfu, 527400, China
| | - Zhe Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642, Guangdong Province, China
| | - Qinghua Nie
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642, Guangdong Province, China
| | - Jiguo Xu
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642, Guangdong Province, China
| | - Dexiang Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642, Guangdong Province, China.,Wen's Nanfang Poultry Breeding Co. Ltd, Guangdong Province, Yunfu, 527400, China
| | - Xiquan Zhang
- Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding and Key Lab of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou, 510642, Guangdong Province, China.
| |
Collapse
|
28
|
Lin J, Hu Y, Nunez S, Foulkes AS, Cieply B, Xue C, Gerelus M, Li W, Zhang H, Rader DJ, Musunuru K, Li M, Reilly MP. Transcriptome-Wide Analysis Reveals Modulation of Human Macrophage Inflammatory Phenotype Through Alternative Splicing. Arterioscler Thromb Vasc Biol 2016; 36:1434-47. [PMID: 27230130 PMCID: PMC4919157 DOI: 10.1161/atvbaha.116.307573] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 05/17/2016] [Indexed: 12/20/2022]
Abstract
OBJECTIVE Human macrophages can shift phenotype across the inflammatory M1 and reparative M2 spectrum in response to environmental challenges, but the mechanisms promoting inflammatory and cardiometabolic disease-associated M1 phenotypes remain incompletely understood. Alternative splicing (AS) is emerging as an important regulator of cellular function, yet its role in macrophage activation is largely unknown. We investigated the extent to which AS occurs in M1 activation within the cardiometabolic disease context and validated a functional genomic cell model for studying human macrophage-related AS events. APPROACH AND RESULTS From deep RNA-sequencing of resting, M1, and M2 primary human monocyte-derived macrophages, we found 3860 differentially expressed genes in M1 activation and detected 233 M1-induced AS events; the majority of AS events were cell- and M1-specific with enrichment for pathways relevant to macrophage inflammation. Using genetic variant data for 10 cardiometabolic traits, we identified 28 trait-associated variants within the genomic loci of 21 alternatively spliced genes and 15 variants within 7 differentially expressed regulatory splicing factors in M1 activation. Knockdown of 1 such splicing factor, CELF1, in primary human macrophages led to increased inflammatory response to M1 stimulation, demonstrating CELF1's potential modulation of the M1 phenotype. Finally, we demonstrated that an induced pluripotent stem cell-derived macrophage system recapitulates M1-associated AS events and provides a high-fidelity macrophage AS model. CONCLUSIONS AS plays a role in defining macrophage phenotype in a cell- and stimulus-specific fashion. Alternatively spliced genes and splicing factors with trait-associated variants may reveal novel pathways and targets in cardiometabolic diseases.
Collapse
Affiliation(s)
- Jennie Lin
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.).
| | - Yu Hu
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Sara Nunez
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Andrea S Foulkes
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Benjamin Cieply
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Chenyi Xue
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Mark Gerelus
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Wenjun Li
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Hanrui Zhang
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Daniel J Rader
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Kiran Musunuru
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Mingyao Li
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.)
| | - Muredach P Reilly
- From the Renal, Electrolyte, and Hypertension Division, Department of Medicine, Perelman School of Medicine (J.L.), Department of Biostatistics and Epidemiology (Y.H., M.L.), Department of Genetics, Perelman School of Medicine (B.C., K.M., D.J.R.), and Cardiovascular Institute, Department of Medicine, Perelman School of Medicine (M.G., W.L., K.M.), University of Pennsylvania, Philadelphia; Irving Institute for Clinical and Translational Research (M.P.R.) and Division of Cardiology, Department of Medicine (C.X., H.Z., M.P.R.), Columbia University Medical Center, New York, NY; and Department of Mathematics and Statistics, Mount Holyoke College, South Hadley, MA (S.N., A.S.F.).
| |
Collapse
|
29
|
Shapiro JA. Nothing in Evolution Makes Sense Except in the Light of Genomics: Read-Write Genome Evolution as an Active Biological Process. BIOLOGY 2016; 5:E27. [PMID: 27338490 PMCID: PMC4929541 DOI: 10.3390/biology5020027] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 02/12/2016] [Revised: 05/20/2016] [Accepted: 06/02/2016] [Indexed: 01/15/2023]
Abstract
The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess "Read-Write Genomes" they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification.
Collapse
Affiliation(s)
- James A Shapiro
- Department of Biochemistry and Molecular Biology, University of Chicago, GCIS W123B, 979 E. 57th Street, Chicago, IL 60637, USA.
| |
Collapse
|
30
|
Barson G, Griffiths E. SeqTools: visual tools for manual analysis of sequence alignments. BMC Res Notes 2016; 9:39. [PMID: 26801397 PMCID: PMC4724122 DOI: 10.1186/s13104-016-1847-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 01/08/2016] [Indexed: 11/23/2022] Open
Abstract
Background Manual annotation is essential to create high-quality reference alignments and annotation. Annotators need to be able to view sequence alignments in detail. The SeqTools package provides three tools for viewing different types of sequence alignment: Blixem is a many-to-one browser of pairwise alignments, displaying multiple match sequences aligned against a single reference sequence; Dotter provides a graphical dot-plot view of a single pairwise alignment; and Belvu is a multiple sequence alignment viewer, editor, and phylogenetic tool. These tools were originally part of the AceDB genome database system but have been completely rewritten to make them generally available as a standalone package of greatly improved function. Findings Blixem is used by annotators to give a detailed view of the evidence for particular gene models. Blixem displays the gene model positions and the match sequences aligned against the genomic reference sequence. Annotators use this for many reasons, including to check the quality of an alignment, to find missing/misaligned sequence and to identify splice sites and polyA sites and signals. Dotter is used to give a dot-plot representation of a particular pairwise alignment. This is used to identify sequence that is not represented (or is misrepresented) and to quickly compare annotated gene models with transcriptional and protein evidence that putatively supports them. Belvu is used to analyse conservation patterns in multiple sequence alignments and to perform a combination of manual and automatic processing of the alignment. High-quality reference alignments are essential if they are to be used as a starting point for further automatic alignment generation. Conclusions While there are many different alignment tools available, the SeqTools package provides unique functionality that annotators have found to be essential for analysing sequence alignments as part of the manual annotation process. Electronic supplementary material The online version of this article (doi:10.1186/s13104-016-1847-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gemma Barson
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | - Ed Griffiths
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| |
Collapse
|
31
|
Zhou K, Salamov A, Kuo A, Aerts AL, Kong X, Grigoriev IV. Alternative splicing acting as a bridge in evolution. Stem Cell Investig 2015; 2:19. [PMID: 27358887 DOI: 10.3978/j.issn.2306-9759.2015.10.01] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 10/15/2015] [Indexed: 12/15/2022]
Abstract
BACKGROUND Alternative splicing (AS) regulates diverse cellular and developmental functions through alternative protein structures of different isoforms. Alternative exons dominate AS in vertebrates; however, very little is known about the extent and function of AS in lower eukaryotes. To understand the role of introns in gene evolution, we examined AS from a green algal and five fungal genomes using a novel EST-based gene-modeling algorithm (COMBEST). METHODS AS from each genome was classified with COMBEST that maps EST sequences to genomes to build gene models. Various aspects of AS were analyzed through statistical methods. The interplay of intron 3n length, phase, coding property, and intron retention (RI) were examined with Chi-square testing. RESULTS With 3 to 834 times EST coverage, we identified up to 73% of AS in intron-containing genes and found preponderance of RI among 11 types of AS. The number of exons, expression level, and maximum intron length correlated with number of AS per gene (NAG), and intron-rich genes suppressed AS. Genes with AS were more ancient, and AS was conserved among fungal genomes. Among stopless introns, non-retained introns (NRI) avoided, but major RI preferred 3n length. In contrast, stop-containing introns showed uniform distribution among 3n, 3n+1, and 3n+2 lengths. We found a clue to the intron phase enigma: it was the coding function of introns involved in AS that dictates the intron phase bias. CONCLUSIONS Majority of AS is non-functional, and the extent of AS is suppressed for intron-rich genes. RI through 3n length, stop codon, and phase bias bridges the transition from functionless to functional alternative isoforms.
Collapse
Affiliation(s)
- Kemin Zhou
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Asaf Salamov
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Alan Kuo
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Andrea L Aerts
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Xiangyang Kong
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Igor V Grigoriev
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| |
Collapse
|
32
|
Tovar-Corona JM, Castillo-Morales A, Chen L, Olds BP, Clark JM, Reynolds SE, Pittendrigh BR, Feil EJ, Urrutia AO. Alternative Splice in Alternative Lice. Mol Biol Evol 2015; 32:2749-59. [PMID: 26169943 PMCID: PMC4576711 DOI: 10.1093/molbev/msv151] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Genomic and transcriptomics analyses have revealed human head and body lice to be almost genetically identical; although con-specific, they nevertheless occupy distinct ecological niches and have differing feeding patterns. Most importantly, while head lice are not known to be vector competent, body lice can transmit three serious bacterial diseases; epidemictyphus, trench fever, and relapsing fever. In order to gain insights into the molecular bases for these differences, we analyzed alternative splicing (AS) using next-generation sequencing data for one strain of head lice and one strain of body lice. We identified a total of 3,598 AS events which were head or body lice specific. Exon skipping AS events were overrepresented among both head and body lice, whereas intron retention events were underrepresented in both. However, both the enrichment of exon skipping and the underrepresentation of intron retention are significantly stronger in body lice compared with head lice. Genes containing body louse-specific AS events were found to be significantly enriched for functions associated with development of the nervous system, salivary gland, trachea, and ovarian follicle cells, as well as regulation of transcription. In contrast, no functional categories were overrepresented among genes with head louse-specific AS events. Together, our results constitute the first evidence for transcript pool differences in head and body lice, providing insights into molecular adaptations that enabled human lice to adapt to clothing, and representing a powerful illustration of the pivotal role AS can play in functional adaptation.
Collapse
Affiliation(s)
- Jaime M Tovar-Corona
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom Milner Centre, University of Bath, Bath, UK
| | - Atahualpa Castillo-Morales
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom Milner Centre, University of Bath, Bath, UK
| | - Lu Chen
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, United Kingdom
| | - Brett P Olds
- Department of Animal Biology, University of Illinois at Urbana-Champaign Department of Biological Sciences, University of Notre Dame
| | - John M Clark
- Department of Veterinary & Animal Science, University of Massachusetts, Amherst
| | - Stuart E Reynolds
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | | | - Edward J Feil
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom Milner Centre, University of Bath, Bath, UK
| | - Araxi O Urrutia
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom Milner Centre, University of Bath, Bath, UK
| |
Collapse
|
33
|
Mort M, Carlisle FA, Waite AJ, Elliston L, Allen ND, Jones L, Hughes AC. Huntingtin Exists as Multiple Splice Forms in Human Brain. J Huntingtons Dis 2015; 4:161-71. [DOI: 10.3233/jhd-150151] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Matthew Mort
- Institute of Medical Genetics, Cardiff University, Heath Park, Cardiff, UK
| | - Francesca A. Carlisle
- Institute of Psychological Medicine and Clinical Neuroscience, MRC Centre for Neuropsychiatric Genetics and Genomics, Hadyn Ellis Building, Cardiff University, UK
| | - Adrian J. Waite
- Institute of Psychological Medicine and Clinical Neuroscience, MRC Centre for Neuropsychiatric Genetics and Genomics, Hadyn Ellis Building, Cardiff University, UK
| | - Lyn Elliston
- Institute of Psychological Medicine and Clinical Neuroscience, MRC Centre for Neuropsychiatric Genetics and Genomics, Hadyn Ellis Building, Cardiff University, UK
| | - Nicholas D. Allen
- Cardiff School of Biosciences, Sir Martin Evans Building, Museum Avenue, Cardiff, UK
| | - Lesley Jones
- Institute of Psychological Medicine and Clinical Neuroscience, MRC Centre for Neuropsychiatric Genetics and Genomics, Hadyn Ellis Building, Cardiff University, UK
| | - Alis C. Hughes
- Institute of Psychological Medicine and Clinical Neuroscience, MRC Centre for Neuropsychiatric Genetics and Genomics, Hadyn Ellis Building, Cardiff University, UK
| |
Collapse
|
34
|
Abascal F, Ezkurdia I, Rodriguez-Rivas J, Rodriguez JM, del Pozo A, Vázquez J, Valencia A, Tress ML. Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level. PLoS Comput Biol 2015; 11:e1004325. [PMID: 26061177 PMCID: PMC4465641 DOI: 10.1371/journal.pcbi.1004325] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Accepted: 05/08/2015] [Indexed: 11/19/2022] Open
Abstract
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved—all the homologous exons we identified evolved over 460 million years ago—and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles. Alternative splicing is thought to be one means for generating the protein diversity necessary for the whole range of cellular functions. While the presence of alternatively spliced transcripts in the cell has been amply demonstrated, the same cannot be said for alternatively spliced proteins. The quest for alternative protein isoforms has focused primarily on the analysis of peptides from large-scale mass spectroscopy experiments, but evidence for alternative isoforms has been patchy and contradictory. A careful analysis of the peptide evidence is needed to fully understand the scale of alternative splicing detectable at the protein level. Here we analysed peptides from eight large-scale data sets, identifying just 282 splice events among 12,716 genes. This suggests that most genes have a single dominant isoform. Many of the alternative isoforms that we identified were only subtly different from the main splice variant, and one in five was generated by substitution of homologous exons by swapping one related exon for another. Remarkably, the alternative isoforms generated from homologous exons were highly conserved, first appearing 460 million years ago, and several appear to have tissue-specific roles in the brain and heart. Our results suggest that these particular isoforms are likely to have important cellular roles.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Iakes Ezkurdia
- Unidad de Proteómica, Centro Nacional de Investigaciones Cardiovasculares (CNIC), Madrid, Spain
| | - Juan Rodriguez-Rivas
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Jose Manuel Rodriguez
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Angela del Pozo
- Instituto de Genetica Medica y Molecular, Hospital Universitario La Paz, Madrid, Spain
| | - Jesús Vázquez
- Laboratorio de Proteómica Cardiovascular, Centro Nacional de Investigaciones Cardiovasculares (CNIC) Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| | - Michael L. Tress
- Structural Biology and Bioinformatics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
- * E-mail: (AV); (MLT)
| |
Collapse
|
35
|
Jing R, Sun J, Wang Y, Li M. Domain position prediction based on sequence information by using fuzzy mean operator. Proteins 2015; 83:1462-9. [PMID: 26009844 DOI: 10.1002/prot.24833] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Revised: 04/23/2015] [Accepted: 05/17/2015] [Indexed: 11/09/2022]
Abstract
The prediction of protein domain region is an advantageous process on the study of protein structure and function. In this study, we proposed a new method, which is composed of fuzzy mean operator and region division, to predict the particular positions of domains in a target protein based on its sequence. The whole sequence is aligned and scored by using fuzzy mean operator, and the final determination of domain region position is realized by region division. A published benchmark is used for the comparison with previous researches. In addition, we generate two extra datasets to examine the stability of this method. Finally, the prediction accuracy of independent test dataset achieved by our method was up to 84.13%. We wish that this method could be useful for related researches.
Collapse
Affiliation(s)
- Runyu Jing
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Jing Sun
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Yuelong Wang
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| | - Menglong Li
- Chemical Information Center (CIC), College of Chemistry, Sichuan University, Chengdu, 610064, China
| |
Collapse
|
36
|
Rodriguez JM, Carro A, Valencia A, Tress ML. APPRIS WebServer and WebServices. Nucleic Acids Res 2015; 43:W455-9. [PMID: 25990727 PMCID: PMC4489225 DOI: 10.1093/nar/gkv512] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 05/05/2015] [Indexed: 01/08/2023] Open
Abstract
This paper introduces the APPRIS WebServer (http://appris.bioinfo.cnio.es) and WebServices (http://apprisws.bioinfo.cnio.es). Both the web servers and the web services are based around the APPRIS Database, a database that presently houses annotations of splice isoforms for five different vertebrate genomes. The APPRIS WebServer and WebServices provide access to the computational methods implemented in the APPRIS Database, while the APPRIS WebServices also allows retrieval of the annotations. The APPRIS WebServer and WebServices annotate splice isoforms with protein structural and functional features, and with data from cross-species alignments. In addition they can use the annotations of structure, function and conservation to select a single reference isoform for each protein-coding gene (the principal protein isoform). APPRIS principal isoforms have been shown to agree overwhelmingly with the main protein isoform detected in proteomics experiments. The APPRIS WebServer allows for the annotation of splice isoforms for individual genes, and provides a range of visual representations and tools to allow researchers to identify the likely effect of splicing events. The APPRIS WebServices permit users to generate annotations automatically in high throughput mode and to interrogate the annotations in the APPRIS Database. The APPRIS WebServices have been implemented using REST architecture to be flexible, modular and automatic.
Collapse
Affiliation(s)
- Jose Manuel Rodriguez
- Spanish National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Angel Carro
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Alfonso Valencia
- Spanish National Bioinformatics Institute (INB), Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| | - Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid 28029, Spain
| |
Collapse
|
37
|
Abascal F, Tress ML, Valencia A. The evolutionary fate of alternatively spliced homologous exons after gene duplication. Genome Biol Evol 2015; 7:1392-403. [PMID: 25931610 PMCID: PMC4494069 DOI: 10.1093/gbe/evv076] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Alternative splicing and gene duplication are the two main processes responsible for expanding protein functional diversity. Although gene duplication can generate new genes and alternative splicing can introduce variation through alternative gene products, the interplay between the two processes is complex and poorly understood. Here, we have carried out a study of the evolution of alternatively spliced exons after gene duplication to better understand the interaction between the two processes. We created a manually curated set of 97 human genes with mutually exclusively spliced homologous exons and analyzed the evolution of these exons across five distantly related vertebrates (lamprey, spotted gar, zebrafish, fugu, and coelacanth). Most of these exons had an ancient origin (more than 400 Ma). We found examples supporting two extreme evolutionary models for the behaviour of homologous axons after gene duplication. We observed 11 events in which gene duplication was accompanied by splice isoform separation, that is, each paralog specifically conserved just one distinct ancestral homologous exon. At other extreme, we identified genes in which the homologous exons were always conserved within paralogs, suggesting that the alternative splicing event cannot easily be separated from the function in these genes. That many homologous exons fall in between these two extremes highlights the diversity of biological systems and suggests that the subtle balance between alternative splicing and gene duplication is adjusted to the specific cellular context of each gene.
Collapse
Affiliation(s)
- Federico Abascal
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Michael L Tress
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Alfonso Valencia
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| |
Collapse
|
38
|
Thompson B, Martins A, Spurdle A. A review of mismatch repair gene transcripts: issues for interpretation of mRNA splicing assays. Clin Genet 2014; 87:100-8. [DOI: 10.1111/cge.12450] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 06/17/2014] [Accepted: 06/24/2014] [Indexed: 12/21/2022]
Affiliation(s)
- B.A. Thompson
- Department of Genetics and Computational Biology; QIMR Berghofer Medical Research Institute; Brisbane Australia
- School of Medicine; University of Queensland; Brisbane Australia
| | - A. Martins
- Inserm U1079; University of Rouen, Institute for Research and Innovation in Biomedicine; Rouen France
| | - A.B. Spurdle
- Department of Genetics and Computational Biology; QIMR Berghofer Medical Research Institute; Brisbane Australia
| |
Collapse
|
39
|
Colombo M, Blok MJ, Whiley P, Santamariña M, Gutiérrez-Enríquez S, Romero A, Garre P, Becker A, Smith LD, De Vecchi G, Brandão RD, Tserpelis D, Brown M, Blanco A, Bonache S, Menéndez M, Houdayer C, Foglia C, Fackenthal JD, Baralle D, Wappenschmidt B, Díaz-Rubio E, Caldés T, Walker L, Díez O, Vega A, Spurdle AB, Radice P, De La Hoya M. Comprehensive annotation of splice junctions supports pervasive alternative splicing at the BRCA1 locus: a report from the ENIGMA consortium. Hum Mol Genet 2014; 23:3666-80. [DOI: 10.1093/hmg/ddu075] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Affiliation(s)
- Mara Colombo
- Department of Preventive
and Predictive Medicine, Fondazione IRCCS Istituto Nazionale dei Tumori, Milano, Italy,
| | - Marinus J. Blok
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands,
| | - Phillip Whiley
- Molecular Cancer Epidemiology Laboratory, Genetics and Computational Division, QIMR Berghofer Medical Research Institute, Brisbane, Australia,
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia,
| | - Marta Santamariña
- Grupo de Medicina Xenómica-USC, Universidad de Santiago de Compostela, CIBERER, IDIS, Santiago de Compostela, Spain,
| | | | - Atocha Romero
- Laboratorio de Oncología Molecular, Instituto de Investigación Sanitaria San Carlos (IdISSC), Hospital Clínico San Carlos, Madrid, Spain,
| | - Pilar Garre
- Laboratorio de Oncología Molecular, Instituto de Investigación Sanitaria San Carlos (IdISSC), Hospital Clínico San Carlos, Madrid, Spain,
| | - Alexandra Becker
- Center of Familial Breast and Ovarian Cancer, University Hospital Cologne, Cologne, Germany,
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany,
| | - Lindsay Denise Smith
- Human Development and Health Academic Unit, Faculty of Medicine, University of Southampton, Southampton General Hospital, Southampton, UK,
| | - Giovanna De Vecchi
- Department of Preventive
and Predictive Medicine, Fondazione IRCCS Istituto Nazionale dei Tumori, Milano, Italy,
| | - Rita D. Brandão
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands,
| | - Demis Tserpelis
- Department of Clinical Genetics, Maastricht University Medical Center, Maastricht, The Netherlands,
| | - Melissa Brown
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia,
| | - Ana Blanco
- Fundación Pública Galega de Medicina Xenómica-SERGAS, Grupo de Medicina Xenómica-USC, CIBERER, IDIS, Santiago de Compostela, Spain,
| | - Sandra Bonache
- Oncogenetics Group, Vall d'Hebron Institute of Oncology (VHIO) and
- Oncogenetics Group, Vall d'Hebron Research Institute (VHIR), Universitat Autonoma de Barcelona, Barcelona, Spain,
| | - Mireia Menéndez
- Genetic Diagnosis Unit, Hereditary Cancer Program, Institut Català d'Oncologia, Barcelona, Spain,
| | - Claude Houdayer
- Service de Génétique and INSERM U830, Institut Curie and Université Paris Descartes, Sorbonne Paris Cité, Paris, France,
| | - Claudia Foglia
- Department of Preventive
and Predictive Medicine, Fondazione IRCCS Istituto Nazionale dei Tumori, Milano, Italy,
| | - James D. Fackenthal
- Department of Medicine, The University of Chicago Medical Center, Chicago, IL, USA,
| | - Diana Baralle
- Human Development and Health Academic Unit, Faculty of Medicine, University of Southampton, Southampton General Hospital, Southampton, UK,
| | - Barbara Wappenschmidt
- Center of Familial Breast and Ovarian Cancer, University Hospital Cologne, Cologne, Germany,
- Center for Molecular Medicine Cologne (CMMC), University of Cologne, Cologne, Germany,
| | - Eduardo Díaz-Rubio
- Laboratorio de Oncología Molecular, Instituto de Investigación Sanitaria San Carlos (IdISSC), Hospital Clínico San Carlos, Madrid, Spain,
- Servicio de Oncología Médica, Hospital Clínico San Carlos, Madrid, Spain,
| | - Trinidad Caldés
- Laboratorio de Oncología Molecular, Instituto de Investigación Sanitaria San Carlos (IdISSC), Hospital Clínico San Carlos, Madrid, Spain,
| | - Logan Walker
- Department of Pathology, University of Otago, Christchurch, New Zealand
| | - Orland Díez
- Oncogenetics Group, Vall d'Hebron Institute of Oncology (VHIO) and
- Oncogenetics Group, Vall d'Hebron Research Institute (VHIR), Universitat Autonoma de Barcelona, Barcelona, Spain,
- Oncogenetics Group, University Hospital of Vall d'Hebron, Barcelona, Spain
| | - Ana Vega
- Fundación Pública Galega de Medicina Xenómica-SERGAS, Grupo de Medicina Xenómica-USC, CIBERER, IDIS, Santiago de Compostela, Spain,
| | - Amanda B. Spurdle
- Molecular Cancer Epidemiology Laboratory, Genetics and Computational Division, QIMR Berghofer Medical Research Institute, Brisbane, Australia,
| | - Paolo Radice
- Department of Preventive
and Predictive Medicine, Fondazione IRCCS Istituto Nazionale dei Tumori, Milano, Italy,
| | - Miguel De La Hoya
- Laboratorio de Oncología Molecular, Instituto de Investigación Sanitaria San Carlos (IdISSC), Hospital Clínico San Carlos, Madrid, Spain,
| | | |
Collapse
|
40
|
Abstract
The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom
| | | | | |
Collapse
|
41
|
Wu CS, Yu CY, Chuang CY, Hsiao M, Kao CF, Kuo HC, Chuang TJ. Integrative transcriptome sequencing identifies trans-splicing events with important roles in human embryonic stem cell pluripotency. Genome Res 2013; 24:25-36. [PMID: 24131564 PMCID: PMC3875859 DOI: 10.1101/gr.159483.113] [Citation(s) in RCA: 83] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Trans-splicing is a post-transcriptional event that joins exons from separate pre-mRNAs. Detection of trans-splicing is usually severely hampered by experimental artifacts and genetic rearrangements. Here, we develop a new computational pipeline, TSscan, which integrates different types of high-throughput long-/short-read transcriptome sequencing of different human embryonic stem cell (hESC) lines to effectively minimize false positives while detecting trans-splicing. Combining TSscan screening with multiple experimental validation steps revealed that most chimeric RNA products were platform-dependent experimental artifacts of RNA sequencing. We successfully identified and confirmed four trans-spliced RNAs, including the first reported trans-spliced large intergenic noncoding RNA (“tsRMST”). We showed that these trans-spliced RNAs were all highly expressed in human pluripotent stem cells and differentially expressed during hESC differentiation. Our results further indicated that tsRMST can contribute to pluripotency maintenance of hESCs by suppressing lineage-specific gene expression through the recruitment of NANOG and the PRC2 complex factor, SUZ12. Taken together, our findings provide important insights into the role of trans-splicing in pluripotency maintenance of hESCs and help to facilitate future studies into trans-splicing, opening up this important but understudied class of post-transcriptional events for comprehensive characterization.
Collapse
Affiliation(s)
- Chan-Shuo Wu
- Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
| | | | | | | | | | | | | |
Collapse
|
42
|
Morata J, Béjar S, Talavera D, Riera C, Lois S, de Xaxars GM, de la Cruz X. The relationship between gene isoform multiplicity, number of exons and protein divergence. PLoS One 2013; 8:e72742. [PMID: 24023641 PMCID: PMC3758341 DOI: 10.1371/journal.pone.0072742] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 07/14/2013] [Indexed: 11/18/2022] Open
Abstract
At present we know that phenotypic differences between organisms arise from a variety of sources, like protein sequence divergence, regulatory sequence divergence, alternative splicing, etc. However, we do not have yet a complete view of how these sources are related. Here we address this problem, studying the relationship between protein divergence and the ability of genes to express multiple isoforms. We used three genome-wide datasets of human-mouse orthologs to study the relationship between isoform multiplicity co-occurrence between orthologs (the fact that two orthologs have more than one isoform) and protein divergence. In all cases our results showed that there was a monotonic dependence between these two properties. We could explain this relationship in terms of a more fundamental one, between exon number of the largest isoform and protein divergence. We found that this last relationship was present, although with variations, in other species (chimpanzee, cow, rat, chicken, zebrafish and fruit fly). In summary, we have identified a relationship between protein divergence and isoform multiplicity co-occurrence and explained its origin in terms of a simple gene-level property. Finally, we discuss the biological implications of these findings for our understanding of inter-species phenotypic differences.
Collapse
Affiliation(s)
- Jordi Morata
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | - Santi Béjar
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | - David Talavera
- Faculty of Life Sciences, Manchester University, Manchester, United Kingdom
| | - Casandra Riera
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
| | - Sergio Lois
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
| | - Gemma Mas de Xaxars
- Laboratori de Botànica, Facultat de Farmàcia, Universitat de Barcelona, Barcelona, Spain
| | - Xavier de la Cruz
- Department of Structural Biology, Institut de Biologia Molecular de Barcelona (IBMB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
- Laboratory of Translational Bioinformatics in Neuroscience, Vall d'Hebron Institute of Research (VHIR), Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- * E-mail:
| |
Collapse
|
43
|
Koumandou VL, Scorilas A. Evolution of the plasma and tissue kallikreins, and their alternative splicing isoforms. PLoS One 2013; 8:e68074. [PMID: 23874499 PMCID: PMC3707919 DOI: 10.1371/journal.pone.0068074] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Accepted: 05/25/2013] [Indexed: 12/14/2022] Open
Abstract
Kallikreins are secreted serine proteases with important roles in human physiology. Human plasma kallikrein, encoded by the KLKB1 gene on locus 4q34-35, functions in the blood coagulation pathway, and in regulating blood pressure. The human tissue kallikrein and kallikrein-related peptidases (KLKs) have diverse expression patterns and physiological roles, including cancer-related processes such as cell growth regulation, angiogenesis, invasion, and metastasis. Prostate-specific antigen (PSA), the product of the KLK3 gene, is the most widely used biomarker in clinical practice today. A total of 15 KLKs are encoded by the largest contiguous cluster of protease genes in the human genome (19q13.3-13.4), which makes them ideal for evolutionary analysis of gene duplication events. Previous studies on the evolution of KLKs have traced mammalian homologs as well as a probable early origin of the family in aves, amphibia and reptilia. The aim of this study was to address the evolutionary and functional relationships between tissue KLKs and plasma kallikrein, and to examine the evolution of alternative splicing isoforms. Sequences of plasma and tissue kallikreins and their alternative transcripts were collected from the NCBI and Ensembl databases, and comprehensive phylogenetic analysis was performed by Bayesian as well as maximum likelihood methods. Plasma and tissue kallikreins exhibit high sequence similarity in the trypsin domain (>50%). Phylogenetic analysis indicates an early divergence of KLKB1, which groups closely with plasminogen, chymotrypsin, and complement factor D (CFD), in a monophyletic group distinct from trypsin and the tissue KLKs. Reconstruction of the earliest events leading to the diversification of the tissue KLKs is not well resolved, indicating rapid expansion in mammals. Alternative transcripts of each KLK gene show species-specific divergence, while examination of sequence conservation indicates that many annotated human KLK isoforms are missing the catalytic triad that is crucial for protease activity.
Collapse
Affiliation(s)
| | - Andreas Scorilas
- Department of Biochemistry and Molecular Biology, University of Athens, Athens, Greece
- * E-mail:
| |
Collapse
|
44
|
Light S, Elofsson A. The impact of splicing on protein domain architecture. Curr Opin Struct Biol 2013; 23:451-8. [PMID: 23562110 DOI: 10.1016/j.sbi.2013.02.013] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Revised: 02/22/2013] [Accepted: 02/28/2013] [Indexed: 10/27/2022]
Abstract
Many proteins are composed of protein domains, functional units of common descent. Multidomain forms are common in all eukaryotes making up more than half of the proteome and the evolution of novel domain architecture has been accelerated in metazoans. It is also becoming increasingly clear that alternative splicing is prevalent among vertebrates. Given that protein domains are defined as structurally, functionally and evolutionarily distinct units, one may speculate that some alternative splicing events may lead to clean excisions of protein domains, thus generating a number of different domain architectures from one gene template. However, recent findings indicate that smaller alternative splicing events, in particular in disordered regions, might be more prominent than domain architectural changes. The problem of identifying protein isoforms is, however, still not resolved. Clearly, many splice forms identified through detection of mRNA sequences appear to produce 'nonfunctional' proteins, such as proteins with missing internal secondary structure elements. Here, we review the state of the art methods for identification of functional isoforms and present a summary of what is known, thus far, about alternative splicing with regard to protein domain architectures.
Collapse
Affiliation(s)
- Sara Light
- Science for Life Laboratory, Stockholm University, Box 1031 SE-171 21 Solna, Sweden
| | | |
Collapse
|
45
|
Abstract
PURPOSE OF REVIEW With the advent of whole-transcriptome sequencing, or RNA-seq, we now know that alternative splicing is a generalized phenomenon, with nearly all multiexonic genes subject to alternative splicing. In this review, we highlight recent studies examining alternative splicing as a modulator of cellular cholesterol homeostasis and as an underlying mechanism of dyslipidemia. RECENT FINDINGS A number of key genes involved in cholesterol metabolism are known to undergo functionally relevant alternative splicing. Recently, we have identified coordinated changes in alternative splicing in multiple genes in response to alterations in cellular sterol content. We and others have implicated several splicing factors as regulators of lipid metabolism. Furthermore, a number of cis-acting human gene variants that modulate alternative splicing have been implicated in a variety of human metabolic diseases. SUMMARY Alternative splicing is of importance in various types of genetically influenced dyslipidemias and in the regulation of cellular cholesterol metabolism.
Collapse
Affiliation(s)
- Marisa W Medina
- Children's Hospital Oakland Research Institute, Oakland, CA 94609, USA.
| | | |
Collapse
|
46
|
Yap K, Makeyev EV. Regulation of gene expression in mammalian nervous system through alternative pre-mRNA splicing coupled with RNA quality control mechanisms. Mol Cell Neurosci 2013; 56:420-8. [PMID: 23357783 DOI: 10.1016/j.mcn.2013.01.003] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2012] [Revised: 01/15/2013] [Accepted: 01/17/2013] [Indexed: 12/12/2022] Open
Abstract
Eukaryotic gene expression is orchestrated on a genome-wide scale through several post-transcriptional mechanisms. Of these, alternative pre-mRNA splicing expands the proteome diversity and modulates mRNA stability through downstream RNA quality control (QC) pathways including nonsense-mediated decay (NMD) of mRNAs containing premature termination codons and nuclear retention and elimination (NRE) of intron-containing transcripts. Although originally identified as mechanisms for eliminating aberrant transcripts, a growing body of evidence suggests that NMD and NRE coupled with deliberate changes in pre-mRNA splicing patterns are also used in a number of biological contexts for deterministic control of gene expression. Here we review recent studies elucidating molecular mechanisms and biological significance of these gene regulation strategies with a specific focus on their roles in nervous system development and physiology. This article is part of a Special Issue entitled 'RNA and splicing regulation in neurodegeneration'.
Collapse
Affiliation(s)
- Karen Yap
- School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | | |
Collapse
|
47
|
Rodriguez JM, Maietta P, Ezkurdia I, Pietrelli A, Wesselink JJ, Lopez G, Valencia A, Tress ML. APPRIS: annotation of principal and alternative splice isoforms. Nucleic Acids Res 2012; 41:D110-7. [PMID: 23161672 PMCID: PMC3531113 DOI: 10.1093/nar/gks1058] [Citation(s) in RCA: 153] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform.
Collapse
|
48
|
Kim DS, Hahn Y. Human-specific protein isoforms produced by novel splice sites in the human genome after the human-chimpanzee divergence. BMC Bioinformatics 2012; 13:299. [PMID: 23148531 PMCID: PMC3538075 DOI: 10.1186/1471-2105-13-299] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2012] [Accepted: 11/09/2012] [Indexed: 11/16/2022] Open
Abstract
Background Evolution of splice sites is a well-known phenomenon that results in transcript diversity during human evolution. Many novel splice sites are derived from repetitive elements and may not contribute to protein products. Here, we analyzed annotated human protein-coding exons and identified human-specific splice sites that arose after the human-chimpanzee divergence. Results We analyzed multiple alignments of the annotated human protein-coding exons and their respective orthologous mammalian genome sequences to identify 85 novel splice sites (50 splice acceptors and 35 donors) in the human genome. The novel protein-coding exons, which are expressed either constitutively or alternatively, produce novel protein isoforms by insertion, deletion, or frameshift. We found three cases in which the human-specific isoform conferred novel molecular function in the human cells: the human-specific IMUP protein isoform induces apoptosis of the trophoblast and is implicated in pre-eclampsia; the intronization of a part of SMOX gene exon produces inactive spermine oxidase; the human-specific NUB1 isoform shows reduced interaction with ubiquitin-like proteins, possibly affecting ubiquitin pathways. Conclusions Although the generation of novel protein isoforms does not equate to adaptive evolution, we propose that these cases are useful candidates for a molecular functional study to identify proteomic changes that might bring about novel phenotypes during human evolution.
Collapse
Affiliation(s)
- Dong Seon Kim
- Department of Life Science, Research Center for Biomolecules and Biosystems, Chung-Ang University, Seoul 156-756, Korea
| | | |
Collapse
|
49
|
Bicknell AA, Cenik C, Chua HN, Roth FP, Moore MJ. Introns in UTRs: why we should stop ignoring them. Bioessays 2012; 34:1025-34. [PMID: 23108796 DOI: 10.1002/bies.201200073] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Although introns in 5'- and 3'-untranslated regions (UTRs) are found in many protein coding genes, rarely are they considered distinctive entities with specific functions. Indeed, mammalian transcripts with 3'-UTR introns are often assumed nonfunctional because they are subject to elimination by nonsense-mediated decay (NMD). Nonetheless, recent findings indicate that 5'- and 3'-UTR intron status is of significant functional consequence for the regulation of mammalian genes. Therefore these features should be ignored no longer.
Collapse
Affiliation(s)
- Alicia A Bicknell
- Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA, USA
| | | | | | | | | |
Collapse
|
50
|
Pertea M. The human transcriptome: an unfinished story. Genes (Basel) 2012; 3:344-60. [PMID: 22916334 PMCID: PMC3422666 DOI: 10.3390/genes3030344] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Revised: 06/14/2012] [Accepted: 06/25/2012] [Indexed: 11/16/2022] Open
Abstract
Despite recent technological advances, the study of the human transcriptome is still in its early stages. Here we provide an overview of the complex human transcriptomic landscape, present the bioinformatics challenges posed by the vast quantities of transcriptomic data, and discuss some of the studies that have tried to determine how much of the human genome is transcribed. Recent evidence has suggested that more than 90% of the human genome is transcribed into RNA. However, this view has been strongly contested by groups of scientists who argued that many of the observed transcripts are simply the result of transcriptional noise. In this review, we conclude that the full extent of transcription remains an open question that will not be fully addressed until we decipher the complete range and biological diversity of the transcribed genomic sequences.
Collapse
Affiliation(s)
- Mihaela Pertea
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| |
Collapse
|