1
|
Radrizzani S, Kudla G, Izsvák Z, Hurst LD. Selection on synonymous sites: the unwanted transcript hypothesis. Nat Rev Genet 2024; 25:431-448. [PMID: 38297070 DOI: 10.1038/s41576-023-00686-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/04/2023] [Indexed: 02/02/2024]
Abstract
Although translational selection to favour codons that match the most abundant tRNAs is not readily observed in humans, there is nonetheless selection in humans on synonymous mutations. We hypothesize that much of this synonymous site selection can be explained in terms of protection against unwanted RNAs - spurious transcripts, mis-spliced forms or RNAs derived from transposable elements or viruses. We propose not only that selection on synonymous sites functions to reduce the rate of creation of unwanted transcripts (for example, through selection on exonic splice enhancers and cryptic splice sites) but also that high-GC content (but low-CpG content), together with intron presence and position, is both particular to functional native mRNAs and used to recognize transcripts as native. In support of this hypothesis, transcription, nuclear export, liquid phase condensation and RNA degradation have all recently been shown to promote GC-rich transcripts and suppress AU/CpG-rich ones. With such 'traps' being set against AU/CpG-rich transcripts, the codon usage of native genes has, in turn, evolved to avoid such suppression. That parallel filters against AU/CpG-rich transcripts also affect the endosomal import of RNAs further supports the unwanted transcript hypothesis of synonymous site selection and explains the similar design rules that have enabled the successful use of transgenes and RNA vaccines.
Collapse
Affiliation(s)
- Sofia Radrizzani
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
- Milner Therapeutics Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, Institute for Genetics and Cancer, The University of Edinburgh, Edinburgh, UK
| | - Zsuzsanna Izsvák
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Society, Berlin, Germany
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
2
|
Ferrer J, Dimitrova N. Transcription regulation by long non-coding RNAs: mechanisms and disease relevance. Nat Rev Mol Cell Biol 2024; 25:396-415. [PMID: 38242953 PMCID: PMC11045326 DOI: 10.1038/s41580-023-00694-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/11/2023] [Indexed: 01/21/2024]
Abstract
Long non-coding RNAs (lncRNAs) outnumber protein-coding transcripts, but their functions remain largely unknown. In this Review, we discuss the emerging roles of lncRNAs in the control of gene transcription. Some of the best characterized lncRNAs have essential transcription cis-regulatory functions that cannot be easily accomplished by DNA-interacting transcription factors, such as XIST, which controls X-chromosome inactivation, or imprinted lncRNAs that direct allele-specific repression. A growing number of lncRNA transcription units, including CHASERR, PVT1 and HASTER (also known as HNF1A-AS1) act as transcription-stabilizing elements that fine-tune the activity of dosage-sensitive genes that encode transcription factors. Genetic experiments have shown that defects in such transcription stabilizers often cause severe phenotypes. Other lncRNAs, such as lincRNA-p21 (also known as Trp53cor1) and Maenli (Gm29348) contribute to local activation of gene transcription, whereas distinct lncRNAs influence gene transcription in trans. We discuss findings of lncRNAs that elicit a function through either activation of their transcription, transcript elongation and processing or the lncRNA molecule itself. We also discuss emerging evidence of lncRNA involvement in human diseases, and their potential as therapeutic targets.
Collapse
Affiliation(s)
- Jorge Ferrer
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.
- Centro de Investigación Biomédica en Red de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Madrid, Spain.
- Department of Metabolism, Digestion and Reproduction, Imperial College London, London, UK.
| | - Nadya Dimitrova
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, USA.
| |
Collapse
|
3
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
4
|
Ponting CP, Haerty W. Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review. Annu Rev Genomics Hum Genet 2022; 23:153-172. [PMID: 35395170 DOI: 10.1146/annurev-genom-112921-123710] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Do long noncoding RNAs (lncRNAs) contribute little or substantively to human biology? To address how lncRNA loci and their transcripts, structures, interactions, and functions contribute to human traits and disease, we adopt a genome-wide perspective. We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims. We discuss pitfalls of lncRNA experimental and computational methods as well as opposing interpretations of their results. The majority of evidence, we argue, indicates that most lncRNA transcript models reflect transcriptional noise or provide minor regulatory roles, leaving relatively few human lncRNAs that contribute centrally to human development, physiology, or behavior. These important few tend to be spliced and better conserved but lack a simple syntax relating sequence to structure and mechanism, and so resist simple categorization. This genome-wide view should help investigators prioritize individual lncRNAs based on their likely contribution to human biology.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom;
| | | |
Collapse
|
5
|
Ross CJ, Ulitsky I. Discovering functional motifs in long noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1708. [PMID: 34981665 DOI: 10.1002/wrna.1708] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/19/2021] [Accepted: 12/04/2021] [Indexed: 12/27/2022]
Abstract
Long noncoding RNAs (lncRNAs) are products of pervasive transcription that closely resemble messenger RNAs on the molecular level, yet function through largely unknown modes of action. The current model is that the function of lncRNAs often relies on specific, typically short, conserved elements, connected by linkers in which specific sequences and/or structures are less important. This notion has fueled the development of both computational and experimental methods focused on the discovery of functional elements within lncRNA genes, based on diverse signals such as evolutionary conservation, predicted structural elements, or the ability to rescue loss-of-function phenotypes. In this review, we outline the main challenges that the different methods need to overcome, describe the recently developed approaches, and discuss their respective limitations. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs.
Collapse
Affiliation(s)
- Caroline Jane Ross
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Igor Ulitsky
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
6
|
Tan JY, Marques AC. The activity of human enhancers is modulated by the splicing of their associated lncRNAs. PLoS Comput Biol 2022; 18:e1009722. [PMID: 35015755 PMCID: PMC8803168 DOI: 10.1371/journal.pcbi.1009722] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 01/31/2022] [Accepted: 12/05/2021] [Indexed: 11/19/2022] Open
Abstract
Pervasive enhancer transcription is at the origin of more than half of all long noncoding RNAs in humans. Transcription of enhancer-associated long noncoding RNAs (elncRNA) contribute to their cognate enhancer activity and gene expression regulation in cis. Recently, splicing of elncRNAs was shown to be associated with elevated enhancer activity. However, whether splicing of elncRNA transcripts is a mere consequence of accessibility at highly active enhancers or if elncRNA splicing directly impacts enhancer function, remains unanswered. We analysed genetically driven changes in elncRNA splicing, in humans, to address this outstanding question. We showed that splicing related motifs within multi-exonic elncRNAs evolved under selective constraints during human evolution, suggesting the processing of these transcripts is unlikely to have resulted from transcription across spurious splice sites. Using a genome-wide and unbiased approach, we used nucleotide variants as independent genetic factors to directly assess the causal relationship that underpin elncRNA splicing and their cognate enhancer activity. We found that the splicing of most elncRNAs is associated with changes in chromatin signatures at cognate enhancers and target mRNA expression. We provide evidence that efficient and conserved processing of enhancer-associated elncRNAs contributes to enhancer activity. Most, if not all, active enhancers are transcribed, giving rise to a plethora of transcripts, including enhancer-associated long noncoding RNAs (elncRNAs). Changes in elncRNA levels impacts cognate enhancer activity. Recently splicing of elncRNA has also been found to associate with enhancer activity. Whether this associations reflects a contribution of elncRNA splicing to increased enhancer activity or else is simply the consequence of increased chromatin accessibility that promotes transcriptional elongation and allows for spurious splicing events remains unknown. We show that natural selection has acted, at the species and population level, to preserve DNA elements required for frequent and efficient elncRNA splicing Importantly, using a genome-wide and unbiased statistical population genomics approach, we demonstrate that elncRNA splicing is associated with cognate enhancer function, contributing to chromatin status and enhancer activity. Our results provides strong evidence that efficient elncRNA splicing contributes to enhancer activity genome-wide.
Collapse
Affiliation(s)
- Jennifer Yihong Tan
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- * E-mail: (JYT); (ACM)
| | - Ana Claudia Marques
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
- * E-mail: (JYT); (ACM)
| |
Collapse
|
7
|
Abrahams L, Savisaar R, Mordstein C, Young B, Kudla G, Hurst LD. Evidence in disease and non-disease contexts that nonsense mutations cause altered splicing via motif disruption. Nucleic Acids Res 2021; 49:9665-9685. [PMID: 34469537 PMCID: PMC8464065 DOI: 10.1093/nar/gkab750] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 08/17/2021] [Accepted: 08/19/2021] [Indexed: 12/21/2022] Open
Abstract
Transcripts containing premature termination codons (PTCs) can be subject to nonsense-associated alternative splicing (NAS). Two models have been evoked to explain this, scanning and splice motif disruption. The latter postulates that exonic cis motifs, such as exonic splice enhancers (ESEs), are disrupted by nonsense mutations. We employ genome-wide transcriptomic and k-mer enrichment methods to scrutinize this model. First, we show that ESEs are prone to disruptive nonsense mutations owing to their purine richness and paucity of TGA, TAA and TAG. The motif model correctly predicts that NAS rates should be low (we estimate 5–30%) and approximately in line with estimates for the rate at which random point mutations disrupt splicing (8–20%). Further, we find that, as expected, NAS-associated PTCs are predictable from nucleotide-based machine learning approaches to predict splice disruption and, at least for pathogenic variants, are enriched in ESEs. Finally, we find that both in and out of frame mutations to TAA, TGA or TAG are associated with exon skipping. While a higher relative frequency of such skip-inducing mutations in-frame than out of frame lends some credence to the scanning model, these results reinforce the importance of considering splice motif modulation to understand the etiology of PTC-associated disease.
Collapse
Affiliation(s)
- Liam Abrahams
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | - Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK.,Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, 1649-028 Lisboa, Portugal
| | - Christine Mordstein
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK.,MRC Human Genetics Unit, The University of Edinburgh, Crewe Road, Edinburgh EH4 2XU, UK.,Aarhus University, Department of Molecular Biology and Genetics, C F Møllers Allé 3, 8000 Aarhus, Denmark
| | - Bethan Young
- MRC Human Genetics Unit, The University of Edinburgh, Crewe Road, Edinburgh EH4 2XU, UK
| | - Grzegorz Kudla
- MRC Human Genetics Unit, The University of Edinburgh, Crewe Road, Edinburgh EH4 2XU, UK
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| |
Collapse
|
8
|
Comparative genomics in the search for conserved long noncoding RNAs. Essays Biochem 2021; 65:741-749. [PMID: 33885137 PMCID: PMC8564735 DOI: 10.1042/ebc20200069] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 02/15/2021] [Accepted: 03/15/2021] [Indexed: 12/23/2022]
Abstract
Long noncoding RNAs (lncRNAs) have emerged as prominent regulators of gene expression in eukaryotes. The identification of lncRNA orthologs is essential in efforts to decipher their roles across model organisms, as homologous genes tend to have similar molecular and biological functions. The relatively high sequence plasticity of lncRNA genes compared with protein-coding genes, makes the identification of their orthologs a challenging task. This is why comparative genomics of lncRNAs requires the development of specific and, sometimes, complex approaches. Here, we briefly review current advancements and challenges associated with four levels of lncRNA conservation: genomic sequences, splicing signals, secondary structures and syntenic transcription.
Collapse
|
9
|
Long non-coding RNAs and splicing. Essays Biochem 2021; 65:723-729. [PMID: 33835135 DOI: 10.1042/ebc20200087] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 02/05/2021] [Accepted: 03/15/2021] [Indexed: 12/25/2022]
Abstract
In this review I focus on the role of splicing in long non-coding RNA (lncRNA) life. First, I summarize differences between the splicing efficiency of protein-coding genes and lncRNAs and discuss why non-coding RNAs are spliced less efficiently. In the second half of the review, I speculate why splice sites are the most conserved sequences in lncRNAs and what additional roles could splicing play in lncRNA metabolism. I discuss the hypothesis that the splicing machinery can, besides its dominant role in intron removal and exon joining, protect cells from undesired transcripts.
Collapse
|
10
|
A first exon termination checkpoint preferentially suppresses extragenic transcription. Nat Struct Mol Biol 2021; 28:337-346. [PMID: 33767452 PMCID: PMC7610630 DOI: 10.1038/s41594-021-00572-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 02/12/2021] [Indexed: 01/31/2023]
Abstract
Interactions between the splicing machinery and RNA polymerase II increase protein-coding gene transcription. Similarly, exons and splicing signals of enhancer-generated long noncoding RNAs (elncRNAs) augment enhancer activity. However, elncRNAs are inefficiently spliced, suggesting that, compared with protein-coding genes, they contain qualitatively different exons with a limited ability to drive splicing. We show here that the inefficiently spliced first exons of elncRNAs as well as promoter-antisense long noncoding RNAs (pa-lncRNAs) in human and mouse cells trigger a transcription termination checkpoint that requires WDR82, an RNA polymerase II-binding protein, and its RNA-binding partner of previously unknown function, ZC3H4. We propose that the first exons of elncRNAs and pa-lncRNAs are an intrinsic component of a regulatory mechanism that, on the one hand, maximizes the activity of these cis-regulatory elements by recruiting the splicing machinery and, on the other, contains elements that suppress pervasive extragenic transcription.
Collapse
|
11
|
Bryzghalov O, Makałowska I, Szcześniak MW. lncEvo: automated identification and conservation study of long noncoding RNAs. BMC Bioinformatics 2021; 22:59. [PMID: 33563213 PMCID: PMC7871587 DOI: 10.1186/s12859-021-03991-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 02/01/2021] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Long noncoding RNAs represent a large class of transcripts with two common features: they exceed an arbitrary length threshold of 200 nt and are assumed to not encode proteins. Although a growing body of evidence indicates that the vast majority of lncRNAs are potentially nonfunctional, hundreds of them have already been revealed to perform essential gene regulatory functions or to be linked to a number of cellular processes, including those associated with the etiology of human diseases. To better understand the biology of lncRNAs, it is essential to perform a more in-depth study of their evolution. In contrast to protein-encoding transcripts, however, they do not show the strong sequence conservation that usually results from purifying selection; therefore, software that is typically used to resolve the evolutionary relationships of protein-encoding genes and transcripts is not applicable to the study of lncRNAs. RESULTS To tackle this issue, we developed lncEvo, a computational pipeline that consists of three modules: (1) transcriptome assembly from RNA-Seq data, (2) prediction of lncRNAs, and (3) conservation study-a genome-wide comparison of lncRNA transcriptomes between two species of interest, including search for orthologs. Importantly, one can choose to apply lncEvo solely for transcriptome assembly or lncRNA prediction, without calling the conservation-related part. CONCLUSIONS lncEvo is an all-in-one tool built with the Nextflow framework, utilizing state-of-the-art software and algorithms with customizable trade-offs between speed and sensitivity, ease of use and built-in reporting functionalities. The source code of the pipeline is freely available for academic and nonacademic use under the MIT license at https://gitlab.com/spirit678/lncrna_conservation_nf .
Collapse
Affiliation(s)
- Oleksii Bryzghalov
- Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland
| | - Izabela Makałowska
- Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland
| | - Michał Wojciech Szcześniak
- Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, 61-614, Poznan, Poland.
| |
Collapse
|
12
|
Palazzo AF, Koonin EV. Functional Long Non-coding RNAs Evolve from Junk Transcripts. Cell 2020; 183:1151-1161. [PMID: 33068526 DOI: 10.1016/j.cell.2020.09.047] [Citation(s) in RCA: 125] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 08/20/2020] [Accepted: 09/17/2020] [Indexed: 12/30/2022]
Abstract
Transcriptome studies reveal pervasive transcription of complex genomes, such as those of mammals. Despite popular arguments for functionality of most, if not all, of these transcripts, genome-wide analysis of selective constraints indicates that most of the produced RNA are junk. However, junk is not garbage. On the contrary, junk transcripts provide the raw material for the evolution of diverse long non-coding (lnc) RNAs by non-adaptive mechanisms, such as constructive neutral evolution. The generation of many novel functional entities, such as lncRNAs, that fuels organismal complexity does not seem to be driven by strong positive selection. Rather, the weak selection regime that dominates the evolution of most multicellular eukaryotes provides ample material for functional innovation with relatively little adaptation involved.
Collapse
Affiliation(s)
- Alexander F Palazzo
- Department of Biochemistry, University of Toronto, Toronto, ON M5G 1M1, Canada.
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
13
|
Schertzer MD, Murvin MM, Calabrese JM. Using RNA Sequencing and Spike-in RNAs to Measure Intracellular Abundance of lncRNAs and mRNAs. Bio Protoc 2020; 10:e3772. [PMID: 33204768 DOI: 10.21769/bioprotoc.3772] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) play essential roles in normal physiology and in disease but their mechanisms of action can be challenging to identify. For mechanistic studies, it is often useful to know a lncRNA's intracellular abundance, i.e., approximately how many molecules of the lncRNA are present in a typical cell of a cell-type of interest. At least two approaches have been used to approximate lncRNA intracellular abundance: single-molecule sensitivity RNA fluorescence in situ hybridization (smFISH) and single-gene, calibrated reverse-transcription followed by quantitative PCR (RT-qPCR). However, like all experimental approaches, these methods have their limitations. smFISH, when analyzed using diffraction-limited microscopy, can underestimate intracellular abundance, especially for lncRNAs that accumulate in focused subcellular regions. Calibrated RT-qPCR may return inaccurate estimates of abundance because individual PCR amplicons spaced across the length of a transcript can vary in their efficiency of reverse transcription. Here, we describe a sequencing-based approach that is straightforward, orthogonal to smFISH and RT-qPCR, and can be used to approximate the intracellular abundance for most expressed long RNAs (lncRNAs and mRNAs) in a cell type of interest. Firstly, the average weight of total RNA per cell for the cell type of interest is estimated by replicate rounds of RNA purification from a known number of cells. Secondly, an rRNA-depletion RNA-Seq protocol is performed after adding spike-in control RNAs to a known quantity of total cellular RNA. Lastly, by comparing read counts per transcript to a standard curve derived from the spiked-in RNAs, the intracellular abundance for each transcript is estimated. The sequencing-based approach provides a powerful complement to existing methods, particularly in situations where it is desirable to quantify the abundance of multiple lncRNAs and/or mRNAs simultaneously.
Collapse
Affiliation(s)
- Megan D Schertzer
- Department of Pharmacology and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA.,Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA
| | - McKenzie M Murvin
- Department of Pharmacology and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA.,Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA
| | - J Mauro Calabrese
- Department of Pharmacology and Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, 120 Mason Farm Road, Chapel Hill, NC, 27599, USA
| |
Collapse
|
14
|
Ntini E, Marsico A. Functional impacts of non-coding RNA processing on enhancer activity and target gene expression. J Mol Cell Biol 2020; 11:868-879. [PMID: 31169884 PMCID: PMC6884709 DOI: 10.1093/jmcb/mjz047] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 04/03/2019] [Accepted: 04/04/2019] [Indexed: 01/06/2023] Open
Abstract
Tight regulation of gene expression is orchestrated by enhancers. Through recent research advancements, it is becoming clear that enhancers are not solely distal regulatory elements harboring transcription factor binding sites and decorated with specific histone marks, but they rather display signatures of active transcription, showing distinct degrees of transcription unit organization. Thereby, a substantial fraction of enhancers give rise to different species of non-coding RNA transcripts with an unprecedented range of potential functions. In this review, we bring together data from recent studies indicating that non-coding RNA transcription from active enhancers, as well as enhancer-produced long non-coding RNA transcripts, may modulate or define the functional regulatory potential of the cognate enhancer. In addition, we summarize supporting evidence that RNA processing of the enhancer-associated long non-coding RNA transcripts may constitute an additional layer of regulation of enhancer activity, which contributes to the control and final outcome of enhancer-targeted gene expression.
Collapse
Affiliation(s)
- Evgenia Ntini
- Max Planck Institute for Molecular Genetics, Berlin, Germany.,Free University Berlin, Berlin, Germany
| | - Annalisa Marsico
- Max Planck Institute for Molecular Genetics, Berlin, Germany.,Free University Berlin, Berlin, Germany.,Institute of Computational Biology, Helmholtz Zentrum München, München, Germany
| |
Collapse
|
15
|
Abou Alezz M, Celli L, Belotti G, Lisa A, Bione S. GC-AG Introns Features in Long Non-coding and Protein-Coding Genes Suggest Their Role in Gene Expression Regulation. Front Genet 2020; 11:488. [PMID: 32499820 PMCID: PMC7242645 DOI: 10.3389/fgene.2020.00488] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 04/20/2020] [Indexed: 12/16/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are recognized as an important class of regulatory molecules involved in a variety of biological functions. However, the regulatory mechanisms of long non-coding genes expression are still poorly understood. The characterization of the genomic features of lncRNAs is crucial to get insight into their function. In this study, we exploited recent annotations by GENCODE to characterize the genomic and splicing features of long non-coding genes in comparison with protein-coding ones, both in human and mouse. Our analysis highlighted differences between the two classes of genes in terms of their gene architecture. Significant differences in the splice sites usage were observed between long non-coding and protein-coding genes (PCG). While the frequency of non-canonical GC-AG splice junctions represents about 0.8% of total splice sites in PCGs, we identified a significant enrichment of the GC-AG splice sites in long non-coding genes, both in human (3.0%) and mouse (1.9%). In addition, we found a positional bias of GC-AG splice sites being enriched in the first intron in both classes of genes. Moreover, a significant shorter length and weaker donor and acceptor sites were found comparing GC-AG introns to GT-AG introns. Genes containing at least one GC-AG intron were found conserved in many species, more prone to alternative splicing and a functional analysis pointed toward their enrichment in specific biological processes such as DNA repair. Our study shows for the first time that GC-AG introns are mainly associated with lncRNAs and are preferentially located in the first intron. Additionally, we discovered their regulatory potential indicating the existence of a new mechanism of non-coding and PCGs expression regulation.
Collapse
Affiliation(s)
- Monah Abou Alezz
- Computational Biology Unit, Institute of Molecular Genetics Luigi Luca Cavalli-Sforza, National Research Council, Pavia, Italy
| | - Ludovica Celli
- Computational Biology Unit, Institute of Molecular Genetics Luigi Luca Cavalli-Sforza, National Research Council, Pavia, Italy
| | - Giulia Belotti
- Computational Biology Unit, Institute of Molecular Genetics Luigi Luca Cavalli-Sforza, National Research Council, Pavia, Italy
| | - Antonella Lisa
- Computational Biology Unit, Institute of Molecular Genetics Luigi Luca Cavalli-Sforza, National Research Council, Pavia, Italy
| | - Silvia Bione
- Computational Biology Unit, Institute of Molecular Genetics Luigi Luca Cavalli-Sforza, National Research Council, Pavia, Italy
| |
Collapse
|
16
|
Abrahams L, Hurst LD. A Depletion of Stop Codons in lincRNA is Owing to Transfer of Selective Constraint from Coding Sequences. Mol Biol Evol 2020; 37:1148-1164. [PMID: 31841162 PMCID: PMC7086181 DOI: 10.1093/molbev/msz299] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Although the constraints on a gene’s sequence are often assumed to reflect the functioning of that gene, here we propose transfer selection, a constraint operating on one class of genes transferred to another, mediated by shared binding factors. We show that such transfer can explain an otherwise paradoxical depletion of stop codons in long intergenic noncoding RNAs (lincRNAs). Serine/arginine-rich proteins direct the splicing machinery by binding exonic splice enhancers (ESEs) in immature mRNA. As coding exons cannot contain stop codons in one reading frame, stop codons should be rare within ESEs. We confirm that the stop codon density (SCD) in ESE motifs is low, even accounting for nucleotide biases. Given that serine/arginine-rich proteins binding ESEs also facilitate lincRNA splicing, a low SCD could transfer to lincRNAs. As predicted, multiexon lincRNA exons are depleted in stop codons, a result not explained by open reading frame (ORF) contamination. Consistent with transfer selection, stop codon depletion in lincRNAs is most acute in exonic regions with the highest ESE density, disappears when ESEs are masked, is consistent with stop codon usage skews in ESEs, and is diminished in both single-exon lincRNAs and introns. Owing to low SCD, the maximum lengths of pseudo-ORFs frequently exceed null expectations. This has implications for ORF annotation and the evolution of de novo protein-coding genes from lincRNAs. We conclude that not all constraints operating on genes need be explained by the functioning of the gene but may instead be transferred owing to shared binding factors.
Collapse
Affiliation(s)
- Liam Abrahams
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
17
|
Tan JY, Biasini A, Young RS, Marques AC. Splicing of enhancer-associated lincRNAs contributes to enhancer activity. Life Sci Alliance 2020; 3:3/4/e202000663. [PMID: 32086317 PMCID: PMC7035876 DOI: 10.26508/lsa.202000663] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 02/12/2020] [Accepted: 02/13/2020] [Indexed: 12/19/2022] Open
Abstract
Transcription is common at active mammalian enhancers sometimes giving rise to stable enhancer-associated long intergenic noncoding RNAs (elincRNAs). Expression of elincRNA is associated with changes in neighboring gene product abundance and local chromosomal topology, suggesting that transcription at these loci contributes to gene expression regulation in cis Despite the lack of evidence supporting sequence-dependent functions for most elincRNAs, splicing of these transcripts is unexpectedly common. Whether elincRNA splicing is a mere consequence of cognate enhancer activity or if it directly impacts enhancer function remains unresolved. Here, we investigate the association between elincRNA splicing and enhancer activity in mouse embryonic stem cells. We show that multi-exonic elincRNAs are enriched at conserved enhancers, and the efficient processing of elincRNAs is strongly associated with their cognate enhancer activity. This association is supported by their enrichment in enhancer-specific chromatin signatures; elevated binding of co-transcriptional regulators; increased local intra-chromosomal DNA contacts; and strengthened cis-regulation on target gene expression. Our results support the role of efficient RNA processing of enhancer-associated transcripts to cognate enhancer activity.
Collapse
Affiliation(s)
- Jennifer Y Tan
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Adriano Biasini
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | - Robert S Young
- Medical Research Council Human Genetics Unit, Medical Research Council Institute of Genetics & Molecular Medicine, University of Edinburgh, Edinburgh, UK
| | - Ana C Marques
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
18
|
Darbellay F, Necsulea A. Comparative Transcriptomics Analyses across Species, Organs, and Developmental Stages Reveal Functionally Constrained lncRNAs. Mol Biol Evol 2020; 37:240-259. [PMID: 31539080 PMCID: PMC6984365 DOI: 10.1093/molbev/msz212] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
The functionality of long noncoding RNAs (lncRNAs) is disputed. In general, lncRNAs are under weak selective pressures, suggesting that the majority of lncRNAs may be nonfunctional. However, although some surveys showed negligible phenotypic effects upon lncRNA perturbation, key biological roles were demonstrated for individual lncRNAs. Most lncRNAs with proven functions were implicated in gene expression regulation, in pathways related to cellular pluripotency, differentiation, and organ morphogenesis, suggesting that functional lncRNAs may be more abundant in embryonic development, rather than in adult organs. To test this hypothesis, we perform a multidimensional comparative transcriptomics analysis, across five developmental time points (two embryonic stages, newborn, adult, and aged individuals), four organs (brain, kidney, liver, and testes), and three species (mouse, rat, and chicken). We find that, overwhelmingly, lncRNAs are preferentially expressed in adult and aged testes, consistent with the presence of permissive transcription during spermatogenesis. LncRNAs are often differentially expressed among developmental stages and are less abundant in embryos and newborns compared with adult individuals, in agreement with a requirement for tighter expression control and less tolerance for noisy transcription early in development. For differentially expressed lncRNAs, we find that the patterns of expression variation among developmental stages are generally conserved between mouse and rat. Moreover, lncRNAs expressed above noise levels in somatic organs and during development show higher evolutionary conservation, in particular, at their promoter regions. Thus, we show that functionally constrained lncRNA loci are enriched in developing organs, and we suggest that many of these loci may function in an RNA-independent manner.
Collapse
Affiliation(s)
- Fabrice Darbellay
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Anamaria Necsulea
- School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland.,Laboratoire de Biométrie et Biologie Évolutive, CNRS UMR 5558, Université de Lyon, Université Lyon 1, Villeurbanne, France
| |
Collapse
|
19
|
Gil N, Ulitsky I. Regulation of gene expression by cis-acting long non-coding RNAs. Nat Rev Genet 2019; 21:102-117. [DOI: 10.1038/s41576-019-0184-5] [Citation(s) in RCA: 296] [Impact Index Per Article: 59.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/07/2019] [Indexed: 12/14/2022]
|
20
|
Walter Costa MB, Höner Zu Siederdissen C, Dunjić M, Stadler PF, Nowick K. SSS-test: a novel test for detecting positive selection on RNA secondary structure. BMC Bioinformatics 2019; 20:151. [PMID: 30898084 PMCID: PMC6429701 DOI: 10.1186/s12859-019-2711-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 03/03/2019] [Indexed: 12/23/2022] Open
Abstract
Background Long non-coding RNAs (lncRNAs) play an important role in regulating gene expression and are thus important for determining phenotypes. Most attempts to measure selection in lncRNAs have focused on the primary sequence. The majority of small RNAs and at least some parts of lncRNAs must fold into specific structures to perform their biological function. Comprehensive assessments of selection acting on RNAs therefore must also encompass structure. Selection pressures acting on the structure of non-coding genes can be detected within multiple sequence alignments. Approaches of this type, however, have so far focused on negative selection. Thus, a computational method for identifying ncRNAs under positive selection is needed. Results We introduce the SSS-test (test for Selection on Secondary Structure) to identify positive selection and thus adaptive evolution. Benchmarks with biological as well as synthetic controls yield coherent signals for both negative and positive selection, demonstrating the functionality of the test. A survey of a lncRNA collection comprising 15,443 families resulted in 110 candidates that appear to be under positive selection in human. In 26 lncRNAs that have been associated with psychiatric disorders we identified local structures that have signs of positive selection in the human lineage. Conclusions It is feasible to assay positive selection acting on RNA secondary structures on a genome-wide scale. The detection of human-specific positive selection in lncRNAs associated with cognitive disorder provides a set of candidate genes for further experimental testing and may provide insights into the evolution of cognitive abilities in humans. Availability The SSS-test and related software is available at: https://github.com/waltercostamb/SSS-test. The databases used in this work are available at: http://www.bioinf.uni-leipzig.de/Software/SSS-test/. Electronic supplementary material The online version of this article (10.1186/s12859-019-2711-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maria Beatriz Walter Costa
- Embrapa Agroenergia, Parque Estação Biológica (PqEB), Asa Norte, Brasília, DF, 70770-901, Brazil. .,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany.
| | - Christian Höner Zu Siederdissen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany
| | - Marko Dunjić
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Königin-Luise-Straße 1-3, Berlin, 14195, Germany.,Center for Human Molecular Genetics, Faculty of Biology, University of Belgrade, Studentski trg 16, PO box 43, Belgrade, 11000, Serbia
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig & Competence Center for Scalable Data Services and Solutions Dresden-Leipzig & Leipzig Research Center for Civilization Diseases, University Leipzig, Leipzig, 04107, Germany.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, 04103, Germany.,Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria.,Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870, Denmark.,Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Ciudad Universitaria, Bogotá, D.C., COL-111321, Colombia.,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA
| | - Katja Nowick
- Human Biology Group, Institute for Biology, Department of Biology, Chemistry, Pharmacy, Freie Universitaet Berlin, Königin-Luise-Straße 1-3, Berlin, 14195, Germany. .,TFome Research Group, Bioinformatics Group, Interdisciplinary Center of Bioinformatics, Department of Computer Science, University of Leipzig, Härtelstraße 16-18, Leipzig, 04107, Germany. .,Paul-Flechsig-Institute for Brain Research, University of Leipzig, Liebigstraße 19. Haus C, Leipzig, 04103, Germany. .,Bioinformatics, Faculty of Agricultural Sciences, Institute of Animal Science, University of Hohenheim, Garbenstraße 13, Stuttgart, 70593, Germany.
| |
Collapse
|
21
|
Krchňáková Z, Thakur PK, Krausová M, Bieberstein N, Haberman N, Müller-McNicoll M, Staněk D. Splicing of long non-coding RNAs primarily depends on polypyrimidine tract and 5' splice-site sequences due to weak interactions with SR proteins. Nucleic Acids Res 2019; 47:911-928. [PMID: 30445574 PMCID: PMC6344860 DOI: 10.1093/nar/gky1147] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 10/26/2018] [Accepted: 10/30/2018] [Indexed: 12/20/2022] Open
Abstract
Many nascent long non-coding RNAs (lncRNAs) undergo the same maturation steps as pre-mRNAs of protein-coding genes (PCGs), but they are often poorly spliced. To identify the underlying mechanisms for this phenomenon, we searched for putative splicing inhibitory sequences using the ncRNA-a2 as a model. Genome-wide analyses of intergenic lncRNAs (lincRNAs) revealed that lincRNA splicing efficiency positively correlates with 5'ss strength while no such correlation was identified for PCGs. In addition, efficiently spliced lincRNAs have higher thymidine content in the polypyrimidine tract (PPT) compared to efficiently spliced PCGs. Using model lincRNAs, we provide experimental evidence that strengthening the 5'ss and increasing the T content in PPT significantly enhances lincRNA splicing. We further showed that lincRNA exons contain less putative binding sites for SR proteins. To map binding of SR proteins to lincRNAs, we performed iCLIP with SRSF2, SRSF5 and SRSF6 and analyzed eCLIP data for SRSF1, SRSF7 and SRSF9. All examined SR proteins bind lincRNA exons to a much lower extent than expression-matched PCGs. We propose that lincRNAs lack the cooperative interaction network that enhances splicing, which renders their splicing outcome more dependent on the optimality of splice sites.
Collapse
Affiliation(s)
- Zuzana Krchňáková
- Institute of Molecular Genetics, Czech Academy of Sciences, Prague, Czech Republic
| | - Prasoon Kumar Thakur
- Institute of Molecular Genetics, Czech Academy of Sciences, Prague, Czech Republic
| | - Michaela Krausová
- Institute of Molecular Genetics, Czech Academy of Sciences, Prague, Czech Republic
| | - Nicole Bieberstein
- Institute of Molecular Genetics, Czech Academy of Sciences, Prague, Czech Republic
| | - Nejc Haberman
- Computational Regulatory Genomics, MRC London Institute of Medical Sciences, London W12 0NN, UK
| | | | - David Staněk
- Institute of Molecular Genetics, Czech Academy of Sciences, Prague, Czech Republic
| |
Collapse
|
22
|
Production of Spliced Long Noncoding RNAs Specifies Regions with Increased Enhancer Activity. Cell Syst 2018; 7:537-547.e3. [PMID: 30447999 DOI: 10.1016/j.cels.2018.10.009] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2018] [Revised: 09/12/2018] [Accepted: 10/16/2018] [Indexed: 12/28/2022]
Abstract
Active enhancers in mammals produce enhancer RNAs (eRNAs) that are bidirectionally transcribed, unspliced, and unstable. Enhancer regions are also enriched with long noncoding RNA (lncRNA) transcripts, which are typically spliced and substantially more stable. In order to explore the relationship between these two classes of RNAs, we analyzed DNase hypersensitive sites with evidence of bidirectional transcription, which we termed eRNA-producing centers (EPCs). EPCs found very close to transcription start sites of lncRNAs exhibit attributes of both enhancers and promoters, including distinctive DNA motifs and a characteristic chromatin landscape. These EPCs are associated with higher enhancer activity, driven at least in part by the presence of conserved, directional splicing signals that promote lncRNA production, pointing at a causal role of lncRNA processing in enhancer activity. Together, our results suggest that the conserved ability of some enhancers to produce lncRNAs augments their activity in a manner likely mediated through lncRNA maturation.
Collapse
|
23
|
Le Béguec C, Wucher V, Lagoutte L, Cadieu E, Botherel N, Hédan B, De Brito C, Guillory AS, André C, Derrien T, Hitte C. Characterisation and functional predictions of canine long non-coding RNAs. Sci Rep 2018; 8:13444. [PMID: 30194329 PMCID: PMC6128939 DOI: 10.1038/s41598-018-31770-2] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 08/24/2018] [Indexed: 02/07/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) are a family of heterogeneous RNAs that play major roles in multiple biological processes. We recently identified an extended repertoire of more than 10,000 lncRNAs of the domestic dog however, predicting their biological functionality remains challenging. In this study, we have characterised the expression profiles of 10,444 canine lncRNAs in 26 distinct tissue types, representing various anatomical systems. We showed that lncRNA expressions are mainly clustered by tissue type and we highlighted that 44% of canine lncRNAs are expressed in a tissue-specific manner. We further demonstrated that tissue-specificity correlates with specific families of canine transposable elements. In addition, we identified more than 900 conserved dog-human lncRNAs for which we show their overall reproducible expression patterns between dog and human through comparative transcriptomics. Finally, co-expression analyses of lncRNA and neighbouring protein-coding genes identified more than 3,400 canine lncRNAs, suggesting that functional roles of these lncRNAs act as regulatory elements. Altogether, this genomic and transcriptomic integrative study of lncRNAs constitutes a major resource to investigate genotype to phenotype relationships and biomedical research in the dog species.
Collapse
Affiliation(s)
- Céline Le Béguec
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France
| | - Valentin Wucher
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France.,Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona, 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Lætitia Lagoutte
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France.,UMR PEGASE, Agrocampus Ouest, INRA, 35042, Rennes, France
| | - Edouard Cadieu
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France
| | - Nadine Botherel
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France
| | - Benoît Hédan
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France
| | - Clotilde De Brito
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France
| | - Anne-Sophie Guillory
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France
| | - Catherine André
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France
| | - Thomas Derrien
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France.
| | - Christophe Hitte
- Univ Rennes, CNRS, IGDR (Institut de génétique et développement de Rennes) - UMR 6290, F-35000, Rennes, France.
| |
Collapse
|
24
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
25
|
Bedoya-Reina OC, Ponting CP. Functional RNA classes: a matter of time? Nat Struct Mol Biol 2017; 24:7-8. [PMID: 28054565 DOI: 10.1038/nsmb.3354] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Oscar C Bedoya-Reina
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
| | - Chris P Ponting
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, UK
| |
Collapse
|
26
|
Abstract
Strong DNA conservation among divergent species is an indicator of enduring functionality. With weaker sequence conservation we enter a vast ‘twilight zone’ in which sequence subject to transient or lower constraint cannot be distinguished easily from neutrally evolving, non-functional sequence. Twilight zone functional sequence is illuminated instead by principles of selective constraint and positive selection using genomic data acquired from within a species’ population. Application of these principles reveals that despite being biochemically active, most twilight zone sequence is not functional.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Human Genetics Unit, The Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK.
| |
Collapse
|
27
|
Short (16-mer) locked nucleic acid splice-switching oligonucleotides restore dystrophin production in Duchenne Muscular Dystrophy myotubes. PLoS One 2017; 12:e0181065. [PMID: 28742140 PMCID: PMC5524367 DOI: 10.1371/journal.pone.0181065] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 06/26/2017] [Indexed: 12/13/2022] Open
Abstract
Splice-switching antisense oligonucleotides (SSOs) offer great potential for RNA-targeting therapies, and two SSO drugs have been recently approved for treating Duchenne Muscular Dystrophy (DMD) and Spinal Muscular Atrophy (SMA). Despite promising results, new developments are still needed for more efficient chemistries and delivery systems. Locked nucleic acid (LNA) is a chemically modified nucleic acid that presents several attractive properties, such as high melting temperature when bound to RNA, potent biological activity, high stability and low toxicity in vivo. Here, we designed a series of LNA-based SSOs complementary to two sequences of the human dystrophin exon 51 that are most evolutionary conserved and evaluated their ability to induce exon skipping upon transfection into myoblasts derived from a DMD patient. We show that 16-mers with 60% of LNA modification efficiently induce exon skipping and restore synthesis of a truncated dystrophin isoform that localizes to the plasma membrane of patient-derived myotubes differentiated in culture. In sum, this study underscores the value of short LNA-modified SSOs for therapeutic applications.
Collapse
|
28
|
Savisaar R, Hurst LD. Estimating the prevalence of functional exonic splice regulatory information. Hum Genet 2017; 136:1059-1078. [PMID: 28405812 PMCID: PMC5602102 DOI: 10.1007/s00439-017-1798-3] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022]
Abstract
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.
Collapse
Affiliation(s)
- Rosina Savisaar
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK.
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
29
|
Melé M, Mattioli K, Mallard W, Shechner DM, Gerhardinger C, Rinn JL. Chromatin environment, transcriptional regulation, and splicing distinguish lincRNAs and mRNAs. Genome Res 2016; 27:27-37. [PMID: 27927715 PMCID: PMC5204342 DOI: 10.1101/gr.214205.116] [Citation(s) in RCA: 157] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2016] [Accepted: 11/09/2016] [Indexed: 12/29/2022]
Abstract
While long intergenic noncoding RNAs (lincRNAs) and mRNAs share similar biogenesis pathways, these transcript classes differ in many regards. LincRNAs are less evolutionarily conserved, less abundant, and more tissue-specific, suggesting that their pre- and post-transcriptional regulation is different from that of mRNAs. Here, we perform an in-depth characterization of the features that contribute to lincRNA regulation in multiple human cell lines. We find that lincRNA promoters are depleted of transcription factor (TF) binding sites, yet enriched for some specific factors such as GATA and FOS relative to mRNA promoters. Surprisingly, we find that H3K9me3—a histone modification typically associated with transcriptional repression—is more enriched at the promoters of active lincRNA loci than at those of active mRNAs. Moreover, H3K9me3-marked lincRNA genes are more tissue-specific. The most discriminant differences between lincRNAs and mRNAs involve splicing. LincRNAs are less efficiently spliced, which cannot be explained by differences in U1 binding or the density of exonic splicing enhancers but may be partially attributed to lower U2AF65 binding and weaker splicing-related motifs. Conversely, the stability of lincRNAs and mRNAs is similar, differing only with regard to the location of stabilizing protein binding sites. Finally, we find that certain transcriptional properties are correlated with higher evolutionary conservation in both DNA and RNA motifs and are enriched in lincRNAs that have been functionally characterized.
Collapse
Affiliation(s)
- Marta Melé
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - Kaia Mattioli
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Biological and Biomedical Sciences, Harvard University, Boston, Massachusetts 02115, USA
| | - William Mallard
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - David M Shechner
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - Chiara Gerhardinger
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - John L Rinn
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Pathology, Beth Israel Deaconess Medical Center, Boston, Massachusetts 02215, USA
| |
Collapse
|
30
|
Integrative classification of human coding and noncoding genes through RNA metabolism profiles. Nat Struct Mol Biol 2016; 24:86-96. [PMID: 27870833 DOI: 10.1038/nsmb.3325] [Citation(s) in RCA: 117] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Accepted: 10/18/2016] [Indexed: 12/26/2022]
Abstract
Pervasive transcription of the human genome results in a heterogeneous mix of coding RNAs and long noncoding RNAs (lncRNAs). Only a small fraction of lncRNAs have demonstrated regulatory functions, thus making functional lncRNAs difficult to distinguish from nonfunctional transcriptional byproducts. This difficulty has resulted in numerous competing human lncRNA classifications that are complicated by a steady increase in the number of annotated lncRNAs. To address these challenges, we quantitatively examined transcription, splicing, degradation, localization and translation for coding and noncoding human genes. We observed that annotated lncRNAs had lower synthesis and higher degradation rates than mRNAs and discovered mechanistic differences explaining slower lncRNA splicing. We grouped genes into classes with similar RNA metabolism profiles, containing both mRNAs and lncRNAs to varying extents. These classes exhibited distinct RNA metabolism, different evolutionary patterns and differential sensitivity to cellular RNA-regulatory pathways. Our classification provides an alternative to genomic context-driven annotations of lncRNAs.
Collapse
|
31
|
Secondary structure impacts patterns of selection in human lncRNAs. BMC Biol 2016; 14:60. [PMID: 27457204 PMCID: PMC4960838 DOI: 10.1186/s12915-016-0283-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Accepted: 07/04/2016] [Indexed: 02/04/2023] Open
Abstract
Background Metazoans transcribe many long non-coding RNAs (lncRNAs) that are poorly conserved and whose function remains unknown. This has raised the questions of what fraction of the predicted lncRNAs is actually functional, and whether selection can effectively constrain lncRNAs in species with small effective population sizes such as human populations. Results Here we evaluate signatures of selection in human lncRNAs using inter-specific data and intra-specific comparisons from five major populations, as well as by assessing relationships between sequence variation and predictions of secondary structure. In all analyses we included a reference of functionally characterized lncRNAs. Altogether, our results show compelling evidence of recent purifying selection acting on both characterized and predicted lncRNAs. We found that RNA secondary structure constrains sequence variation in lncRNAs, so that polymorphisms are depleted in paired regions with low accessibility and tend to be neutral with respect to structural stability. Conclusions Important implications of our results are that secondary structure plays a role in the functionality of lncRNAs, and that the set of predicted lncRNAs contains a large fraction of functional ones that may play key roles that remain to be discovered. Electronic supplementary material The online version of this article (doi:10.1186/s12915-016-0283-0) contains supplementary material, which is available to authorized users.
Collapse
|
32
|
McLysaght A, Hurst LD. Open questions in the study of de novo genes: what, how and why. Nat Rev Genet 2016; 17:567-78. [PMID: 27452112 DOI: 10.1038/nrg.2016.78] [Citation(s) in RCA: 125] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The study of de novo protein-coding genes is maturing from the ad hoc reporting of individual cases to the systematic analysis of extensive genomic data from several species. We identify three key challenges for this emerging field: understanding how best to identify de novo genes, how they arise and why they spread. We highlight the intellectual challenges of understanding how a de novo gene becomes integrated into pre-existing functions and becomes essential. We suggest that, as with protein sequence evolution, antagonistic co-evolution may be key to de novo gene evolution, particularly for new essential genes and new cancer-associated genes.
Collapse
Affiliation(s)
- Aoife McLysaght
- The Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Ireland
| | - Laurence D Hurst
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, Somerset BA2 7AY, UK
| |
Collapse
|
33
|
Nitsche A, Stadler PF. Evolutionary clues in lncRNAs. WILEY INTERDISCIPLINARY REVIEWS-RNA 2016; 8. [PMID: 27436689 DOI: 10.1002/wrna.1376] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 06/06/2016] [Accepted: 06/09/2016] [Indexed: 12/13/2022]
Abstract
The diversity of long non-coding RNAs (lncRNAs) in the human transcriptome is in stark contrast to the sparse exploration of their functions concomitant with their conservation and evolution. The pervasive transcription of the largely non-coding human genome makes the evolutionary age and conservation patterns of lncRNAs to a topic of interest. Yet it is a fairly unexplored field and not that easy to determine as for protein-coding genes. Although there are a few experimentally studied cases, which are conserved at the sequence level, most lncRNAs exhibit weak or untraceable primary sequence conservation. Recent studies shed light on the interspecies conservation of secondary structures among lncRNA homologs by using diverse computational methods. This highlights the importance of structure on functionality of lncRNAs as opposed to the poor impact of primary sequence changes. Further clues in the evolution of lncRNAs are given by selective constraints on non-coding gene structures (e.g., promoters or splice sites) as well as the conservation of prevalent spatio-temporal expression patterns. However, a rapid evolutionary turnover is observable throughout the heterogeneous group of lncRNAs. This still gives rise to questions about its functional meaning. WIREs RNA 2017, 8:e1376. doi: 10.1002/wrna.1376 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Anne Nitsche
- Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Institute de Biologie Moléculaire et Cellulaire, Université de Strasbourg, Cedex, France
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Interdisciplinary Center for Bioinformatics, University Leipzig, Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.,Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology - IZI, Leipzig, Germany.,Center for Non-Coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.,Department of Theoretical Chemistry, University of Vienna, Wien, Austria.,Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
34
|
Conservation of the Exon-Intron Structure of Long Intergenic Non-Coding RNA Genes in Eutherian Mammals. Life (Basel) 2016; 6:life6030027. [PMID: 27429005 PMCID: PMC5041003 DOI: 10.3390/life6030027] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Revised: 06/28/2016] [Accepted: 07/12/2016] [Indexed: 11/17/2022] Open
Abstract
The abundance of mammalian long intergenic non-coding RNA (lincRNA) genes is high, yet their functions remain largely unknown. One possible way to study this important question is to use large-scale comparisons of various characteristics of lincRNA with those of protein-coding genes for which a large body of functional information is available. A prominent feature of mammalian protein-coding genes is the high evolutionary conservation of the exon-intron structure. Comparative analysis of putative intron positions in lincRNA genes from various mammalian genomes suggests that some lincRNA introns have been conserved for over 100 million years, thus the primary and/or secondary structure of these molecules is likely to be functionally important.
Collapse
|
35
|
Nyberg KG, Machado CA. Comparative Expression Dynamics of Intergenic Long Noncoding RNAs in the Genus Drosophila. Genome Biol Evol 2016; 8:1839-58. [PMID: 27189981 PMCID: PMC4943187 DOI: 10.1093/gbe/evw116] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Thousands of long noncoding RNAs (lncRNAs) have been annotated in eukaryotic genomes, but comparative transcriptomic approaches are necessary to understand their biological impact and evolution. To facilitate such comparative studies in Drosophila, we identified and characterized lncRNAs in a second Drosophilid—the evolutionary model Drosophila pseudoobscura. Using RNA-Seq and computational filtering of protein-coding potential, we identified 1,589 intergenic lncRNA loci in D. pseudoobscura. We surveyed multiple sex-specific developmental stages and found, like in Drosophila melanogaster, increasingly prolific lncRNA expression through male development and an overrepresentation of lncRNAs in the testes. Other trends seen in D. melanogaster, like reduced pupal expression, were not observed. Nonrandom distributions of female-biased and non-testis-specific male-biased lncRNAs between the X chromosome and autosomes are consistent with selection-based models of gene trafficking to optimize genomic location of sex-biased genes. The numerous testis-specific lncRNAs, however, are randomly distributed between the X and autosomes, and we cannot reject the hypothesis that many of these are likely to be spurious transcripts. Finally, using annotated lncRNAs in both species, we identified 134 putative lncRNA homologs between D. pseudoobscura and D. melanogaster and find that many have conserved developmental expression dynamics, making them ideal candidates for future functional analyses.
Collapse
Affiliation(s)
- Kevin G Nyberg
- Department of Biology, University of Maryland, College Park
| | | |
Collapse
|
36
|
McLysaght A, Guerzoni D. New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation. Philos Trans R Soc Lond B Biol Sci 2016; 370:20140332. [PMID: 26323763 PMCID: PMC4571571 DOI: 10.1098/rstb.2014.0332] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations.
Collapse
Affiliation(s)
- Aoife McLysaght
- Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Republic of Ireland
| | - Daniele Guerzoni
- Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin 2, Republic of Ireland
| |
Collapse
|
37
|
Abstract
Exonic splice enhancers (ESEs) are short nucleotide motifs, enriched near exon ends, that enhance the recognition of the splice site and thus promote splicing. Are intronless genes under selection to avoid these motifs so as not to attract the splicing machinery to an mRNA that should not be spliced, thereby preventing the production of an aberrant transcript? Consistent with this possibility, we find that ESEs in putative recent retrocopies are at a higher density and evolving faster than those in other intronless genes, suggesting that they are being lost. Moreover, intronless genes are less dense in putative ESEs than intron-containing ones. However, this latter difference is likely due to the skewed base composition of intronless sequences, a skew that is in line with the general GC richness of few exon genes. Indeed, after controlling for such biases, we find that both intronless and intron-containing genes are denser in ESEs than expected by chance. Importantly, nucleotide-controlled analysis of evolutionary rates at synonymous sites in ESEs indicates that the ESEs in intronless genes are under purifying selection in both human and mouse. We conclude that on the loss of introns, some but not all, ESE motifs are lost, the remainder having functions beyond a role in splice promotion. These results have implications for the design of intronless transgenes and for understanding the causes of selection on synonymous sites.
Collapse
Affiliation(s)
- Rosina Savisaar
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| | - Laurence D Hurst
- Department of Biology and Biochemistry, The Milner Centre for Evolution, University of Bath, Bath, United Kingdom
| |
Collapse
|
38
|
Kannan S, Chernikova D, Rogozin IB, Poliakov E, Managadze D, Koonin EV, Milanesi L. Transposable Element Insertions in Long Intergenic Non-Coding RNA Genes. Front Bioeng Biotechnol 2015; 3:71. [PMID: 26106594 PMCID: PMC4460805 DOI: 10.3389/fbioe.2015.00071] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 05/06/2015] [Indexed: 11/13/2022] Open
Abstract
Transposable elements (TEs) are abundant in mammalian genomes and appear to have contributed to the evolution of their hosts by providing novel regulatory or coding sequences. We analyzed different regions of long intergenic non-coding RNA (lincRNA) genes in human and mouse genomes to systematically assess the potential contribution of TEs to the evolution of the structure and regulation of expression of lincRNA genes. Introns of lincRNA genes contain the highest percentage of TE-derived sequences (TES), followed by exons and then promoter regions although the density of TEs is not significantly different between exons and promoters. Higher frequencies of ancient TEs in promoters and exons compared to introns implies that many lincRNA genes emerged before the split of primates and rodents. The content of TES in lincRNA genes is substantially higher than that in protein-coding genes, especially in exons and promoter regions. A significant positive correlation was detected between the content of TEs and evolutionary rate of lincRNAs indicating that inserted TEs are preferentially fixed in fast-evolving lincRNA genes. These results are consistent with the repeat insertion domains of LncRNAs hypothesis under which TEs have substantially contributed to the origin, evolution, and, in particular, fast functional diversification, of lincRNA genes.
Collapse
Affiliation(s)
- Sivakumar Kannan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD , USA
| | - Diana Chernikova
- Department of Genetics, Institute for Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College , Hanover, NH , USA
| | - Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD , USA
| | - Eugenia Poliakov
- Laboratory of Retinal Cell and Molecular Biology, National Eye Institute, National Institutes of Health , Bethesda, MD , USA
| | - David Managadze
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD , USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health , Bethesda, MD , USA
| | - Luciano Milanesi
- Institute for Biomedical Technologies, National Research Council , Segrate , Italy
| |
Collapse
|
39
|
Hezroni H, Koppstein D, Schwartz MG, Avrutin A, Bartel DP, Ulitsky I. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep 2015; 11:1110-22. [PMID: 25959816 DOI: 10.1016/j.celrep.2015.04.023] [Citation(s) in RCA: 435] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Revised: 03/02/2015] [Accepted: 04/09/2015] [Indexed: 12/15/2022] Open
Abstract
The inability to predict long noncoding RNAs from genomic sequence has impeded the use of comparative genomics for studying their biology. Here, we develop methods that use RNA sequencing (RNA-seq) data to annotate the transcriptomes of 16 vertebrates and the echinoid sea urchin, uncovering thousands of previously unannotated genes, most of which produce long intervening noncoding RNAs (lincRNAs). Although in each species, >70% of lincRNAs cannot be traced to homologs in species that diverged >50 million years ago, thousands of human lincRNAs have homologs with similar expression patterns in other species. These homologs share short, 5'-biased patches of sequence conservation nested in exonic architectures that have been extensively rewired, in part by transposable element exonization. Thus, over a thousand human lincRNAs are likely to have conserved functions in mammals, and hundreds beyond mammals, but those functions require only short patches of specific sequences and can tolerate major changes in gene architecture.
Collapse
Affiliation(s)
- Hadas Hezroni
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot 76100, Israel
| | - David Koppstein
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA; Howard Hughes Medical Institute and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | - Alexandra Avrutin
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot 76100, Israel
| | - David P Bartel
- Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA; Howard Hughes Medical Institute and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Igor Ulitsky
- Department of Biological Regulation, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
40
|
Gibb EA, Warren RL, Wilson GW, Brown SD, Robertson GA, Morin GB, Holt RA. Activation of an endogenous retrovirus-associated long non-coding RNA in human adenocarcinoma. Genome Med 2015; 7:22. [PMID: 25821520 PMCID: PMC4375928 DOI: 10.1186/s13073-015-0142-6] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Accepted: 02/12/2015] [Indexed: 11/15/2022] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are emerging as molecules that significantly impact many cellular processes and have been associated with almost every human cancer. Compared to protein-coding genes, lncRNA genes are often associated with transposable elements, particularly with endogenous retroviral elements (ERVs). ERVs can have potentially deleterious effects on genome structure and function, so these elements are typically silenced in normal somatic tissues, albeit with varying efficiency. The aberrant regulation of ERVs associated with lncRNAs (ERV-lncRNAs), coupled with the diverse range of lncRNA functions, creates significant potential for ERV-lncRNAs to impact cancer biology. Methods We used RNA-seq analysis to identify and profile the expression of a novel lncRNA in six large cohorts, including over 7,500 samples from The Cancer Genome Atlas (TCGA). Results We identified the tumor-specific expression of a novel lncRNA that we have named Endogenous retroViral-associated ADenocarcinoma RNA or ‘EVADR’, by analyzing RNA-seq data derived from colorectal tumors and matched normal control tissues. Subsequent analysis of TCGA RNA-seq data revealed the striking association of EVADR with adenocarcinomas, which are tumors of glandular origin. Moderate to high levels of EVADR were detected in 25 to 53% of colon, rectal, lung, pancreas and stomach adenocarcinomas (mean = 30 to 144 FPKM), and EVADR expression correlated with decreased patient survival (Cox regression; hazard ratio = 1.47, 95% confidence interval = 1.06 to 2.04, P = 0.02). In tumor sites of non-glandular origin, EVADR expression was detectable at only very low levels and in less than 10% of patients. For EVADR, a MER48 ERV element provides an active promoter to drive its transcription. Genome-wide, MER48 insertions are associated with nine lncRNAs, but none of the MER48-associated lncRNAs other than EVADR were consistently expressed in adenocarcinomas, demonstrating the specific activation of EVADR. The sequence and structure of the EVADR locus is highly conserved among Old World monkeys and apes but not New World monkeys or prosimians, where the MER48 insertion is absent. Conservation of the EVADR locus suggests a functional role for this novel lncRNA in humans and our closest primate relatives. Conclusions Our results describe the specific activation of a highly conserved ERV-lncRNA in numerous cancers of glandular origin, a finding with diagnostic, prognostic and therapeutic implications. Electronic supplementary material The online version of this article (doi:10.1186/s13073-015-0142-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ewan A Gibb
- Genome Sciences Centre, British Columbia Cancer Agency, 675 West 10th Ave, Vancouver, British Columbia V5Z 1L3 Canada ; Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V6T 1Z4 Canada
| | - René L Warren
- Genome Sciences Centre, British Columbia Cancer Agency, 675 West 10th Ave, Vancouver, British Columbia V5Z 1L3 Canada
| | - Gavin W Wilson
- Informatics and Biocomputing Platform, Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3 Canada ; Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 1A8 Canada
| | - Scott D Brown
- Genome Sciences Centre, British Columbia Cancer Agency, 675 West 10th Ave, Vancouver, British Columbia V5Z 1L3 Canada ; Genome Science and Technology Program, University of British Columbia, Vancouver, British Columbia V6T 1Z4 Canada
| | - Gordon A Robertson
- Genome Sciences Centre, British Columbia Cancer Agency, 675 West 10th Ave, Vancouver, British Columbia V5Z 1L3 Canada
| | - Gregg B Morin
- Genome Sciences Centre, British Columbia Cancer Agency, 675 West 10th Ave, Vancouver, British Columbia V5Z 1L3 Canada ; Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V6T 1Z4 Canada ; Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6 Canada
| | - Robert A Holt
- Genome Sciences Centre, British Columbia Cancer Agency, 675 West 10th Ave, Vancouver, British Columbia V5Z 1L3 Canada ; Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia V6T 1Z4 Canada ; Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, British Columbia V5A 1S6 Canada
| |
Collapse
|
41
|
Yang JR, Zhang J. Human long noncoding RNAs are substantially less folded than messenger RNAs. Mol Biol Evol 2014; 32:970-7. [PMID: 25540450 DOI: 10.1093/molbev/msu402] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) do not code for proteins but function as RNAs. Because the functions of an RNA rely on either its sequence or secondary structure, lncRNAs should be folded at least as strongly as messenger RNAs (mRNAs), which serve as messengers for translation and are generally thought to lack secondary structure-dependent RNA-level functions. Contrary to this prediction, analysis of genome-wide experimental data of human RNA folding reveals that lncRNAs are substantially less folded than mRNAs even after the control of expression level and GC% (percentage of guanines and cytosines), although both lncRNAs and mRNAs are more strongly folded than expected by chance. In contrast to mRNAs, lncRNAs show neither the positive correlation between folding strength and expression level nor the negative correlation between folding strength and evolutionary rate. These and other results support that although RNA folding undoubtedly plays a role in RNA biology it is also important in translation and/or protein biology.
Collapse
Affiliation(s)
- Jian-Rong Yang
- Department of Ecology and Evolutionary Biology, University of Michigan
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan
| |
Collapse
|