1
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
2
|
Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators. PLoS Comput Biol 2022; 18:e1010240. [PMID: 35797361 PMCID: PMC9262186 DOI: 10.1371/journal.pcbi.1010240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 05/23/2022] [Indexed: 11/24/2022] Open
Abstract
It is well-established that neural networks can predict or identify structural motifs of non-coding RNAs (ncRNAs). Yet, the neural network based identification of RNA structural motifs is limited by the availability of training data that are often insufficient for learning features of specific ncRNA families or structural motifs. Aiming to reliably identify intrinsic transcription terminators in bacteria, we introduce a novel pre-training approach that uses inverse folding to generate training data for predicting or identifying a specific family or structural motif of ncRNA. We assess the ability of neural networks to identify secondary structure by systematic in silico mutagenesis experiments. In a study to identify intrinsic transcription terminators as functionally well-understood RNA structural motifs, our inverse folding based pre-training approach significantly boosts the performance of neural network topologies, which outperform previous approaches to identify intrinsic transcription terminators. Inverse-folding based pre-training provides a simple, yet highly effective way to integrate the well-established thermodynamic energy model into deep neural networks for identifying ncRNA families or motifs. The pre-training technique is broadly applicable to a range of network topologies as well as different types of ncRNA families and motifs. Intrinsic transcriptional terminators are essential regulators in determining the 3’ end of transcripts in bacteria. The underlying mechanism involves RNA secondary structure, where nucleotides fold into a specific hairpin motif. Identifying terminator sequences in bacterial genomes has conventionally been approached with well-established energy models for structural motifs. However, the folding mechanism of transcription terminators is understood only partially, limiting the success of energy-model based identification. Neural networks have been proposed to overcome these limitations. However, their adoption for predicting and identifying RNA secondary structure has been a double edged sword: Neural networks promise to learn features that are not represented by the energy models, while they are black boxes that lack explicit modeling assumptions and may fail to account for features that are well understandable based on decades-old energy models. Here, we introduce a pre-training approach for neural networks that uses energy-model based inverse folding of structural motifs. As we demonstrate, this approach “brings back the energy model” to identify transcriptional terminators and overcomes the limitations of previous energy-model based predictions. Our approach works for diverse types of neural networks, and is suitable for the identification of structural motifs of many other RNA molecules beyond transcriptional terminators.
Collapse
|
3
|
Fontenla S, Langleib M, de la Torre-Escudero E, Domínguez MF, Robinson MW, Tort J. Role of Fasciola hepatica Small RNAs in the Interaction With the Mammalian Host. Front Cell Infect Microbiol 2022; 11:812141. [PMID: 35155272 PMCID: PMC8824774 DOI: 10.3389/fcimb.2021.812141] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 12/29/2021] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs (miRNAs) are important post-transcriptional regulators of gene expression being involved in many different biological processes and play a key role in developmental timing. Additionally, recent studies have shown that miRNAs released from parasites are capable of regulating the expression of host genes. In the present work, we studied the expression patterns of ncRNAs of various intra-mammalian life-cycle stages of the liver fluke, Fasciola hepatica, as well as those packaged into extracellular vesicles and shed by the adult fluke. The miRNA expression profile of the intra-mammalian stages shows important variations, despite a set of predominant miRNAs that are highly expressed across all stages. No substantial variations in miRNA expression between dormant and activated metacercariae were detected, suggesting that they might not be central players in regulating fluke gene expression during this crucial step in the invasion of the definitive host. We generated a curated pipeline for the prediction of putative target genes that reports only sites conserved between three different prediction approaches. This pipeline was tested against an iso-seq curated database of the 3’ UTR regions of F. hepatica genes to detect miRNA regulation networks within liver fluke. Several functions related to the host immune response or modulation were enriched among the targets of the most highly expressed parasite miRNAs, stressing that they might be key players during the establishment and maintenance of infection. Additionally, we detected fragments derived from the processing of tRNAs, in all developmental stages analyzed, and documented the presence of novel long tRNA fragments enriched in vesicles. We confirmed the presence of at least 5 putative vault RNAs (vtRNAs), that are expressed across different stages and enriched in vesicles. The presence of tRNA fragments and vtRNAs in vesicles raise the possibility that they could be involved in the host-parasite interaction.
Collapse
Affiliation(s)
- Santiago Fontenla
- Departamento de Genética, Facultad de Medicina, Universidad de la República (UdelaR), Montevideo, Uruguay
- *Correspondence: Santiago Fontenla, ; José Tort,
| | - Mauricio Langleib
- Departamento de Genética, Facultad de Medicina, Universidad de la República (UdelaR), Montevideo, Uruguay
- Departamento de Desarrollo Biotecnológico, Instituto de Higiene, Facultad de Medicina, Universidad de la República (UdelaR), Montevideo, Uruguay
| | | | - Maria Fernanda Domínguez
- Departamento de Genética, Facultad de Medicina, Universidad de la República (UdelaR), Montevideo, Uruguay
| | - Mark W. Robinson
- School of Biological Sciences, Queen’s University Belfast, Belfast, Northern Ireland
| | - José Tort
- Departamento de Genética, Facultad de Medicina, Universidad de la República (UdelaR), Montevideo, Uruguay
- *Correspondence: Santiago Fontenla, ; José Tort,
| |
Collapse
|
4
|
Leypold NA, Speicher MR. Evolutionary conservation in noncoding genomic regions. Trends Genet 2021; 37:903-918. [PMID: 34238591 DOI: 10.1016/j.tig.2021.06.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 05/25/2021] [Accepted: 06/07/2021] [Indexed: 12/28/2022]
Abstract
Humans may share more genomic commonalities with other species than previously thought. According to current estimates, ~5% of the human genome is functionally constrained, which is a much larger fraction than the ~1.5% occupied by annotated protein-coding genes. Hence, ~3.5% of the human genome comprises likely functional conserved noncoding elements (CNEs) preserved among organisms, whose common ancestors existed throughout hundreds of millions of years of evolution. As whole-genome sequencing emerges as a standard procedure in genetic analyses, interpretation of variations in CNEs, including the elucidation of mechanistic and functional roles, becomes a necessity. Here, we discuss the phenomenon of noncoding conservation via four dimensions (sequence, regulatory conservation, spatiotemporal expression, and structure) and the potential significance of CNEs in phenotype variation and disease.
Collapse
Affiliation(s)
- Nicole A Leypold
- Institute of Human Genetics, Diagnostic and Research Center for Molecular Biomedicine, Medical University of Graz, 8010 Graz, Austria.
| | - Michael R Speicher
- Institute of Human Genetics, Diagnostic and Research Center for Molecular Biomedicine, Medical University of Graz, 8010 Graz, Austria; BioTechMed-Graz, Graz, Austria.
| |
Collapse
|
5
|
Kjellin J, Avesson L, Reimegård J, Liao Z, Eichinger L, Noegel A, Glöckner G, Schaap P, Söderbom F. Abundantly expressed class of noncoding RNAs conserved through the multicellular evolution of dictyostelid social amoebas. Genome Res 2021; 31:436-447. [PMID: 33479022 PMCID: PMC7919456 DOI: 10.1101/gr.272856.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 01/15/2021] [Indexed: 01/26/2023]
Abstract
Aggregative multicellularity has evolved multiple times in diverse groups of eukaryotes, exemplified by the well-studied development of dictyostelid social amoebas, for example, Dictyostelium discoideum However, it is still poorly understood why multicellularity emerged in these amoebas while the majority of other members of Amoebozoa are unicellular. Previously, a novel type of noncoding RNA, Class I RNAs, was identified in D. discoideum and shown to be important for normal multicellular development. Here, we investigated Class I RNA evolution and its connection to multicellular development. We identified a large number of new Class I RNA genes by constructing a covariance model combined with a scoring system based on conserved upstream sequences. Multiple genes were predicted in representatives of each major group of Dictyostelia and expression analysis confirmed that our search approach identifies expressed Class I RNA genes with high accuracy and sensitivity and that the RNAs are developmentally regulated. Further studies showed that Class I RNAs are ubiquitous in Dictyostelia and share highly conserved structure and sequence motifs. In addition, Class I RNA genes appear to be unique to dictyostelid social amoebas because they could not be identified in outgroup genomes, including their closest known relatives. Our results show that Class I RNA is an ancient class of ncRNAs, likely to have been present in the last common ancestor of Dictyostelia dating back at least 600 million years. Based on previous functional analyses and the presented evolutionary investigation, we hypothesize that Class I RNAs were involved in evolution of multicellularity in Dictyostelia.
Collapse
Affiliation(s)
- Jonas Kjellin
- Department of Cell and Molecular Biology, Uppsala University, Uppsala S-75124, Sweden
| | - Lotta Avesson
- Department of Molecular Biology, Biomedical Center, Swedish University of Agricultural Sciences, Uppsala S-75124, Sweden
| | - Johan Reimegård
- Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Uppsala S-75124, Sweden
| | - Zhen Liao
- Department of Cell and Molecular Biology, Uppsala University, Uppsala S-75124, Sweden
| | - Ludwig Eichinger
- Centre for Biochemistry, Institute of Biochemistry I, Medical Faculty, University of Cologne, 50931 Cologne, Germany
| | - Angelika Noegel
- Centre for Biochemistry, Institute of Biochemistry I, Medical Faculty, University of Cologne, 50931 Cologne, Germany
| | - Gernot Glöckner
- Centre for Biochemistry, Institute of Biochemistry I, Medical Faculty, University of Cologne, 50931 Cologne, Germany
| | - Pauline Schaap
- College of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Fredrik Söderbom
- Department of Cell and Molecular Biology, Uppsala University, Uppsala S-75124, Sweden
| |
Collapse
|
6
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, D-79110 Freiburg, Germany.,Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | - Ivo L Hofacker
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark.,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.,Bioinformatics and Computational Biology Research Group, University of Vienna, Währingerstraße 17, A-1090 Vienna, Austria
| | - Peter F Stadler
- Center for non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg C, Denmark. .,Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria. .,Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany. .,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. .,Fraunhofer Institute for Cell Therapy and Immunology, Perlickstraße 1, D-04103 Leipzig, Germany. .,Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA.
| |
Collapse
|
7
|
Podlevsky JD, Li Y, Chen JJL. The functional requirement of two structural domains within telomerase RNA emerged early in eukaryotes. Nucleic Acids Res 2016; 44:9891-9901. [PMID: 27378779 PMCID: PMC5175330 DOI: 10.1093/nar/gkw605] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 06/22/2016] [Accepted: 06/23/2016] [Indexed: 11/30/2022] Open
Abstract
Telomerase emerged during evolution as a prominent solution to the eukaryotic linear chromosome end-replication problem. Telomerase minimally comprises the catalytic telomerase reverse transcriptase (TERT) and telomerase RNA (TR) that provides the template for telomeric DNA synthesis. While the TERT protein is well-conserved across taxa, TR is highly divergent amongst distinct groups of species. Herein, we have identified the essential functional domains of TR from the basal eukaryotic species Trypanosoma brucei, revealing the ancestry of TR comprising two distinct structural core domains that can assemble in trans with TERT and reconstitute active telomerase enzyme in vitro. The upstream essential domain of T. brucei TR, termed the template core, constitutes three short helices in addition to the 11-nt template. Interestingly, the trypanosome template core domain lacks the ubiquitous pseudoknot found in all known TRs, suggesting later evolution of this critical structural element. The template-distal domain is a short stem-loop, termed equivalent CR4/5 (eCR4/5). While functionally similar to vertebrate and fungal CR4/5, trypanosome eCR4/5 is structurally distinctive, lacking the essential P6.1 stem-loop. Our functional study of trypanosome TR core domains suggests that the functional requirement of two discrete structural domains is a common feature of TRs and emerged early in telomerase evolution.
Collapse
Affiliation(s)
- Joshua D Podlevsky
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Yang Li
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Julian J-L Chen
- School of Molecular Sciences, Arizona State University, Tempe, AZ 85287, USA
| |
Collapse
|
8
|
Podlevsky JD, Li Y, Chen JJL. Structure and function of echinoderm telomerase RNA. RNA (NEW YORK, N.Y.) 2016; 22:204-215. [PMID: 26598712 PMCID: PMC4712671 DOI: 10.1261/rna.053280.115] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 10/23/2015] [Indexed: 06/05/2023]
Abstract
Telomerase is a ribonucleoprotein (RNP) enzyme that requires an integral telomerase RNA (TR) subunit, in addition to the catalytic telomerase reverse transcriptase (TERT), for enzymatic function. The secondary structures of TRs from the three major groups of species, ciliates, fungi, and vertebrates, have been studied extensively and demonstrate dramatic diversity. Herein, we report the first comprehensive secondary structure of TR from echinoderms-marine invertebrates closely related to vertebrates-determined by phylogenetic comparative analysis of 16 TR sequences from three separate echinoderm classes. Similar to vertebrate TR, echinoderm TR contains the highly conserved template/pseudoknot and H/ACA domains. However, echinoderm TR lacks the ancestral CR4/5 structural domain found throughout vertebrate and fungal TRs. Instead, echinoderm TR contains a distinct simple helical region, termed eCR4/5, that is functionally equivalent to the CR4/5 domain. The urchin and brittle star eCR4/5 domains bind specifically to their respective TERT proteins and stimulate telomerase activity. Distinct from vertebrate telomerase, the echinoderm TR template/pseudoknot domain with the TERT protein is sufficient to reconstitute significant telomerase activity. This gain-of-function of the echinoderm template/pseudoknot domain for conferring telomerase activity presumably facilitated the rapid structural evolution of the eCR4/5 domain throughout the echinoderm lineage. Additionally, echinoderm TR utilizes the template-adjacent P1.1 helix as a physical template boundary element to prevent nontelomeric DNA synthesis, a mechanism used by ciliate and fungal TRs. Thus, the chimeric and eccentric structural features of echinoderm TR provide unparalleled insights into the rapid evolution of telomerase RNP structure and function.
Collapse
Affiliation(s)
- Joshua D Podlevsky
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, USA
| | - Yang Li
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, USA
| | - Julian J-L Chen
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, USA
| |
Collapse
|
9
|
Gruber AR. RNA Polymerase III promoter screen uncovers a novel noncoding RNA family conserved in Caenorhabditis and other clade V nematodes. Gene 2014; 544:236-40. [PMID: 24792899 DOI: 10.1016/j.gene.2014.04.068] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Revised: 04/25/2014] [Accepted: 04/28/2014] [Indexed: 10/25/2022]
Abstract
RNA Polymerase III is a highly specialized enzyme complex responsible for the transcription of a very distinct set of housekeeping noncoding RNAs including tRNAs, 7SK snRNA, Y RNAs, U6 snRNA, and the RNA components of RNaseP and RNaseMRP. In this work we have utilized the conserved promoter structure of known RNA Polymerase III transcripts consisting of characteristic sequence elements termed proximal sequence elements (PSE) A and B and a TATA-box to uncover a novel RNA Polymerase III-transcribed, noncoding RNA family found to be conserved in Caenorhabditis as well as other clade V nematode species. Homology search in combination with detailed sequence and secondary structure analysis revealed that members of this novel ncRNA family evolve rapidly, and only maintain a potentially functional small stem structure that links the 5' end to the very 3' end of the transcript and a small hairpin structure at the 3' end. This is most likely required for efficient transcription termination. In addition, our study revealed evidence that canonical C/D box snoRNAs are also transcribed from a PSE A-PSE B-TATA-box promoter in Caenorhabditis elegans.
Collapse
Affiliation(s)
- Andreas R Gruber
- Computational and Systems Biology, Biozentrum, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland; Swiss Institute of Bioinformatics, University of Basel, Klingelbergstrasse 50-70, 4056 Basel, Switzerland.
| |
Collapse
|
10
|
Wehner S, Dörrich AK, Ciba P, Wilde A, Marz M. pRNA: NoRC-associated RNA of rRNA operons. RNA Biol 2013; 11:3-9. [PMID: 24440945 DOI: 10.4161/rna.27448] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Promoter-associated RNAs (pRNAs) are a family of ~90-100 nt-long divergent RNAs overlapping the promoter of the rRNA (rDNA) operon. pRNA transcripts interact with TIP5, a component of the chromatin remodeling complex NoRC, which recruits enzymes for heterochromatin formation and mediates silencing of rRNA genes. Here we present a comprehensive analysis of pRNA homologs, including different versions per species, as result of in silico studies in available metazoan genome assemblies. Comparative sequence analysis and secondary structure prediction ended up in two possible secondary structures, which let us assume a possible dual function of pRNAs for regulation of rRNA operons. Furthermore, we validated parts of our computational predictions experimentally by RT-PCR and sequencing. A representative seed alignment of the pRNA family, annotated with possible secondary structures was released to the Rfam database.
Collapse
Affiliation(s)
- Stefanie Wehner
- Department for Bioinformatics; Faculty of Mathematics and Computer Science; Friedrich-Schiller-University Jena; Jena, Germany
| | - Anja K Dörrich
- Institute for Microbiology and Molecular Biology; Justus-Liebig-University Giessen; Giessen, Germany
| | - Philipp Ciba
- Fraunhofer Research Institution for Marine Biotechnology; Lübeck, Germany
| | - Annegret Wilde
- Faculty of Biology; University of Freiburg; Freiburg, Germany
| | - Manja Marz
- Department for Bioinformatics; Faculty of Mathematics and Computer Science; Friedrich-Schiller-University Jena; Jena, Germany
| |
Collapse
|
11
|
Abstract
Rapid improvements in high-throughput experimental technologies make it nowadays possible to study the expression, as well as changes in expression, of whole transcriptomes under different environmental conditions in a detailed view. We describe current approaches to identify genome-wide functional RNA transcripts (experimentally as well as computationally), and focus on computational methods that may be utilized to disclose their function. While genome databases offer a wealth of information about known and putative functions for protein-coding genes, functional information for novel non-coding RNA genes is almost nonexistent. This is mainly explained by the lack of established software tools to efficiently reveal the function and evolutionary origin of non-coding RNA genes. Here, we describe in detail computational approaches one may follow to annotate and classify an RNA transcript.
Collapse
Affiliation(s)
- Kristin Reiche
- Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
| | | | | | | | | |
Collapse
|
12
|
Menzel P, Stadler PF, Gorodkin J. maxAlike: maximum likelihood-based sequence reconstruction with application to improved primer design for unknown sequences. ACTA ACUST UNITED AC 2010; 27:317-25. [PMID: 21123221 PMCID: PMC3031029 DOI: 10.1093/bioinformatics/btq651] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION The task of reconstructing a genomic sequence from a particular species is gaining more and more importance in the light of the rapid development of high-throughput sequencing technologies and their limitations. Applications include not only compensation for missing data in unsequenced genomic regions and the design of oligonucleotide primers for target genes in species with lacking sequence information but also the preparation of customized queries for homology searches. RESULTS We introduce the maxAlike algorithm, which reconstructs a genomic sequence for a specific taxon based on sequence homologs in other species. The input is a multiple sequence alignment and a phylogenetic tree that also contains the target species. For this target species, the algorithm computes nucleotide probabilities at each sequence position. Consensus sequences are then reconstructed based on a certain confidence level. For 37 out of 44 target species in a test dataset, we obtain a significant increase of the reconstruction accuracy compared to both the consensus sequence from the alignment and the sequence of the nearest phylogenetic neighbor. When considering only nucleotides above a confidence limit, maxAlike is significantly better (up to 10%) in all 44 species. The improved sequence reconstruction also leads to an increase of the quality of PCR primer design for yet unsequenced genes: the differences between the expected T(m) and real T(m) of the primer-template duplex can be reduced by ~26% compared with other reconstruction approaches. We also show that the prediction accuracy is robust to common distortions of the input trees. The prediction accuracy drops by only 1% on average across all species for 77% of trees derived from random genomic loci in a test dataset. AVAILABILITY maxAlike is available for download and web server at: http://rth.dk/resources/maxAlike.
Collapse
Affiliation(s)
- Peter Menzel
- Center for non-coding RNA in Technology and Health, IBHV, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark
| | | | | |
Collapse
|
13
|
Marz M, Mosig A, Stadler BMR, Stadler PF. U7 snRNAs: a computational survey. GENOMICS PROTEOMICS & BIOINFORMATICS 2008; 5:187-95. [PMID: 18267300 PMCID: PMC5054213 DOI: 10.1016/s1672-0229(08)60006-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
U7 small nuclear RNA (snRNA) sequences have been described only for a handful of animal species in the past. Here we describe a computational search for functional U7 snRNA genes throughout vertebrates including the upstream sequence elements characteristic for snRNAs transcribed by polymerase II. Based on the results of this search, we discuss the high variability of U7 snRNAs in both sequence and structure, and report on an attempt to find U7 snRNA sequences in basal deuterostomes and non-drosophilids insect genomes based on a combination of sequence, structure, and promoter features. Due to the extremely short sequence and the high variability in both sequence and structure, no unambiguous candidates were found. These results cast doubt on putative U7 homologs in even more distant organisms that are reported in the most recent release of the Rfam database.
Collapse
Affiliation(s)
- Manja Marz
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig D-04107, Germany
| | | | | | | |
Collapse
|
14
|
Xie M, Mosig A, Qi X, Li Y, Stadler PF, Chen JJL. Structure and function of the smallest vertebrate telomerase RNA from teleost fish. J Biol Chem 2007; 283:2049-59. [PMID: 18039659 DOI: 10.1074/jbc.m708032200] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Telomerase extends chromosome ends by copying a short template sequence within its intrinsic RNA component. Telomerase RNA (TR) from different groups of species varies dramatically in sequence and size. We report here the bioinformatic identification, secondary structure comparison, and functional analysis of the smallest known vertebrate TRs from five teleost fishes. The teleost TRs (312-348 nucleotides) are significantly smaller than the cartilaginous fish TRs (478-559 nucleotides) and tetrapod TRs. This remarkable length reduction of teleost fish TRs correlates positively with the genome size, reflecting an unusual structural plasticity of TR during evolution. The teleost TR consists of a compact three-domain structure, lacking most of the sequences in regions that are variable in other vertebrate TR structures. The medaka and fugu TRs, when assembled with their telomerase reverse transcriptase (TERT) protein counterparts, reconstituted active and processive telomerase enzymes. Titration analysis of individual RNA domains suggests that the efficient assembly of the telomerase complex is influenced more by the telomerase reverse transcriptase (TERT) binding of the CR4-CR5 domain than the pseudoknot domain of TR. The remarkably small teleost fish TR further expands our understanding about the evolutionary divergence of vertebrate TR.
Collapse
Affiliation(s)
- Mingyi Xie
- Department of Chemistry & Biochemistry and School of Life Sciences, Arizona State University, Tempe, Arizona 85287, USA
| | | | | | | | | | | |
Collapse
|
15
|
Hertel J, Hofacker IL, Stadler PF. SnoReport: computational identification of snoRNAs with unknown targets. Bioinformatics 2007; 24:158-64. [PMID: 17895272 DOI: 10.1093/bioinformatics/btm464] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Unlike tRNAs and microRNAs, both classes of snoRNAs, which direct two distinct types of chemical modifications of uracil residues, have proved to be surprisingly difficult to find in genomic sequences. Most computational approaches so far have explicitly used the fact that snoRNAs predominantly target ribosomal RNAs and spliceosomal RNAs. The target is specified by a short stretch of sequence complementarity between the snoRNA and its target. This sequence complementarity to known targets crucially contributes to sensitivity and specificity of snoRNA gene finding algorithms. The discovery of 'orphan' snoRNAs, which either have no known target, or which target ordinary protein-coding mRNAs, however, begs the question whether this class of 'housekeeping' non-coding RNAs is much more widespread and might have a diverse set of regulatory functions. In order to approach this question, we present here a combination of RNA secondary structure prediction and machine learning that is designed to recognize the two major classes of snoRNAs, box C/D and box H/ACA snoRNAs, among ncRNA candidate sequences. The snoReport approach deliberately avoids any usage of target information. We find that the combination of the conserved sequence boxes and secondary structure constraints as a pre-filter with SVM classifiers based on a small set of structural descriptors are sufficient for a reliable identification of snoRNAs. Tests of snoReport on data from several recent experimental surveys show that the approach is feasible; the application to a dataset from a large-scale comparative genomics survey for ncRNAs suggests that there are likely hundreds of previously undescribed 'orphan' snoRNAs still hidden in the human genome. AVAILABILITY The snoReport software is implemented in ANSI C. The source code is available under the GNU Public License at http://www.bioinf.uni-leipzig.de/Software/snoReport.
Collapse
Affiliation(s)
- Jana Hertel
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Wien, Austria.
| | | | | |
Collapse
|
16
|
Homology Search with Fragmented Nucleic Acid Sequence Patterns. LECTURE NOTES IN COMPUTER SCIENCE 2007. [DOI: 10.1007/978-3-540-74126-8_31] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
17
|
Hinas A, Söderbom F. Treasure hunt in an amoeba: non-coding RNAs in Dictyostelium discoideum. Curr Genet 2007; 51:141-59. [PMID: 17171561 DOI: 10.1007/s00294-006-0112-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2006] [Revised: 11/22/2006] [Accepted: 11/23/2006] [Indexed: 12/20/2022]
Abstract
The traditional view of RNA being merely an intermediate in the transfer of genetic information, as mRNA, spliceosomal RNA, tRNA, and rRNA, has become outdated. The recent discovery of numerous regulatory RNAs with a plethora of functions in biological processes has truly revolutionized our understanding of gene regulation. Tiny RNAs such as microRNAs and small interfering RNAs play vital roles at different levels of gene control. Small nucleolar RNAs are much more abundant than previously recognized, and new functions beyond processing and modification of rRNA have recently emerged. Longer non-coding RNAs (ncRNAs) can also have important regulatory roles in the cell, e.g., antisense RNAs that control their target mRNAs. The majority of these important findings arose from analyses in various model organisms. In this review, we focus on ncRNAs in the social amoeba Dictyostelium discoideum. This important genetically tractable model organism has recently received renewed attention in terms of discovery, regulation and functional studies of ncRNAs. Old and recent findings are discussed and put in context of what we today know about ncRNAs in other organisms.
Collapse
Affiliation(s)
- Andrea Hinas
- Department of Molecular Biology, Biomedical Center, Swedish University of Agricultural Sciences, Box 590, 75124 Uppsala, Sweden
| | | |
Collapse
|
18
|
Evolution of the vertebrate Y RNA cluster. Theory Biosci 2007; 126:9-14. [PMID: 18087752 DOI: 10.1007/s12064-007-0003-y] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2007] [Accepted: 02/21/2007] [Indexed: 10/23/2022]
Abstract
Relatively little is known about the evolutionary histories of most classes of non-protein coding RNAs. Here we consider Y RNAs, a relatively rarely studied group of related pol-III transcripts. A single cluster of functional genes is preserved throughout tetrapod evolution, which however exhibits clade-specific tandem duplications, gene-losses, and rearrangements.
Collapse
|