201
|
Meyer F, Kurtz S, Backofen R, Will S, Beckstette M. Structator: fast index-based search for RNA sequence-structure patterns. BMC Bioinformatics 2011; 12:214. [PMID: 21619640 PMCID: PMC3154205 DOI: 10.1186/1471-2105-12-214] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2010] [Accepted: 05/27/2011] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. RESULTS We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. CONCLUSIONS The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator.
Collapse
Affiliation(s)
- Fernando Meyer
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, 20146 Hamburg, Germany
| | | | | | | | | |
Collapse
|
202
|
Sequence-based identification of 3D structural modules in RNA with RMDetect. Nat Methods 2011; 8:513-21. [PMID: 21552257 DOI: 10.1038/nmeth.1603] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 04/11/2011] [Indexed: 01/24/2023]
Abstract
Structural RNA modules, sets of ordered non-Watson-Crick base pairs embedded between Watson-Crick pairs, have central roles as architectural organizers and sites of ligand binding in RNA molecules, and are recurrently observed in RNA families throughout the phylogeny. Here we describe a computational tool, RNA three-dimensional (3D) modules detection, or RMDetect, for identifying known 3D structural modules in single and multiple RNA sequences in the absence of any other information. Currently, four modules can be searched for: G-bulge loop, kink-turn, C-loop and tandem-GA loop. In control test sequences we found all of the known modules with a false discovery rate of 0.23. Scanning through 1,444 publicly available alignments, we identified 21 yet unreported modules and 141 known modules. RMDetect can be used to refine RNA 2D structure, assemble RNA 3D models, and search and annotate structured RNAs in genomic data.
Collapse
|
203
|
Perreault J, Weinberg Z, Roth A, Popescu O, Chartrand P, Ferbeyre G, Breaker RR. Identification of hammerhead ribozymes in all domains of life reveals novel structural variations. PLoS Comput Biol 2011; 7:e1002031. [PMID: 21573207 PMCID: PMC3088659 DOI: 10.1371/journal.pcbi.1002031] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2010] [Accepted: 02/25/2011] [Indexed: 02/07/2023] Open
Abstract
Hammerhead ribozymes are small self-cleaving RNAs that promote strand scission by internal phosphoester transfer. Comparative sequence analysis was used to identify numerous additional representatives of this ribozyme class than were previously known, including the first representatives in fungi and archaea. Moreover, we have uncovered the first natural examples of "type II" hammerheads, and our findings reveal that this permuted form occurs in bacteria as frequently as type I and III architectures. We also identified a commonly occurring pseudoknot that forms a tertiary interaction critical for high-speed ribozyme activity. Genomic contexts of many hammerhead ribozymes indicate that they perform biological functions different from their known role in generating unit-length RNA transcripts of multimeric viroid and satellite virus genomes. In rare instances, nucleotide variation occurs at positions within the catalytic core that are otherwise strictly conserved, suggesting that core mutations are occasionally tolerated or preferred.
Collapse
Affiliation(s)
- Jonathan Perreault
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, United States of America
| | - Zasha Weinberg
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, United States of America
- Howard Hughes Medical Institute, Yale University, New Haven, Connecticut, United States of America
| | - Adam Roth
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, United States of America
- Howard Hughes Medical Institute, Yale University, New Haven, Connecticut, United States of America
| | - Olivia Popescu
- Department of Biochemistry, Université de Montréal, Montréal, Québec, Canada
| | - Pascal Chartrand
- Department of Biochemistry, Université de Montréal, Montréal, Québec, Canada
| | - Gerardo Ferbeyre
- Department of Biochemistry, Université de Montréal, Montréal, Québec, Canada
| | - Ronald R. Breaker
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, Connecticut, United States of America
- Howard Hughes Medical Institute, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
204
|
Westhof E, Masquida B, Jossinet F. Predicting and modeling RNA architecture. Cold Spring Harb Perspect Biol 2011; 3:cshperspect.a003632. [PMID: 20504963 DOI: 10.1101/cshperspect.a003632] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
A general approach for modeling the architecture of large and structured RNA molecules is described. The method exploits the modularity and the hierarchical folding of RNA architecture that is viewed as the assembly of preformed double-stranded helices defined by Watson-Crick base pairs and RNA modules maintained by non-Watson-Crick base pairs. Despite the extensive molecular neutrality observed in RNA structures, specificity in RNA folding is achieved through global constraints like lengths of helices, coaxiality of helical stacks, and structures adopted at the junctions of helices. The Assemble integrated suite of computer tools allows for sequence and structure analysis as well as interactive modeling by homology or ab initio assembly with possibilities for fitting within electronic density maps. The local key role of non-Watson-Crick pairs guides RNA architecture formation and offers metrics for assessing the accuracy of three-dimensional models in a more useful way than usual root mean square deviation (RMSD) values.
Collapse
Affiliation(s)
- Eric Westhof
- Architecture et Réactivité de l'ARN, Université de Strasbourg, Institut de Biologie Moléculaire et Cellulaire du CNRS, 67084 Strasbourg, France.
| | | | | |
Collapse
|
205
|
Characterization and prediction of mRNA polyadenylation sites in human genes. Med Biol Eng Comput 2011; 49:463-72. [PMID: 21286831 DOI: 10.1007/s11517-011-0732-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2009] [Accepted: 01/02/2011] [Indexed: 12/31/2022]
Abstract
The accurate identification of potential poly(A) sites has contributed to all many studies with regard to alternative polyadenylation. The aim of this study was the development of a machine-learning methodology that will help to discriminate real polyadenylation signals from randomly occurring signals in genomic sequence. Since previous studies have revealed that RNA secondary structure in certain genes has significant impact, the authors tried to computationally pinpoint common structural patterns around the poly(A) sites and to investigate how RNA secondary structure may influence polyadenylation. This involved an initial study on the impact of RNA structure and it was found using motif search tools that hairpin structures might be important. Thus, it was propose that, in addition to the sequence pattern around poly(A) sites, there exists a widespread structural pattern that is also employed during human mRNA polyadenylation. In this study, the authors present a computational model that uses support vector machines to predict human poly(A) sites. The results show that this predictive model has a comparable performance to the current prediction tool. In addition, it was identified common structural patterns associated with polyadenylation using several motif finding programs and this provides new insight into the role of RNA secondary structure plays in polyadenylation.
Collapse
|
206
|
Chen Y, Indurthi DC, Jones SW, Papoutsakis ET. Small RNAs in the genus Clostridium. mBio 2011; 2:e00340-10. [PMID: 21264064 PMCID: PMC3025663 DOI: 10.1128/mbio.00340-10] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2010] [Accepted: 01/03/2011] [Indexed: 11/20/2022] Open
Abstract
The genus Clostridium includes major human pathogens and species important to cellulose degradation, the carbon cycle, and biotechnology. Small RNAs (sRNAs) are emerging as crucial regulatory molecules in all organisms, but they have not been investigated in clostridia. Research on sRNAs in clostridia is hindered by the absence of a systematic method to identify sRNA candidates, thus delegating clostridial sRNA research to a hit-and-miss process. Thus, we wanted to develop a method to identify potential sRNAs in the Clostridium genus to open up the field of sRNA research in clostridia. Using comparative genomics analyses combined with predictions of rho-independent terminators and promoters, we predicted sRNAs in 21 clostridial genomes: Clostridium acetobutylicum, C. beijerinckii, C. botulinum (eight strains), C. cellulolyticum, C. difficile, C. kluyveri (two strains), C. novyi, C. perfringens (three strains), C. phytofermentans, C. tetani, and C. thermocellum. Although more than one-third of predicted sRNAs have Shine-Dalgarno (SD) sequences, only one-sixth have a start codon downstream of SD sequences; thus, most of the predicted sRNAs are noncoding RNAs. Quantitative reverse transcription-PCR (Q-RT-PCR) and Northern analysis were employed to test the presence of a randomly chosen set of sRNAs in C. acetobutylicum and several C. botulinum strains, leading to the confirmation of a large fraction of the tested sRNAs. We identified a conserved, novel sRNA which, together with the downstream gene coding for an ATP-binding cassette (ABC) transporter gene, responds to the antibiotic clindamycin. The number of predicted sRNAs correlated with the physiological function of the species (high for pathogens, low for cellulolytic, and intermediate for solventogenic), but not with 16S rRNA-based phylogeny.
Collapse
Affiliation(s)
- Yili Chen
- Delaware Biotechnology Institute, Molecular Biotechnology Laboratory, University of Delaware, Newark, Delaware, USA
- Department of Chemical Engineering, Colburn Laboratory, University of Delaware, Newark, Delaware, USA; and
| | - Dinesh C. Indurthi
- Delaware Biotechnology Institute, Molecular Biotechnology Laboratory, University of Delaware, Newark, Delaware, USA
- Department of Chemical Engineering, Colburn Laboratory, University of Delaware, Newark, Delaware, USA; and
| | - Shawn W. Jones
- Delaware Biotechnology Institute, Molecular Biotechnology Laboratory, University of Delaware, Newark, Delaware, USA
- Department of Chemical Engineering, Colburn Laboratory, University of Delaware, Newark, Delaware, USA; and
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, USA
| | - Eleftherios T. Papoutsakis
- Delaware Biotechnology Institute, Molecular Biotechnology Laboratory, University of Delaware, Newark, Delaware, USA
- Department of Chemical Engineering, Colburn Laboratory, University of Delaware, Newark, Delaware, USA; and
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, Illinois, USA
| |
Collapse
|
207
|
Abstract
Rapid improvements in high-throughput experimental technologies make it nowadays possible to study the expression, as well as changes in expression, of whole transcriptomes under different environmental conditions in a detailed view. We describe current approaches to identify genome-wide functional RNA transcripts (experimentally as well as computationally), and focus on computational methods that may be utilized to disclose their function. While genome databases offer a wealth of information about known and putative functions for protein-coding genes, functional information for novel non-coding RNA genes is almost nonexistent. This is mainly explained by the lack of established software tools to efficiently reveal the function and evolutionary origin of non-coding RNA genes. Here, we describe in detail computational approaches one may follow to annotate and classify an RNA transcript.
Collapse
Affiliation(s)
- Kristin Reiche
- Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany
| | | | | | | | | |
Collapse
|
208
|
Abstract
RNA localisation is an important mode of delivering proteins to their site of function. Cis-acting signals within the RNAs, which can be thought of as zip-codes, determine the site of localisation. There are few examples of fully characterised RNA signals, but the signals are thought to be defined through a combination of primary, secondary, and tertiary structures. In this chapter, we describe a selection of computational methods for predicting RNA secondary structure, identifying localisation signals, and searching for similar localisation signals on a genome-wide scale. The chapter is aimed at the biologist rather than presenting the details of each of the individual methods.
Collapse
|
209
|
Abstract
Like protein coding sequences, functional motifs in RNA elements are frequently conserved, but this conservation is most often at the structure level rather than sequence based. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the nature of RNA-protein interactions. The discovery of elements targeted by RNA-binding proteins and how they function remains one of the most active, yet elusive areas of RNA biology. Only a limited number of these elements have been well characterized with many of the fundamental rules yet to be discovered. Here we present a comprehensive list of web based resources that can be used in the study and identification of RNA-based structural and regulatory motifs and provide a survey of the informatic resources that can have been developed to facilitate this research.
Collapse
Affiliation(s)
- Ajish D George
- Department of Biomedical Sciences, School of Public Health, Gen∗NY∗Sis Center for Excellence in Cancer Genomics, University at Albany-SUNY, Rensselaer, NY, USA.
| | | |
Collapse
|
210
|
De Francisci D, Campanaro S, Kornfeld G, Siddiqui KS, Williams TJ, Ertan H, Treu L, Pilak O, Lauro FM, Harrop SJ, Curmi PMG, Cavicchioli R. The RNA polymerase subunits E/F from the Antarctic archaeon Methanococcoides burtonii bind to specific species of mRNA. Environ Microbiol 2010; 13:2039-55. [DOI: 10.1111/j.1462-2920.2010.02385.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
211
|
Campanaro S, Williams TJ, Burg DW, De Francisci D, Treu L, Lauro FM, Cavicchioli R. Temperature-dependent global gene expression in the Antarctic archaeon Methanococcoides burtonii. Environ Microbiol 2010; 13:2018-38. [PMID: 21059163 DOI: 10.1111/j.1462-2920.2010.02367.x] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Methanococcoides burtonii is a member of the Archaea that was isolated from Ace Lake in Antarctica and is a valuable model for studying cold adaptation. Low temperature transcriptional regulation of global gene expression, and the arrangement of transcriptional units in cold-adapted archaea has not been studied. We developed a microarray for determining which genes are expressed in operons, and which are differentially expressed at low (4°C) or high (23°C) temperature. Approximately 55% of genes were found to be arranged in operons that range in length from 2 to 23 genes, and mRNA abundance tended to increase with operon length. Analysing microarray data previously obtained by others for Halobacterium salinarum revealed a similar correlation between operon length and mRNA abundance, suggesting that operons may play a similar role more broadly in the Archaea. More than 500 genes were differentially expressed at levels up to ≈ 24-fold. A notable feature was the upregulation of genes involved in maintaining RNA in a state suitable for translation in the cold. Comparison between microarray experiments and results previously obtained using proteomics indicates that transcriptional regulation (rather than translation) is primarily responsible for controlling gene expression in M. burtonii. In addition, certain genes (e.g. involved in ribosome structure and methanogenesis) appear to be regulated post-transcriptionally. This is one of few experimental studies describing the genome-wide distribution and regulation of operons in archaea.
Collapse
Affiliation(s)
- S Campanaro
- CRIBI Biotechnology Centre, Department of Biology, University of Padua, Via U. Bassi 58/B, 35121 Padova, Italy
| | | | | | | | | | | | | |
Collapse
|
212
|
Naville M, Gautheret D. Premature terminator analysis sheds light on a hidden world of bacterial transcriptional attenuation. Genome Biol 2010; 11:R97. [PMID: 20920266 PMCID: PMC2965389 DOI: 10.1186/gb-2010-11-9-r97] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2010] [Revised: 08/11/2010] [Accepted: 09/29/2010] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Bacterial transcription attenuation occurs through a variety of cis-regulatory elements that control gene expression in response to a wide range of signals. The signal-sensing structures in attenuators are so diverse and rapidly evolving that only a small fraction have been properly annotated and characterized to date. Here we apply a broad-spectrum detection tool in order to achieve a more complete view of the transcriptional attenuation complement of key bacterial species. RESULTS Our protocol seeks gene families with an unusual frequency of 5' terminators found across multiple species. Many of the detected attenuators are part of annotated elements, such as riboswitches or T-boxes, which often operate through transcriptional attenuation. However, a significant fraction of candidates were not previously characterized in spite of their unmistakable footprint. We further characterized some of these new elements using sequence and secondary structure analysis. We also present elements that may control the expression of several non-homologous genes, suggesting co-transcription and response to common signals. An important class of such elements, which we called mobile attenuators, is provided by 3' terminators of insertion sequences or prophages that may be exapted as 5' regulators when inserted directly upstream of a cellular gene. CONCLUSIONS We show here that attenuators involve a complex landscape of signal-detection structures spanning the entire bacterial domain. We discuss possible scenarios through which these diverse 5' regulatory structures may arise or evolve.
Collapse
MESH Headings
- Bacillus subtilis/genetics
- Bacteria/genetics
- Bacteria/metabolism
- Base Sequence
- Codon, Nonsense
- Codon, Terminator
- Escherichia coli/genetics
- Gene Expression Regulation, Bacterial
- Genome, Bacterial
- Interspersed Repetitive Sequences
- RNA, Bacterial/genetics
- RNA, Bacterial/metabolism
- Regulatory Elements, Transcriptional
- Regulatory Sequences, Nucleic Acid
- Riboswitch
- Sequence Analysis, DNA
- Synteny
- T-Box Domain Proteins
- Terminator Regions, Genetic
- Transcription, Genetic
Collapse
Affiliation(s)
- Magali Naville
- Université Paris-Sud, CNRS, UMR8621, Institut de Génétique et Microbiologie, Bâtiment 400, F-91405 Orsay Cedex, France
| | - Daniel Gautheret
- Université Paris-Sud, CNRS, UMR8621, Institut de Génétique et Microbiologie, Bâtiment 400, F-91405 Orsay Cedex, France
| |
Collapse
|
213
|
Zhong C, Tang H, Zhang S. RNAMotifScan: automatic identification of RNA structural motifs using secondary structural alignment. Nucleic Acids Res 2010; 38:e176. [PMID: 20696653 PMCID: PMC2952876 DOI: 10.1093/nar/gkq672] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Recent studies have shown that RNA structural motifs play essential roles in RNA folding and interaction with other molecules. Computational identification and analysis of RNA structural motifs remains a challenging task. Existing motif identification methods based on 3D structure may not properly compare motifs with high structural variations. Other structural motif identification methods consider only nested canonical base-pairing structures and cannot be used to identify complex RNA structural motifs that often consist of various non-canonical base pairs due to uncommon hydrogen bond interactions. In this article, we present a novel RNA structural alignment method for RNA structural motif identification, RNAMotifScan, which takes into consideration the isosteric (both canonical and non-canonical) base pairs and multi-pairings in RNA structural motifs. The utility and accuracy of RNAMotifScan is demonstrated by searching for kink-turn, C-loop, sarcin-ricin, reverse kink-turn and E-loop motifs against a 23S rRNA (PDBid: 1S72), which is well characterized for the occurrences of these motifs. Finally, we search these motifs against the RNA structures in the entire Protein Data Bank and the abundances of them are estimated. RNAMotifScan is freely available at our supplementary website (http://genome.ucf.edu/RNAMotifScan).
Collapse
Affiliation(s)
- Cuncong Zhong
- School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | | | | |
Collapse
|
214
|
Mathews DH, Moss WN, Turner DH. Folding and finding RNA secondary structure. Cold Spring Harb Perspect Biol 2010; 2:a003665. [PMID: 20685845 DOI: 10.1101/cshperspect.a003665] [Citation(s) in RCA: 106] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Optimal exploitation of the expanding database of sequences requires rapid finding and folding of RNAs. Methods are reviewed that automate folding and discovery of RNAs with algorithms that couple thermodynamics with chemical mapping, NMR, and/or sequence comparison. New functional noncoding RNAs in genome sequences can be found by combining sequence comparison with the assumption that functional noncoding RNAs will have more favorable folding free energies than other RNAs. When a new RNA is discovered, experiments and sequence comparison can restrict folding space so that secondary structure can be rapidly determined with the help of predicted free energies. In turn, secondary structure restricts folding in three dimensions, which allows modeling of three-dimensional structure. An example from a domain of a retrotransposon is described. Discovery of new RNAs and their structures will provide insights into evolution, biology, and design of therapeutics. Applications to studies of evolution are also reviewed.
Collapse
Affiliation(s)
- David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester School of Medicine and Dentistry, Rochester, New York 14642, USA
| | | | | |
Collapse
|
215
|
Yusuf D, Marz M, Stadler PF, Hofacker IL. Bcheck: a wrapper tool for detecting RNase P RNA genes. BMC Genomics 2010; 11:432. [PMID: 20626900 PMCID: PMC2996960 DOI: 10.1186/1471-2164-11-432] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2010] [Accepted: 07/13/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Effective bioinformatics solutions are needed to tackle challenges posed by industrial-scale genome annotation. We present Bcheck, a wrapper tool which predicts RNase P RNA genes by combining the speed of pattern matching and sensitivity of covariance models. The core of Bcheck is a library of subfamily specific descriptor models and covariance models. RESULTS Scanning all microbial genomes in GenBank identifies RNase P RNA genes in 98% of 1024 microbial chromosomal sequences within just 4 hours on single CPU. Comparing to existing annotations found in 387 of the GenBank files, Bcheck predictions have more intact structure and are automatically classified by subfamily membership. For eukaryotic chromosomes Bcheck could identify the known RNase P RNA genes in 84 out of 85 metazoan genomes and 19 out of 21 fungi genomes. Bcheck predicted 37 novel eukaryotic RNase P RNA genes, 32 of which are from fungi. Gene duplication events are observed in at least 20 metazoan organisms. Scanning of meta-genomic data from the Global Ocean Sampling Expedition, comprising over 10 million sample sequences (18 Gigabases), predicted 2909 unique genes, 98% of which fall into ancestral bacteria A type of RNase P RNA and 66% of which have no close homolog to known prokaryotic RNase P RNA. CONCLUSIONS The combination of efficient filtering by means of a descriptor-based search and subsequent construction of a high-quality gene model by means of a covariance model provides an efficient method for the detection of RNase P RNA genes in large-scale sequencing data. Bcheck is implemented as webserver and can also be downloaded for local use from http://rna.tbi.univie.ac.at/bcheck.
Collapse
Affiliation(s)
- Dilmurat Yusuf
- Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, A-1090 Wien, Austria
| | | | | | | |
Collapse
|
216
|
Kanhere A, Viiri K, Araújo CC, Rasaiyaah J, Bouwman RD, Whyte WA, Pereira CF, Brookes E, Walker K, Bell GW, Pombo A, Fisher AG, Young RA, Jenner RG. Short RNAs are transcribed from repressed polycomb target genes and interact with polycomb repressive complex-2. Mol Cell 2010; 38:675-88. [PMID: 20542000 DOI: 10.1016/j.molcel.2010.03.019] [Citation(s) in RCA: 299] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2009] [Revised: 01/12/2010] [Accepted: 03/26/2010] [Indexed: 01/13/2023]
Abstract
Polycomb proteins maintain cell identity by repressing the expression of developmental regulators specific for other cell types. Polycomb repressive complex-2 (PRC2) catalyzes trimethylation of histone H3 lysine-27 (H3K27me3). Although repressed, PRC2 targets are generally associated with the transcriptional initiation marker H3K4me3, but the significance of this remains unclear. Here, we identify a class of short RNAs, approximately 50-200 nucleotides in length, transcribed from the 5' end of polycomb target genes in primary T cells and embryonic stem cells. Short RNA transcription is associated with RNA polymerase II and H3K4me3, occurs in the absence of mRNA transcription, and is independent of polycomb activity. Short RNAs form stem-loop structures resembling PRC2 binding sites in Xist, interact with PRC2 through SUZ12, cause gene repression in cis, and are depleted from polycomb target genes activated during cell differentiation. We propose that short RNAs play a role in the association of PRC2 with its target genes.
Collapse
Affiliation(s)
- Aditi Kanhere
- Division of Infection and Immunity, University College London, London W1T 4JF, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
217
|
Beaudoin JD, Perreault JP. 5'-UTR G-quadruplex structures acting as translational repressors. Nucleic Acids Res 2010; 38:7022-36. [PMID: 20571090 PMCID: PMC2978341 DOI: 10.1093/nar/gkq557] [Citation(s) in RCA: 175] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Given that greater than 90% of the human genome is expressed, it is logical to assume that post-transcriptional regulatory mechanisms must be the primary means of controlling the flow of information from mRNA to protein. This report describes a robust approach that includes in silico, in vitro and in cellulo experiments permitting an in-depth evaluation of the impact of G-quadruplexes as translational repressors. Sequences including potential G-quadruplexes were selected within nine distinct genes encoding proteins involved in various biological processes. Their abilities to fold into G-quadruplex structures in vitro were evaluated using circular dichroism, thermal denaturation and the novel use of in-line probing. Six sequences were observed to fold into G-quadruplex structures in vitro, all of which exhibited translational inhibition in cellulo when linked to a reporter gene. Sequence analysis, direct mutagenesis and subsequent experiments were performed in order to define the rules governing the folding of G-quadruplexes. In addition, the impact of single-nucleotide polymorphism was shown to be important in the formation of G-quadruplexes located within the 5'-untranslated region of an mRNA. In light of these results, clearly the 5'-UTR G-quadruplexes represent a class of translational repressors that is broadly distributed in the cell.
Collapse
Affiliation(s)
- Jean-Denis Beaudoin
- RNA Group/Groupe ARN, Département de biochimie, Faculté de médecine et des sciences de la santé, Université de Sherbrooke, QC, J1H 5N4, Canada
| | | |
Collapse
|
218
|
Riccitelli NJ, Lupták A. Computational discovery of folded RNA domains in genomes and in vitro selected libraries. Methods 2010; 52:133-40. [PMID: 20554049 DOI: 10.1016/j.ymeth.2010.06.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 06/03/2010] [Indexed: 10/19/2022] Open
Abstract
Structured functional RNAs are conserved on the level of secondary and tertiary structure, rather than at sequence level, and so traditional sequence-based searches often fail to identify them. Structure-based searches are increasingly used to discover known RNA motifs in sequence databases. We describe the application of the program RNABOB, which performs such searches by allowing the user to define a desired motif's sequence, paired and spacer elements and then scans a sequence file for regions capable of assuming the prescribed fold. Structure descriptors of stem-loops, internal loops, three-way junctions, kissing loops, and the hammerhead and hepatitis delta virus ribozymes are shown as examples of implementation of structure-based searches.
Collapse
|
219
|
Campillos M, Cases I, Hentze MW, Sanchez M. SIREs: searching for iron-responsive elements. Nucleic Acids Res 2010; 38:W360-7. [PMID: 20460462 PMCID: PMC2896125 DOI: 10.1093/nar/gkq371] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
The iron regulatory protein/iron-responsive element regulatory system plays a crucial role in the post-transcriptional regulation of gene expression and its disruption results in human disease. IREs are cis-acting regulatory motifs present in mRNAs that encode proteins involved in iron metabolism. They function as binding sites for two related trans-acting factors, namely the IRP-1 and -2. Among cis-acting RNA regulatory elements, the IRE is one of the best characterized. It is defined by a combination of RNA sequence and structure. However, currently available programs to predict IREs do not show a satisfactory level of sensitivity and fail to detect some of the functional IREs. Here, we report an improved software for the prediction of IREs implemented as a user-friendly web server tool. The SIREs web server uses a simple data input interface and provides structure analysis, predicted RNA folds, folding energy data and an overall quality flag based on properties of well characterized IREs. Results are reported in a tabular format and as a schematic visual representation that highlights important features of the IRE. The SIREs (Search for iron-responsive elements) web server is freely available on the web at http://ccbg.imppc.org/sires/index.html
Collapse
Affiliation(s)
- Monica Campillos
- European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany
| | | | | | | |
Collapse
|
220
|
Kim N, Izzo JA, Elmetwaly S, Gan HH, Schlick T. Computational generation and screening of RNA motifs in large nucleotide sequence pools. Nucleic Acids Res 2010; 38:e139. [PMID: 20448026 PMCID: PMC2910066 DOI: 10.1093/nar/gkq282] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 10(12)-10(14)-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (10(14) sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6-8, 1-2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection.
Collapse
Affiliation(s)
- Namhee Kim
- Department of Chemistry, New York University, 100 Washington Square East, New York, NY 10003, USA
| | | | | | | | | |
Collapse
|
221
|
Schlüter JP, Reinkensmeier J, Daschkey S, Evguenieva-Hackenberg E, Janssen S, Jänicke S, Becker JD, Giegerich R, Becker A. A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti. BMC Genomics 2010; 11:245. [PMID: 20398411 PMCID: PMC2873474 DOI: 10.1186/1471-2164-11-245] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2010] [Accepted: 04/17/2010] [Indexed: 12/03/2022] Open
Abstract
Background Small untranslated RNAs (sRNAs) are widespread regulators of gene expression in bacteria. This study reports on a comprehensive screen for sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti applying deep sequencing of cDNAs and microarray hybridizations. Results A total of 1,125 sRNA candidates that were classified as trans-encoded sRNAs (173), cis-encoded antisense sRNAs (117), mRNA leader transcripts (379), and sense sRNAs overlapping coding regions (456) were identified in a size range of 50 to 348 nucleotides. Among these were transcripts corresponding to 82 previously reported sRNA candidates. Enrichment for RNAs with primary 5'-ends prior to sequencing of cDNAs suggested transcriptional start sites corresponding to 466 predicted sRNA regions. The consensus σ70 promoter motif CTTGAC-N17-CTATAT was found upstream of 101 sRNA candidates. Expression patterns derived from microarray hybridizations provided further information on conditions of expression of a number of sRNA candidates. Furthermore, GenBank, EMBL, DDBJ, PDB, and Rfam databases were searched for homologs of the sRNA candidates identified in this study. Searching Rfam family models with over 1,000 sRNA candidates, re-discovered only those sequences from S. meliloti already known and stored in Rfam, whereas BLAST searches suggested a number of homologs in related alpha-proteobacteria. Conclusions The screening data suggests that in S. meliloti about 3% of the genes encode trans-encoded sRNAs and about 2% antisense transcripts. Thus, this first comprehensive screen for sRNAs applying deep sequencing in an alpha-proteobacterium shows that sRNAs also occur in high number in this group of bacteria.
Collapse
Affiliation(s)
- Jan-Philip Schlüter
- Institute of Biology III, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
222
|
Legionella pneumophila 6S RNA optimizes intracellular multiplication. Proc Natl Acad Sci U S A 2010; 107:7533-8. [PMID: 20368425 DOI: 10.1073/pnas.0911764107] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Legionella pneumophila is a Gram-negative opportunistic human pathogen that infects and multiplies in a broad range of phagocytic protozoan and mammalian phagocytes. Based on the observation that small regulatory RNAs (sRNAs) play an important role in controlling virulence-related genes in several pathogenic bacteria, we attempted to identify sRNAs expressed by L. pneumophila. We used computational prediction followed by experimental verification to identify and characterize sRNAs encoded in the L. pneumophila genome. A 50-mer probe microarray was constructed to test the expression of predicted sRNAs in bacteria grown under a variety of conditions. This strategy successfully identified 22 expressed RNAs, out of which 6 were confirmed by northern blot and RACE. One of the identified sRNAs is highly expressed in postexponential phase, and computational prediction of its secondary structure reveals a striking similarity to the structure of 6S RNA, a widely distributed prokaryotic sRNA, known to regulate the activity of sigma(70)-containing RNA polymerase. A 70-mer probe microarray was used to identify genes affected by L. pneumophila 6S RNA in stationary phase. The 6S RNA positively regulates expression of genes encoding type IVB secretion system effectors, stress response genes such as groES and recA, as well as many genes involved in acquisition of nutrients and genes with unknown or hypothetical functions. Deletion of 6S RNA significantly reduced L. pneumophila intracellular multiplication in both protist and mammalian host cells, but had no detectable effect on growth in rich media.
Collapse
|
223
|
Menzel P, Gorodkin J, Stadler PF. The tedious task of finding homologous noncoding RNA genes. RNA (NEW YORK, N.Y.) 2009; 15:2075-82. [PMID: 19861422 PMCID: PMC2779685 DOI: 10.1261/rna.1556009] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
User-driven in silico RNA homology search is still a nontrivial task. In part, this is the consequence of a limited precision of the computational tools in spite of recent exciting progress in this area, and to a certain extent, computational costs are still problematic in practice. An important, and as we argue here, dominating issue is the dependence on good curated (secondary) structural alignments of the RNAs. These are often hard to obtain, not so much because of an inherent limitation in the available data, but because they require substantial manual curation, an effort that is rarely acknowledged. Here, we qualitatively describe a realistic scenario for what a "regular user" (i.e., a nonexpert in a particular RNA family) can do in practice, and what kind of results are likely to be achieved. Despite the indisputable advances in computational RNA biology, the conclusion is discouraging: BLAST still works better or equally good as other methods unless extensive expert knowledge on the RNA family is included. However, when good curated data are available the recent development yields further improvements in finding remote homologs. Homology search beyond the reach of BLAST hence is not at all a routine task.
Collapse
Affiliation(s)
- Peter Menzel
- Section for Genetics and Bioinformatics, IBHV, and Center for Applied Bioinformatics, University of Copenhagen, DK-1870 Frederiksberg, Denmark
| | | | | |
Collapse
|
224
|
Mosig A, Zhu L, Stadler PF. Customized strategies for discovering distant ncRNA homologs. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2009; 8:451-60. [PMID: 19779009 DOI: 10.1093/bfgp/elp035] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A large fraction of non-coding RNAs is short and/or poorly conserved in sequence. Most of the longer examples, furthermore, consist of a collection of conserved structural motifs rather than a coherent globally conserved secondary structure. As a consequence, the conceptually simple problem of homology search becomes a complex and technically demanding task. Despite the best efforts of databases such as Rfam, the situation is complicated further by the sparsity of information in many--in particular prokaryotic--RNA families. In this contribution, we review recent efforts to customize sequence-based search tools for ncRNA applications. In particular, semi-global alignments and the development of methods for fragmented pattern search have brought significant practical advances. Current developments in this area focus on the integration of fragmented sequence pattern search with search algorithms for secondary structure patterns. We focus here, in particular, on strategies that can be successful in the 'twilight zone' where generic approaches from blast to infernal to start to fail.
Collapse
Affiliation(s)
- Axel Mosig
- Chair of Bioinformatics, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, D-04107 Leipzig, Germany
| | | | | |
Collapse
|
225
|
Pushechnikov A, Lee MM, Childs-Disney JL, Sobczak K, French JM, Thornton CA, Disney MD. Rational design of ligands targeting triplet repeating transcripts that cause RNA dominant disease: application to myotonic muscular dystrophy type 1 and spinocerebellar ataxia type 3. J Am Chem Soc 2009; 131:9767-79. [PMID: 19552411 PMCID: PMC2731475 DOI: 10.1021/ja9020149] [Citation(s) in RCA: 148] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Herein, we describe the design of high affinity ligands that bind expanded rCUG and rCAG repeat RNAs expressed in myotonic dystrophy type 1 (DM1) and spinocerebellar ataxia type 3. These ligands also inhibit, with nanomolar IC(50) values, the formation of RNA-protein complexes that are implicated in both disorders. The expanded rCUG and rCAG repeats form stable RNA hairpins with regularly repeating internal loops in the stem and have deleterious effects on cell function. The ligands that bind the repeats display a derivative of the bisbenzimidazole Hoechst 33258, which was identified by searching known RNA-ligand interactions for ligands that bind the internal loop displayed in these hairpins. A series of 13 modularly assembled ligands with defined valencies and distances between ligand modules was synthesized to target multiple motifs in these RNAs simultaneously. The most avid binder, a pentamer, binds the rCUG repeat hairpin with a K(d) of 13 nM. When compared to a series of related RNAs, the pentamer binds to rCUG repeats with 4.4- to >200-fold specificity. Furthermore, the affinity of binding to rCUG repeats shows incremental gains with increasing valency, while the background binding to genomic DNA is correspondingly reduced. Then, it was determined whether the modularly assembled ligands inhibit the recognition of RNA repeats by Muscleblind-like 1 (MBNL1) protein, the expanded-rCUG binding protein whose sequestration leads to splicing defects in DM1. Among several compounds with nanomolar IC(50) values, the most potent inhibitor is the pentamer, which also inhibits the formation of rCAG repeat-MBNL1 complexes. Comparison of the binding data for the designed synthetic ligands and MBNL1 to repeating RNAs shows that the synthetic ligand is 23-fold higher affinity and more specific to DM1 RNAs than MBNL1. Further studies show that the designed ligands are cell permeable to mouse myoblasts. Thus, cell permeable ligands that bind repetitive RNAs have been designed that exhibit higher affinity and specificity for binding RNA than natural proteins. These studies suggest a general approach to targeting RNA, including those that cause RNA dominant disease.
Collapse
Affiliation(s)
- Alexei Pushechnikov
- Department of Chemistry and The Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, The State University of New York, 657 Natural Sciences Complex, Buffalo, NY 14260
| | - Melissa M. Lee
- Department of Chemistry and The Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, The State University of New York, 657 Natural Sciences Complex, Buffalo, NY 14260
| | | | - Krzysztof Sobczak
- Department of Neurology, University of Rochester, Rochester, NY, 14620
| | - Jonathan M. French
- Department of Chemistry and The Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, The State University of New York, 657 Natural Sciences Complex, Buffalo, NY 14260
| | | | - Matthew D. Disney
- Department of Chemistry and The Center of Excellence in Bioinformatics and Life Sciences, University at Buffalo, The State University of New York, 657 Natural Sciences Complex, Buffalo, NY 14260
| |
Collapse
|
226
|
Lai CE, Tsai MY, Liu YC, Wang CW, Chen KT, Lu CL. FASTR3D: a fast and accurate search tool for similar RNA 3D structures. Nucleic Acids Res 2009; 37:W287-95. [PMID: 19435878 PMCID: PMC2703968 DOI: 10.1093/nar/gkp330] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
FASTR3D is a web-based search tool that allows the user to fast and accurately search the PDB database for structurally similar RNAs. Currently, it allows the user to input three types of queries: (i) a PDB code of an RNA tertiary structure (default), optionally with specified residue range, (ii) an RNA secondary structure, optionally with primary sequence, in the dot-bracket notation and (iii) an RNA primary sequence in the FASTA format. In addition, the user can run FASTR3D with specifying additional filtering options: (i) the released date of RNA structures in the PDB database, and (ii) the experimental methods used to determine RNA structures and their least resolutions. In the output page, FASTR3D will show the user-queried RNA molecule, as well as user-specified options, followed by a detailed list of identified structurally similar RNAs. Particularly, when queried with RNA tertiary structures, FASTR3D provides a graphical display to show the structural superposition of the query structure and each of identified structures. FASTR3D is now available online at http://bioalgorithm.life.nctu.edu.tw/FASTR3D/.
Collapse
Affiliation(s)
- Chin-En Lai
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | | | | | | | | | | |
Collapse
|
227
|
Marz M, Kirsten T, Stadler PF. Evolution of spliceosomal snRNA genes in metazoan animals. J Mol Evol 2009; 67:594-607. [PMID: 19030770 DOI: 10.1007/s00239-008-9149-6] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2008] [Accepted: 07/14/2008] [Indexed: 11/28/2022]
Abstract
While studies of the evolutionary histories of protein families are commonplace, little is known on noncoding RNAs beyond microRNAs and some snoRNAs. Here we investigate in detail the evolutionary history of the nine spliceosomal snRNA families (U1, U2, U4, U5, U6, U11, U12, U4atac, and U6atac) across the completely or partially sequenced genomes of metazoan animals. Representatives of the five major spliceosomal snRNAs were found in all genomes. None of the minor splicesomal snRNAs were detected in nematodes or in the shotgun traces of Oikopleura dioica, while in all other animal genomes at most one of them is missing. Although snRNAs are present in multiple copies in most genomes, distinguishable paralogue groups are not stable over long evolutionary times, although they appear independently in several clades. In general, animal snRNA secondary structures are highly conserved, albeit, in particular, U11 and U12 in insects exhibit dramatic variations. An analysis of genomic context of snRNAs reveals that they behave like mobile elements, exhibiting very little syntenic conservation.
Collapse
Affiliation(s)
- Manuela Marz
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Härtelstrasse 16-18, 04107 Leipzig, Germany.
| | | | | |
Collapse
|
228
|
Byrne D, Grzela R, Lartigue A, Audic S, Chenivesse S, Encinas S, Claverie JM, Abergel C. The polyadenylation site of Mimivirus transcripts obeys a stringent 'hairpin rule'. Genome Res 2009; 19:1233-42. [PMID: 19403753 DOI: 10.1101/gr.091561.109] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Mimivirus, a giant DNA virus infecting Acanthamoeba, is revealing an increasing list of unique features such as a 1.2-Mb genome with numerous genes not found in other viruses, a uniquely conserved promoter signal, and a particle of unmatched complexity using two distinct portals for genome delivery and packaging. Herein, we contribute a further Mimivirus distinctive feature discovered by sequencing a panel of viral cDNAs produced for probing the structure of Mimivirus transcripts. All Mimivirus mRNAs are polyadenylated at a site coinciding exactly with unrelated, but strongly palindromic, genomic sequences. The analysis of 454 Life Sciences (Roche) FLX cDNA tags (150,651) confirmed this finding for all Mimivirus genes independent of their transcription timings and expression levels. The absence of a suitable palindromic signal between adjacent genes results in transcripts encompassing multiple ORFs in the same or even in opposite orientations. Surprisingly, Mimivirus tRNAs are expressed as polyadenylated messengers, including an ORF/tRNA composite mRNA. To our knowledge, both the nature and the stringency of the "hairpin rule" defining the location of polyadenylation sites are unique, raising once more the question of Mimivirus's evolutionary origin. The precise molecular mechanisms implementing the hairpin rule into the 3'-end processing of Mimivirus pre-mRNAs remain to be elucidated.
Collapse
Affiliation(s)
- Deborah Byrne
- Structural and Genomic Information Laboratory, CNRS-UPR 2589, IFR-88, Aix-Marseille University, Parc Scientifique de Luminy, Case 934, 13288 Marseille Cedex 9, France
| | | | | | | | | | | | | | | |
Collapse
|
229
|
Stabell FB, Tourasse NJ, Kolstø AB. A conserved 3' extension in unusual group II introns is important for efficient second-step splicing. Nucleic Acids Res 2009; 37:3202-14. [PMID: 19304998 PMCID: PMC2691827 DOI: 10.1093/nar/gkp186] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The B.c.I4 group II intron from Bacillus cereus ATCC 10987 harbors an unusual 3′ extension. Here, we report the discovery of four additional group II introns with a similar 3′ extension in Bacillus thuringiensis kurstaki 4D1 that splice at analogous positions 53/56 nt downstream of domain VI in vivo. Phylogenetic analyses revealed that the introns are only 47–61% identical to each other. Strikingly, they do not form a single evolutionary lineage even though they belong to the same Bacterial B class. The extension of these introns is predicted to form a conserved two-stem–loop structure. Mutational analysis in vitro showed that the smaller stem S1 is not critical for self-splicing, whereas the larger stem S2 is important for efficient exon ligation and lariat release in presence of the extension. This study clearly demonstrates that previously reported B.c.I4 is not a single example of a specialized intron, but forms a new functional class with an unusual mode that ensures proper positioning of the 3′ splice site.
Collapse
Affiliation(s)
- Fredrik B Stabell
- Laboratory for Microbial Dynamics (LaMDa), Department of Pharmaceutical Biosciences, University of Oslo, Oslo, Norway
| | | | | |
Collapse
|
230
|
Santhanam AN, Bindewald E, Rajasekhar VK, Larsson O, Sonenberg N, Colburn NH, Shapiro BA. Role of 3'UTRs in the translation of mRNAs regulated by oncogenic eIF4E--a computational inference. PLoS One 2009; 4:e4868. [PMID: 19290046 PMCID: PMC2654073 DOI: 10.1371/journal.pone.0004868] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Accepted: 02/01/2009] [Indexed: 01/07/2023] Open
Abstract
Eukaryotic cap-dependent mRNA translation is mediated by the initiation factor eIF4E, which binds mRNAs and stimulates efficient translation initiation. eIF4E is often overexpressed in human cancers. To elucidate the molecular signature of eIF4E target mRNAs, we analyzed sequence and structural properties of two independently derived polyribosome recruited mRNA datasets. These datasets originate from studies of mRNAs that are actively being translated in response to cells over-expressing eIF4E or cells with an activated oncogenic AKT: eIF4E signaling pathway, respectively. Comparison of eIF4E target mRNAs to mRNAs insensitive to eIF4E-regulation has revealed surprising features in mRNA secondary structure, length and microRNA-binding properties. Fold-changes (the relative change in recruitment of an mRNA to actively translating polyribosomal complexes in response to eIF4E overexpression or AKT upregulation) are positively correlated with mRNA G+C content and negatively correlated with total and 3'UTR length of the mRNAs. A machine learning approach for predicting the fold change was created. Interesting tendencies of secondary structure stability are found near the start codon and at the beginning of the 3'UTR region. Highly upregulated mRNAs show negative selection (site avoidance) for binding sites of several microRNAs. These results are consistent with the emerging model of regulation of mRNA translation through a dynamic balance between translation initiation at the 5'UTR and microRNA binding at the 3'UTR.
Collapse
Affiliation(s)
- Arti N. Santhanam
- Gene Regulation Section, Laboratory of Cancer Prevention, National Cancer Institute, Frederick, Maryland, United States of America
| | - Eckart Bindewald
- Basic Research Program, SAIC-Frederick, Inc., National Cancer Institute-Frederick, Frederick, Maryland, United States of America
| | - Vinagolu K. Rajasekhar
- Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America
| | - Ola Larsson
- Department of Biochemistry and McGill Cancer Center, McGill University, Montreal, Quebec, Canada
| | - Nahum Sonenberg
- Department of Biochemistry and McGill Cancer Center, McGill University, Montreal, Quebec, Canada
| | - Nancy H. Colburn
- Basic Research Program, SAIC-Frederick, Inc., National Cancer Institute-Frederick, Frederick, Maryland, United States of America
| | - Bruce A. Shapiro
- Center for Cancer Research, Nanobiology Program, National Cancer Institute, Frederick, Maryland, United States of America
| |
Collapse
|
231
|
Aminova O, Paul DJ, Childs-Disney JL, Disney MD. Two-dimensional combinatorial screening identifies specific 6'-acylated kanamycin A- and 6'-acylated neamine-RNA hairpin interactions. Biochemistry 2009; 47:12670-9. [PMID: 18991404 DOI: 10.1021/bi8012615] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Herein, we report the RNA hairpin loops from a six-nucleotide hairpin library that bind 6'-acylated kanamycin A (1) and 6'-acylated neamine (2) identified by two-dimensional combinatorial screening (2DCS). Hairpins selected to bind 1 have K(d)'s ranging from 235 to 1035 nM, with an average K(d) of 618 nM. For 2, the selected hairpins bind with K(d)'s ranging from 135 to 2300 nM, with an average K(d) of 1010 nM. The selected RNA hairpin-ligand interactions are also specific for the ligand that they were selected to bind compared with the other arrayed ligand. For example, the mixture of hairpins selected for 1 on average bind 33-fold more tightly to 1 than to 2, while the mixtures of hairpins selected for 2 on average bind 11-fold more tightly to 2 than to 1. Secondary structure prediction of the selected sequences was completed to determine the motifs that each ligand binds, and the hairpin loop preferences for 1 and 2 were computed. For 1, the preferred hairpin loops contain an adenine separated by at least two nucleotides from a cytosine, for example, ANNCNN (two-tailed p-value = 0.0010) and ANNNCN (two-tailed p-value <0.0001). For 2, the preferred hairpin loops contain both 5'GC and 5'CG steps (two-tailed p-value <0.0001). These results expand the information available on the RNA hairpin loops that bind small molecules and could prove useful for targeting RNA.
Collapse
Affiliation(s)
- Olga Aminova
- Department of Chemistry, University at Buffalo, The State University of New York, and the NYS Center of Excellence in Bioinformatics & Life Sciences, 657 Natural Sciences Complex, Buffalo, New York 14260, USA
| | | | | | | |
Collapse
|
232
|
Marchais A, Naville M, Bohn C, Bouloc P, Gautheret D. Single-pass classification of all noncoding sequences in a bacterial genome using phylogenetic profiles. Genome Res 2009; 19:1084-92. [PMID: 19237465 DOI: 10.1101/gr.089714.108] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Identification and characterization of functional elements in the noncoding regions of genomes is an elusive and time-consuming activity whose output does not keep up with the pace of genome sequencing. Hundreds of bacterial genomes lay unexploited in terms of noncoding sequence analysis, although they may conceal a wide diversity of novel RNA genes, riboswitches, or other regulatory elements. We describe a strategy that exploits the entirety of available bacterial genomes to classify all noncoding elements of a selected reference species in a single pass. This method clusters noncoding elements based on their profile of presence among species. Most noncoding RNAs (ncRNAs) display specific signatures that enable their grouping in distinct clusters, away from sequence conservation noise and other elements such as promoters. We submitted 24 ncRNA candidates from Staphylococcus aureus to experimental validation and confirmed the presence of seven novel small RNAs or riboswitches. Besides offering a powerful method for de novo ncRNA identification, the analysis of phylogenetic profiles opens a new path toward the identification of functional relationships between co-evolving coding and noncoding elements.
Collapse
Affiliation(s)
- Antonin Marchais
- Université Paris-Sud 11, CNRS, UMR8621, Institut de Génétique et Microbiologie, F-91405 Orsay Cedex, France
| | | | | | | | | |
Collapse
|
233
|
Abstract
As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for the best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe the chain of decisions accompanying a metagenomic project from the viewpoint of the bioinformatic analysis step by step. We guide the reader through a standard workflow for a metagenomic project beginning with presequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries, and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic data sets in contrast to genome projects. Different types of data analyses particular to metagenomes are then presented, including binning, dominant population analysis, and gene-centric analysis. Finally, data management issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.
Collapse
|
234
|
Hamilton RS, Hartswood E, Vendra G, Jones C, Van De Bor V, Finnegan D, Davis I. A bioinformatics search pipeline, RNA2DSearch, identifies RNA localization elements in Drosophila retrotransposons. RNA (NEW YORK, N.Y.) 2009; 15:200-7. [PMID: 19144907 PMCID: PMC2648715 DOI: 10.1261/rna.1264109] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2008] [Accepted: 11/12/2008] [Indexed: 05/16/2023]
Abstract
mRNA localization is a widespread mode of delivering proteins to their site of function. The embryonic axes in Drosophila are determined in the oocyte, through Dynein-dependent transport of gurken/TGF-alpha mRNA, containing a small localization signal that assigns its destination. A signal with a similar secondary structure, but lacking significant sequence similarity, is present in the I factor retrotransposon mRNA, also transported by Dynein. It is currently unclear whether other mRNAs exist that are localized to the same site using similar signals. Moreover, searches for other genes containing similar elements have not been possible due to a lack of suitable bioinformatics methods for searches of secondary structure elements and the difficulty of experimentally testing all the possible candidates. We have developed a bioinformatics approach for searching across the genome for small RNA elements that are similar to the secondary structures of particular localization signals. We have uncovered 48 candidates, of which we were able to test 22 for their localization potential using injection assays for Dynein mediated RNA localization. We found that G2 and Jockey transposons each contain a gurken/I factor-like RNA stem-loop required for Dynein-dependent localization to the anterior and dorso-anterior corner of the oocyte. We conclude that I factor, G2, and Jockey are members of a "family" of transposable elements sharing a gurken-like mRNA localization signal and Dynein-dependent mechanism of transport. The bioinformatics pipeline we have developed will have broader utility in fields where small RNA signals play important roles.
Collapse
|
235
|
|
236
|
Informatic resources for identifying and annotating structural RNA motifs. Mol Biotechnol 2008; 41:180-93. [PMID: 18979204 DOI: 10.1007/s12033-008-9114-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Accepted: 10/01/2008] [Indexed: 10/21/2022]
Abstract
Post-transcriptional regulation of genes and transcripts is a vital aspect of cellular processes, and unlike transcriptional regulation, remains a largely unexplored domain. One of the most obvious and most important questions to explore is the discovery of functional RNA elements. Many RNA elements have been characterized to date ranging from cis-regulatory motifs within mRNAs to large families of non-coding RNAs. Like protein coding genes, the functional motifs of these RNA elements are highly conserved, but unlike protein coding genes, it is most often the structure and not the sequence that is conserved. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the post-transcriptional aspects of the genomic world. Here, we focus on the task of structural motif discovery and provide a survey of the informatics resources geared towards this task.
Collapse
|
237
|
Pichon C, Felden B. Small RNA gene identification and mRNA target predictions in bacteria. Bioinformatics 2008; 24:2807-13. [PMID: 18974076 DOI: 10.1093/bioinformatics/btn560] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Bacterial small ribonucleic acids (sRNAs) that are not ribosomal and transfer or messenger RNAs were initially identified in the sixties, whereas their molecular functions are still under active investigation today. It is now widely accepted that most play central roles in gene expression regulation in response to environmental changes. Interestingly, some are also implicated in bacterial virulence. Functional studies revealed that a large subset of these sRNAs act by an antisense mechanism thanks to pairing interactions with dedicated mRNA targets, usually around their translation start sites, to modulate gene expression at the posttranscriptional level. Some sRNAs modulate protein activity or mimic the structure of other macromolecules. In the last few years, in silico methods have been developed to detect more bacterial sRNAs. Among these, computational analyses of the bacterial genomes by comparative genomics have predicted the existence of a plethora of sRNAs, some that were confirmed to be expressed in vivo. The prediction accuracy of these computational tools is highly variable and can be perfectible. Here we review the computational studies that have contributed to detecting the sRNA gene and mRNA targets in bacteria and the methods for their experimental testing. In addition, the remaining challenges are discussed.
Collapse
Affiliation(s)
- Christophe Pichon
- Unité Pathogénie Bactérienne des Muqueuses, Institut Pasteur, 25-28 Rue du Docteur Roux, 75724 Paris, France
| | | |
Collapse
|
238
|
Shulman-Peleg A, Nussinov R, Wolfson HJ. RsiteDB: a database of protein binding pockets that interact with RNA nucleotide bases. Nucleic Acids Res 2008; 37:D369-73. [PMID: 18953028 PMCID: PMC2686467 DOI: 10.1093/nar/gkn759] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
We present a new database and an on-line search engine, which store and query the protein binding pockets that interact with single-stranded RNA nucleotide bases. The database consists of a classification of binding sites derived from protein–RNA complexes. Each binding site is assigned to a cluster of similar binding sites in other protein–RNA complexes. Cluster members share similar spatial arrangements of physico–chemical properties, thus can reveal novel similarity between proteins and RNAs with different sequences and folds. The clusters provide 3D consensus binding patterns important for protein–nucleotide recognition. The database search engine allows two types of useful queries: first, given a PDB code of a protein–RNA complex, RsiteDB can detail and classify the properties of the protein binding pockets accommodating extruded RNA nucleotides not involved in local RNA base pairing. Second, given an unbound protein structure, RsiteDB can perform an on-line structural search against the constructed database of 3D consensus binding patterns. Regions similar to known patterns are predicted to serve as binding sites. Alignment of the query to these patterns with their corresponding RNA nucleotides allows making unique predictions of the protein–RNA interactions at the atomic level of detail. This database is accessable at http://bioinfo3d.cs.tau.ac.il/RsiteDB.
Collapse
Affiliation(s)
- Alexandra Shulman-Peleg
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| | | | | |
Collapse
|
239
|
Theis C, Reeder J, Giegerich R. KnotInFrame: prediction of -1 ribosomal frameshift events. Nucleic Acids Res 2008; 36:6013-20. [PMID: 18820303 PMCID: PMC2566878 DOI: 10.1093/nar/gkn578] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Programmed −1 ribosomal frameshift (−1 PRF) allows for alternative reading frames within one mRNA. First found in several viruses, it is now believed to exist in all kingdoms of life. Strong stimulators for −1 PRF are a heptameric slippery site and an RNA pseudoknot. Here, we present a new algorithm KnotInFrame, for the automatic detection of −1 PRF signals from genomic sequences. It finds the frameshifting stimulators by means of a specialized RNA-pseudoknot folding program, fast enough for genome-wide analyses. Evaluations on known −1 PRF signals demonstrate a high sensitivity.
Collapse
Affiliation(s)
- Corinna Theis
- Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany
| | | | | |
Collapse
|
240
|
Livny J, Teonadi H, Livny M, Waldor MK. High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs. PLoS One 2008; 3:e3197. [PMID: 18787707 PMCID: PMC2527527 DOI: 10.1371/journal.pone.0003197] [Citation(s) in RCA: 160] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2008] [Accepted: 08/25/2008] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Diverse bacterial genomes encode numerous small non-coding RNAs (sRNAs) that regulate myriad biological processes. While bioinformatic algorithms have proven effective in identifying sRNA-encoding loci, the lack of tools and infrastructure with which to execute these computationally demanding algorithms has limited their utilization. Genome-wide predictions of sRNA-encoding genes have been conducted in less than 3% of all sequenced bacterial strains, leading to critical gaps in current annotations. The relative paucity of genome-wide sRNA prediction represents a critical gap in current annotations of bacterial genomes and has limited examination of larger issues in sRNA biology, such as sRNA evolution. METHODOLOGY/PRINCIPAL FINDINGS We have developed and deployed SIPHT, a high throughput computational tool that utilizes workflow management and distributed computing to effectively conduct kingdom-wide predictions and annotations of intergenic sRNA-encoding genes. Candidate sRNA-encoding loci are identified based on the presence of putative Rho-independent terminators downstream of conserved intergenic sequences, and each locus is annotated for several features, including conservation in other species, association with one of several transcription factor binding sites and homology to any of over 300 previously identified sRNAs and cis-regulatory RNA elements. Using SIPHT, we conducted searches for putative sRNA-encoding genes in all 932 bacterial replicons in the NCBI database. These searches yielded nearly 60% of previously confirmed sRNAs, hundreds of previously annotated cis-encoded regulatory RNA elements such as riboswitches, and over 45,000 novel candidate intergenic loci. CONCLUSIONS/SIGNIFICANCE Candidate loci were identified across all branches of the bacterial evolutionary tree, suggesting a central and ubiquitous role for RNA-mediated regulation among bacterial species. Annotation of candidate loci by SIPHT provides clues into the potential biological function of thousands of previously confirmed and candidate regulatory RNAs and affords new insights into the evolution of bacterial riboregulation.
Collapse
Affiliation(s)
- Jonathan Livny
- Channing Laboratories, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.
| | | | | | | |
Collapse
|
241
|
Simon DM, Clarke NAC, McNeil BA, Johnson I, Pantuso D, Dai L, Chai D, Zimmerly S. Group II introns in eubacteria and archaea: ORF-less introns and new varieties. RNA (NEW YORK, N.Y.) 2008; 14:1704-13. [PMID: 18676618 PMCID: PMC2525955 DOI: 10.1261/rna.1056108] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Group II introns are a major class of ribozymes found in bacteria, mitochondria, and plastids. Many introns contain reverse transcriptase open reading frames (ORFs) that confer mobility to the introns and allow them to persist as selfish DNAs. Here, we report an updated compilation of group II introns in Eubacteria and Archaea comprising 234 introns. One new phylogenetic class is identified, as well as several specialized lineages. In addition, we undertake a detailed search for ORF-less group II introns in bacterial genomes in order to find undiscovered introns that either entirely lack an ORF or encode a novel ORF. Unlike organellar group II introns, we find only a handful of ORF-less introns in bacteria, suggesting that if a substantial number exist, they must be divergent from known introns. Together, these results highlight the retroelement character of bacterial group II introns, and suggest that their long-term survival is dependent upon retromobility.
Collapse
Affiliation(s)
- Dawn M Simon
- Department of Biological Sciences, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | | | | | | | | | | | | | | |
Collapse
|
242
|
The distributions, mechanisms, and structures of metabolite-binding riboswitches. Genome Biol 2008; 8:R239. [PMID: 17997835 PMCID: PMC2258182 DOI: 10.1186/gb-2007-8-11-r239] [Citation(s) in RCA: 365] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2007] [Revised: 10/01/2007] [Accepted: 11/12/2007] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Riboswitches are noncoding RNA structures that appropriately regulate genes in response to changing cellular conditions. The expression of many proteins involved in fundamental metabolic processes is controlled by riboswitches that sense relevant small molecule ligands. Metabolite-binding riboswitches that recognize adenosylcobalamin (AdoCbl), thiamin pyrophosphate (TPP), lysine, glycine, flavin mononucleotide (FMN), guanine, adenine, glucosamine-6-phosphate (GlcN6P), 7-aminoethyl 7-deazaguanine (preQ1), and S-adenosylmethionine (SAM) have been reported. RESULTS We have used covariance model searches to identify examples of ten widespread riboswitch classes in the genomes of organisms from all three domains of life. This data set rigorously defines the phylogenetic distributions of these riboswitch classes and reveals how their gene control mechanisms vary across different microbial groups. By examining the expanded aptamer sequence alignments resulting from these searches, we have also re-evaluated and refined their consensus secondary structures. Updated riboswitch structure models highlight additional RNA structure motifs, including an unusual double T-loop arrangement common to AdoCbl and FMN riboswitch aptamers, and incorporate new, sometimes noncanonical, base-base interactions predicted by a mutual information analysis. CONCLUSION Riboswitches are vital components of many genomes. The additional riboswitch variants and updated aptamer structure models reported here will improve future efforts to annotate these widespread regulatory RNAs in genomic sequences and inform ongoing structural biology efforts. There remain significant questions about what physiological and evolutionary forces influence the distributions and mechanisms of riboswitches and about what forms of regulation substitute for riboswitches that appear to be missing in certain lineages.
Collapse
|
243
|
Sridhar P, Gan HH, Schlick T. A computational screen for C/D box snoRNAs in the human genomic region associated with Prader-Willi and Angelman syndromes. J Biomed Sci 2008; 15:697-705. [PMID: 18661287 DOI: 10.1007/s11373-008-9271-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Accepted: 07/10/2008] [Indexed: 11/29/2022] Open
Abstract
Small nucleolar RNAs (snoRNAs) play a significant role in Prader-Willi Syndrome (PWS) and Angelman Syndrome (AS), which are genomic disorders resulting from deletions in the human chromosomal region 15q11-q13. To identify snoRNAs in the region, our computational study employs key motif features of C/D box snoRNAs and introduces a complementary RNA-RNA hybridization test. We identify three previously unknown methylation guide snoRNAs targeting ribosomal 18S and 28S RNAs, and two snoRNAs targeting serotonin receptor 2C mRNA. We show that the three snoRNA candidates likely possess methylation strands complementary to, and form stable complexes with, human ribosomal RNAs. Our screen also identifies 8 other snoRNA candidates that do not pass the rRNA-complementarity and/or hybridization tests. Two of these candidates have extensive sequence similarity to HBII-52, a snoRNA that regulates the alternative splicing of serotonin receptor 2C mRNA. Six out of our eleven candidate snoRNAs are also predicted by other existing methods.
Collapse
Affiliation(s)
- Padmavati Sridhar
- Department of Chemistry, New York University, 100 Washington Square East, New York, NY 10003, USA
| | | | | |
Collapse
|
244
|
Disney MD, Labuda LP, Paul DJ, Poplawski SG, Pushechnikov A, Tran T, Velagapudi SP, Wu M, Childs-Disney JL. Two-dimensional combinatorial screening identifies specific aminoglycoside-RNA internal loop partners. J Am Chem Soc 2008; 130:11185-94. [PMID: 18652457 DOI: 10.1021/ja803234t] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Herein is described the identification of RNA internal loops that bind to derivatives of neomycin B, neamine, tobramycin, and kanamycin A. RNA loop-ligand partners were identified by a two-dimensional combinatorial screening (2DCS) platform that probes RNA and chemical spaces simultaneously. In 2DCS, an aminoglycoside library immobilized onto an agarose microarray was probed for binding to a 3 x 3 nucleotide RNA internal loop library (81,920 interactions probed in duplicate in a single experiment). RNAs that bound aminoglycosides were harvested from the array via gel excision. RNA internal loop preferences for three aminoglycosides were identified from statistical analysis of selected structures. This provides consensus RNA internal loops that bind these structures and include: loops with potential GA pairs for the neomycin derivative, loops with potential GG pairs for the tobramycin derivative, and pyrimidine-rich loops for the kanamycin A derivative. Results with the neamine derivative show that it binds a variety of loops, including loops that contain potential GA pairs that also recognize the neomycin B derivative. All studied selected internal loops are specific for the aminoglycoside that they were selected to bind. Specificity was quantified for 16 selected internal loops by studying their binding to each of the arrayed aminoglycosides. Specificities ranged from 2- to 80-fold with an average specificity of 20-fold. These studies show that 2DCS is a unique platform to probe RNA and chemical space simultaneously to identify specific RNA motif-ligand interactions.
Collapse
Affiliation(s)
- Matthew D Disney
- Department of Chemistry, University at Buffalo, The State University of New York, and the New York State Center of Excellence in Bioinformatics and Life Sciences, 657 Natural Sciences Complex, Buffalo, New York 14260, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
245
|
Belew AT, Hepler NL, Jacobs JL, Dinman JD. PRFdb: a database of computationally predicted eukaryotic programmed -1 ribosomal frameshift signals. BMC Genomics 2008; 9:339. [PMID: 18637175 PMCID: PMC2483730 DOI: 10.1186/1471-2164-9-339] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2008] [Accepted: 07/17/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Programmed Ribosomal Frameshift Database (PRFdb) provides an interface to help researchers identify potential programmed -1 ribosomal frameshift (-1 PRF) signals in eukaryotic genes or sequences of interest. RESULTS To identify putative -1 PRF signals, sequences are first imported from whole genomes or datasets, e.g. the yeast genome project and mammalian gene collection. They are then filtered through multiple algorithms to identify potential -1 PRF signals as defined by a heptameric slippery site followed by an mRNA pseudoknot. The significance of each candidate -1 PRF signal is evaluated by comparing the predicted thermodynamic stability (DeltaG degrees ) of the native mRNA sequence against a distribution of DeltaG degrees values of a pool of randomized sequences derived from the original. The data have been compiled in a user-friendly, easily searchable relational database. CONCLUSION The PRFdB enables members of the research community to determine whether genes that they are investigating contain potential -1 PRF signals, and can be used as a metasource of information for cross referencing with other databases. It is available on the web at http://dinmanlab.umd.edu/prfdb.
Collapse
Affiliation(s)
- Ashton T Belew
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20854, USA.
| | | | | | | |
Collapse
|
246
|
Lee JH, Culver G, Carpenter S, Dobbs D. Analysis of the EIAV Rev-responsive element (RRE) reveals a conserved RNA motif required for high affinity Rev binding in both HIV-1 and EIAV. PLoS One 2008; 3:e2272. [PMID: 18523581 PMCID: PMC2386976 DOI: 10.1371/journal.pone.0002272] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2008] [Accepted: 04/15/2008] [Indexed: 11/29/2022] Open
Abstract
A cis-acting RNA regulatory element, the Rev-responsive element (RRE), has essential roles in replication of lentiviruses, including human immunodeficiency virus (HIV-1) and equine infection anemia virus (EIAV). The RRE binds the viral trans-acting regulatory protein, Rev, to mediate nucleocytoplasmic transport of incompletely spliced mRNAs encoding viral structural genes and genomic RNA. Because of its potential as a clinical target, RRE-Rev interactions have been well studied in HIV-1; however, detailed molecular structures of Rev-RRE complexes in other lentiviruses are still lacking. In this study, we investigate the secondary structure of the EIAV RRE and interrogate regulatory protein-RNA interactions in EIAV Rev-RRE complexes. Computational prediction and detailed chemical probing and footprinting experiments were used to determine the RNA secondary structure of EIAV RRE-1, a 555 nt region that provides RRE function in vivo. Chemical probing experiments confirmed the presence of several predicted loop and stem-loop structures, which are conserved among 140 EIAV sequence variants. Footprinting experiments revealed that Rev binding induces significant structural rearrangement in two conserved domains characterized by stable stem-loop structures. Rev binding region-1 (RBR-1) corresponds to a genetically-defined Rev binding region that overlaps exon 1 of the EIAV rev gene and contains an exonic splicing enhancer (ESE). RBR-2, characterized for the first time in this study, is required for high affinity binding of EIAV Rev to the RRE. RBR-2 contains an RNA structural motif that is also found within the high affinity Rev binding site in HIV-1 (stem-loop IIB), and within or near mapped RRE regions of four additional lentiviruses. The powerful integration of computational and experimental approaches in this study has generated a validated RNA secondary structure for the EIAV RRE and provided provocative evidence that high affinity Rev binding sites of HIV-1 and EIAV share a conserved RNA structural motif. The presence of this motif in phylogenetically divergent lentiviruses suggests that it may play a role in highly conserved interactions that could be targeted in novel anti-lentiviral therapies.
Collapse
Affiliation(s)
- Jae-Hyung Lee
- Bioinformatics and Computational Biology Program, Department of Genetics, Development and Cell Biology, Iowa State University, Ames, Iowa, United States of America.
| | | | | | | |
Collapse
|
247
|
Shulman-Peleg A, Shatsky M, Nussinov R, Wolfson HJ. Prediction of interacting single-stranded RNA bases by protein-binding patterns. J Mol Biol 2008; 379:299-316. [PMID: 18452949 PMCID: PMC2429989 DOI: 10.1016/j.jmb.2008.03.043] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2007] [Revised: 02/15/2008] [Accepted: 03/17/2008] [Indexed: 11/18/2022]
Abstract
Prediction of protein-RNA interactions at the atomic level of detail is crucial for our ability to understand and interfere with processes such as gene expression and regulation. Here, we investigate protein binding pockets that accommodate extruded nucleotides not involved in RNA base pairing. We observed that most of the protein-interacting nucleotides are part of a consecutive fragment of at least two nucleotides whose rings have significant interactions with the protein. Many of these share the same protein binding cavity and more than 30% of such pairs are pi-stacked. Since these local geometries cannot be inferred from the nucleotide identities, we present a novel framework for their prediction from the properties of protein binding sites. First, we present a classification of known RNA nucleotide and dinucleotide protein binding sites and identify the common types of shared 3-D physicochemical binding patterns. These are recognized by a new classification methodology that is based on spatial multiple alignment. The shared patterns reveal novel similarities between dinucleotide binding sites of proteins with different overall sequences, folds and functions. Given a protein structure, we use these patterns for the prediction of its RNA dinucleotide binding sites. Based on the binding modes of these nucleotides, we further predict an RNA fragment that interacts with those protein binding sites. With these knowledge-based predictions, we construct an RNA fragment that can have a previously unknown sequence and structure. In addition, we provide a drug design application in which the database of all known small-molecule binding sites is searched for regions similar to nucleotide and dinucleotide binding patterns, suggesting new fragments and scaffolds that can target them.
Collapse
Affiliation(s)
- Alexandra Shulman-Peleg
- School of Computer Science, Beverly and Raymond Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| | | | | | | |
Collapse
|
248
|
Stormo GD. An overview of RNA structure prediction and applications to RNA gene prediction and RNAi design. ACTA ACUST UNITED AC 2008; Chapter 12:Unit 12.1. [PMID: 18428758 DOI: 10.1002/0471250953.bi1201s13] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
This unit briefly describes the two fundamentally different methods for predicting RNA structures. The first is to find that structure with the minimum free energy of folding, as predicted by various thermodynamic parameters related to base-pair stacking, loop lengths, and other features. If one has only a single sequence, this thermodynamic approach is the best available method. The second fundamental approach to RNA structure prediction is to use multiple, homologous sequences for which one can infer a common structure, and then try and predict a structure common to all of the sequences. Such an approach is referred to as a comparative method or phylogenetic method of RNA structure prediction.
Collapse
Affiliation(s)
- Gary D Stormo
- Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
249
|
Iben JR, Draper DE. Specific interactions of the L10(L12)4 ribosomal protein complex with mRNA, rRNA, and L11. Biochemistry 2008; 47:2721-31. [PMID: 18247578 DOI: 10.1021/bi701838y] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Large ribosomal subunit proteins L10 and L12 form a pentameric protein complex, L10(L12) 4, that is intimately involved in the ribosome elongation cycle. Its contacts with rRNA or other ribosomal proteins have been only partially resolved by crystallography. In Escherichia coli, L10 and L12 are encoded from a single operon for which L10(L12) 4 is a translational repressor that recognizes a secondary structure in the mRNA leader. In this study, L10(L12) 4 was expressed from the moderate thermophile Bacillus stearothermophilus to quantitatively compare strategies for binding of the complex to mRNA and ribosome targets. The minimal mRNA recognition structure is widely distributed among bacteria and has the potential to form a kink-turn structure similar to one identified in the rRNA as part of the L10(L12) 4 binding site. Mutations in equivalent positions between the two sequences have similar effects on L10(L12) 4-RNA binding affinity and identify the kink-turn motif and a loop AA sequence as important recognition elements. In contrast to the larger rRNA structure, the mRNA apparently positions the kink-turn motif and loop for protein recognition without the benefit of Mg (2+)-dependent tertiary structure. The mRNA and rRNA fragments bind L10(L12) 4 with similar affinity ( approximately 10 (8) M (-1)), but fluorescence binding studies show that a nearby protein in the ribosome, L11, enhances L10(L12) 4 binding approximately 100-fold. Thus, mRNA and ribosome targets use similar RNA features, held in different structural contexts, to recognize L10(L12) 4, and the ribosome ensures the saturation of its L10(L12) 4 binding site by means of an additional protein-protein interaction.
Collapse
Affiliation(s)
- James R Iben
- Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | | |
Collapse
|
250
|
Veksler-Lublinsky I, Ziv-Ukelson M, Barash D, Kedem K. A structure-based flexible search method for motifs in RNA. J Comput Biol 2008; 14:908-26. [PMID: 17803370 DOI: 10.1089/cmb.2007.0061] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The discovery of non-coding RNA (ncRNA) motifs and their role in regulating gene expression has recently attracted considerable attention. The goal is to discover these motifs in a sequence database. Current RNA motif search methods start from the primary sequence and only then take into account secondary structure considerations. One can think of developing a flexible structure-based motif search method that will filter datasets based on secondary structure first, while allowing extensive primary sequence factors and additional factors such as potential pseudoknots as constraints. Since different motifs vary in structure rigidity and in local sequence constraints, there is a need for algorithms and tools that can be fine-tuned according to the searched RNA motif, but differ in their approach from the RNAMotif descriptor language. We present an RNA motif search tool called STRMS (Structural RNA Motif Search), which takes as input the secondary structure of the query, including local sequence and structure constraints, and a target sequence database. It reports all occurrences of the query in the target, ranked by their similarity to the query, and produces an html file that displays graphical images of the predicted structures for both the query and the candidate hits. Our tool is flexible and takes into account a large number of sequence options and existence of potential pseudoknots as dictated by specific queries. Our approach combines pre-folding and an O(m n) RNA pattern matching algorithm based on subtree homeomorphism for ordered, rooted trees. An O(n(2) log n) extension is described that allows the search engine to take into account the pseudoknots typical to riboswitches. We employed STRMS in search for both new and known RNA motifs (riboswitches and tRNAs) in large target databases. Our results point to a number of additional purine bacterial riboswitch candidates in newly sequenced bacteria, and demonstrate high sensitivity on known riboswitches and tRNAs. Code and data are available at www.cs.bgu.ac.il/vaksler/STRMS.
Collapse
|