1
|
Ledda M, Aviran S. PATTERNA: transcriptome-wide search for functional RNA elements via structural data signatures. Genome Biol 2018; 19:28. [PMID: 29495968 PMCID: PMC5833111 DOI: 10.1186/s13059-018-1399-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Accepted: 01/30/2018] [Indexed: 02/08/2023] Open
Abstract
Establishing a link between RNA structure and function remains a great challenge in RNA biology. The emergence of high-throughput structure profiling experiments is revolutionizing our ability to decipher structure, yet principled approaches for extracting information on structural elements directly from these data sets are lacking. We present PATTERNA, an unsupervised pattern recognition algorithm that rapidly mines RNA structure motifs from profiling data. We demonstrate that PATTERNA detects motifs with an accuracy comparable to commonly used thermodynamic models and highlight its utility in automating data-directed structure modeling from large data sets. PATTERNA is versatile and compatible with diverse profiling techniques and experimental conditions.
Collapse
Affiliation(s)
- Mirko Ledda
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
- Integrative Genetics and Genomics Graduate Group, UC Davis, 1 Shields Ave, Davis, 95616 USA
| | - Sharon Aviran
- Department of Biomedical Engineering and Genome Center, UC Davis, 1 Shields Ave, Davis, 95616 USA
| |
Collapse
|
2
|
Drory Retwitzer M, Kifer I, Sengupta S, Yakhini Z, Barash D. An Efficient Minimum Free Energy Structure-Based Search Method for Riboswitch Identification Based on Inverse RNA Folding. PLoS One 2015; 10:e0134262. [PMID: 26230932 PMCID: PMC4521916 DOI: 10.1371/journal.pone.0134262] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 07/07/2015] [Indexed: 11/22/2022] Open
Abstract
Riboswitches are RNA genetic control elements that were originally discovered in bacteria and provide a unique mechanism of gene regulation. They work without the participation of proteins and are believed to represent ancient regulatory systems in the evolutionary timescale. One of the biggest challenges in riboswitch research is to find additional eukaryotic riboswitches since more than 20 riboswitch classes have been found in prokaryotes but only one class has been found in eukaryotes. Moreover, this single known class of eukaryotic riboswitch, namely the TPP riboswitch class, has been found in bacteria, archaea, fungi and plants but not in animals. The few examples of eukaryotic riboswitches were identified using sequence-based bioinformatics search methods such as a combination of BLAST and pattern matching techniques that incorporate base-pairing considerations. None of these approaches perform energy minimization structure predictions. There is a clear motivation to develop new bioinformatics methods, aside of the ongoing advances in covariance models, that will sample the sequence search space more flexibly using structural guidance while retaining the computational efficiency of sequence-based methods. We present a new energy minimization approach that transforms structure-based search into a sequence-based search, thereby enabling the utilization of well established sequence-based search utilities such as BLAST and FASTA. The transformation to sequence space is obtained by using an extended inverse RNA folding problem solver with sequence and structure constraints, available within RNAfbinv. Examples in applying the new method are presented for the purine and preQ1 riboswitches. The method is described in detail along with its findings in prokaryotes. Potential uses in finding novel eukaryotic riboswitches and optimizing pre-designed synthetic riboswitches based on ligand simulations are discussed. The method components are freely available for use.
Collapse
Affiliation(s)
| | - Ilona Kifer
- Agilent Laboratories, Tel Aviv, Israel; Microsoft R&D Center, Herzliya, Israel
| | - Supratim Sengupta
- Department of Physical Sciences, Indian Institute of Science Education and Research Kolkata, Mohanpur, 741246, India
| | - Zohar Yakhini
- Agilent Laboratories, Tel Aviv, Israel; Laboratory of Computational Biology, Computer Science Department, Israel Institute of Technology, Haifa, 32000, Israel
| | - Danny Barash
- Department of Computer Science, Ben-Gurion University, Beer-Sheva, 84105, Israel
| |
Collapse
|
3
|
Gawronski AR, Turcotte M. RiboFSM: frequent subgraph mining for the discovery of RNA structures and interactions. BMC Bioinformatics 2014; 15 Suppl 13:S2. [PMID: 25434643 PMCID: PMC4248650 DOI: 10.1186/1471-2105-15-s13-s2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Frequent subgraph mining is a useful method for extracting meaningful patterns from a set of graphs or a single large graph. Here, the graph represents all possible RNA structures and interactions. Patterns that are significantly more frequent in this graph over a random graph are extracted. We hypothesize that these patterns are most likely to represent biological mechanisms. The graph representation used is a directed dual graph, extended to handle intermolecular interactions. The graph is sampled for subgraphs, which are labeled using a canonical labeling method and counted. The resulting patterns are compared to those created from a randomized dataset and scored. The algorithm was applied to the mitochondrial genome of the kinetoplastid species Trypanosoma brucei, which has a unique RNA editing mechanism. The most significant patterns contain two stem-loops, indicative of gRNA, and represent interactions of these structures with target mRNA.
Collapse
|
4
|
Havill JT, Bhatiya C, Johnson SM, Sheets JD, Thompson JS. A new approach for detecting riboswitches in DNA sequences. ACTA ACUST UNITED AC 2014; 30:3012-9. [PMID: 25015992 DOI: 10.1093/bioinformatics/btu479] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
MOTIVATION Riboswitches are short sequences of messenger RNA that can change their structural conformation to regulate the expression of adjacent genes. Computational prediction of putative riboswitches can provide direction to molecular biologists studying riboswitch-mediated gene expression. RESULTS The Denison Riboswitch Detector (DRD) is a new computational tool with a Web interface that can quickly identify putative riboswitches in DNA sequences on the scale of bacterial genomes. Riboswitch descriptions are easily modifiable and new ones are easily created. The underlying algorithm converts the problem to a 'heaviest path' problem on a multipartite graph, which is then solved using efficient dynamic programming. We show that DRD can achieve ∼ 88-99% sensitivity and >99.99% specificity on 13 riboswitch families. AVAILABILITY AND IMPLEMENTATION DRD is available at http://drd.denison.edu.
Collapse
Affiliation(s)
- Jessen T Havill
- Department of Mathematics and Computer Science, Department of Biology, Denison University, Granville, OH 43023, Capco, New York, NY, 10005, Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109 and College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
| | - Chinmoy Bhatiya
- Department of Mathematics and Computer Science, Department of Biology, Denison University, Granville, OH 43023, Capco, New York, NY, 10005, Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109 and College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
| | - Steven M Johnson
- Department of Mathematics and Computer Science, Department of Biology, Denison University, Granville, OH 43023, Capco, New York, NY, 10005, Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109 and College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
| | - Joseph D Sheets
- Department of Mathematics and Computer Science, Department of Biology, Denison University, Granville, OH 43023, Capco, New York, NY, 10005, Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109 and College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
| | - Jeffrey S Thompson
- Department of Mathematics and Computer Science, Department of Biology, Denison University, Granville, OH 43023, Capco, New York, NY, 10005, Department of Computer Science, Wake Forest University, Winston-Salem, NC 27109 and College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
| |
Collapse
|
5
|
Abstract
Like protein coding sequences, functional motifs in RNA elements are frequently conserved, but this conservation is most often at the structure level rather than sequence based. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the nature of RNA-protein interactions. The discovery of elements targeted by RNA-binding proteins and how they function remains one of the most active, yet elusive areas of RNA biology. Only a limited number of these elements have been well characterized with many of the fundamental rules yet to be discovered. Here we present a comprehensive list of web based resources that can be used in the study and identification of RNA-based structural and regulatory motifs and provide a survey of the informatic resources that can have been developed to facilitate this research.
Collapse
Affiliation(s)
- Ajish D George
- Department of Biomedical Sciences, School of Public Health, Gen∗NY∗Sis Center for Excellence in Cancer Genomics, University at Albany-SUNY, Rensselaer, NY, USA.
| | | |
Collapse
|
6
|
Riccitelli NJ, Lupták A. Computational discovery of folded RNA domains in genomes and in vitro selected libraries. Methods 2010; 52:133-40. [PMID: 20554049 DOI: 10.1016/j.ymeth.2010.06.005] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 06/03/2010] [Indexed: 10/19/2022] Open
Abstract
Structured functional RNAs are conserved on the level of secondary and tertiary structure, rather than at sequence level, and so traditional sequence-based searches often fail to identify them. Structure-based searches are increasingly used to discover known RNA motifs in sequence databases. We describe the application of the program RNABOB, which performs such searches by allowing the user to define a desired motif's sequence, paired and spacer elements and then scans a sequence file for regions capable of assuming the prescribed fold. Structure descriptors of stem-loops, internal loops, three-way junctions, kissing loops, and the hammerhead and hepatitis delta virus ribozymes are shown as examples of implementation of structure-based searches.
Collapse
|
7
|
Ivry T, Michal S, Avihoo A, Sapiro G, Barash D. An image processing approach to computing distances between RNA secondary structures dot plots. Algorithms Mol Biol 2009; 4:4. [PMID: 19203377 PMCID: PMC2677394 DOI: 10.1186/1748-7188-4-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2007] [Accepted: 02/09/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computing the distance between two RNA secondary structures can contribute in understanding the functional relationship between them. When used repeatedly, such a procedure may lead to finding a query RNA structure of interest in a database of structures. Several methods are available for computing distances between RNAs represented as strings or graphs, but none utilize the RNA representation with dot plots. Since dot plots are essentially digital images, there is a clear motivation to devise an algorithm for computing the distance between dot plots based on image processing methods. RESULTS We have developed a new metric dubbed 'DoPloCompare', which compares two RNA structures. The method is based on comparing dot plot diagrams that represent the secondary structures. When analyzing two diagrams and motivated by image processing, the distance is based on a combination of histogram correlations and a geometrical distance measure. We introduce, describe, and illustrate the procedure by two applications that utilize this metric on RNA sequences. The first application is the RNA design problem, where the goal is to find the nucleotide sequence for a given secondary structure. Examples where our proposed distance measure outperforms others are given. The second application locates peculiar point mutations that induce significant structural alternations relative to the wild type predicted secondary structure. The approach reported in the past to solve this problem was tested on several RNA sequences with known secondary structures to affirm their prediction, as well as on a data set of ribosomal pieces. These pieces were computationally cut from a ribosome for which an experimentally derived secondary structure is available, and on each piece the prediction conveys similarity to the experimental result. Our newly proposed distance measure shows benefit in this problem as well when compared to standard methods used for assessing the distance similarity between two RNA secondary structures. CONCLUSION Inspired by image processing and the dot plot representation for RNA secondary structure, we have managed to provide a conceptually new and potentially beneficial metric for comparing two RNA secondary structures. We illustrated our approach on the RNA design problem, as well as on an application that utilizes the distance measure to detect conformational rearranging point mutations in an RNA sequence.
Collapse
|
8
|
Informatic resources for identifying and annotating structural RNA motifs. Mol Biotechnol 2008; 41:180-93. [PMID: 18979204 DOI: 10.1007/s12033-008-9114-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Accepted: 10/01/2008] [Indexed: 10/21/2022]
Abstract
Post-transcriptional regulation of genes and transcripts is a vital aspect of cellular processes, and unlike transcriptional regulation, remains a largely unexplored domain. One of the most obvious and most important questions to explore is the discovery of functional RNA elements. Many RNA elements have been characterized to date ranging from cis-regulatory motifs within mRNAs to large families of non-coding RNAs. Like protein coding genes, the functional motifs of these RNA elements are highly conserved, but unlike protein coding genes, it is most often the structure and not the sequence that is conserved. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the post-transcriptional aspects of the genomic world. Here, we focus on the task of structural motif discovery and provide a survey of the informatics resources geared towards this task.
Collapse
|