1
|
Tang M, Hwang K, Kang SH. StemP: A Fast and Deterministic Stem-Graph Approach for RNA Secondary Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3278-3291. [PMID: 37028040 DOI: 10.1109/tcbb.2023.3253049] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
We propose a new deterministic methodology to predict the secondary structure of RNA sequences. What information of stem is important for structure prediction, and is it enough ? The proposed simple deterministic algorithm uses minimum stem length, Stem-Loop score, and co-existence of stems, to give good structure predictions for short RNA and tRNA sequences. The main idea is to consider all possible stem with certain stem loop energy and strength to predict RNA secondary structure. We use graph notation, where stems are represented as vertexes, and co-existence between stems as edges. This full Stem-graph presents all possible folding structure, and we pick sub-graph(s) which give the best matching energy for structure prediction. Stem-Loop score adds structure information and speeds up the computation. The proposed method can predict secondary structure even with pseudo knots. One of the strengths of this approach is the simplicity and flexibility of the algorithm, and it gives a deterministic answer. Numerical experiments are done on various sequences from Protein Data Bank and the Gutell Lab using a laptop and results take only a few seconds.
Collapse
|
2
|
Chiu JKH, Dillon TS, Chen YPP. Large-scale frequent stem pattern mining in RNA families. J Theor Biol 2018; 455:131-139. [PMID: 30036526 DOI: 10.1016/j.jtbi.2018.07.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 07/09/2018] [Accepted: 07/11/2018] [Indexed: 11/19/2022]
Abstract
Functionally similar non-coding RNAs are expected to be similar in certain regions of their secondary structures. These similar regions are called common structure motifs, and are structurally conserved throughout evolution to maintain their functional roles. Common structure motif identification is one of the critical tasks in RNA secondary structure analysis. Nevertheless, current approaches suffer several limitations, and/or do not scale with both structure size and the number of input secondary structures. In this work, we present a method to transform the conserved base pair stems into transaction items and apply frequent itemset mining to identify common structure motifs existing in a majority of input structures. Our experimental results on telomerase and ribosomal RNA secondary structures report frequent stem patterns that are of biological significance. Moreover, the algorithms utilized in our method are scalable and frequent stem patterns can be identified efficiently among many large structures.
Collapse
Affiliation(s)
- Jimmy Ka Ho Chiu
- Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia.
| | - Tharam S Dillon
- Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia.
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information, Technology, La Trobe University, Melbourne VIC 3086, Australia.
| |
Collapse
|
3
|
Chen Q, Lan C, Chen B, Wang L, Li J, Zhang C. Exploring Consensus RNA Substructural Patterns Using Subgraph Mining. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1134-1146. [PMID: 28026781 DOI: 10.1109/tcbb.2016.2645202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Frequently recurring RNA structural motifs play important roles in RNA folding process and interaction with other molecules. Traditional index-based and shape-based schemas are useful in modeling RNA secondary structures but ignore the structural discrepancy of individual RNA family member. Further, the in-depth analysis of underlying substructure pattern is insufficient due to varied and unnormalized substructure data. This prevents us from understanding RNAs functions and their inherent synergistic regulation networks. This article thus proposes a novel labeled graph-based algorithm RnaGraph to uncover frequently RNA substructure patterns. Attribute data and graph data are combined to characterize diverse substructures and their correlations, respectively. Further, a top-k graph pattern mining algorithm is developed to extract interesting substructure motifs by integrating frequency and similarity. The experimental results show that our methods assist in not only modelling complex RNA secondary structures but also identifying hidden but interesting RNA substructure patterns.
Collapse
|
4
|
Li J, Xu C, Liang H, Cong W, Wang Y, Luan K, Liu Y. RGRNA: prediction of RNA secondary structure based on replacement and growth of stems. Comput Methods Biomech Biomed Engin 2017; 20:1261-1272. [DOI: 10.1080/10255842.2017.1340460] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Jin Li
- College of Automation, Harbin Engineering University, Harbin, China
| | - Chengzhen Xu
- College of Automation, Harbin Engineering University, Harbin, China
| | - Hong Liang
- College of Automation, Harbin Engineering University, Harbin, China
| | - Wang Cong
- College of Automation, Harbin Engineering University, Harbin, China
| | - Ying Wang
- College of Automation, Harbin Engineering University, Harbin, China
| | - Kuan Luan
- College of Automation, Harbin Engineering University, Harbin, China
| | - Yunlong Liu
- College of Automation, Harbin Engineering University, Harbin, China
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
5
|
Chiu JKH, Chen YPP. A comprehensive study of RNA secondary structure alignment algorithms. Brief Bioinform 2017; 18:291-305. [PMID: 26984617 DOI: 10.1093/bib/bbw009] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Indexed: 01/04/2023] Open
Abstract
RNA secondary structure alignment has received more attention since the discovery of the structure-function relationships in some non-protein-encoding RNAs. However, unlike the pure sequence alignment problem, which has been solved in polynomial time, secondary structure alignment incorporates the base pairings as another information dimension in addition to the base sequence. This problem therefore becomes more challenging. In this study, we classify the selected approaches, and algorithmically illustrate how these methods address the alignment problems with different structure types. Other features such as the types of base pair edit operations supported and the time complexity are also compared.
Collapse
Affiliation(s)
- Jimmy Ka Ho Chiu
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria, Australia
| |
Collapse
|
6
|
Pietrosanto M, Mattei E, Helmer-Citterich M, Ferrè F. A novel method for the identification of conserved structural patterns in RNA: From small scale to high-throughput applications. Nucleic Acids Res 2016; 44:8600-8609. [PMID: 27580722 PMCID: PMC5062999 DOI: 10.1093/nar/gkw750] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 08/17/2016] [Indexed: 12/21/2022] Open
Abstract
Functional RNA regions are often related to recurrent secondary structure patterns (or motifs), which can exert their role in several different ways, particularly in dictating the interaction with RNA-binding proteins, and acting in the regulation of a large number of cellular processes. Among the available motif-finding tools, the majority focuses on sequence patterns, sometimes including secondary structure as additional constraints to improve their performance. Nonetheless, secondary structures motifs may be concurrent to their sequence counterparts or even encode a stronger functional signal. Current methods for searching structural motifs generally require long pipelines and/or high computational efforts or previously aligned sequences. Here, we present BEAM (BEAr Motif finder), a novel method for structural motif discovery from a set of unaligned RNAs, taking advantage of a recently developed encoding for RNA secondary structure named BEAR (Brand nEw Alphabet for RNAs) and of evolutionary substitution rates of secondary structure elements. Tested in a varied set of scenarios, from small- to large-scale, BEAM is successful in retrieving structural motifs even in highly noisy data sets, such as those that can arise in CLIP-Seq or other high-throughput experiments.
Collapse
Affiliation(s)
- Marco Pietrosanto
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | - Eugenio Mattei
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | - Manuela Helmer-Citterich
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | - Fabrizio Ferrè
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna Alma Mater, Via Belmeloro 8/2, 40126 Bologna, Italy
| |
Collapse
|
7
|
Baba N, Elmetwaly S, Kim N, Schlick T. Predicting Large RNA-Like Topologies by a Knowledge-Based Clustering Approach. J Mol Biol 2015; 428:811-821. [PMID: 26478223 DOI: 10.1016/j.jmb.2015.10.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 10/06/2015] [Indexed: 11/19/2022]
Abstract
An analysis and expansion of our resource for classifying, predicting, and designing RNA structures, RAG (RNA-As-Graphs), is presented, with the goal of understanding features of RNA-like and non-RNA-like motifs and exploiting this information for RNA design. RAG was first reported in 2004 for cataloging RNA secondary structure motifs using graph representations. In 2011, the RAG resource was updated with the increased availability of RNA structures and was improved by utilities for analyzing RNA structures, including substructuring and search tools. We also classified RNA structures as graphs up to 10 vertices (~200 nucleotides) into three classes: existing, RNA-like, and non-RNA-like using clustering approaches. Here, we focus on the tree graphs and evaluate the newly founded RNAs since 2011, which also support our refined predictions of RNA-like motifs. We expand the RAG resource for large tree graphs up to 13 vertices (~260 nucleotides), thereby cataloging more than 10 times as many secondary structures. We apply clustering algorithms based on features of RNA secondary structures translated from known tertiary structures to suggest which hypothetical large RNA motifs can be considered "RNA-like". The results by the PAM (Partitioning Around Medoids) approach, in particular, reveal good accuracy, with small error for the largest cases. The RAG update here up to 13 vertices offers a useful graph-based tool for exploring RNA motifs and suggesting large RNA motifs for design.
Collapse
Affiliation(s)
- Naoto Baba
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA; Department of Chemistry, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan
| | - Shereef Elmetwaly
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | - Namhee Kim
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | - Tamar Schlick
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA; NYU-ECNU Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Road North, Shanghai, 200062, China.
| |
Collapse
|
8
|
Abstract
Genomic studies have greatly expanded our knowledge of structural non-coding RNAs (ncRNAs). These RNAs fold into characteristic secondary structures and perform specific-structure dependent biological functions. Hence RNA secondary structure prediction is one of the most well studied problems in computational RNA biology. Comparative sequence analysis is one of the more reliable RNA structure prediction approaches as it exploits information of multiple related sequences to infer the consensus secondary structure. This class of methods essentially learns a global secondary structure from the input sequences. In this paper, we consider the more general problem of unearthing common local secondary structure based patterns from a set of related sequences. The input sequences for example could correspond to 3(') or 5(') untranslated regions of a set of orthologous genes and the unearthed local patterns could correspond to regulatory motifs found in these regions. These sequences could also correspond to in vitro selected RNA, genomic segments housing ncRNA genes from the same family and so on. Here, we give a detailed review of the various computational techniques proposed in literature attempting to solve this general motif discovery problem. We also give empirical comparisons of some of the current state of the art methods and point out future directions of research.
Collapse
Affiliation(s)
- Avinash Achar
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Pål Sætrom
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway.
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
9
|
Chiu JKH, Chen YPP. Pairwise RNA secondary structure alignment with conserved stem pattern. Bioinformatics 2015; 31:3914-21. [PMID: 26275897 DOI: 10.1093/bioinformatics/btv471] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2014] [Accepted: 08/07/2015] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION The regulatory functions performed by non-coding RNAs are related to their 3D structures, which are, in turn, determined by their secondary structures. Pairwise secondary structure alignment gives insight into the functional similarity between a pair of RNA sequences. Numerous exact or heuristic approaches have been proposed for computational alignment. However, the alignment becomes intractable when arbitrary pseudoknots are allowed. Also, since non-coding RNAs are, in general, more conserved in structures than sequences, it is more effective to perform alignment based on the common structural motifs discovered. RESULTS We devised a method to approximate the true conserved stem pattern for a secondary structure pair, and constructed the alignment from it. Experimental results suggest that our method identified similar RNA secondary structures better than the existing tools, especially for large structures. It also successfully indicated the conservation of some pseudoknot features with biological significance. More importantly, even for large structures with arbitrary pseudoknots, the alignment can usually be obtained efficiently. AVAILABILITY AND IMPLEMENTATION Our algorithm has been implemented in a tool called PSMAlign. The source code of PSMAlign is freely available at http://homepage.cs.latrobe.edu.au/ypchen/psmalign/.
Collapse
Affiliation(s)
- Jimmy Ka Ho Chiu
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Victoria 3086, Australia
| |
Collapse
|
10
|
Chiu JKH, Chen YPP. Efficient conversion of RNA pseudoknots to knot-free structures using a graphical model. IEEE Trans Biomed Eng 2014; 62:1265-71. [PMID: 25474805 DOI: 10.1109/tbme.2014.2375360] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
RNA secondary structures are vital in determining the 3-D structures of noncoding RNA molecules, which in turn affect their functions. Computational RNA secondary structure alignment and analysis are biologically significant, because they help identify numerous functionally important motifs. Unfortunately, many analysis methods suffer from computational intractability in the presence of pseudoknots. The conversion of knotted to knot-free secondary structures is an essential preprocessing step, and is regarded as pseudoknot removal. Although exact methods have been proposed for this task, their computational complexities are undetermined, and so their efficiencies in processing complex pseudoknots are currently unknown. We transformed the pseudoknot removal problem into a circle graph maximum weight independent set (MWIS) problem, in which each MWIS represents a unique optimal deknotted structure. An existing circle graph MWIS algorithm was extended to report either single or all solutions. Its time complexity depends on the number of MWISs, and is guaranteed to report one solution in polynomial time. Experimental results suggest that our extended algorithm is much more efficient than the state-of-the-art tool. We also devised a novel concept called the structural scoring function, and investigated its effectiveness in more accurate solution candidate selection for a certain criteria.
Collapse
|
11
|
Gawronski AR, Turcotte M. RiboFSM: frequent subgraph mining for the discovery of RNA structures and interactions. BMC Bioinformatics 2014; 15 Suppl 13:S2. [PMID: 25434643 PMCID: PMC4248650 DOI: 10.1186/1471-2105-15-s13-s2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Frequent subgraph mining is a useful method for extracting meaningful patterns from a set of graphs or a single large graph. Here, the graph represents all possible RNA structures and interactions. Patterns that are significantly more frequent in this graph over a random graph are extracted. We hypothesize that these patterns are most likely to represent biological mechanisms. The graph representation used is a directed dual graph, extended to handle intermolecular interactions. The graph is sampled for subgraphs, which are labeled using a canonical labeling method and counted. The resulting patterns are compared to those created from a randomized dataset and scored. The algorithm was applied to the mitochondrial genome of the kinetoplastid species Trypanosoma brucei, which has a unique RNA editing mechanism. The most significant patterns contain two stem-loops, indicative of gRNA, and represent interactions of these structures with target mRNA.
Collapse
|
12
|
Badr G, Al-Turaiki I, Turcotte M, Mathkour H. IncMD: incremental trie-based structural motif discovery algorithm. J Bioinform Comput Biol 2014; 12:1450027. [PMID: 25362841 DOI: 10.1142/s0219720014500279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The discovery of common RNA secondary structure motifs is an important problem in bioinformatics. The presence of such motifs is usually associated with key biological functions. However, the identification of structural motifs is far from easy. Unlike motifs in sequences, which have conserved bases, structural motifs have common structure arrangements even if the underlying sequences are different. Over the past few years, hundreds of algorithms have been published for the discovery of sequential motifs, while less work has been done for the structural motifs case. Current structural motif discovery algorithms are limited in terms of accuracy and scalability. In this paper, we present an incremental and scalable algorithm for discovering RNA secondary structure motifs, namely IncMD. We consider the structural motif discovery as a frequent pattern mining problem and tackle it using a modified a priori algorithm. IncMD uses data structures, trie-based linked lists of prefixes (LLP), to accelerate the search and retrieval of patterns, support counting, and candidate generation. We modify the candidate generation step in order to adapt it to the RNA secondary structure representation. IncMD constructs the frequent patterns incrementally from RNA secondary structure basic elements, using nesting and joining operations. The notion of a motif group is introduced in order to simulate an alignment of motifs that only differ in the number of unpaired bases. In addition, we use a cluster beam approach to select motifs that will survive to the next iterations of the search. Results indicate that IncMD can perform better than some of the available structural motif discovery algorithms in terms of sensitivity (Sn), positive predictive value (PPV), and specificity (Sp). The empirical results also show that the algorithm is scalable and runs faster than all of the compared algorithms.
Collapse
Affiliation(s)
- Ghada Badr
- College of Computer and Information Sciences, King Saud University, Riyadh, Kingdom of Saudi Arabia , IRI - The City of Scientific Research and Technological Applications, University and Research District, P. O. 21934, New Borg Alarab, Alexandria, Egypt
| | | | | | | |
Collapse
|
13
|
RNA graph partitioning for the discovery of RNA modularity: a novel application of graph partition algorithm to biology. PLoS One 2014; 9:e106074. [PMID: 25188578 PMCID: PMC4154854 DOI: 10.1371/journal.pone.0106074] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2014] [Accepted: 07/31/2014] [Indexed: 11/19/2022] Open
Abstract
Graph representations have been widely used to analyze and design various economic, social, military, political, and biological networks. In systems biology, networks of cells and organs are useful for understanding disease and medical treatments and, in structural biology, structures of molecules can be described, including RNA structures. In our RNA-As-Graphs (RAG) framework, we represent RNA structures as tree graphs by translating unpaired regions into vertices and helices into edges. Here we explore the modularity of RNA structures by applying graph partitioning known in graph theory to divide an RNA graph into subgraphs. To our knowledge, this is the first application of graph partitioning to biology, and the results suggest a systematic approach for modular design in general. The graph partitioning algorithms utilize mathematical properties of the Laplacian eigenvector (µ2) corresponding to the second eigenvalues (λ2) associated with the topology matrix defining the graph: λ2 describes the overall topology, and the sum of µ2's components is zero. The three types of algorithms, termed median, sign, and gap cuts, divide a graph by determining nodes of cut by median, zero, and largest gap of µ2's components, respectively. We apply these algorithms to 45 graphs corresponding to all solved RNA structures up through 11 vertices (∼ 220 nucleotides). While we observe that the median cut divides a graph into two similar-sized subgraphs, the sign and gap cuts partition a graph into two topologically-distinct subgraphs. We find that the gap cut produces the best biologically-relevant partitioning for RNA because it divides RNAs at less stable connections while maintaining junctions intact. The iterative gap cuts suggest basic modules and assembly protocols to design large RNA structures. Our graph substructuring thus suggests a systematic approach to explore the modularity of biological networks. In our applications to RNA structures, subgraphs also suggest design strategies for novel RNA motifs.
Collapse
|
14
|
|
15
|
Badr G, Al-Turaiki I, Mathkour H. Classification and assessment tools for structural motif discovery algorithms. BMC Bioinformatics 2013; 14 Suppl 9:S4. [PMID: 23902564 PMCID: PMC3698030 DOI: 10.1186/1471-2105-14-s9-s4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case. METHODS In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery. RESULTS Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures. CONCLUSIONS We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.
Collapse
Affiliation(s)
- Ghada Badr
- King Saud University, College of Computer and Information Sciences, Riyadh, Kingdom of Saudi Arabia
- IRI - The City of Scientific Research and Technological Applications, University and Research District, P.O. 21934 New Borg Alarab, Alexandria, Egypt
| | - Isra Al-Turaiki
- King Saud University, College of Computer and Information Sciences, Riyadh, Kingdom of Saudi Arabia
| | - Hassan Mathkour
- King Saud University, College of Computer and Information Sciences, Riyadh, Kingdom of Saudi Arabia
| |
Collapse
|
16
|
Hamada M. Direct updating of an RNA base-pairing probability matrix with marginal probability constraints. J Comput Biol 2013; 19:1265-76. [PMID: 23210474 DOI: 10.1089/cmb.2012.0215] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
A base-pairing probability matrix (BPPM) stores the probabilities for every possible base pair in an RNA sequence and has been used in many algorithms in RNA informatics (e.g., RNA secondary structure prediction and motif search). In this study, we propose a novel algorithm to perform iterative updates of a given BPPM, satisfying marginal probability constraints that are (approximately) given by recently developed biochemical experiments, such as SHAPE, PAR, and FragSeq. The method is easily implemented and is applicable to common models for RNA secondary structures, such as energy-based or machine-learning-based models. In this article, we focus mainly on the details of the algorithms, although preliminary computational experiments will also be presented.
Collapse
Affiliation(s)
- Michiaki Hamada
- The University of Tokyo, Graduate School of Frontier Science, Kashiwa, Japan.
| |
Collapse
|
17
|
Izzo JA, Kim N, Elmetwaly S, Schlick T. RAG: an update to the RNA-As-Graphs resource. BMC Bioinformatics 2011; 12:219. [PMID: 21627789 PMCID: PMC3123240 DOI: 10.1186/1471-2105-12-219] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 05/31/2011] [Indexed: 02/08/2023] Open
Abstract
Background In 2004, we presented a web resource for stimulating the search for novel RNAs, RNA-As-Graphs (RAG), which classified, catalogued, and predicted RNA secondary structure motifs using clustering and build-up approaches. With the increased availability of secondary structures in recent years, we update the RAG resource and provide various improvements for analyzing RNA structures. Description Our RAG update includes a new supervised clustering algorithm that can suggest RNA motifs that may be "RNA-like". We use this utility to describe RNA motifs as three classes: existing, RNA-like, and non-RNA-like. This produces 126 tree and 16,658 dual graphs as candidate RNA-like topologies using the supervised clustering algorithm with existing RNAs serving as the training data. A comparison of this clustering approach to an earlier method shows considerable improvements. Additional RAG features include greatly expanded search capabilities, an interface to better utilize the benefits of relational database, and improvements to several of the utilities such as directed/labeled graphs and a subgraph search program. Conclusions The RAG updates presented here augment the database's intended function - stimulating the search for novel RNA functionality - by classifying available motifs, suggesting new motifs for design, and allowing for more specific searches for specific topologies. The updated RAG web resource offers users a graph-based tool for exploring available RNA motifs and suggesting new RNAs for design.
Collapse
Affiliation(s)
- Joseph A Izzo
- Department of Chemistry, New York University, New York, NY 10003, USA
| | | | | | | |
Collapse
|
18
|
Abstract
Like protein coding sequences, functional motifs in RNA elements are frequently conserved, but this conservation is most often at the structure level rather than sequence based. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the nature of RNA-protein interactions. The discovery of elements targeted by RNA-binding proteins and how they function remains one of the most active, yet elusive areas of RNA biology. Only a limited number of these elements have been well characterized with many of the fundamental rules yet to be discovered. Here we present a comprehensive list of web based resources that can be used in the study and identification of RNA-based structural and regulatory motifs and provide a survey of the informatic resources that can have been developed to facilitate this research.
Collapse
Affiliation(s)
- Ajish D George
- Department of Biomedical Sciences, School of Public Health, Gen∗NY∗Sis Center for Excellence in Cancer Genomics, University at Albany-SUNY, Rensselaer, NY, USA.
| | | |
Collapse
|
19
|
Sperschneider J, Datta A. DotKnot: pseudoknot prediction using the probability dot plot under a refined energy model. Nucleic Acids Res 2010; 38:e103. [PMID: 20123730 PMCID: PMC2853144 DOI: 10.1093/nar/gkq021] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
RNA pseudoknots are functional structure elements with key roles in viral and cellular processes. Prediction of a pseudoknotted minimum free energy structure is an NP-complete problem. Practical algorithms for RNA structure prediction including restricted classes of pseudoknots suffer from high runtime and poor accuracy for longer sequences. A heuristic approach is to search for promising pseudoknot candidates in a sequence and verify those. Afterwards, the detected pseudoknots can be further analysed using bioinformatics or laboratory techniques. We present a novel pseudoknot detection method called DotKnot that extracts stem regions from the secondary structure probability dot plot and assembles pseudoknot candidates in a constructive fashion. We evaluate pseudoknot free energies using novel parameters, which have recently become available. We show that the conventional probability dot plot makes a wide class of pseudoknots including those with bulged stems manageable in an explicit fashion. The energy parameters now become the limiting factor in pseudoknot prediction. DotKnot is an efficient method for long sequences, which finds pseudoknots with higher accuracy compared to other known prediction algorithms. DotKnot is accessible as a web server at http://dotknot.csse.uwa.edu.au.
Collapse
Affiliation(s)
- Jana Sperschneider
- School of Computer Science and Software Engineering, The University of Western Australia, Perth, WA 6009, Australia.
| | | |
Collapse
|
20
|
Schlick T. Biomolecular Structure and Modeling: Problem and Application Perspective. INTERDISCIPLINARY APPLIED MATHEMATICS 2010. [PMCID: PMC7124132 DOI: 10.1007/978-1-4419-6351-2_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The experimental progress described in the previous chapter has been accompanied by an increasing desire to relate the complex three-dimensional (3D) shapes of biomolecules to their biological functions and interactions with other molecular systems. Structural biology, computational biology, genomics, proteomics,
bioinformatics, chemoinformatics, and others are natural partner disciplines in such endeavors.
Collapse
Affiliation(s)
- Tamar Schlick
- Courant Institute of Mathematical Sciences and Department of Chemistry, New York University, 251 Mercer Street, New York, NY 10012 USA
| |
Collapse
|
21
|
Fan D, Bitterman PB, Larsson O. Regulatory element identification in subsets of transcripts: comparison and integration of current computational methods. RNA (NEW YORK, N.Y.) 2009; 15:1469-82. [PMID: 19553345 PMCID: PMC2714745 DOI: 10.1261/rna.1617009] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2009] [Accepted: 05/20/2009] [Indexed: 05/20/2023]
Abstract
Regulatory elements in mRNA play an often pivotal role in post-transcriptional regulation of gene expression. However, a systematic approach to efficiently identify putative regulatory elements from sets of post-transcriptionally coregulated genes is lacking, hampering studies of coregulation mechanisms. Although there are several analytical methods that can be used to detect conserved mRNA regulatory elements in a set of transcripts, there has been no systematic study of how well any of these methods perform individually or as a group. We therefore compared how well three algorithms, each based on a different principle (enumeration, optimization, or structure/sequence profiles), can identify elements in unaligned untranslated sequence regions. Two algorithms were originally designed to detect transcription factor binding sites, Weeder and BioProspector; and one was designed to detect RNA elements conserved in structure, RNAProfile. Three types of elements were examined: (1) elements conserved in both primary sequence and secondary structure; (2) elements conserved only in primary sequence; and (3) microRNA targets. Our results indicate that all methods can uniquely identify certain known RNA elements, and therefore, integrating the output from all algorithms leads to the most complete identification of elements. We therefore developed an approach to integrate results and guide selection of candidate elements from several algorithms presented as a web service (https://dbw.msi.umn.edu:8443/recit). These findings together with the approach for integration can be used to identify candidate elements from genome-wide post-transcriptional profiling data sets.
Collapse
Affiliation(s)
- Danhua Fan
- Department of Medicine, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | | | | |
Collapse
|
22
|
Hamada M, Sato K, Kiryu H, Mituyama T, Asai K. Predictions of RNA secondary structure by combining homologous sequence information. ACTA ACUST UNITED AC 2009; 25:i330-8. [PMID: 19478007 PMCID: PMC2687982 DOI: 10.1093/bioinformatics/btp228] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: Secondary structure prediction of RNA sequences is an important problem. There have been progresses in this area, but the accuracy of prediction from an RNA sequence is still limited. In many cases, however, homologous RNA sequences are available with the target RNA sequence whose secondary structure is to be predicted. Results: In this article, we propose a new method for secondary structure predictions of individual RNA sequences by taking the information of their homologous sequences into account without assuming the common secondary structure of the entire sequences. The proposed method is based on posterior decoding techniques, which consider all the suboptimal secondary structures of the target and homologous sequences and all the suboptimal alignments between the target sequence and each of the homologous sequences. In our computational experiments, the proposed method provides better predictions than those performed only on the basis of the formation of individual RNA sequences and those performed by using methods for predicting the common secondary structure of the homologous sequences. Remarkably, we found that the common secondary predictions sometimes give worse predictions for the secondary structure of a target sequence than the predictions from the individual target sequence, while the proposed method always gives good predictions for the secondary structure of target sequences in all tested cases. Availability: Supporting information and software are available online at: http://www.ncrna.org/software/centroidfold/ismb2009/. Contact:hamada-michiaki@aist.go.jp Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
23
|
|
24
|
Informatic resources for identifying and annotating structural RNA motifs. Mol Biotechnol 2008; 41:180-93. [PMID: 18979204 DOI: 10.1007/s12033-008-9114-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2008] [Accepted: 10/01/2008] [Indexed: 10/21/2022]
Abstract
Post-transcriptional regulation of genes and transcripts is a vital aspect of cellular processes, and unlike transcriptional regulation, remains a largely unexplored domain. One of the most obvious and most important questions to explore is the discovery of functional RNA elements. Many RNA elements have been characterized to date ranging from cis-regulatory motifs within mRNAs to large families of non-coding RNAs. Like protein coding genes, the functional motifs of these RNA elements are highly conserved, but unlike protein coding genes, it is most often the structure and not the sequence that is conserved. Proper characterization of these structural RNA motifs is both the key and the limiting step to understanding the post-transcriptional aspects of the genomic world. Here, we focus on the task of structural motif discovery and provide a survey of the informatics resources geared towards this task.
Collapse
|
25
|
Seemann SE, Gorodkin J, Backofen R. Unifying evolutionary and thermodynamic information for RNA folding of multiple alignments. Nucleic Acids Res 2008; 36:6355-62. [PMID: 18836192 PMCID: PMC2582601 DOI: 10.1093/nar/gkn544] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computational methods for determining the secondary structure of RNA sequences from given alignments are currently either based on thermodynamic folding, compensatory base pair substitutions or both. However, there is currently no approach that combines both sources of information in a single optimization problem. Here, we present a model that formally integrates both the energy-based and evolution-based approaches to predict the folding of multiple aligned RNA sequences. We have implemented an extended version of Pfold that identifies base pairs that have high probabilities of being conserved and of being energetically favorable. The consensus structure is predicted using a maximum expected accuracy scoring scheme to smoothen the effect of incorrectly predicted base pairs. Parameter tuning revealed that the probability of base pairing has a higher impact on the RNA structure prediction than the corresponding probability of being single stranded. Furthermore, we found that structurally conserved RNA motifs are mostly supported by folding energies. Other problems (e.g. RNA-folding kinetics) may also benefit from employing the principles of the model we introduce. Our implementation, PETfold, was tested on a set of 46 well-curated Rfam families and its performance compared favorably to that of Pfold and RNAalifold.
Collapse
Affiliation(s)
- Stefan E Seemann
- Division of Genetics and Bioinformatics, IBHV and Center for Applied Bioinformatics, University of Copenhagen, Groennegårdsvej 3, DK-1870 Frederiksberg C, Denmark
| | | | | |
Collapse
|
26
|
Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes. Proc Natl Acad Sci U S A 2008; 105:14885-90. [PMID: 18815376 DOI: 10.1073/pnas.0803169105] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Messenger RNA molecules are tightly regulated, mostly through interactions with proteins and other RNAs, but the mechanisms that confer the specificity of such interactions are poorly understood. It is clear, however, that this specificity is determined by both the nucleotide sequence and secondary structure of the mRNA. Here, we develop RNApromo, an efficient computational tool for identifying structural elements within mRNAs that are involved in specifying posttranscriptional regulations. By analyzing experimental data on mRNA decay rates, we identify common structural elements in fast-decaying and slow-decaying mRNAs and link them with binding preferences of several RNA binding proteins. We also predict structural elements in sets of mRNAs with common subcellular localization in mouse neurons and fly embryos. Finally, by analyzing pre-microRNA stem-loops, we identify structural differences between pre-microRNAs of animals and plants, which provide insights into the mechanism of microRNA biogenesis. Together, our results reveal unexplored layers of posttranscriptional regulations in groups of RNAs and are therefore an important step toward a better understanding of the regulatory information conveyed within RNA molecules. Our new RNA motif discovery tool is available online.
Collapse
|
27
|
Asai K, Kiryu H, Hamada M, Tabei Y, Sato K, Matsui H, Sakakibara Y, Terai G, Mituyama T. Software.ncrna.org: web servers for analyses of RNA sequences. Nucleic Acids Res 2008; 36:W75-8. [PMID: 18440970 PMCID: PMC2447773 DOI: 10.1093/nar/gkn222] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
We present web servers for analysis of non-coding RNA sequences on the basis of their secondary structures. Software tools for structural multiple sequence alignments, structural pairwise sequence alignments and structural motif findings are available from the integrated web server and the individual stand-alone web servers. The servers are located at http://software.ncrna.org, along with the information for the evaluation and downloading. This website is freely available to all users and there is no login requirement.
Collapse
Affiliation(s)
- Kiyoshi Asai
- Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwa-no-ha, Chiba 277-8561, Japan
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Tabei Y, Kiryu H, Kin T, Asai K. A fast structural multiple alignment method for long RNA sequences. BMC Bioinformatics 2008; 9:33. [PMID: 18215258 PMCID: PMC2375124 DOI: 10.1186/1471-2105-9-33] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2007] [Accepted: 01/23/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Aligning multiple RNA sequences is essential for analyzing non-coding RNAs. Although many alignment methods for non-coding RNAs, including Sankoff's algorithm for strict structural alignments, have been proposed, they are either inaccurate or computationally too expensive. Faster methods with reasonable accuracies are required for genome-scale analyses. RESULTS We propose a fast algorithm for multiple structural alignments of RNA sequences that is an extension of our pairwise structural alignment method (implemented in SCARNA). The accuracies of the implemented software, MXSCARNA, are at least as favorable as those of state-of-art algorithms that are computationally much more expensive in time and memory. CONCLUSION The proposed method for structural alignment of multiple RNA sequences is fast enough for large-scale analyses with accuracies at least comparable to those of existing algorithms. The source code of MXSCARNA and its web server are available at http://mxscarna.ncrna.org.
Collapse
Affiliation(s)
- Yasuo Tabei
- Graduate School of Frontier Science, University of Tokyo, CB04 Kiban-tou 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8561, Japan.
| | | | | | | |
Collapse
|
29
|
Kiryu H, Kin T, Asai K. Rfold: an exact algorithm for computing local base pairing probabilities. ACTA ACUST UNITED AC 2007; 24:367-73. [PMID: 18056736 DOI: 10.1093/bioinformatics/btm591] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Base pairing probability matrices have been frequently used for the analyses of structural RNA sequences. Recently, there has been a growing need for computing these probabilities for long DNA sequences by constraining the maximal span of base pairs to a limited value. However, none of the existing programs can exactly compute the base pairing probabilities associated with the energy model of secondary structures under such a constraint. RESULTS We present an algorithm that exactly computes the base pairing probabilities associated with the energy model under the constraint on the maximal span W of base pairs. The complexity of our algorithm is given by O(NW2) in time and O(N+W2) in memory, where N is the sequence length. We show that our algorithm has a higher sensitivity to the true base pairs as compared to that of RNAplfold. We also present an algorithm that predicts a mutually consistent set of local secondary structures by maximizing the expected accuracy function. The comparison of the local secondary structure predictions with those of RNALfold indicates that our algorithm is more accurate. Our algorithms are implemented in the software named 'Rfold.' AVAILABILITY The C++ source code of the Rfold software and the test dataset used in this study are available at http://www.ncrna.org/software/Rfold/.
Collapse
Affiliation(s)
- Hisanori Kiryu
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-42 Aomi, Koto-ku, Tokyo, Japan.
| | | | | |
Collapse
|
30
|
Horesh Y, Doniger T, Michaeli S, Unger R. RNAspa: a shortest path approach for comparative prediction of the secondary structure of ncRNA molecules. BMC Bioinformatics 2007; 8:366. [PMID: 17908318 PMCID: PMC2147038 DOI: 10.1186/1471-2105-8-366] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2007] [Accepted: 10/01/2007] [Indexed: 12/27/2022] Open
Abstract
Background In recent years, RNA molecules that are not translated into proteins (ncRNAs) have drawn a great deal of attention, as they were shown to be involved in many cellular functions. One of the most important computational problems regarding ncRNA is to predict the secondary structure of a molecule from its sequence. In particular, we attempted to predict the secondary structure for a set of unaligned ncRNA molecules that are taken from the same family, and thus presumably have a similar structure. Results We developed the RNAspa program, which comparatively predicts the secondary structure for a set of ncRNA molecules in linear time in the number of molecules. We observed that in a list of several hundred suboptimal minimal free energy (MFE) predictions, as provided by the RNAsubopt program of the Vienna package, it is likely that at least one suggested structure would be similar to the true, correct one. The suboptimal solutions of each molecule are represented as a layer of vertices in a graph. The shortest path in this graph is the basis for structural predictions for the molecule. We also show that RNA secondary structures can be compared very rapidly by a simple string Edit-Distance algorithm with a minimal loss of accuracy. We show that this approach allows us to more deeply explore the suboptimal structure space. Conclusion The algorithm was tested on three datasets which include several ncRNA families taken from the Rfam database. These datasets allowed for comparison of the algorithm with other methods. In these tests, RNAspa performed better than four other programs.
Collapse
Affiliation(s)
- Yair Horesh
- Department of Computer Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| | - Tirza Doniger
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| | - Shulamit Michaeli
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| | - Ron Unger
- The Mina & Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| |
Collapse
|
31
|
Jossinet F, Ludwig TE, Westhof E. RNA structure: bioinformatic analysis. Curr Opin Microbiol 2007; 10:279-85. [PMID: 17548241 DOI: 10.1016/j.mib.2007.05.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2007] [Accepted: 05/23/2007] [Indexed: 01/30/2023]
Abstract
The range of functions ascribed to RNA molecules has grown considerably during recent years. Consequently, the analysis and comparison of RNA sequences have become recurrent tasks in molecular biology. Because the biological function of an RNA is expressed more by its folded architecture than by its sequence, original computational tools adapted to the multifaceted RNA functions have to be developed. Such tools, recently published, enable a user to solve classical problems related to RNA research: constructing 'structural' multiple alignments, inferring complete structures and structural motifs from RNA alignments, or searching structural homology in genomic databases.
Collapse
Affiliation(s)
- Fabrice Jossinet
- Architecture et Réactivité de l'ARN, Université Louis Pasteur, Institut de Biologie Moléculaire et Cellulaire, CNRS, F-67084 Strasbourg, France
| | | | | |
Collapse
|
32
|
Kiryu H, Tabei Y, Kin T, Asai K. Murlet: a practical multiple alignment tool for structural RNA sequences. ACTA ACUST UNITED AC 2007; 23:1588-98. [PMID: 17459961 DOI: 10.1093/bioinformatics/btm146] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Structural RNA genes exhibit unique evolutionary patterns that are designed to conserve their secondary structures; these patterns should be taken into account while constructing accurate multiple alignments of RNA genes. The Sankoff algorithm is a natural alignment algorithm that includes the effect of base-pair covariation in the alignment model. However, the extremely high computational cost of the Sankoff algorithm precludes its application to most RNA sequences. RESULTS We propose an efficient algorithm for the multiple alignment of structural RNA sequences. Our algorithm is a variant of the Sankoff algorithm, and it uses an efficient scoring system that reduces the time and space requirements considerably without compromising on the alignment quality. First, our algorithm computes the match probability matrix that measures the alignability of each position pair between sequences as well as the base pairing probability matrix for each sequence. These probabilities are then combined to score the alignment using the Sankoff algorithm. By itself, our algorithm does not predict the consensus secondary structure of the alignment but uses external programs for the prediction. We demonstrate that both the alignment quality and the accuracy of the consensus secondary structure prediction from our alignment are the highest among the other programs examined. We also demonstrate that our algorithm can align relatively long RNA sequences such as the eukaryotic-type signal recognition particle RNA that is approximately 300 nt in length; multiple alignment of such sequences has not been possible by using other Sankoff-based algorithms. The algorithm is implemented in the software named 'Murlet'. AVAILABILITY The C++ source code of the Murlet software and the test dataset used in this study are available at http://www.ncrna.org/papers/Murlet/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hisanori Kiryu
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | | | |
Collapse
|