51
|
Dai Q, Wang TM. Use of statistical measures for analyzing RNA secondary structures. J Comput Chem 2008; 29:1292-305. [PMID: 18172840 DOI: 10.1002/jcc.20891] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
With more and more RNA secondary structures accumulated, the need for comparing different RNA secondary structures often arises in function prediction and evolutionary analysis. Numerous efficient algorithms were developed for comparing different RNA secondary structures, but challenges remain. In this article, a new statistical measure extending the notion of relative entropy based on the proposed stochastic model is evaluated for RNA secondary structures. The results obtained from several experiments on real datasets have shown the effectiveness of the proposed approach. Moreover, the time complexity of our method is favorable by comparing with that of the existing methods which solve the similar problem.
Collapse
Affiliation(s)
- Qi Dai
- Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, People's Republic of China.
| | | |
Collapse
|
52
|
Liu N, Wang T. A method for rapid similarity analysis of RNA secondary structures. BMC Bioinformatics 2006; 7:493. [PMID: 17090331 PMCID: PMC1637118 DOI: 10.1186/1471-2105-7-493] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2006] [Accepted: 11/08/2006] [Indexed: 11/24/2022] Open
Abstract
Background Owing to the rapid expansion of RNA structure databases in recent years, efficient methods for structure comparison are in demand for function prediction and evolutionary analysis. Usually, the similarity of RNA secondary structures is evaluated based on tree models and dynamic programming algorithms. We present here a new method for the similarity analysis of RNA secondary structures. Results Three sets of real data have been used as input for the example applications. Set I includes the structures from 5S rRNAs. Set II includes the secondary structures from RNase P and RNase MRP. Set III includes the structures from 16S rRNAs. Reasonable phylogenetic trees are derived for these three sets of data by using our method. Moreover, our program runs faster as compared to some existing ones. Conclusion The famous Lempel-Ziv algorithm can efficiently extract the information on repeated patterns encoded in RNA secondary structures and makes our method an alternative to analyze the similarity of RNA secondary structures. This method will also be useful to researchers who are interested in evolutionary analysis.
Collapse
Affiliation(s)
- Na Liu
- Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China
- College of Advanced Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tianming Wang
- College of Advanced Science and Technology, Dalian University of Technology, Dalian 116024, China
- Department of Mathematics, Hainan Normal University, Haikou 571158, China
| |
Collapse
|
53
|
Rødland EA. Pseudoknots in RNA secondary structures: representation, enumeration, and prevalence. J Comput Biol 2006; 13:1197-213. [PMID: 16901237 DOI: 10.1089/cmb.2006.13.1197] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
A number of non-coding RNA are known to contain functionally important or conserved pseudoknots. However, pseudoknotted structures are more complex than orthodox, and most methods for analyzing secondary structures do not handle them. I present here a way to decompose and represent general secondary structures which extends the tree representation of the stem-loop structure, and use this to analyze the frequency of pseudoknots in known and in random secondary structures. This comparison shows that, though a number of pseudoknots exist, they are still relatively rare and mostly of the simpler kinds. In contrast, random secondary structures tend to be heavily knotted, and the number of available structures increases dramatically when allowing pseudoknots. Therefore, methods for structure prediction and non-coding RNA identification that allow pseudoknots are likely to be much less powerful than those that do not, unless they penalize pseudoknots appropriately.
Collapse
Affiliation(s)
- Einar Andreas Rødland
- Institute of Medical Microbiology, Centre of Molecular Biology and Neuroscience, University of Oslo, Oslo, Norway.
| |
Collapse
|
54
|
Jansson J, Hieu NT, Sung WK. Local gapped subforest alignment and its application in finding RNA structural motifs. J Comput Biol 2006; 13:702-18. [PMID: 16706720 DOI: 10.1089/cmb.2006.13.702] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
RNA molecules whose secondary structures contain similar substructures often have similar functions. Therefore, an important task in the study of RNA is to develop methods for discovering substructures in RNA secondary structures that occur frequently (also referred to as motifs). In this paper, we consider the problem of computing an optimal local alignment of two given labeled ordered forests F1 and F2. This problem asks for a substructure of F1 and a substructure of F2 that exhibit a high similarity. Since an RNA molecule's secondary structure can be represented as a labeled ordered forest, the problem we study has a direct application to finding potential motifs. We generalize the previously studied concept of a closed subforest to a gapped subforest and present the first algorithm for computing the optimal local gapped subforest alignment of F1 and F2. We also show that our technique can improve the time and space complexity of the previously most efficient algorithm for optimal local closed subforest alignment. Furthermore, we prove that a special case of our local gapped subforest alignment problem is equivalent to a problem known in the literature as the local sequence-structure alignment problem (lssa) and modify our main algorithm to obtain a much faster algorithm for lssa than the one previously proposed. An implementation of our algorithm is available at www.comp.nus.edu.sg/~bioinfo/LGSFAligner/. Its running time is significantly faster than the original lssa program.
Collapse
Affiliation(s)
- Jesper Jansson
- School of Computing, National University of Singapore, 3 Science Drive 2, Singapore 117543
| | | | | |
Collapse
|
55
|
Leontis NB, Lescoute A, Westhof E. The building blocks and motifs of RNA architecture. Curr Opin Struct Biol 2006; 16:279-87. [PMID: 16713707 PMCID: PMC4857889 DOI: 10.1016/j.sbi.2006.05.009] [Citation(s) in RCA: 251] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Revised: 04/12/2006] [Accepted: 05/10/2006] [Indexed: 10/24/2022]
Abstract
RNA motifs can be defined broadly as recurrent structural elements containing multiple intramolecular RNA-RNA interactions, as observed in atomic-resolution RNA structures. They constitute the modular building blocks of RNA architecture, which is organized hierarchically. Recent work has focused on analyzing RNA backbone conformations to identify, define and search for new instances of recurrent motifs in X-ray structures. One current view asserts that recurrent RNA strand segments with characteristic backbone configurations qualify as independent motifs. Other considerations indicate that, to characterize modular motifs, one must take into account the larger structural context of such strand segments. This follows the biologically relevant motivation, which is to identify RNA structural characteristics that are subject to sequence constraints and that thus relate RNA architectures to sequences.
Collapse
Affiliation(s)
- Neocles B Leontis
- Department of Chemistry and Center for Biomolecular Sciences, Bowling Green State University, Bowling Green, OH 43402, USA
| | | | | |
Collapse
|
56
|
Churkin A, Barash D. RNAmute: RNA secondary structure mutation analysis tool. BMC Bioinformatics 2006; 7:221. [PMID: 16638137 PMCID: PMC1489952 DOI: 10.1186/1471-2105-7-221] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2006] [Accepted: 04/25/2006] [Indexed: 12/04/2022] Open
Abstract
Background RNAMute is an interactive Java application that calculates the secondary structure of all single point mutations, given an RNA sequence, and organizes them into categories according to their similarity with respect to the wild type predicted structure. The secondary structure predictions are performed using the Vienna RNA package. Several alternatives are used for the categorization of single point mutations: Vienna's RNAdistance based on dot-bracket representation, as well as tree edit distance and second eigenvalue of the Laplacian matrix based on Shapiro's coarse grain tree graph representation. Results Selecting a category in each one of the processed tables lists all single point mutations belonging to that category. Selecting a mutation displays a graphical drawing of the single point mutation and the wild type, and includes basic information such as associated energies, representations and distances. RNAMute can be used successfully with very little previous experience and without choosing any parameter value alongside the initial RNA sequence. The package runs under LINUX operating system. Conclusion RNAMute is a user friendly tool that can be used to predict single point mutations leading to conformational rearrangements in the secondary structure of RNAs. In several cases of substantial interest, notably in virology, a point mutation may lead to a loss of important functionality such as the RNA virus replication and translation initiation because of a conformational rearrangement in the secondary structure.
Collapse
Affiliation(s)
- Alexander Churkin
- Department of Computer Science, Ben-Gurion University, 84105 Beer Sheva, Israel
| | - Danny Barash
- Department of Computer Science, Ben-Gurion University, 84105 Beer Sheva, Israel
- Genome Diversity Center, Institute of Evolution, University of Haifa, 31905 Haifa, Israel
| |
Collapse
|
57
|
Haynes T, Knisley D, Seier E, Zou Y. A quantitative analysis of secondary RNA structure using domination based parameters on trees. BMC Bioinformatics 2006; 7:108. [PMID: 16515683 PMCID: PMC1420337 DOI: 10.1186/1471-2105-7-108] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 03/03/2006] [Indexed: 11/30/2022] Open
Abstract
Background It has become increasingly apparent that a comprehensive database of RNA motifs is essential in order to achieve new goals in genomic and proteomic research. Secondary RNA structures have frequently been represented by various modeling methods as graph-theoretic trees. Using graph theory as a modeling tool allows the vast resources of graphical invariants to be utilized to numerically identify secondary RNA motifs. The domination number of a graph is a graphical invariant that is sensitive to even a slight change in the structure of a tree. The invariants selected in this study are variations of the domination number of a graph. These graphical invariants are partitioned into two classes, and we define two parameters based on each of these classes. These parameters are calculated for all small order trees and a statistical analysis of the resulting data is conducted to determine if the values of these parameters can be utilized to identify which trees of orders seven and eight are RNA-like in structure. Results The statistical analysis shows that the domination based parameters correctly distinguish between the trees that represent native structures and those that are not likely candidates to represent RNA. Some of the trees previously identified as candidate structures are found to be "very" RNA like, while others are not, thereby refining the space of structures likely to be found as representing secondary RNA structure. Conclusion Search algorithms are available that mine nucleotide sequence databases. However, the number of motifs identified can be quite large, making a further search for similar motif computationally difficult. Much of the work in the bioinformatics arena is toward the development of better algorithms to address the computational problem. This work, on the other hand, uses mathematical descriptors to more clearly characterize the RNA motifs and thereby reduce the corresponding search space. These preliminary findings demonstrate that graph-theoretic quantifiers utilized in fields such as computer network design hold significant promise as an added tool for genomics and proteomics.
Collapse
Affiliation(s)
- Teresa Haynes
- Mathematics and Statistics Department, Box 70663, East Tennessee State University, Johnson City, TN, USA
| | - Debra Knisley
- Mathematics and Statistics Department, Box 70663, East Tennessee State University, Johnson City, TN, USA
| | - Edith Seier
- Mathematics and Statistics Department, Box 70663, East Tennessee State University, Johnson City, TN, USA
| | - Yue Zou
- Department of Biochemistry and Molecular Biology, Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
| |
Collapse
|
58
|
Abstract
In this article, we propose a relatively similar measure to compare RNA secondary structures. We first transform an RNA secondary structure into a special sequence representation. Then, on the basis of symbolic sequence complexity, we obtain the relative distance of RNA secondary structures. The examination of similarities/dissimilarities of a set of RNA secondary structures at the 3'-terminus of different viruses illustrates the utility of the approach.
Collapse
Affiliation(s)
- Chun Li
- Department of Mathematics, Bohai University, Jinzhou 121000, People's Republic of China.
| | | | | |
Collapse
|
59
|
Liao B, Wang T, Ding K. On a seven-dimensional representation of RNA secondary structures. MOLECULAR SIMULATION 2005. [DOI: 10.1080/08927020500371332] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
60
|
|
61
|
Yao YH, Liao B, Wang TM. A 2D graphical representation of RNA secondary structures and the analysis of similarity/dissimilarity based on it. ACTA ACUST UNITED AC 2005. [DOI: 10.1016/j.theochem.2005.08.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
62
|
Gevertz J, Gan HH, Schlick T. In vitro RNA random pools are not structurally diverse: a computational analysis. RNA (NEW YORK, N.Y.) 2005; 11:853-63. [PMID: 15923372 PMCID: PMC1370770 DOI: 10.1261/rna.7271405] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
In vitro selection of functional RNAs from large random sequence pools has led to the identification of many ligand-binding and catalytic RNAs. However, the structural diversity in random pools is not well understood. Such an understanding is a prerequisite for designing sequence pools to increase the probability of finding complex functional RNA by in vitro selection techniques. Toward this goal, we have generated by computer five random pools of RNA sequences of length up to 100 nt to mimic experiments and characterized the distribution of associated secondary structural motifs using sets of possible RNA tree structures derived from graph theory techniques. Our results show that such random pools heavily favor simple topological structures: For example, linear stem-loop and low-branching motifs are favored rather than complex structures with high-order junctions, as confirmed by known aptamers. Moreover, we quantify the rise of structural complexity with sequence length and report the dominant class of tree motifs (characterized by vertex number) for each pool. These analyses show not only that random pools do not lead to a uniform distribution of possible RNA secondary topologies; they point to avenues for designing pools with specific simple and complex structures in equal abundance in the goal of broadening the range of functional RNAs discovered by in vitro selection. Specifically, the optimal RNA sequence pool length to identify a structure with x stems is 20x.
Collapse
Affiliation(s)
- Jana Gevertz
- Summer Undergraduate Research Program, New York University School of Medicine, New York, 10003, USA
| | | | | |
Collapse
|
63
|
Abstract
Scales in RNA, based on geometrical considerations, can be exploited for the analysis and prediction of RNA structures. By using spectral decomposition, geometric information that relates to a given RNA fold can be reduced to a single positive scalar number, the second eigenvalue of the Laplacian matrix corresponding to the tree-graph representation of the RNA secondary structure. Along with the free energy of the structure, being the most important scalar number in the prediction of RNA folding by energy minimization methods, the second eigenvalue of the Laplacian matrix can be used as an effective signature for locating a target folded structure given a set of RNA folds. Furthermore, the second eigenvector of the Laplacian matrix can be used to partition large RNA structures into smaller fragments. An illustrative example is given for the use of the second eigenvalue to predict mutations that may cause structural rearrangements, thereby disrupting stable motifs.
Collapse
Affiliation(s)
- Danny Barash
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa, Israel.
| |
Collapse
|
64
|
Pasquali S, Gan HH, Schlick T. Modular RNA architecture revealed by computational analysis of existing pseudoknots and ribosomal RNAs. Nucleic Acids Res 2005; 33:1384-98. [PMID: 15745998 PMCID: PMC552955 DOI: 10.1093/nar/gki267] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Modular architecture is a hallmark of RNA structures, implying structural, and possibly functional, similarity among existing RNAs. To systematically delineate the existence of smaller topologies within larger structures, we develop and apply an efficient RNA secondary structure comparison algorithm using a newly developed two-dimensional RNA graphical representation. Our survey of similarity among 14 pseudoknots and subtopologies within ribosomal RNAs (rRNAs) uncovers eight pairs of structurally related pseudoknots with non-random sequence matches and reveals modular units in rRNAs. Significantly, three structurally related pseudoknot pairs have functional similarities not previously known: one pair involves the 3′ end of brome mosaic virus genomic RNA (PKB134) and the alternative hammerhead ribozyme pseudoknot (PKB173), both of which are replicase templates for viral RNA replication; the second pair involves structural elements for translation initiation and ribosome recruitment found in the viral internal ribosome entry site (PKB223) and the V4 domain of 18S rRNA (PKB205); the third pair involves 18S rRNA (PKB205) and viral tRNA-like pseudoknot (PKB134), which probably recruits ribosomes via structural mimicry and base complementarity. Additionally, we quantify the modularity of 16S and 23S rRNAs by showing that RNA motifs can be constructed from at least 210 building blocks. Interestingly, we find that the 5S rRNA and two tree modules within 16S and 23S rRNAs have similar topologies and tertiary shapes. These modules can be applied to design novel RNA motifs via build-up-like procedures for constructing sequences and folds.
Collapse
Affiliation(s)
| | - Hin Hark Gan
- Department of Chemistry, New York University251 Mercer Street, New York, NY 10021, USA
| | - Tamar Schlick
- Department of Chemistry, New York University251 Mercer Street, New York, NY 10021, USA
- Courant Institute of Mathematical Sciences, New York University251 Mercer Street, New York, NY 10021, USA
- To whom correspondence should be addressed: Tel: +1 212 998 3116; Fax: +1 212 995 4152;
| |
Collapse
|
65
|
Abstract
In this paper, we proposed a 6-D representation of RNA secondary structures. The use of the 6-D representation is illustrated by constructing structure invariants. Comparisons with the similarity/dissimilarity results based on 6-D representation for a set of RNA secondary structures, are considered to illustrate the use of our structure invariants based on the entries in derived sequence matrices restricted to a selected width of a band along the main diagonal.
Collapse
Affiliation(s)
- Bo Liao
- Science 100, Graduate School of the Chinese Academy of Sciences, Beijing 100039, China.
| | | | | |
Collapse
|
66
|
Yao YH, Nan XY, Wang TM. A class of 2D graphical representations of RNA secondary structures and the analysis of similarity based on them. J Comput Chem 2005; 26:1339-46. [PMID: 16021599 DOI: 10.1002/jcc.20271] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Based on the concepts of cell and system of graphical representation, a class of 2D graphical representations of RNA secondary structures are given in terms of classifications of bases of nucleic acids. The representations can completely avoid loss of information associated with crossing and overlapping of the corresponding curve. As an application, we make quantitative comparisons for a set of RNA secondary structures at the 3'-terminus of different viruses based on the graphical representations. The examination of similarities/dissimilarities illustrates the utility of the approach.
Collapse
Affiliation(s)
- Yu-Hua Yao
- Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, People's Republic of China
| | | | | |
Collapse
|
67
|
Kim N, Shiffeldrim N, Gan HH, Schlick T. Candidates for novel RNA topologies. J Mol Biol 2004; 341:1129-44. [PMID: 15321711 DOI: 10.1016/j.jmb.2004.06.054] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2004] [Revised: 06/10/2004] [Accepted: 06/21/2004] [Indexed: 10/26/2022]
Abstract
Because the functional repertiore of RNA molecules, like proteins, is closely linked to the diversity of their shapes, uncovering RNA's structural repertoire is vital for identifying novel RNAs, especially in genomic sequences. To help expand the limited number of known RNA families, we use graphical representation and clustering analysis of RNA secondary structures to predict novel RNA topologies and their abundance as a function of size. Representing the essential topological properties of RNA secondary structures as graphs enables enumeration, generation, and prediction of novel RNA motifs. We apply a probabilistic graph-growing method to construct the RNA structure space encompassing the topologies of existing and hypothetical RNAs and cluster all RNA topologies into two groups using topological descriptors and a standard clustering algorithm. Significantly, we find that nearly all existing RNAs fall into one group, which we refer to as "RNA-like"; we consider the other group "non-RNA-like". Our method predicts many candidates for novel RNA secondary topologies, some of which are remarkably similar to existing structures; interestingly, the centroid of the RNA-like group is the tmRNA fold, a pseudoknot having both tRNA-like and mRNA-like functions. Additionally, our approach allows estimation of the relative abundance of pseudoknot and other (e.g. tree) motifs using the "edge-cut" property of RNA graphs. This analysis suggests that pseudoknots dominate the RNA structure universe, representing more than 90% when the sequence length exceeds 120 nt; the predicted trend for <100 nt agrees with data for existing RNAs. Together with our predictions for novel "RNA-like" topologies, our analysis can help direct the design of functional RNAs and identification of novel RNA folds in genomes through an efficient topology-directed search, which grows much more slowly in complexity with RNA size compared to the traditional sequence-based search.
Collapse
Affiliation(s)
- Namhee Kim
- Department of Chemistry, New York University, 100 Washington Square East, Room 1001, New York, NY 10003, USA
| | | | | | | |
Collapse
|
68
|
Abstract
In this paper, we proposed a 3-D graphical representation of RNA secondary structures. Based on this representation, we outline an approach by constructing a 3-component vector whose components are the normalized leading eigenvalues of the L/L matrices associated with RNA secondary structure. The examination of similarities/dissimilarities among the secondary structure at the 3'-terminus of different viruses illustrates the utility of the approach.
Collapse
Affiliation(s)
- B Liao
- Department of Applied Mathematics, Dalian University of Technology, Dalian 116024, China.
| | | |
Collapse
|
69
|
Abstract
Methods for computationally predicting deleterious mutations have recently been investigated for proteins, mainly by probabilistic estimations in the context of genomic research for identifying single nucleotide polymorphisms that can potentially affect protein function. It has been demonstrated that in cases where a few homologs are available, ab initio predicted structures modeled by the Rosetta method can become useful for including structural information to improve the deleterious mutation prediction methods for proteins. In the field of RNAs where very few homologs are available at present, this analogy can serve as a precursor to investigate a deleterious mutation prediction approach that is based on RNA secondary structure. When attempting to develop models for the prediction of deleterious mutations in RNAs, useful structural information is available from folding algorithms that predict the secondary structure of RNAs, based on energy minimization. Detecting mutations with desired structural effects among all possible point mutations may then be valuable for the prediction of deleterious mutations that can be tested experimentally. Here, a method is introduced for the prediction of deleterious mutations in the secondary structure of RNAs. The mutation prediction method, based on subdivision of the initial structure into smaller substructures and construction of eigenvalue tables, is independent of the folding algorithms but relies on their success to predict the folding of small RNA structures. Application of this method to predict mutations that may cause structural rearrangements, thereby disrupting stable motifs, is given for prokaryotic transcription termination in the thiamin pyrophosphate and S-adenosyl-methionine induced riboswitches. Ribo switches are mRNA structures that have recently been found to regulate transcription termination or translation initiation in bacteria by conformation rearrangement in response to direct metabolite binding. Predicting deleterious mutations on riboswitches may succeed to systematically intervene in bacterial genetic control.
Collapse
Affiliation(s)
- Danny Barash
- Genome Diversity Center, Institute of Evolution, University of Haifa, Mount Carmel, Haifa 31905, Israel.
| |
Collapse
|
70
|
Höchsmann M, Voss B, Giegerich R. Pure multiple RNA secondary structure alignments: a progressive profile approach. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2004; 1:53-62. [PMID: 17048408 DOI: 10.1109/tcbb.2004.11] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
In functional, noncoding RNA, structure is often essential to function. While the full 3D structure is very difficult to determine, the 2D structure of an RNA molecule gives good clues to its 3D structure, and for molecules of moderate length, it can be predicted with good reliability. Structure comparison is, in analogy to sequence comparison, the essential technique to infer related function. We provide a method for computing multiple alignments of RNA secondary structures under the tree alignment model, which is suitable to cluster RNA molecules purely on the structural level, i.e., sequence similarity is not required. We give a systematic generalization of the profile alignment method from strings to trees and forests. We introduce a tree profile representation of RNA secondary structure alignments which allows reasonable scoring in structure comparison. Besides the technical aspects, an RNA profile is a useful data structure to represent multiple structures of RNA sequences. Moreover, we propose a visualization of RNA consensus structures that is enriched by the full sequence information.
Collapse
Affiliation(s)
- Matthias Höchsmann
- International Graduate School in Bioinformatics and Genome Research, University of Bielefeld, Bielefeld, Germany.
| | | | | |
Collapse
|
71
|
Gan HH, Pasquali S, Schlick T. Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. Nucleic Acids Res 2003; 31:2926-43. [PMID: 12771219 PMCID: PMC156709 DOI: 10.1093/nar/gkg365] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Understanding the structural repertoire of RNA is crucial for RNA genomics research. Yet current methods for finding novel RNAs are limited to small or known RNA families. To expand known RNA structural motifs, we develop a two-dimensional graphical representation approach for describing and estimating the size of RNA's secondary structural repertoire, including naturally occurring and other possible RNA motifs. We employ tree graphs to describe RNA tree motifs and more general (dual) graphs to describe both RNA tree and pseudoknot motifs. Our estimates of RNA's structural space are vastly smaller than the nucleotide sequence space, suggesting a new avenue for finding novel RNAs. Specifically our survey shows that known RNA trees and pseudoknots represent only a small subset of all possible motifs, implying that some of the 'missing' motifs may represent novel RNAs. To help pinpoint RNA-like motifs, we show that the motifs of existing functional RNAs are clustered in a narrow range of topological characteristics. We also illustrate the applications of our approach to the design of novel RNAs and automated comparison of RNA structures; we report several occurrences of RNA motifs within larger RNAs. Thus, our graph theory approach to RNA structures has implications for RNA genomics, structure analysis and design.
Collapse
Affiliation(s)
- Hin Hark Gan
- Department of Chemistry, New York University, 251 Mercer Street, New York, 10012 NY, USA
| | | | | |
Collapse
|
72
|
|
73
|
A Fast Algorithm for Optimal Alignment between Similar Ordered Trees. ACTA ACUST UNITED AC 2001. [DOI: 10.1007/3-540-48194-x_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2023]
|
74
|
Billoud B, Guerrucci MA, Masselot M, Deutsch JS. Cirripede phylogeny using a novel approach: molecular morphometrics. Mol Biol Evol 2000; 17:1435-45. [PMID: 11018151 DOI: 10.1093/oxfordjournals.molbev.a026244] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present a new method using nucleic acid secondary structure to assess phylogenetic relationships among species. In this method, which we term "molecular morphometrics," the measurable structural parameters of the molecules (geometrical features, bond energies, base composition, etc.) are used as specific characters to construct a phylogenetic tree. This method relies both on traditional morphological comparison and on molecular sequence comparison. Applied to the phylogenetic analysis of Cirripedia, molecular morphometrics supports the most recent morphological analyses arguing for the monophyly of Cirripedia sensu stricto (Thoracica + Rhizocephala + Acrothoracica). As a proof, a classical multiple alignment was also performed, either using or not using the structural information to realign the sequence segments considered in the molecular morphometrics analysis. These methods yielded the same tree topology as the direct use of structural characters as a phylogenetic signal. By taking into account the secondary structure of nucleic acids, the new method allows investigators to use the regions in which multiple alignments are barely reliable because of a large number of insertions and deletions. It thus appears to be complementary to classical primary sequence analysis in phylogenetic studies.
Collapse
Affiliation(s)
- B Billoud
- Atelier de BioInformatique, Service Commun de Bio-Systématique, Université Pierre et Marie Curie, Paris, France.
| | | | | | | |
Collapse
|
75
|
Abstract
This paper introduces a novel class of tree comparison problems strongly motivated by an important and cost intensive step in drug discovery pipeline viz., mapping cell bound receptors to the ligands they bind to and vice versa. Tree comparison studies motivated by problems such as virus-host tree comparison, gene-species tree comparison and consensus tree problem have been reported. None of these studies are applicable in our context because in all these problems, there is a well-defined mapping of the nodes the trees are built on across the set of trees being compared. A new class of tree comparison problems arises in cases where finding the correspondence among the nodes of the trees being compared is itself the problem. The problem arises while trying to find the interclass correspondence between the members of a pair of coevolving classes, e.g., cell bound receptors and their ligands. Given the evolution of the two classes, the combinatorial problem is to find a mapping among the leaves of the two trees that optimizes a given cost function. In this work we formulate various combinatorial optimization problems motivated by the aforementioned biological problem for the first time. We present hardness results, give an efficient algorithm for a restriction of the problem and demonstrate its applicability.
Collapse
Affiliation(s)
- V Bafna
- Informatics Research, Celera Genomics, Rockville, MD 20850, USA
| | | | | | | |
Collapse
|
76
|
|
77
|
Jiang T, Wang L, Zhang K. Alignment of trees — An alternative to tree edit. COMBINATORIAL PATTERN MATCHING 1994. [DOI: 10.1007/3-540-58094-8_7] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|