1
|
Kolaitis A, Makris E, Karagiannis AA, Tsanakas P, Pavlatos C. Knotify_V2.0: Deciphering RNA Secondary Structures with H-Type Pseudoknots and Hairpin Loops. Genes (Basel) 2024; 15:670. [PMID: 38927606 PMCID: PMC11203014 DOI: 10.3390/genes15060670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 05/19/2024] [Accepted: 05/22/2024] [Indexed: 06/28/2024] Open
Abstract
Accurately predicting the pairing order of bases in RNA molecules is essential for anticipating RNA secondary structures. Consequently, this task holds significant importance in unveiling previously unknown biological processes. The urgent need to comprehend RNA structures has been accentuated by the unprecedented impact of the widespread COVID-19 pandemic. This paper presents a framework, Knotify_V2.0, which makes use of syntactic pattern recognition techniques in order to predict RNA structures, with a specific emphasis on tackling the demanding task of predicting H-type pseudoknots that encompass bulges and hairpins. By leveraging the expressive capabilities of a Context-Free Grammar (CFG), the suggested framework integrates the inherent benefits of CFG and makes use of minimum free energy and maximum base pairing criteria. This integration enables the effective management of this inherently ambiguous task. The main contribution of Knotify_V2.0 compared to earlier versions lies in its capacity to identify additional motifs like bulges and hairpins within the internal loops of the pseudoknot. Notably, the proposed methodology, Knotify_V2.0, demonstrates superior accuracy in predicting core stems compared to state-of-the-art frameworks. Knotify_V2.0 exhibited exceptional performance by accurately identifying both core base pairing that form the ground truth pseudoknot in 70% of the examined sequences. Furthermore, Knotify_V2.0 narrowed the performance gap with Knotty, which had demonstrated better performance than Knotify and even surpassed it in Recall and F1-score metrics. Knotify_V2.0 achieved a higher count of true positives (tp) and a significantly lower count of false negatives (fn) compared to Knotify, highlighting improvements in Prediction and Recall metrics, respectively. Consequently, Knotify_V2.0 achieved a higher F1-score than any other platform. The source code and comprehensive implementation details of Knotify_V2.0 are publicly available on GitHub.
Collapse
Affiliation(s)
- Angelos Kolaitis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Evangelos Makris
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Alexandros Anastasios Karagiannis
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Panayiotis Tsanakas
- School of Electrical and Computer Engineering, National Technical University of Athens, 9 Iroon Polytechniou St., 15780 Athens, Greece; (A.K.); (E.M.); (A.A.K.); (P.T.)
| | - Christos Pavlatos
- Hellenic Air Force Academy, Dekelia Air Base, Acharnes, 13671 Athens, Greece
| |
Collapse
|
2
|
Marchand B, Will S, Berkemer SJ, Ponty Y, Bulteau L. Automated design of dynamic programming schemes for RNA folding with pseudoknots. Algorithms Mol Biol 2023; 18:18. [PMID: 38041153 PMCID: PMC10691146 DOI: 10.1186/s13015-023-00229-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/10/2023] [Indexed: 12/03/2023] Open
Abstract
Although RNA secondary structure prediction is a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, it remains challenging whenever pseudoknots come into play. Since the prediction of pseudoknotted structures by minimizing (realistically modelled) energy is NP-hard, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. To achieve good performance, these methods rely on specific and carefully hand-crafted DP schemes. In contrast, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. For this purpose, we formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the treewidth tw of the fatgraph, and its output represents a [Formula: see text] algorithm (and even possibly [Formula: see text] in simple energy models) for predicting the MFE folding of an RNA of length n. We demonstrate, for the most common pseudoknot classes, that our automatically generated algorithms achieve the same complexities as reported in the literature for hand-crafted schemes. Our framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case.
Collapse
Affiliation(s)
- Bertrand Marchand
- LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
- LIGM, CNRS, University Gustave Eiffel, F77454, Marne-la-Vallée, France
| | - Sebastian Will
- LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Sarah J Berkemer
- LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
- Earth-Life Science Institute, Tokyo Institute of Technology 2-12-1-I7E-318, Ookayama, Tokyo, 152-8550, Japan
| | - Yann Ponty
- LIX (UMR 7161), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.
| | - Laurent Bulteau
- LIGM, CNRS, University Gustave Eiffel, F77454, Marne-la-Vallée, France
| |
Collapse
|
3
|
Matarrese MAG, Loppini A, Nicoletti M, Filippi S, Chiodo L. Assessment of tools for RNA secondary structure prediction and extraction: a final-user perspective. J Biomol Struct Dyn 2022:1-20. [DOI: 10.1080/07391102.2022.2116110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Margherita A. G. Matarrese
- Engineering Department, Campus Bio-Medico University of Rome, Rome, Italy
- Jane and John Justin Neurosciences Center, Cook Children’s Health Care System, TX, USA
- Department of Bioengineering, The University of Texas at Arlington, Arlington, TX, USA
| | - Alessandro Loppini
- Engineering Department, Campus Bio-Medico University of Rome, Rome, Italy
- Center for Life Nano & Neuroscience, Italian Institute of Technology, Rome, Italy
| | - Martina Nicoletti
- Engineering Department, Campus Bio-Medico University of Rome, Rome, Italy
- Center for Life Nano & Neuroscience, Italian Institute of Technology, Rome, Italy
| | - Simonetta Filippi
- Engineering Department, Campus Bio-Medico University of Rome, Rome, Italy
| | - Letizia Chiodo
- Engineering Department, Campus Bio-Medico University of Rome, Rome, Italy
| |
Collapse
|
4
|
Marchei D, Merelli E. RNA secondary structure factorization in prime tangles. BMC Bioinformatics 2022; 23:345. [PMID: 35982399 PMCID: PMC9386957 DOI: 10.1186/s12859-022-04879-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 08/03/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Due to its key role in various biological processes, RNA secondary structures have always been the focus of in-depth analyses, with great efforts from mathematicians and biologists, to find a suitable abstract representation for modelling its functional and structural properties. One contribution is due to Kauffman and Magarshak, who modelled RNA secondary structures as mathematical objects constructed in link theory: tangles of the Brauer Monoid. In this paper, we extend the tangle-based model with its minimal prime factorization, useful to analyze patterns that characterize the RNA secondary structure. RESULTS By leveraging the mapping between RNA and tangles, we prove that the prime factorizations of tangle-based models share some patterns with RNA folding's features. We analyze the E. coli tRNA and provide some visual examples of interesting patterns. CONCLUSIONS We formulate an open question on the nature of the class of equivalent factorizations and discuss some research directions in this regard. We also propose some practical applications of the tangle-based method to RNA classification and folding prediction as a useful tool for learning algorithms, even though the full factorization is not known.
Collapse
Affiliation(s)
- Daniele Marchei
- University of Camerino, Via Madonna delle Carceri 9, 62032, Camerino, Italy.
| | - Emanuela Merelli
- University of Camerino, Via Madonna delle Carceri 9, 62032, Camerino, Italy
| |
Collapse
|
5
|
Naureen Z, Gilani SA, Benny BK, Sadia H, Hafeez FY, Khanum A. Metabolomic Profiling of Plant Growth-Promoting Rhizobacteria for Biological Control of Phytopathogens. Fungal Biol 2022. [DOI: 10.1007/978-3-031-04805-0_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
6
|
Quadrini M. Structural relation matching: an algorithm to identify structural patterns into RNAs and their interactions. J Integr Bioinform 2021; 18:111-126. [PMID: 34051708 PMCID: PMC9382659 DOI: 10.1515/jib-2020-0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 04/19/2021] [Indexed: 11/15/2022] Open
Abstract
RNA molecules play crucial roles in various biological processes. Their three-dimensional configurations determine the functions and, in turn, influences the interaction with other molecules. RNAs and their interaction structures, the so-called RNA-RNA interactions, can be abstracted in terms of secondary structures, i.e., a list of the nucleotide bases paired by hydrogen bonding within its nucleotide sequence. Each secondary structure, in turn, can be abstracted into cores and shadows. Both are determined by collapsing nucleotides and arcs properly. We formalize all of these abstractions as arc diagrams, whose arcs determine loops. A secondary structure, represented by an arc diagram, is pseudoknot-free if its arc diagram does not present any crossing among arcs otherwise, it is said pseudoknotted. In this study, we face the problem of identifying a given structural pattern into secondary structures or the associated cores or shadow of both RNAs and RNA-RNA interactions, characterized by arbitrary pseudoknots. These abstractions are mapped into a matrix, whose elements represent the relations among loops. Therefore, we face the problem of taking advantage of matrices and submatrices. The algorithms, implemented in Python, work in polynomial time. We test our approach on a set of 16S ribosomal RNAs with inhibitors of Thermus thermophilus, and we quantify the structural effect of the inhibitors.
Collapse
Affiliation(s)
- Michela Quadrini
- University of Camerino, School of Science and Technology, via Madonna delle Carceri, Camerino, Italy
| |
Collapse
|
7
|
Thanh VH, Korpela D, Orponen P. Cotranscriptional Kinetic Folding of RNA Secondary Structures Including Pseudoknots. J Comput Biol 2021; 28:892-908. [PMID: 33902324 DOI: 10.1089/cmb.2020.0606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Computational prediction of ribonucleic acid (RNA) structures is an important problem in computational structural biology. Studies of RNA structure formation often assume that the process starts from a fully synthesized sequence. Experimental evidence, however, has shown that RNA folds concurrently with its elongation. We investigate RNA secondary structure formation, including pseudoknots, that takes into account the cotranscriptional effects. We propose a single-nucleotide resolution kinetic model of the folding process of RNA molecules, where the polymerase-driven elongation of an RNA strand by a new nucleotide is included as a primitive operation, together with a stochastic simulation method that implements this folding concurrently with the transcriptional synthesis. Numerical case studies show that our cotranscriptional RNA folding model can predict the formation of conformations that are favored in actual biological systems. Our new computational tool can thus provide quantitative predictions and offer useful insights into the kinetics of RNA folding.
Collapse
Affiliation(s)
- Vo Hong Thanh
- Department of Computer Science, Aalto University, Espoo, Finland.,Certara UK Limited (Simcyp Division), Sheffield, United Kingdom
| | - Dani Korpela
- Department of Computer Science, Aalto University, Espoo, Finland
| | - Pekka Orponen
- Department of Computer Science, Aalto University, Espoo, Finland
| |
Collapse
|
8
|
Mak CH, Phan ENH. Diagrammatic approaches to RNA structures with trinucleotide repeats. Biophys J 2021; 120:2343-2354. [PMID: 33887227 PMCID: PMC8390803 DOI: 10.1016/j.bpj.2021.04.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 04/07/2021] [Accepted: 04/09/2021] [Indexed: 11/30/2022] Open
Abstract
Trinucleotide repeat expansion disorders are associated with the overexpansion of (CNG) repeats on the genome. Messenger RNA transcripts of sequences with greater than 60–100 (CNG) tandem units have been implicated in trinucleotide repeat expansion disorder pathogenesis. In this work, we develop a diagrammatic theory to study the structural diversity of these (CNG)n RNA sequences. Representing structural elements on the chain’s conformation by a set of graphs and employing elementary diagrammatic methods, we have formulated a renormalization procedure to re-sum these graphs and arrive at a closed-form expression for the ensemble partition function. With a simple approximation for the renormalization and applied to extended (CNG)n sequences, this theory can comprehensively capture an infinite set of conformations with any number and any combination of duplexes, hairpins, multiway junctions, and quadruplexes. To quantify the diversity of different (CNG)n ensembles, the analytical equations derived from the diagrammatic theory were solved numerically to derive equilibrium estimates for the secondary structural contents of the chains. The results suggest that the structural ensembles of (CNG)n repeat sequence with n ∼60 are surprisingly diverse, and the distribution is sensitive to the ability of the N nucleotide to make noncanonical pairs and whether the (CNG)n sequence can sustain stable quadruplexes. The results show how perturbations in the form of biases on the stabilities of the various structural motifs, duplexes, junctions, helices, and quadruplexes could affect the secondary structures of the chains and how these structures may switch when they are perturbed.
Collapse
Affiliation(s)
- Chi H Mak
- Department of Chemistry, Center of Applied Mathematical Sciences and Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California.
| | - Ethan N H Phan
- Department of Chemistry, University of Southern California, Los Angeles, California
| |
Collapse
|
9
|
Badelt S, Grun C, Sarma KV, Wolfe B, Shin SW, Winfree E. A domain-level DNA strand displacement reaction enumerator allowing arbitrary non-pseudoknotted secondary structures. J R Soc Interface 2020; 17:20190866. [PMID: 32486951 PMCID: PMC7328391 DOI: 10.1098/rsif.2019.0866] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 04/21/2020] [Indexed: 12/30/2022] Open
Abstract
Information technologies enable programmers and engineers to design and synthesize systems of startling complexity that nonetheless behave as intended. This mastery of complexity is made possible by a hierarchy of formal abstractions that span from high-level programming languages down to low-level implementation specifications, with rigorous connections between the levels. DNA nanotechnology presents us with a new molecular information technology whose potential has not yet been fully unlocked in this way. Developing an effective hierarchy of abstractions may be critical for increasing the complexity of programmable DNA systems. Here, we build on prior practice to provide a new formalization of 'domain-level' representations of DNA strand displacement systems that has a natural connection to nucleic acid biophysics while still being suitable for formal analysis. Enumeration of unimolecular and bimolecular reactions provides a semantics for programmable molecular interactions, with kinetics given by an approximate biophysical model. Reaction condensation provides a tractable simplification of the detailed reactions that respects overall kinetic properties. The applicability and accuracy of the model is evaluated across a wide range of engineered DNA strand displacement systems. Thus, our work can serve as an interface between lower-level DNA models that operate at the nucleotide sequence level, and high-level chemical reaction network models that operate at the level of interactions between abstract species.
Collapse
Affiliation(s)
- Stefan Badelt
- California Institute of Technology, Pasadena, CA, USA
| | - Casey Grun
- Wyss Institute, Harvard University, Boston, MA, USA
| | | | - Brian Wolfe
- California Institute of Technology, Pasadena, CA, USA
| | | | - Erik Winfree
- California Institute of Technology, Pasadena, CA, USA
| |
Collapse
|
10
|
Abstract
There are some NP-hard problems in the prediction of RNA structures. Prediction of RNA folding structure in RNA nucleotide sequence remains an unsolved challenge. We investigate the computing algorithm in RNA folding structural prediction based on extended structure and basin hopping graph, it is a computing mode of basin hopping graph in RNA folding structural prediction including pseudoknots. This study presents the predicting algorithm based on extended structure, it also proposes an improved computing algorithm based on barrier tree and basin hopping graph, which are the attractive approaches in RNA folding structural prediction. Many experiments have been implemented in Rfam14.1 database and PseudoBase database, the experimental results show that our two algorithms are efficient and accurate than the other existing algorithms.
Collapse
Affiliation(s)
- Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, P. R. China
- Department of Biostatistics, University of California, Los Angeles, Los Angeles 90095, USA
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| | - Gang Li
- Department of Biostatistics, University of California, Los Angeles, Los Angeles 90095, USA
| | - Jun S. Liu
- Department of Statistics, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
11
|
Li TJX, Burris CS, Reidys CM. The block spectrum of RNA pseudoknot structures. J Math Biol 2019; 79:791-822. [PMID: 31172257 DOI: 10.1007/s00285-019-01379-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 04/29/2019] [Indexed: 01/08/2023]
Abstract
In this paper we analyze the length-spectrum of blocks in [Formula: see text]-structures. [Formula: see text]-structures are a class of RNA pseudoknot structures that play a key role in the context of polynomial time RNA folding. A [Formula: see text]-structure is constructed by nesting and concatenating specific building components having topological genus at most [Formula: see text]. A block is a substructure enclosed by crossing maximal arcs with respect to the partial order induced by nesting. We show that, in uniformly generated [Formula: see text]-structures, there is a significant gap in this length-spectrum, i.e., there asymptotically almost surely exists a unique longest block of length at least [Formula: see text] and that with high probability any other block has finite length. For fixed [Formula: see text], we prove that the length of the complement of the longest block converges to a discrete limit law, and that the distribution of short blocks of given length tends to a negative binomial distribution in the limit of long sequences. We refine this analysis to the length spectrum of blocks of specific pseudoknot types, such as H-type and kissing hairpins. Our results generalize the rainbow spectrum on secondary structures by the first and third authors and are being put into context with the structural prediction of long non-coding RNAs.
Collapse
Affiliation(s)
- Thomas J X Li
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, VA, USA
| | | | - Christian M Reidys
- Biocomplexity Institute and Initiative, University of Virginia, Charlottesville, VA, USA. .,Department of Mathematics, University of Virginia, Charlottesville, VA, USA.
| |
Collapse
|
12
|
Thiel BC, Beckmann IK, Kerpedjiev P, Hofacker IL. 3D based on 2D: Calculating helix angles and stacking patterns using forgi 2.0, an RNA Python library centered on secondary structure elements. F1000Res 2019; 8:ISCB Comm J-287. [PMID: 31069053 PMCID: PMC6480952 DOI: 10.12688/f1000research.18458.2] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/04/2019] [Indexed: 01/01/2023] Open
Abstract
We present forgi, a Python library to analyze the tertiary structure of RNA secondary structure elements. Our representation of an RNA molecule is centered on secondary structure elements (stems, bulges and loops). By fitting a cylinder to the helix axis, these elements are carried over into a coarse-grained 3D structure representation. Integration with Biopython allows for handling of all-atom 3D information. forgi can deal with a variety of file formats including dotbracket strings, PDB and MMCIF files. We can handle modified residues, missing residues, cofold and multifold structures as well as nucleotide numbers starting at arbitrary positions. We apply this library to the study of stacking helices in junctions and pseudoknots and investigate how far stacking helices in solved experimental structures can divert from coaxial geometries.
Collapse
Affiliation(s)
- Bernhard C. Thiel
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
| | - Irene K. Beckmann
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
| | - Peter Kerpedjiev
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Ivo L. Hofacker
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
13
|
Thiel BC, Beckmann IK, Kerpedjiev P, Hofacker IL. 3D based on 2D: Calculating helix angles and stacking patterns using forgi 2.0, an RNA Python library centered on secondary structure elements. F1000Res 2019; 8:ISCB Comm J-287. [PMID: 31069053 PMCID: PMC6480952 DOI: 10.12688/f1000research.18458.1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/06/2019] [Indexed: 10/12/2023] Open
Abstract
We present forgi, a Python library to analyze the tertiary structure of RNA secondary structure elements. Our representation of an RNA molecule is centered on secondary structure elements (stems, bulges and loops). By fitting a cylinder to the helix axis, these elements are carried over into a coarse-grained 3D structure representation. Integration with Biopython allows for handling of all-atom 3D information. forgi can deal with a variety of file formats including dotbracket strings, PDB and MMCIF files. We can handle modified residues, missing residues, cofold and multifold structures as well as nucleotide numbers starting at arbitrary positions. We apply this library to the study of stacking helices in junctions and pseudo knots and investigate how far stacking helices in solved experimental structures can divert from coaxial geometries.
Collapse
Affiliation(s)
- Bernhard C. Thiel
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
| | - Irene K. Beckmann
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
| | - Peter Kerpedjiev
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, 02115, USA
| | - Ivo L. Hofacker
- Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, 1090, Austria
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, 1090, Austria
| |
Collapse
|
14
|
Barrett C, He Q, Huang FW, Reidys CM. A Boltzmann Sampler for 1-Pairs with Double Filtration. J Comput Biol 2019; 26:173-192. [PMID: 30653353 DOI: 10.1089/cmb.2018.0095] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Recently, a framework considering RNA sequences and their RNA secondary structures as pairs led to some information-theoretic perspectives on how the semantics encoded in RNA sequences can be inferred. This pairing arises naturally from the energy model of RNA secondary structures. Fixing the sequence in the pairing produces the RNA energy landscape, whose partition function was discovered by McCaskill. Dually, fixing the structure induces the energy landscape of sequences. The latter has been considered originally for designing more efficient inverse folding algorithms and subsequently enhanced by facilitating the sampling of sequences. We present here a partition function of sequence/structure pairs, with endowed Hamming distance and base pair distance filtration. This partition function is an augmentation of the previous mentioned (dual) partition function. We develop an efficient dynamic programming routine to recursively compute the partition function with this double filtration. Our framework is capable of dealing with RNA secondary structures as well as 1-structures, where a 1-structure is an RNA pseudoknot structure consisting of "building blocks" of genus 0 or 1. In particular, 0-structures, consisting of only "building blocks" of genus 0, are exactly RNA secondary structures. The time complexity for calculating the partition function of 1-pairs, that is, sequence/structure pairs where the structures are 1-structures, is O(h3b3n6), where h, b, n denote the Hamming distance, base pair distance, and sequence length, respectively. The time complexity for the partition function of 0-pairs is O(h2b2n3).
Collapse
Affiliation(s)
- Christopher Barrett
- 1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia.,2 Department of Computer Science, University of Virginia, Charlottesville, Virginia
| | - Qijun He
- 1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia
| | - Fenix W Huang
- 1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia
| | - Christian M Reidys
- 1 Biocomplexity Initiative and Institute, University of Virginia, Charlottesville, Virginia.,3 Department of Mathematics, University of Virginia, Charlottesville, Virginia
| |
Collapse
|
15
|
Liu Z, Zhu D, Dai Q. Predicting Model and Algorithm in RNA Folding Structure Including Pseudoknots. INT J PATTERN RECOGN 2018. [DOI: 10.1142/s0218001418510059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The prediction of RNA structure with pseudoknots is a nondeterministic polynomial-time hard (NP-hard) problem; according to minimum free energy models and computational methods, we investigate the RNA-pseudoknotted structure. Our paper presents an efficient algorithm for predicting RNA structure with pseudoknots, and the algorithm takes O([Formula: see text]) time and O([Formula: see text]) space, the experimental tests in Rfam10.1 and PseudoBase indicate that the algorithm is more effective and precise. The predicting accuracy, the time complexity and space complexity outperform existing algorithms, such as Maximum Weight Matching (MWM) algorithm, PKNOTS algorithm and Inner Limiting Layer (ILM) algorithm, and the algorithm can predict arbitrary pseudoknots. And there exists a [Formula: see text] ([Formula: see text]) polynomial time approximation scheme in searching maximum number of stackings, and we give the proof of the approximation scheme in RNA-pseudoknotted structure. We have improved several types of pseudoknots considered in RNA folding structure, and analyze their possible transitions between types of pseudoknots.
Collapse
Affiliation(s)
- Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, P. R. China
| | - Daming Zhu
- School of Computer Science and Technology, Shandong University, Jinan 250101, P. R. China
| | - Qionghai Dai
- Department of Automation, Tsinghua University, Beijing 100084, P. R. China
| |
Collapse
|
16
|
Barrett C, Huang FW, Reidys CM. Sequence-structure relations of biopolymers. Bioinformatics 2018; 33:382-389. [PMID: 28171628 DOI: 10.1093/bioinformatics/btw621] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 05/16/2016] [Accepted: 09/26/2016] [Indexed: 12/12/2022] Open
Abstract
Motivation DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as structural semantics of sequence data that allows for a different interpretation than conventional sequence alignment. Structural semantics could enable us to identify more general embedded ‘patterns’ in DNA and RNA sequences. Results We compute the partition function of sequences with respect to a fixed structure and connect this computation to the mutual information of a sequence–structure pair for RNA secondary structures. We present a Boltzmann sampler and obtain the a priori probability of specific sequence patterns. We present a detailed analysis for the three PDB-structures, 2JXV (hairpin), 2N3R (3-branch multi-loop) and 1EHZ (tRNA). We localize specific sequence patterns, contrast the energy spectrum of the Boltzmann sampled sequences versus those sequences that refold into the same structure and derive a criterion to identify native structures. We illustrate that there are multiple sequences in the partition function of a fixed structure, each having nearly the same mutual information, that are nevertheless poorly aligned. This indicates the possibility of the existence of relevant patterns embedded in the sequences that are not discoverable using alignments. Availability and Implementation The source code is freely available at http://staff.vbi.vt.edu/fenixh/Sampler.zip Contact duckcr@vbi.vt.edu Supplimentary Information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher Barrett
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| | - Fenix W Huang
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| | - Christian M Reidys
- Biocomplexity Institute of Virginia Tech, Virginia Tech University, Blacksburg, VA, USA
| |
Collapse
|
17
|
Shabash B, Wiese KC. RNA Visualization: Relevance and the Current State-of-the-Art Focusing on Pseudoknots. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:696-712. [PMID: 26915129 DOI: 10.1109/tcbb.2016.2522421] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
RNA visualization is crucial in order to understand the relationship that exists between RNA structure and its function, as well as the development of better RNA structure prediction algorithms. However, in the context of RNA visualization, one key structure remains difficult to visualize: Pseudoknots. Pseudoknots occur in RNA folding when two secondary structural components form base-pairs between them. The three-dimensional nature of these components makes them challenging to visualize in two-dimensional media, such as print media or screens. In this review, we focus on the advancements that have been made in the field of RNA visualization in two-dimensional media in the past two decades. The review aims at presenting all relevant aspects of pseudoknot visualization. We start with an overview of several pseudoknotted structures and their relevance in RNA function. Next, we discuss the theoretical basis for RNA structural topology classification and present RNA classification systems for both pseudoknotted and non-pseudoknotted RNAs. Each description of RNA classification system is followed by a discussion of the software tools and algorithms developed to date to visualize RNA, comparing the different tools' strengths and shortcomings.
Collapse
|
18
|
The Ecological Role of Volatile and Soluble Secondary Metabolites Produced by Soil Bacteria. Trends Microbiol 2017; 25:280-292. [DOI: 10.1016/j.tim.2016.12.002] [Citation(s) in RCA: 240] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2016] [Revised: 11/15/2016] [Accepted: 12/05/2016] [Indexed: 01/11/2023]
|
19
|
Topological Classification of RNA Structures via Intersection Graph. THEORY AND PRACTICE OF NATURAL COMPUTING 2017. [DOI: 10.1007/978-3-319-71069-3_16] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
20
|
Abstract
AbstractTopological data analysis has been recently used to extract meaningful information frombiomolecules. Here we introduce the application of persistent homology, a topological data analysis tool, for computing persistent features (loops) of the RNA folding space. The scaffold of the RNA folding space is a complex graph from which the global features are extracted by completing the graph to a simplicial complex via the notion of clique and Vietoris-Rips complexes. The resulting simplicial complexes are characterised in terms of topological invariants, such as the number of holes in any dimension, i.e. Betti numbers. Our approach discovers persistent structural features, which are the set of smallest components to which the RNA folding space can be reduced. Thanks to this discovery, which in terms of data mining can be considered as a space dimension reduction, it is possible to extract a new insight that is crucial for understanding the mechanism of the RNA folding towards the optimal secondary structure. This structure is composed by the components discovered during the reduction step of the RNA folding space and is characterized by minimum free energy.
Collapse
|
21
|
Huang FW, Reidys CM. Topological language for RNA. Math Biosci 2016; 282:109-120. [DOI: 10.1016/j.mbs.2016.10.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 10/17/2016] [Accepted: 10/17/2016] [Indexed: 12/26/2022]
|
22
|
Li TJX, Reidys CM. Statistics of topological RNA structures. J Math Biol 2016; 74:1793-1821. [DOI: 10.1007/s00285-016-1078-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Revised: 10/30/2016] [Indexed: 11/30/2022]
|
23
|
Vernizzi G, Orland H, Zee A. Classification and predictions of RNA pseudoknots based on topological invariants. Phys Rev E 2016; 94:042410. [PMID: 27841638 DOI: 10.1103/physreve.94.042410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Indexed: 01/21/2023]
Abstract
We propose a new topological characterization of ribonucleic acid (RNA) secondary structures with pseudoknots based on two topological invariants. Starting from the classic arc representation of RNA secondary structures, we consider a model that couples both (i) the topological genus of the graph and (ii) the number of crossing arcs of the corresponding primitive graph. We add a term proportional to these topological invariants to the standard free energy of the RNA molecule, thus obtaining a novel free-energy parametrization that takes into account the abundance of topologies of RNA pseudoknots observed in RNA databases.
Collapse
Affiliation(s)
| | - Henri Orland
- Institut de Physique Théorique, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France.,Beijing Computational Science Research Center, Haidian District Beijing, 100084, China.,Department of Physics, University of California, Santa Barbara, CA 93106, USA
| | - A Zee
- Institut de Physique Théorique, CEA Saclay, 91191 Gif-sur-Yvette Cedex, France.,Department of Physics, University of California, Santa Barbara, CA 93106, USA.,Kavli Institute for Theoretical Physics, University of California, Santa Barbara, CA 93106, USA
| |
Collapse
|
24
|
Baulin E, Yacovlev V, Khachko D, Spirin S, Roytberg M. URS DataBase: universe of RNA structures and their motifs. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw085. [PMID: 27242032 PMCID: PMC4885603 DOI: 10.1093/database/baw085] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Accepted: 05/02/2016] [Indexed: 12/17/2022]
Abstract
The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA–protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification. Database URL: http://server3.lpm.org.ru/urs/
Collapse
Affiliation(s)
- Eugene Baulin
- Laboratory of Applied Mathematics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia Department of Algorithms and Technology of Programming, Faculty of Innovations and High Technology, Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow Region 141700, Russia
| | - Victor Yacovlev
- Laboratory of Applied Mathematics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia Department of Big Data and Information Retrieval, Faculty of Computer Science, National Research University Higher School of Economics, Moscow 101000, Russia
| | - Denis Khachko
- Laboratory of Applied Mathematics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia
| | - Sergei Spirin
- Department of Mathematical Methods in Biology, Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow 119992, Russia
| | - Mikhail Roytberg
- Laboratory of Applied Mathematics, Institute of Mathematical Problems of Biology, Russian Academy of Sciences, Pushchino, Moscow Region 142290, Russia Department of Algorithms and Technology of Programming, Faculty of Innovations and High Technology, Moscow Institute of Physics and Technology (State University), Dolgoprudny, Moscow Region 141700, Russia Department of Big Data and Information Retrieval, Faculty of Computer Science, National Research University Higher School of Economics, Moscow 101000, Russia
| |
Collapse
|
25
|
Alas SJ, González-Pérez PP. Simulating the folding of HP-sequences with a minimalist model in an inhomogeneous medium. Biosystems 2016; 142-143:52-67. [PMID: 27020756 DOI: 10.1016/j.biosystems.2016.03.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2015] [Revised: 03/16/2016] [Accepted: 03/24/2016] [Indexed: 11/24/2022]
Abstract
The phenomenon of protein folding is a fundamental issue in the field of the computational molecular biology. The protein folding inside the cells is performed in a highly inhomogeneous, tortuous, and correlated environment. Therefore, it is important to include in the theoretical studies the medium where the protein folding is developed. In this work we present the combination of three models to mimic the protein folding inside of an inhomogeneous medium. The models used here are Hydrophobic-Polar (HP) in 2D square arrangement, Evolutionary Algorithms (EA), and the Dual Site Bond Model (DSBM). The DSBM model is used to simulate the environment where the HP beads are folded; in this case the medium is correlated and is fractal-like. The analysis of five benchmark HP sequences shows that the inhomogeneous space provided with a given correlation length and fractal dimension plays an important role for correct folding of these sequences, which does not occur in a homogeneous space.
Collapse
Affiliation(s)
- S J Alas
- Departamento de Ciencias Naturales, Universidad Autónoma Metropolitana Unidad Cuajimalpa, Av. Vasco de Quiroga 4871, Distrito Federal 05300, Mexico
| | - P P González-Pérez
- Departamento de Matemáticas Aplicadas y Sistemas, Universidad Autónoma Metropolitana Unidad Cuajimalpa, Av. Vasco de Quiroga 4871, Distrito Federal 05300, Mexico.
| |
Collapse
|
26
|
Kucharík M, Hofacker IL, Stadler PF, Qin J. Pseudoknots in RNA folding landscapes. Bioinformatics 2016; 32:187-94. [PMID: 26428288 PMCID: PMC4708108 DOI: 10.1093/bioinformatics/btv572] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 09/10/2015] [Accepted: 09/27/2015] [Indexed: 02/04/2023] Open
Abstract
MOTIVATION The function of an RNA molecule is not only linked to its native structure, which is usually taken to be the ground state of its folding landscape, but also in many cases crucially depends on the details of the folding pathways such as stable folding intermediates or the timing of the folding process itself. To model and understand these processes, it is necessary to go beyond ground state structures. The study of rugged RNA folding landscapes holds the key to answer these questions. Efficient coarse-graining methods are required to reduce the intractably vast energy landscapes into condensed representations such as barrier trees or basin hopping graphs : BHG) that convey an approximate but comprehensive picture of the folding kinetics. So far, exact and heuristic coarse-graining methods have been mostly restricted to the pseudoknot-free secondary structures. Pseudoknots, which are common motifs and have been repeatedly hypothesized to play an important role in guiding folding trajectories, were usually excluded. RESULTS We generalize the BHG framework to include pseudoknotted RNA structures and systematically study the differences in predicted folding behavior depending on whether pseudoknotted structures are allowed to occur as folding intermediates or not. We observe that RNAs with pseudoknotted ground state structures tend to have more pseudoknotted folding intermediates than RNAs with pseudoknot-free ground state structures. The occurrence and influence of pseudoknotted intermediates on the folding pathway, however, appear to depend very strongly on the individual RNAs so that no general rule can be inferred. AVAILABILITY AND IMPLEMENTATION The algorithms described here are implemented in C++ as standalone programs. Its source code and Supplemental material can be freely downloaded from http://www.tbi.univie.ac.at/bhg.html. CONTACT qin@bioinf.uni-leipzig.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, Research Group BCB, Faculty of Computer Science, University of Vienna, Austria, RTH, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Institute for Theoretical Chemistry, RTH, University of Copenhagen, Frederiksberg, Denmark, Department of Computer Science & IZBI & iDiv & LIFE, Leipzig University, Max Planck Institute for Mathematics in the Sciences, Fraunhofer Institute IZI, Leipzig, Germany, Santa Fe Institute, Santa Fe, NM 87501, USA and
| | - Jing Qin
- Institute for Theoretical Chemistry, RTH, University of Copenhagen, Frederiksberg, Denmark, IMADA, University of Southern Denmark, Campusvej 55, Odense, Denmark
| |
Collapse
|
27
|
zu Siederdissen CH, Prohaska SJ, Stadler PF. Algebraic Dynamic Programming over general data structures. BMC Bioinformatics 2015; 16 Suppl 19:S2. [PMID: 26695390 PMCID: PMC4686793 DOI: 10.1186/1471-2105-16-s19-s2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Background Dynamic programming algorithms provide exact solutions to many problems in computational biology, such as sequence alignment, RNA folding, hidden Markov models (HMMs), and scoring of phylogenetic trees. Structurally analogous algorithms compute optimal solutions, evaluate score distributions, and perform stochastic sampling. This is explained in the theory of Algebraic Dynamic Programming (ADP) by a strict separation of state space traversal (usually represented by a context free grammar), scoring (encoded as an algebra), and choice rule. A key ingredient in this theory is the use of yield parsers that operate on the ordered input data structure, usually strings or ordered trees. The computation of ensemble properties, such as a posteriori probabilities of HMMs or partition functions in RNA folding, requires the combination of two distinct, but intimately related algorithms, known as the inside and the outside recursion. Only the inside recursions are covered by the classical ADP theory. Results The ideas of ADP are generalized to a much wider scope of data structures by relaxing the concept of parsing. This allows us to formalize the conceptual complementarity of inside and outside variables in a natural way. We demonstrate that outside recursions are generically derivable from inside decomposition schemes. In addition to rephrasing the well-known algorithms for HMMs, pairwise sequence alignment, and RNA folding we show how the TSP and the shortest Hamiltonian path problem can be implemented efficiently in the extended ADP framework. As a showcase application we investigate the ancient evolution of HOX gene clusters in terms of shortest Hamiltonian paths. Conclusions The generalized ADP framework presented here greatly facilitates the development and implementation of dynamic programming algorithms for a wide spectrum of applications.
Collapse
|
28
|
Huang FWD, Reidys CM. Shapes of topological RNA structures. Math Biosci 2015; 270:57-65. [PMID: 26482318 DOI: 10.1016/j.mbs.2015.10.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2014] [Revised: 09/30/2015] [Accepted: 10/01/2015] [Indexed: 11/18/2022]
Abstract
A topological RNA structure is derived by fattening the edges of a contact structure into ribbons. The shape of a topological RNA structure is obtained by collapsing the stacks of the structure into single arcs and by removing any arcs of length one, as well as isolated vertices. A shape contains the key topological information of the molecular conformation and for fixed topological genus there exist only finitely many such shapes. In this paper we compute the generating polynomial of shapes of fixed topological genus g. We furthermore derive an algorithm having O(glog g) time complexity uniformly generating shapes of genus g and discuss some applications in the context of databases of RNA pseudoknot structures.
Collapse
Affiliation(s)
- Fenix W D Huang
- Virginia Bioinformatics Institute, 1015 Life Sciences Circle, Blacksburg, VA, USA.
| | - Christian M Reidys
- Virginia Bioinformatics Institute, 1015 Life Sciences Circle, Blacksburg, VA, USA.
| |
Collapse
|
29
|
Höner Zu Siederdissen C, Hofacker IL, Stadler PF. Product Grammars for Alignment and Folding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:507-519. [PMID: 26357262 DOI: 10.1109/tcbb.2014.2326155] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We develop a theory of algebraic operations over linear and context-free grammars that makes it possible to combine simple "atomic" grammars operating on single sequences into complex, multi-dimensional grammars. We demonstrate the utility of this framework by constructing the search spaces of complex alignment problems on multiple input sequences explicitly as algebraic expressions of very simple one-dimensional grammars. In particular, we provide a fully worked frameshift-aware, semiglobal DNA-protein alignment algorithm whose grammar is composed of products of small, atomic grammars. The compiler accompanying our theory makes it easy to experiment with the combination of multiple grammars and different operations. Composite grammars can be written out in L(A)T(E)X for documentation and as a guide to implementation of dynamic programming algorithms. An embedding in Haskell as a domain-specific language makes the theory directly accessible to writing and using grammar products without the detour of an external compiler. Software and supplemental files available here: http://www.bioinf. uni-leipzig.de/Software/gramprod/.
Collapse
|
30
|
Song Y, Hua L, Shapiro BA, Wang JTL. Effective alignment of RNA pseudoknot structures using partition function posterior log-odds scores. BMC Bioinformatics 2015; 16:39. [PMID: 25727492 PMCID: PMC4339682 DOI: 10.1186/s12859-015-0464-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 01/13/2015] [Indexed: 11/18/2022] Open
Abstract
Background RNA pseudoknots play important roles in many biological processes. Previous methods for comparative pseudoknot analysis mainly focus on simultaneous folding and alignment of RNA sequences. Little work has been done to align two known RNA secondary structures with pseudoknots taking into account both sequence and structure information of the two RNAs. Results In this article we present a novel method for aligning two known RNA secondary structures with pseudoknots. We adopt the partition function methodology to calculate the posterior log-odds scores of the alignments between bases or base pairs of the two RNAs with a dynamic programming algorithm. The posterior log-odds scores are then used to calculate the expected accuracy of an alignment between the RNAs. The goal is to find an optimal alignment with the maximum expected accuracy. We present a heuristic to achieve this goal. The performance of our method is investigated and compared with existing tools for RNA structure alignment. An extension of the method to multiple alignment of pseudoknot structures is also discussed. Conclusions The method described here has been implemented in a tool named RKalign, which is freely accessible on the Internet. As more and more pseudoknots are revealed, collected and stored in public databases, we anticipate a tool like RKalign will play a significant role in data comparison, annotation, analysis, and retrieval in these databases. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0464-9) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Fu BMM, Han HSW, Reidys CM. On RNA-RNA interaction structures of fixed topological genus. Math Biosci 2015; 262:88-104. [PMID: 25640867 DOI: 10.1016/j.mbs.2014.12.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 12/03/2014] [Accepted: 12/17/2014] [Indexed: 11/29/2022]
Abstract
Interacting RNA complexes are studied via bicellular maps using a filtration via their topological genus. Our main result is a new bijection for RNA-RNA interaction structures and a linear time uniform sampling algorithm for RNA complexes of fixed topological genus. The bijection allows to either reduce the topological genus of a bicellular map directly, or to lose connectivity by decomposing the complex into a pair of single stranded RNA structures. Our main result is proved bijectively. It provides an explicit algorithm of how to rewire the corresponding complexes and an unambiguous decomposition grammar. Using the concept of genus induction, we construct bicellular maps of fixed topological genus g uniformly in linear time. We present various statistics on these topological RNA complexes and compare our findings with biological complexes. Furthermore we show how to construct loop-energy based complexes using our decomposition grammar.
Collapse
Affiliation(s)
- Benjamin M M Fu
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark.
| | - Hillary S W Han
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark.
| | - Christian M Reidys
- Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark.
| |
Collapse
|
32
|
Harrison R, Li Y, Măndoiu I. Predicting RNA Secondary Structures: One-grammar-fits-all Solution. BIOINFORMATICS RESEARCH AND APPLICATIONS 2015. [PMCID: PMC7121278 DOI: 10.1007/978-3-319-19048-8_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
RNA secondary structures are known to be important in many biological processes. Many available programs have been developed for RNA secondary structure prediction. Based on our knowledge, however, there still exist secondary structures of known RNA sequences which cannot be covered by these algorithms. In this paper, we provide an efficient algorithm that can handle all RNA secondary structures found in Rfam database. We designed a new stochastic context-free grammar named Rectangle Tree Grammar (RTG) which significantly expands the classes of structures that can be modelled. Our algorithm runs in O(n6) time and the accuracy is reasonably high, with average PPV and sensitivity over 75%. In addition, the structures that RTG predicts are very similar to the real ones.
Collapse
Affiliation(s)
| | | | - Ion Măndoiu
- University of Connecticut, Storrs, Connecticut USA
| |
Collapse
|
33
|
Hydrogen bond rotations as a uniform structural tool for analyzing protein architecture. Nat Commun 2014; 5:5803. [DOI: 10.1038/ncomms6803] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2014] [Accepted: 11/07/2014] [Indexed: 11/09/2022] Open
|
34
|
Qin J, Fricke M, Marz M, Stadler PF, Backofen R. Graph-distance distribution of the Boltzmann ensemble of RNA secondary structures. Algorithms Mol Biol 2014; 9:19. [PMID: 25285153 PMCID: PMC4181469 DOI: 10.1186/1748-7188-9-19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Accepted: 06/30/2014] [Indexed: 12/02/2022] Open
Abstract
Background Large RNA molecules are often composed of multiple functional domains whose spatial arrangement strongly influences their function. Pre-mRNA splicing, for instance, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium harbors useful information on the shape of the molecule that in turn can give insights into the interplay of its functional domains. Result Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between a fixed pair of nucleotides can be computed in polynomial time by means of dynamic programming. While a naïve implementation would yield recursions with a very high time complexity of O(n6D5) for sequence length n and D distinct distance values, it is possible to reduce this to O(n4) for practical applications in which predominantly small distances are of of interest. Further reductions, however, seem to be difficult. Therefore, we introduced sampling approaches that are much easier to implement. They are also theoretically favorable for several real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules. Conclusions The graph-distance distribution can be computed using a dynamic programming approach. Although a crude approximation of reality, our initial results indicate that the graph-distance can be related to the smFRET data. The additional file and the software of our paper are available from http://www.rna.uni-jena.de/RNAgraphdist.html.
Collapse
|
35
|
Abstract
Shapes of interacting RNA complexes are studied using a filtration via their topological genus. A shape of an RNA complex is obtained by (iteratively) collapsing stacks and eliminating hairpin loops. This shape projection preserves the topological core of the RNA complex, and for fixed topological genus there are only finitely many such shapes. Our main result is a new bijection that relates the shapes of RNA complexes with shapes of RNA structures. This allows for computing the shape polynomial of RNA complexes via the shape polynomial of RNA structures. We furthermore present a linear time uniform sampling algorithm for shapes of RNA complexes of fixed topological genus.
Collapse
Affiliation(s)
- Benjamin M M Fu
- Department of Mathematics and Computer Science, University of Southern Denmark , Odense M, Denmark
| | | |
Collapse
|
36
|
Abstract
In this article we study canonical γ-structures, a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A γ-structure is composed of specific building blocks that have topological genus less than or equal to γ, where composition means concatenation and nesting of such blocks. Our main result is the derivation of the generating function of γ-structures via symbolic enumeration using so called irreducible shadows. We furthermore recursively compute the generating polynomials of irreducible shadows of genus ≤ γ. The γ-structures are constructed via γ-matchings. For 1 ≤ γ ≤ 10, we compute Puiseux expansions at the unique, dominant singularities, allowing us to derive simple asymptotic formulas for the number of γ-structures.
Collapse
Affiliation(s)
- Hillary S W Han
- Department of Mathematics and Computer Science, University of Southern Denmark , Odense, Denmark
| | | | | |
Collapse
|
37
|
Combinatorial Insights into RNA Secondary Structure. DISCRETE AND TOPOLOGICAL MODELS IN MOLECULAR BIOLOGY 2014. [DOI: 10.1007/978-3-642-40193-0_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
38
|
Koehl P. Mathematics's role in the grand challenge of deciphering the molecular basis of life. Front Mol Biosci 2014; 1:2. [PMID: 25988143 PMCID: PMC4428350 DOI: 10.3389/fmolb.2014.00002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 03/19/2014] [Indexed: 11/13/2022] Open
Affiliation(s)
- Patrice Koehl
- Department of Computer Science and Genome Center, University of California at Davis Davis, CA, USA
| |
Collapse
|
39
|
Abstract
Many methods have been proposed for RNA secondary structure comparison, and new ones are still being developed. In this chapter, we first consider structure representations and discuss their suitability for structure comparison. Then, we take a look at the more commonly used methods, restricting ourselves to structures without pseudo-knots. For comparing structures of the same sequence, we study base pair distances. For structures of different sequences (and of different length), we study variants of the tree edit model. We name some of the available tools and give pointers to the literature. We end with a short review on comparing structures with pseudo-knots as an unsolved problem and topic of active research.
Collapse
|
40
|
Generation of RNA pseudoknot structures with topological genus filtration. Math Biosci 2013; 245:216-25. [DOI: 10.1016/j.mbs.2013.07.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Revised: 06/11/2013] [Accepted: 07/12/2013] [Indexed: 11/22/2022]
|
41
|
Abstract
Recently a folding algorithm of topological RNA pseudoknot structures was presented in Reidys et al. (2011). This algorithm folds single-stranded γ-structures, that is, RNA structures composed by distinct motifs of bounded topological genus. In this article, we set the theoretical foundations for the folding of the two backbone analogues of γ structures: the RNA γ-interaction structures. These are RNA-RNA interaction structures that are constructed by a finite number of building blocks over two backbones having genus at most γ. Combinatorial properties of γ-interaction structures are of practical interest since they have direct implications for the folding of topological interaction structures. We compute the generating function of γ-interaction structures and show that it is algebraic, which implies that the numbers of interaction structures can be computed recursively. We obtain simple asymptotic formulas for 0- and 1-interaction structures. The simplest class of interaction structures are the 0-interaction structures, which represent the two backbone analogues of secondary structures.
Collapse
Affiliation(s)
- Jing Qin
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
| | | |
Collapse
|
42
|
Abstract
In the present article, we review a derivation of the numbers of RNA complexes of an arbitrary topology. These numbers are encoded in the free energy of the Hermitian matrix model with potential V(x)=x2/2−stx/(1−tx), where s and t are respective generating parameters for the number of RNA molecules and hydrogen bonds in a given complex. The free energies of this matrix model are computed using the so-called topological recursion, which is a powerful new formalism arising from random matrix theory. These numbers of RNA complexes also have profound meaning in mathematics: they provide the number of chord diagrams of fixed genus with specified numbers of backbones and chords as well as the number of cells in Riemann's moduli spaces for bordered surfaces of fixed topological type.
Collapse
|
43
|
Nebel ME, Weinberg F. Algebraic and combinatorial properties of common RNA pseudoknot classes with applications. J Comput Biol 2013; 19:1134-50. [PMID: 23057823 DOI: 10.1089/cmb.2011.0094] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Predicting RNA structures with pseudoknots in general is an NP-complete problem. Accordingly, several authors have suggested subclasses that provide polynomial time prediction algorithms by allowing (respectively, disallowing) certain structural motives. In this article, we introduce a unifying algebraic view on most of these classes. That way it becomes possible to find linear time recognition algorithms that decide whether or not a given structure is member of a class (we offer these algorithms as a web service to the scientific community). Furthermore, by presenting a general translation scheme of our algebraic descriptions into multiple context-free grammars, and proving a new correspondence of multiple context-free grammars and generating functions, it becomes possible to derive the precise asymptotic size of all the classes, solving some open problems such as enumerating the Rivas & Eddy class of pseudoknots.
Collapse
Affiliation(s)
- Markus E Nebel
- Computer Science Department, University of Kaiserslautern, Gottlieb Daimler Str. 48, Kaiserslautern 67663, Germany.
| | | |
Collapse
|
44
|
Darling A, Stoye J. Distribution of Graph-Distances in Boltzmann Ensembles of RNA Secondary Structures. LECTURE NOTES IN COMPUTER SCIENCE 2013. [PMCID: PMC7114971 DOI: 10.1007/978-3-642-40453-5_10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Large RNA molecules often carry multiple functional domains whose spatial arrangement is an important determinant of their function. Pre-mRNA splicing, furthermore, relies on the spatial proximity of the splice junctions that can be separated by very long introns. Similar effects appear in the processing of RNA virus genomes. Albeit a crude measure, the distribution of spatial distances in thermodynamic equilibrium therefore provides useful information on the overall shape of the molecule can provide insights into the interplay of its functional domains. Spatial distance can be approximated by the graph-distance in RNA secondary structure. We show here that the equilibrium distribution of graph-distances between arbitrary nucleotides can be computed in polynomial time by means of dynamic programming. A naive implementation would yield recursions with a very high time complexity of O(n11). Although we were able to reduce this to O(n6) for many practical applications a further reduction seems difficult. We conclude, therefore, that sampling approaches, which are much easier to implement, are also theoretically favorable for most real-life applications, in particular since these primarily concern long-range interactions in very large RNA molecules.
Collapse
Affiliation(s)
- Aaron Darling
- ithree institute,, University of Technology Sydney, 2007 Ultimo, NSW Australia
| | - Jens Stoye
- Faculty of Technology, Bielefeld University, Universitätsstraße 25, 33615 Bielefeld, Germany
| |
Collapse
|
45
|
Bon M, Micheletti C, Orland H. McGenus: a Monte Carlo algorithm to predict RNA secondary structures with pseudoknots. Nucleic Acids Res 2012; 41:1895-900. [PMID: 23248008 PMCID: PMC3561945 DOI: 10.1093/nar/gks1204] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
We present McGenus, an algorithm to predict RNA secondary structures with pseudoknots. The method is based on a classification of RNA structures according to their topological genus. McGenus can treat sequences of up to 1000 bases and performs an advanced stochastic search of their minimum free energy structure allowing for non-trivial pseudoknot topologies. Specifically, McGenus uses a Monte Carlo algorithm with replica exchange for minimizing a general scoring function which includes not only free energy contributions for pair stacking, loop penalties, etc. but also a phenomenological penalty for the genus of the pairing graph. The good performance of the stochastic search strategy was successfully validated against TT2NE which uses the same free energy parametrization and performs exhaustive or partially exhaustive structure search, albeit for much shorter sequences (up to 200 bases). Next, the method was applied to other RNA sets, including an extensive tmRNA database, yielding results that are competitive with existing algorithms. Finally, it is shown that McGenus highlights possible limitations in the free energy scoring function. The algorithm is available as a web server at http://ipht.cea.fr/rna/mcgenus.php.
Collapse
Affiliation(s)
- Michaël Bon
- Institut de Physique Théorique, CEA Saclay, CNRS URA 2306, 91191 Gif-sur-Yvette, France
| | | | | |
Collapse
|
46
|
Huang FWD, Reidys CM. On the combinatorics of sparsification. Algorithms Mol Biol 2012; 7:28. [PMID: 23088372 PMCID: PMC3549849 DOI: 10.1186/1748-7188-7-28] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2011] [Accepted: 10/11/2012] [Indexed: 12/30/2022] Open
Abstract
UNLABELLED BACKGROUND We study the sparsification of dynamic programming based on folding algorithms of RNA structures. Sparsification is a method that improves significantly the computation of minimum free energy (mfe) RNA structures. RESULTS We provide a quantitative analysis of the sparsification of a particular decomposition rule, Λ∗. This rule splits an interval of RNA secondary and pseudoknot structures of fixed topological genus. Key for quantifying sparsifications is the size of the so called candidate sets. Here we assume mfe-structures to be specifically distributed (see Assumption 1) within arbitrary and irreducible RNA secondary and pseudoknot structures of fixed topological genus. We then present a combinatorial framework which allows by means of probabilities of irreducible sub-structures to obtain the expectation of the Λ∗-candidate set w.r.t. a uniformly random input sequence. We compute these expectations for arc-based energy models via energy-filtered generating functions (GF) in case of RNA secondary structures as well as RNA pseudoknot structures. Furthermore, for RNA secondary structures we also analyze a simplified loop-based energy model. Our combinatorial analysis is then compared to the expected number of Λ∗-candidates obtained from the folding mfe-structures. In case of the mfe-folding of RNA secondary structures with a simplified loop-based energy model our results imply that sparsification provides a significant, constant improvement of 91% (theory) to be compared to an 96% (experimental, simplified arc-based model) reduction. However, we do not observe a linear factor improvement. Finally, in case of the "full" loop-energy model we can report a reduction of 98% (experiment). CONCLUSIONS Sparsification was initially attributed a linear factor improvement. This conclusion was based on the so called polymer-zeta property, which stems from interpreting polymer chains as self-avoiding walks. Subsequent findings however reveal that the O(n) improvement is not correct. The combinatorial analysis presented here shows that, assuming a specific distribution (see Assumption 1), of mfe-structures within irreducible and arbitrary structures, the expected number of Λ∗-candidates is Θ(n2). However, the constant reduction is quite significant, being in the range of 96%. We furthermore show an analogous result for the sparsification of the Λ∗-decomposition rule for RNA pseudoknotted structures of genus one. Finally we observe that the effect of sparsification is sensitive to the employed energy model.
Collapse
|
47
|
Topological classification and enumeration of RNA structures by genus. J Math Biol 2012; 67:1261-78. [PMID: 23053535 DOI: 10.1007/s00285-012-0594-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Revised: 06/27/2012] [Indexed: 10/27/2022]
Abstract
To an RNA pseudoknot structure is naturally associated a topological surface, which has its associated genus, and structures can thus be classified by the genus. Based on earlier work of Harer-Zagier, we compute the generating function Dg,σ (z) = ∑n dg,σ (n)zn for the number dg,σ (n) of those structures of fixed genus g and minimum stack size σ with n nucleotides so that no two consecutive nucleotides are basepaired and show that Dg,σ (z) is algebraic. In particular, we prove that dg,2(n) ∼ kg n3(g−1/2 )γ n2, where γ2 ≈ 1.9685. Thus, for stack size at least two, the genus only enters through the sub-exponential factor, and the slow growth rate compared to the number of RNA molecules implies the existence of neutral networks of distinct molecules with the same structure of any genus. Certain RNA structures called shapes are shown to be in natural one-to-one correspondence with the cells in the Penner-Strebel decomposition of Riemann's moduli space of a surface of genus g with one boundary component, thus providing a link between RNA enumerative problems and the geometry of Riemann's moduli space.
Collapse
|
48
|
The topological filtration of γ-structures. Math Biosci 2012; 241:24-33. [PMID: 23022027 DOI: 10.1016/j.mbs.2012.09.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2012] [Revised: 09/14/2012] [Accepted: 09/15/2012] [Indexed: 11/23/2022]
Abstract
In this paper we study γ-structures filtered by topological genus. γ-structures are a class of RNA pseudoknot structures that plays a key role in the context of polynomial time folding of RNA pseudoknot structures. A γ-structure is composed by specific building blocks, that have topological genus less than or equal to γ, where composition means concatenation and nesting of such blocks. Our main results are the derivation of a new bivariate generating function for γ-structures via symbolic methods, the singularity analysis of the solutions and a central limit theorem for the distribution of topological genus in γ-structures of given length. In our derivation specific bivariate polynomials play a central role. Their coefficients count particular motifs of fixed topological genus and they are of relevance in the context of genus recursion and novel folding algorithms.
Collapse
|
49
|
Chiu JKH, Chen YPP. Conformational features of topologically classified RNA secondary structures. PLoS One 2012; 7:e39907. [PMID: 22792195 PMCID: PMC3390330 DOI: 10.1371/journal.pone.0039907] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Accepted: 05/29/2012] [Indexed: 11/18/2022] Open
Abstract
Background Current RNA secondary structure prediction approaches predict prevalent pseudoknots such as the H-pseudoknot and kissing hairpin. The number of possible structures increases drastically when more complex pseudoknots are considered, thus leading to computational limitations. On the other hand, the enormous population of possible structures means not all of them appear in real RNA molecules. Therefore, it is of interest to understand how many of them really exist and the reasons for their preferred existence over the others, as any new findings revealed by this study might enhance the capability of future structure prediction algorithms for more accurate prediction of complex pseudoknots. Methodology/Principal Findings A novel algorithm was devised to estimate the exact number of structural possibilities for a pseudoknot constructed with a specified number of base pair stems. Then, topological classification was applied to classify RNA pseudoknotted structures from data in the RNA STRAND database. By showing the vast possibilities and the real population, it is clear that most of these plausible complex pseudoknots are not observed. Moreover, from these classified motifs that exist in nature, some features were identified for further investigation. It was found that some features are related to helical stacking. Other features are still left open to discover underlying tertiary interactions. Conclusions Results from topological classification suggest that complex pseudoknots are usually some well-known motifs that are themselves complex or the interaction results of some special motifs. Heuristics can be proposed to predict the essential parts of these complex motifs, even if the required thermodynamic parameters are currently unknown.
Collapse
Affiliation(s)
- Jimmy Ka Ho Chiu
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Victoria, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Victoria, Australia
- * E-mail:
| |
Collapse
|
50
|
Andersen JE, Huang FW, Penner RC, Reidys CM. Topology of RNA-RNA Interaction Structures. J Comput Biol 2012; 19:928-43. [DOI: 10.1089/cmb.2011.0308] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Affiliation(s)
- Jørgen E. Andersen
- Center for Quantum Geometry of Moduli Spaces, Aarhus University, Århus, Denmark
| | - Fenix W.D. Huang
- Institut for Matematik og Datalogi, University of Southern Denmark, Odense, Denmark
| | - Robert C. Penner
- Center for Quantum Geometry of Moduli Spaces, Aarhus University, Århus, Denmark
- Math and Physics Departments, California Institute of Technology, Pasadena, California
| | - Christian M. Reidys
- Institut for Matematik og Datalogi, University of Southern Denmark, Odense, Denmark
| |
Collapse
|