1
|
Jain S, Bayrak CS, Petingi L, Schlick T. Dual Graph Partitioning Highlights a Small Group of Pseudoknot-Containing RNA Submotifs. Genes (Basel) 2018; 9:E371. [PMID: 30044451 PMCID: PMC6115904 DOI: 10.3390/genes9080371] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/26/2018] [Accepted: 06/26/2018] [Indexed: 12/31/2022] Open
Abstract
RNA molecules are composed of modular architectural units that define their unique structural and functional properties. Characterization of these building blocks can help interpret RNA structure/function relationships. We present an RNA secondary structure motif and submotif library using dual graph representation and partitioning. Dual graphs represent RNA helices as vertices and loops as edges. Unlike tree graphs, dual graphs can represent RNA pseudoknots (intertwined base pairs). For a representative set of RNA structures, we construct dual graphs from their secondary structures, and apply our partitioning algorithm to identify non-separable subgraphs (or blocks) without breaking pseudoknots. We report 56 subgraph blocks up to nine vertices; among them, 22 are frequently occurring, 15 of which contain pseudoknots. We then catalog atomic fragments corresponding to the subgraph blocks to define a library of building blocks that can be used for RNA design, which we call RAG-3Dual, as we have done for tree graphs. As an application, we analyze the distribution of these subgraph blocks within ribosomal RNAs of various prokaryotic and eukaryotic species to identify common subgraphs and possible ancestry relationships. Other applications of dual graph partitioning and motif library can be envisioned for RNA structure analysis and design.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, New York, NY 10003, USA.
| | - Cigdem S Bayrak
- Department of Chemistry, New York University, New York, NY 10003, USA.
| | - Louis Petingi
- Computer Science Department, College of Staten Island, City University of New York, Staten Island, New York, NY 10314, USA.
| | - Tamar Schlick
- Department of Chemistry, New York University, New York, NY 10003, USA.
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA.
- NYU-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 3663, China.
| |
Collapse
|
2
|
Accurate Classification of RNA Structures Using Topological Fingerprints. PLoS One 2016; 11:e0164726. [PMID: 27755571 PMCID: PMC5068708 DOI: 10.1371/journal.pone.0164726] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 09/29/2016] [Indexed: 12/26/2022] Open
Abstract
While RNAs are well known to possess complex structures, functionally similar RNAs often have little sequence similarity. While the exact size and spacing of base-paired regions vary, functionally similar RNAs have pronounced similarity in the arrangement, or topology, of base-paired stems. Furthermore, predicted RNA structures often lack pseudoknots (a crucial aspect of biological activity), and are only partially correct, or incomplete. A topological approach addresses all of these difficulties. In this work we describe each RNA structure as a graph that can be converted to a topological spectrum (RNA fingerprint). The set of subgraphs in an RNA structure, its RNA fingerprint, can be compared with the fingerprints of other RNA structures to identify and correctly classify functionally related RNAs. Topologically similar RNAs can be identified even when a large fraction, up to 30%, of the stems are omitted, indicating that highly accurate structures are not necessary. We investigate the performance of the RNA fingerprint approach on a set of eight highly curated RNA families, with diverse sizes and functions, containing pseudoknots, and with little sequence similarity-an especially difficult test set. In spite of the difficult test set, the RNA fingerprint approach is very successful (ROC AUC > 0.95). Due to the inclusion of pseudoknots, the RNA fingerprint approach both covers a wider range of possible structures than methods based only on secondary structure, and its tolerance for incomplete structures suggests that it can be applied even to predicted structures. Source code is freely available at https://github.rcac.purdue.edu/mgribsko/XIOS_RNA_fingerprint.
Collapse
|
3
|
Churkin A, Gabdank I, Barash D. On topological indices for small RNA graphs. Comput Biol Chem 2012; 41:35-40. [PMID: 23147564 DOI: 10.1016/j.compbiolchem.2012.10.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2012] [Revised: 10/11/2012] [Accepted: 10/12/2012] [Indexed: 11/29/2022]
Abstract
The secondary structure of RNAs can be represented by graphs at various resolutions. While it was shown that RNA secondary structures can be represented by coarse grain tree-graphs and meaningful topological indices can be used to distinguish between various structures, small RNAs are needed to be represented by full graphs. No meaningful topological index has yet been suggested for the analysis of such type of RNA graphs. Recalling that the second eigenvalue of the Laplacian matrix can be used to track topological changes in the case of coarse grain tree-graphs, it is plausible to assume that a topological index such as the Wiener index that represents all Laplacian eigenvalues may provide a similar guide for full graphs. However, by its original definition, the Wiener index was defined for acyclic graphs. Nevertheless, similarly to cyclic chemical graphs, small RNA graphs can be analyzed using elementary cuts, which enables the calculation of topological indices for small RNAs in an intuitive way. We show how to calculate a structural descriptor that is suitable for cyclic graphs, the Szeged index, for small RNA graphs by elementary cuts. We discuss potential uses of such a procedure that considers all eigenvalues of the associated Laplacian matrices to quantify the topology of small RNA graphs.
Collapse
Affiliation(s)
- Alexander Churkin
- Department of Computer Science, Ben-Gurion University, 84105 Beer-Sheva, Israel
| | | | | |
Collapse
|
4
|
MiRNA recognition with the yasMiR system: the quest for further improvements. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2011. [PMID: 21431542 DOI: 10.1007/978-1-4419-7046-6_2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
The paper "Using Base Pairing Probabilities for MiRNA Recognition" by Daniel Pasailă, Irina Mohorianu, and Liviu Ciortuz, that has been published in Proceedings of the International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) 2008, IEEE Computer Society, pp. 519-525, has introduced a new SVM for microRNA identification, whose novelty is twofolded: first, many of its features incorporate the base-pairing probabilities provided by McCaskill's algorithm, and second the classification performance is improved using a certain similarity ("profile"-based) measure between the training and test microRNAs and a set of carefully chosen ("pivot") RNA sequences. Comparisons with some of the best existing SVMs for microRNA identification proved that our SVM obtains truly competitive results. Here we add several significant extensions to the work reported in Daniel Pasailă et al. Proceedings of the International (SYNASC) 2008, pp. 519-525: testing this classifier on a more recent version of miRBase (12.0), evaluating the effect of using probabilistic patterns instead of non-probabilistic ones, analysing the discriminative power of different categories of features we used, and automatically searching for good pivot RNA sequences, which are critical for classification in our approach.
Collapse
|
5
|
Izzo JA, Kim N, Elmetwaly S, Schlick T. RAG: an update to the RNA-As-Graphs resource. BMC Bioinformatics 2011; 12:219. [PMID: 21627789 PMCID: PMC3123240 DOI: 10.1186/1471-2105-12-219] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 05/31/2011] [Indexed: 02/08/2023] Open
Abstract
Background In 2004, we presented a web resource for stimulating the search for novel RNAs, RNA-As-Graphs (RAG), which classified, catalogued, and predicted RNA secondary structure motifs using clustering and build-up approaches. With the increased availability of secondary structures in recent years, we update the RAG resource and provide various improvements for analyzing RNA structures. Description Our RAG update includes a new supervised clustering algorithm that can suggest RNA motifs that may be "RNA-like". We use this utility to describe RNA motifs as three classes: existing, RNA-like, and non-RNA-like. This produces 126 tree and 16,658 dual graphs as candidate RNA-like topologies using the supervised clustering algorithm with existing RNAs serving as the training data. A comparison of this clustering approach to an earlier method shows considerable improvements. Additional RAG features include greatly expanded search capabilities, an interface to better utilize the benefits of relational database, and improvements to several of the utilities such as directed/labeled graphs and a subgraph search program. Conclusions The RAG updates presented here augment the database's intended function - stimulating the search for novel RNA functionality - by classifying available motifs, suggesting new motifs for design, and allowing for more specific searches for specific topologies. The updated RAG web resource offers users a graph-based tool for exploring available RNA motifs and suggesting new RNAs for design.
Collapse
Affiliation(s)
- Joseph A Izzo
- Department of Chemistry, New York University, New York, NY 10003, USA
| | | | | | | |
Collapse
|
6
|
Randić M, Zupan J, Balaban AT, Vikić-Topić D, Plavšić D. Graphical Representation of Proteins. Chem Rev 2010; 111:790-862. [PMID: 20939561 DOI: 10.1021/cr800198j] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Milan Randić
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| | - Jure Zupan
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| | - Alexandru T. Balaban
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| | - Dražen Vikić-Topić
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| | - Dejan Plavšić
- National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia; NMR Center, Ruđer Bošković Institute, P.O. Box 180, HR-10002 Zagreb, Croatia; and Texas A&M University at Galveston, Galveston, Texas 77553
| |
Collapse
|
7
|
Ortega-Broche SE, Marrero-Ponce Y, Díaz YE, Torrens F, Pérez-Giménez F. tomocomd-camps and protein bilinear indices - novel bio-macromolecular descriptors for protein research: I. Predicting protein stability effects of a complete set of alanine substitutions in the Arc repressor. FEBS J 2010; 277:3118-46. [DOI: 10.1111/j.1742-4658.2010.07711.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
8
|
Nucleotide's bilinear indices: novel bio-macromolecular descriptors for bioinformatics studies of nucleic acids. I. Prediction of paromomycin's affinity constant with HIV-1 Psi-RNA packaging region. J Theor Biol 2009; 259:229-41. [PMID: 19272394 DOI: 10.1016/j.jtbi.2009.02.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Revised: 02/24/2009] [Accepted: 02/25/2009] [Indexed: 02/03/2023]
Abstract
A new set of nucleotide-based bio-macromolecular descriptors are presented. This novel approach to bio-macromolecular design from a linear algebra point of view is relevant to nucleic acids quantitative structure-activity relationship (QSAR) studies. These bio-macromolecular indices are based on the calculus of bilinear maps on Re(n)[b(mk)(x (m),y (m)):Re(n) x Re(n)-->Re] in canonical basis. Nucleic acid's bilinear indices are calculated from kth power of non-stochastic and stochastic nucleotide's graph-theoretic electronic-contact matrices, M(m)(k) and (s)M(m)(k), respectively. That is to say, the kth non-stochastic and stochastic nucleic acid's bilinear indices are calculated using M(m)(k) and (s)M(m)(k) as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combinations of nucleotide-base properties as weightings (experimental molar absorption coefficient epsilon(260) at 260 nm and pH=7.0, first (Delta E(1)) and second (Delta E(2)) single excitation energies in eV, and first (f(1)) and second (f(2)) oscillator strength values (of the first singlet excitation energies) of the nucleotide DNA-RNA bases. As example of this approach, an interaction study of the antibiotic paromomycin with the packaging region of the HIV-1 Psi-RNA have been performed and it have been obtained several linear models in order to predict the interaction strength. The best linear model obtained by using non-stochastic bilinear indices explains about 91% of the variance of the experimental Log K (R=0.95 and s=0.08 x 10(-4)M(-1)) as long as the best stochastic bilinear indices-based equation account for 93% of the Log K variance (R=0.97 and s=0.07 x 10(-4)M(-1)). The leave-one-out (LOO) press statistics, evidenced high predictive ability of both models (q(2)=0.86 and s(cv)=0.09 x 10(-4)M(-1) for non-stochastic and q(2)=0.91 and s(cv)=0.08 x 10(-4)M(-1) for stochastic bilinear indices). The nucleic acid's bilinear indices-based models compared favorably with other nucleic acid's indices-based approaches reported nowadays. These models also permit the interpretation of the driving forces of the interaction process. In this sense, developed equations involve short-reaching (k<or=3), middle-reaching (4<k<9), and far-reaching (k=10 or greater) nucleotide's bilinear indices. This situation points to electronic and topologic nucleotide's backbone interactions control of the stability profile of paromomycin-RNA complexes. Consequently, the present approach represents a novel and rather promising way to theoretical-biology studies.
Collapse
|