1
|
Kirkpatrick A, Patton K, Tetali P, Mitchell C. Markov Chain-Based Sampling for Exploring RNA Secondary Structure under the Nearest Neighbor Thermodynamic Model and Extended Applications. MATHEMATICAL AND COMPUTATIONAL APPLICATIONS 2020; 25. [PMID: 35924027 PMCID: PMC9344895 DOI: 10.3390/mca25040067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Ribonucleic acid (RNA) secondary structures and branching properties are important for determining functional ramifications in biology. While energy minimization of the Nearest Neighbor Thermodynamic Model (NNTM) is commonly used to identify such properties (number of hairpins, maximum ladder distance, etc.), it is difficult to know whether the resultant values fall within expected dispersion thresholds for a given energy function. The goal of this study was to construct a Markov chain capable of examining the dispersion of RNA secondary structures and branching properties obtained from NNTM energy function minimization independent of a specific nucleotide sequence. Plane trees are studied as a model for RNA secondary structure, with energy assigned to each tree based on the NNTM, and a corresponding Gibbs distribution is defined on the trees. Through a bijection between plane trees and 2-Motzkin paths, a Markov chain converging to the Gibbs distribution is constructed, and fast mixing time is established by estimating the spectral gap of the chain. The spectral gap estimate is obtained through a series of decompositions of the chain and also by building on known mixing time results for other chains on Dyck paths. The resulting algorithm can be used as a tool for exploring the branching structure of RNA, especially for long sequences, and to examine branching structure dependence on energy model parameters. Full exposition is provided for the mathematical techniques used with the expectation that these techniques will prove useful in bioinformatics, computational biology, and additional extended applications.
Collapse
Affiliation(s)
- Anna Kirkpatrick
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Kalen Patton
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA
- School of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Prasad Tetali
- School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA
- School of Computer Science, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Cassie Mitchell
- Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
- Correspondence:
| |
Collapse
|
2
|
Baba N, Elmetwaly S, Kim N, Schlick T. Predicting Large RNA-Like Topologies by a Knowledge-Based Clustering Approach. J Mol Biol 2015; 428:811-821. [PMID: 26478223 DOI: 10.1016/j.jmb.2015.10.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 10/06/2015] [Indexed: 11/19/2022]
Abstract
An analysis and expansion of our resource for classifying, predicting, and designing RNA structures, RAG (RNA-As-Graphs), is presented, with the goal of understanding features of RNA-like and non-RNA-like motifs and exploiting this information for RNA design. RAG was first reported in 2004 for cataloging RNA secondary structure motifs using graph representations. In 2011, the RAG resource was updated with the increased availability of RNA structures and was improved by utilities for analyzing RNA structures, including substructuring and search tools. We also classified RNA structures as graphs up to 10 vertices (~200 nucleotides) into three classes: existing, RNA-like, and non-RNA-like using clustering approaches. Here, we focus on the tree graphs and evaluate the newly founded RNAs since 2011, which also support our refined predictions of RNA-like motifs. We expand the RAG resource for large tree graphs up to 13 vertices (~260 nucleotides), thereby cataloging more than 10 times as many secondary structures. We apply clustering algorithms based on features of RNA secondary structures translated from known tertiary structures to suggest which hypothetical large RNA motifs can be considered "RNA-like". The results by the PAM (Partitioning Around Medoids) approach, in particular, reveal good accuracy, with small error for the largest cases. The RAG update here up to 13 vertices offers a useful graph-based tool for exploring RNA motifs and suggesting large RNA motifs for design.
Collapse
Affiliation(s)
- Naoto Baba
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA; Department of Chemistry, Nagoya University, Furo-cho, Chikusa-ku, Nagoya, Aichi 464-8601, Japan
| | - Shereef Elmetwaly
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | - Namhee Kim
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
| | - Tamar Schlick
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA; NYU-ECNU Center for Computational Chemistry at NYU Shanghai, 3663 Zhongshan Road North, Shanghai, 200062, China.
| |
Collapse
|
3
|
RNA graph partitioning for the discovery of RNA modularity: a novel application of graph partition algorithm to biology. PLoS One 2014; 9:e106074. [PMID: 25188578 PMCID: PMC4154854 DOI: 10.1371/journal.pone.0106074] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2014] [Accepted: 07/31/2014] [Indexed: 11/19/2022] Open
Abstract
Graph representations have been widely used to analyze and design various economic, social, military, political, and biological networks. In systems biology, networks of cells and organs are useful for understanding disease and medical treatments and, in structural biology, structures of molecules can be described, including RNA structures. In our RNA-As-Graphs (RAG) framework, we represent RNA structures as tree graphs by translating unpaired regions into vertices and helices into edges. Here we explore the modularity of RNA structures by applying graph partitioning known in graph theory to divide an RNA graph into subgraphs. To our knowledge, this is the first application of graph partitioning to biology, and the results suggest a systematic approach for modular design in general. The graph partitioning algorithms utilize mathematical properties of the Laplacian eigenvector (µ2) corresponding to the second eigenvalues (λ2) associated with the topology matrix defining the graph: λ2 describes the overall topology, and the sum of µ2's components is zero. The three types of algorithms, termed median, sign, and gap cuts, divide a graph by determining nodes of cut by median, zero, and largest gap of µ2's components, respectively. We apply these algorithms to 45 graphs corresponding to all solved RNA structures up through 11 vertices (∼ 220 nucleotides). While we observe that the median cut divides a graph into two similar-sized subgraphs, the sign and gap cuts partition a graph into two topologically-distinct subgraphs. We find that the gap cut produces the best biologically-relevant partitioning for RNA because it divides RNAs at less stable connections while maintaining junctions intact. The iterative gap cuts suggest basic modules and assembly protocols to design large RNA structures. Our graph substructuring thus suggests a systematic approach to explore the modularity of biological networks. In our applications to RNA structures, subgraphs also suggest design strategies for novel RNA motifs.
Collapse
|
4
|
Gopal A, Egecioglu DE, Yoffe AM, Ben-Shaul A, Rao ALN, Knobler CM, Gelbart WM. Viral RNAs are unusually compact. PLoS One 2014; 9:e105875. [PMID: 25188030 PMCID: PMC4154850 DOI: 10.1371/journal.pone.0105875] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2014] [Accepted: 07/21/2014] [Indexed: 01/28/2023] Open
Abstract
A majority of viruses are composed of long single-stranded genomic RNA molecules encapsulated by protein shells with diameters of just a few tens of nanometers. We examine the extent to which these viral RNAs have evolved to be physically compact molecules to facilitate encapsulation. Measurements of equal-length viral, non-viral, coding and non-coding RNAs show viral RNAs to have among the smallest sizes in solution, i.e., the highest gel-electrophoretic mobilities and the smallest hydrodynamic radii. Using graph-theoretical analyses we demonstrate that their sizes correlate with the compactness of branching patterns in predicted secondary structure ensembles. The density of branching is determined by the number and relative positions of 3-helix junctions, and is highly sensitive to the presence of rare higher-order junctions with 4 or more helices. Compact branching arises from a preponderance of base pairing between nucleotides close to each other in the primary sequence. The density of branching represents a degree of freedom optimized by viral RNA genomes in response to the evolutionary pressure to be packaged reliably. Several families of viruses are analyzed to delineate the effects of capsid geometry, size and charge stabilization on the selective pressure for RNA compactness. Compact branching has important implications for RNA folding and viral assembly.
Collapse
Affiliation(s)
- Ajaykumar Gopal
- Department of Chemistry & Biochemistry, University of California Los Angeles, Los Angeles, California, United States of America
| | - Defne E. Egecioglu
- Department of Chemistry & Biochemistry, University of California Los Angeles, Los Angeles, California, United States of America
| | - Aron M. Yoffe
- Department of Chemistry & Biochemistry, University of California Los Angeles, Los Angeles, California, United States of America
| | - Avinoam Ben-Shaul
- Institute of Chemistry & The Fritz Haber Research Center, The Hebrew University of Jerusalem, Givat Ram, Jerusalem, Israel
| | - Ayala L. N. Rao
- Department of Plant Pathology, University of California Riverside, Riverside, California, United States of America
| | - Charles M. Knobler
- Department of Chemistry & Biochemistry, University of California Los Angeles, Los Angeles, California, United States of America
| | - William M. Gelbart
- Department of Chemistry & Biochemistry, University of California Los Angeles, Los Angeles, California, United States of America
- * E-mail:
| |
Collapse
|
5
|
Combinatorial Insights into RNA Secondary Structure. DISCRETE AND TOPOLOGICAL MODELS IN MOLECULAR BIOLOGY 2014. [DOI: 10.1007/978-3-642-40193-0_7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
6
|
Izzo JA, Kim N, Elmetwaly S, Schlick T. RAG: an update to the RNA-As-Graphs resource. BMC Bioinformatics 2011; 12:219. [PMID: 21627789 PMCID: PMC3123240 DOI: 10.1186/1471-2105-12-219] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 05/31/2011] [Indexed: 02/08/2023] Open
Abstract
Background In 2004, we presented a web resource for stimulating the search for novel RNAs, RNA-As-Graphs (RAG), which classified, catalogued, and predicted RNA secondary structure motifs using clustering and build-up approaches. With the increased availability of secondary structures in recent years, we update the RAG resource and provide various improvements for analyzing RNA structures. Description Our RAG update includes a new supervised clustering algorithm that can suggest RNA motifs that may be "RNA-like". We use this utility to describe RNA motifs as three classes: existing, RNA-like, and non-RNA-like. This produces 126 tree and 16,658 dual graphs as candidate RNA-like topologies using the supervised clustering algorithm with existing RNAs serving as the training data. A comparison of this clustering approach to an earlier method shows considerable improvements. Additional RAG features include greatly expanded search capabilities, an interface to better utilize the benefits of relational database, and improvements to several of the utilities such as directed/labeled graphs and a subgraph search program. Conclusions The RAG updates presented here augment the database's intended function - stimulating the search for novel RNA functionality - by classifying available motifs, suggesting new motifs for design, and allowing for more specific searches for specific topologies. The updated RAG web resource offers users a graph-based tool for exploring available RNA motifs and suggesting new RNAs for design.
Collapse
Affiliation(s)
- Joseph A Izzo
- Department of Chemistry, New York University, New York, NY 10003, USA
| | | | | | | |
Collapse
|
7
|
Laubenbacher R. Algebraic Methods in Mathematical Biology. Bull Math Biol 2011; 73:701-5. [DOI: 10.1007/s11538-011-9643-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
8
|
Hower V, Heitsch CE. Parametric Analysis of RNA Branching Configurations. Bull Math Biol 2011; 73:754-76. [DOI: 10.1007/s11538-010-9607-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2009] [Accepted: 11/04/2010] [Indexed: 01/30/2023]
|
9
|
Abstract
We consider large random trees under Gibbs distributions and prove a Large Deviation Principle (LDP) for the distribution of degrees of vertices of the tree. The LDP rate function is given explicitly. An immediate consequence is a Law of Large Numbers for the distribution of vertex degrees in a large random tree. Our motivation for this study comes from the analysis of RNA secondary structures.
Collapse
|