1
|
Matthies MC, Krueger R, Torda AE, Ward M. Differentiable partition function calculation for RNA. Nucleic Acids Res 2024; 52:e14. [PMID: 38038257 PMCID: PMC10853804 DOI: 10.1093/nar/gkad1168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 10/24/2023] [Accepted: 11/28/2023] [Indexed: 12/02/2023] Open
Abstract
Ribonucleic acid (RNA) is an essential molecule in a wide range of biological functions. In 1990, McCaskill introduced a dynamic programming algorithm for computing the partition function of an RNA sequence. McCaskill's algorithm is widely used today for understanding the thermodynamic properties of RNA. In this work, we introduce a generalization of McCaskill's algorithm that is well-defined over continuous inputs. Crucially, this enables us to implement an end-to-end differentiable partition function calculation. The derivative can be computed with respect to the input, or to any other fixed values, such as the parameters of the energy model. This builds a bridge between RNA thermodynamics and the tools of differentiable programming including deep learning as it enables the partition function to be incorporated directly into any end-to-end differentiable pipeline. To demonstrate the effectiveness of our new approach, we tackle the inverse folding problem directly using gradient optimization. We find that using the gradient to optimize the sequence directly is sufficient to arrive at sequences with a high probability of folding into the desired structure. This indicates that the gradients we compute are meaningful.
Collapse
Affiliation(s)
- Marco C Matthies
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Ryan Krueger
- Department of Applied Mathematics, Harvard University, 29 Oxford St, Cambridge, MA 02138, USA
| | - Andrew E Torda
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Max Ward
- Department of Computer Science and Software Engineering, The University of Western Australia, 241, 35 Stirling Hwy, Crawley, WA 6009, Australia
| |
Collapse
|
2
|
Zhou T, Dai N, Li S, Ward M, Mathews DH, Huang L. RNA design via structure-aware multifrontier ensemble optimization. Bioinformatics 2023; 39:i563-i571. [PMID: 37387188 DOI: 10.1093/bioinformatics/btad252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION RNA design is the search for a sequence or set of sequences that will fold to desired structure, also known as the inverse problem of RNA folding. However, the sequences designed by existing algorithms often suffer from low ensemble stability, which worsens for long sequence design. Additionally, for many methods only a small number of sequences satisfying the MFE criterion can be found by each run of design. These drawbacks limit their use cases. RESULTS We propose an innovative optimization paradigm, SAMFEO, which optimizes ensemble objectives (equilibrium probability or ensemble defect) by iterative search and yields a very large number of successfully designed RNA sequences as byproducts. We develop a search method which leverages structure level and ensemble level information at different stages of the optimization: initialization, sampling, mutation, and updating. Our work, while being less complicated than others, is the first algorithm that is able to design thousands of RNA sequences for the puzzles from the Eterna100 benchmark. In addition, our algorithm solves the most Eterna100 puzzles among all the general optimization based methods in our study. The only baseline solving more puzzles than our work is dependent on handcrafted heuristics designed for a specific folding model. Surprisingly, our approach shows superiority on designing long sequences for structures adapted from the database of 16S Ribosomal RNAs. AVAILABILITY AND IMPLEMENTATION Our source code and data used in this article is available at https://github.com/shanry/SAMFEO.
Collapse
Affiliation(s)
- Tianshuo Zhou
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Ning Dai
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Sizhen Li
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Max Ward
- Department of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, United States
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
| | - Liang Huang
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| |
Collapse
|
3
|
Albers S, Beckert B, Matthies MC, Mandava CS, Schuster R, Seuring C, Riedner M, Sanyal S, Torda AE, Wilson DN, Ignatova Z. Repurposing tRNAs for nonsense suppression. Nat Commun 2021; 12:3850. [PMID: 34158503 PMCID: PMC8219837 DOI: 10.1038/s41467-021-24076-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 06/01/2021] [Indexed: 02/06/2023] Open
Abstract
Three stop codons (UAA, UAG and UGA) terminate protein synthesis and are almost exclusively recognized by release factors. Here, we design de novo transfer RNAs (tRNAs) that efficiently decode UGA stop codons in Escherichia coli. The tRNA designs harness various functionally conserved aspects of sense-codon decoding tRNAs. Optimization within the TΨC-stem to stabilize binding to the elongation factor, displays the most potent effect in enhancing suppression activity. We determine the structure of the ribosome in a complex with the designed tRNA bound to a UGA stop codon in the A site at 2.9 Å resolution. In the context of the suppressor tRNA, the conformation of the UGA codon resembles that of a sense-codon rather than when canonical translation termination release factors are bound, suggesting conformational flexibility of the stop codons dependent on the nature of the A-site ligand. The systematic analysis, combined with structural insights, provides a rationale for targeted repurposing of tRNAs to correct devastating nonsense mutations that introduce a premature stop codon.
Collapse
Affiliation(s)
- Suki Albers
- grid.9026.d0000 0001 2287 2617Institute of Biochemistry and Molecular Biology, University of Hamburg, Hamburg, Germany
| | - Bertrand Beckert
- grid.9026.d0000 0001 2287 2617Institute of Biochemistry and Molecular Biology, University of Hamburg, Hamburg, Germany
| | - Marco C. Matthies
- grid.9026.d0000 0001 2287 2617Center for Bioinformatics, University of Hamburg, Hamburg, Germany
| | - Chandra Sekhar Mandava
- grid.8993.b0000 0004 1936 9457Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Raphael Schuster
- grid.9026.d0000 0001 2287 2617Institute of Organic Chemistry, University of Hamburg, Hamburg, Germany
| | | | - Maria Riedner
- grid.9026.d0000 0001 2287 2617Institute of Organic Chemistry, University of Hamburg, Hamburg, Germany
| | - Suparna Sanyal
- grid.8993.b0000 0004 1936 9457Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Andrew E. Torda
- grid.9026.d0000 0001 2287 2617Center for Bioinformatics, University of Hamburg, Hamburg, Germany
| | - Daniel N. Wilson
- grid.9026.d0000 0001 2287 2617Institute of Biochemistry and Molecular Biology, University of Hamburg, Hamburg, Germany
| | - Zoya Ignatova
- grid.9026.d0000 0001 2287 2617Institute of Biochemistry and Molecular Biology, University of Hamburg, Hamburg, Germany
| |
Collapse
|
4
|
Inverse folding with RNA-As-Graphs produces a large pool of candidate sequences with target topologies. J Struct Biol 2019; 209:107438. [PMID: 31874236 DOI: 10.1016/j.jsb.2019.107438] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 12/18/2019] [Accepted: 12/19/2019] [Indexed: 02/07/2023]
Abstract
We present an RNA-As-Graphs (RAG) based inverse folding algorithm, RAG-IF, to design novel RNA sequences that fold onto target tree graph topologies. The algorithm can be used to enhance our recently reported computational design pipeline (Jain et al., NAR 2018). The RAG approach represents RNA secondary structures as tree and dual graphs, where RNA loops and helices are coarse-grained as vertices and edges, opening the usage of graph theory methods to study, predict, and design RNA structures. Our recently developed computational pipeline for design utilizes graph partitioning (RAG-3D) and atomic fragment assembly (F-RAG) to design sequences to fold onto RNA-like tree graph topologies; the atomic fragments are taken from existing RNA structures that correspond to tree subgraphs. Because F-RAG may not produce the target folds for all designs, automated mutations by RAG-IF algorithm enhance the candidate pool markedly. The crucial residues for mutation are identified by differences between the predicted and the target topology. A genetic algorithm then mutates the selected residues, and the successful sequences are optimized to retain only the minimal or essential mutations. Here we evaluate RAG-IF for 6 RNA-like topologies and generate a large pool of successful candidate sequences with a variety of minimal mutations. We find that RAG-IF adds robustness and efficiency to our RNA design pipeline, making inverse folding motivated by graph topology rather than secondary structure more productive.
Collapse
|
5
|
Koodli RV, Keep B, Coppess KR, Portela F, Das R. EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame. PLoS Comput Biol 2019; 15:e1007059. [PMID: 31247029 PMCID: PMC6597038 DOI: 10.1371/journal.pcbi.1007059] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 04/30/2019] [Indexed: 11/18/2022] Open
Abstract
Emerging RNA-based approaches to disease detection and gene therapy require RNA sequences that fold into specific base-pairing patterns, but computational algorithms generally remain inadequate for these secondary structure design tasks. The Eterna project has crowdsourced RNA design to human video game players in the form of puzzles that reach extraordinary difficulty. Here, we demonstrate that Eterna participants' moves and strategies can be leveraged to improve automated computational RNA design. We present an eternamoves-large repository consisting of 1.8 million of player moves on 12 of the most-played Eterna puzzles as well as an eternamoves-select repository of 30,477 moves from the top 72 players on a select set of more advanced puzzles. On eternamoves-select, we present a multilayer convolutional neural network (CNN) EternaBrain that achieves test accuracies of 51% and 34% in base prediction and location prediction, respectively, suggesting that top players' moves are partially stereotyped. Pipelining this CNN's move predictions with single-action-playout (SAP) of six strategies compiled by human players solves 61 out of 100 independent puzzles in the Eterna100 benchmark. EternaBrain-SAP outperforms previously published RNA design algorithms and achieves similar or better performance than a newer generation of deep learning methods, while being largely orthogonal to these other methods. Our study provides useful lessons for future efforts to achieve human-competitive performance with automated RNA design algorithms.
Collapse
Affiliation(s)
- Rohan V. Koodli
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
| | - Benjamin Keep
- Department of Education, Stanford University, Stanford, CA, United States of America
| | - Katherine R. Coppess
- Department of Physics, Stanford University, Stanford, CA, United States of America
| | - Fernando Portela
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
| | | | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
- Department of Physics, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
6
|
Jain S, Saju S, Petingi L, Schlick T. An extended dual graph library and partitioning algorithm applicable to pseudoknotted RNA structures. Methods 2019; 162-163:74-84. [PMID: 30928508 DOI: 10.1016/j.ymeth.2019.03.022] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 02/28/2019] [Accepted: 03/22/2019] [Indexed: 12/18/2022] Open
Abstract
Exploring novel RNA topologies is imperative for understanding RNA structure and pursuing its design. Our RNA-As-Graphs (RAG) approach exploits graph theory tools and uses coarse-grained tree and dual graphs to represent RNA helices and loops by vertices and edges. Only dual graphs represent pseudoknotted RNAs fully. Here we develop a dual graph enumeration algorithm to generate an expanded library of dual graph topologies for 2-9 vertices, and extend our dual graph partitioning algorithm to identify all possible RNA subgraphs. Our enumeration algorithm connects smaller-vertex graphs, using all possible edge combinations, to build larger-vertex graphs and retain all non-isomorphic graph topologies, thereby more than doubling the size of our prior library to a total of 110,667 dual graph topologies. We apply our dual graph partitioning algorithm, which keeps pseudoknots and junctions intact, to all existing RNA structures to identify all possible substructures up to 9 vertices. In addition, our expanded dual graph library assigns graph topologies to all RNA graphs and subgraphs, rectifying prior inconsistencies. We update our RAG-3Dual database of RNA atomic fragments with all newly identified substructures and their graph IDs, increasing its size by more than 50 times. The enlarged dual graph library and RAG-3Dual database provide a comprehensive repertoire of graph topologies and atomic fragments to study yet undiscovered RNA molecules and design RNA sequences with novel topologies, including a variety of pseudoknotted RNAs.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, 1021 Silver, 100 Washington Square East, New York, NY 10003, USA
| | - Sera Saju
- Department of Chemistry, New York University, 1021 Silver, 100 Washington Square East, New York, NY 10003, USA
| | - Louis Petingi
- Computer Science Department, College of Staten Island, City University of New York, Staten Island, New York, NY 10314, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 1021 Silver, 100 Washington Square East, New York, NY 10003, USA; Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA; NYU-East China Normal University Center for Computational Chemistry at New York University Shanghai, Room 340, Geography Building, North Zhongshan Road, 3663 Shanghai, China.
| |
Collapse
|
7
|
Jain S, Laederach A, Ramos SBV, Schlick T. A pipeline for computational design of novel RNA-like topologies. Nucleic Acids Res 2018; 46:7040-7051. [PMID: 30137633 PMCID: PMC6101589 DOI: 10.1093/nar/gky524] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 05/22/2018] [Accepted: 05/24/2018] [Indexed: 12/11/2022] Open
Abstract
Designing novel RNA topologies is a challenge, with important therapeutic and industrial applications. We describe a computational pipeline for design of novel RNA topologies based on our coarse-grained RNA-As-Graphs (RAG) framework. RAG represents RNA structures as tree graphs and describes RNA secondary (2D) structure topologies (currently up to 13 vertices, ≈260 nucleotides). We have previously identified novel graph topologies that are RNA-like among these. Here we describe a systematic design pipeline and illustrate design for six broad design problems using recently developed tools for graph-partitioning and fragment assembly (F-RAG). Following partitioning of the target graph, corresponding atomic fragments from our RAG-3D database are combined using F-RAG, and the candidate atomic models are scored using a knowledge-based potential developed for 3D structure prediction. The sequences of the top scoring models are screened further using available tools for 2D structure prediction. The results indicate that our modular approach based on RNA-like topologies rather than specific 2D structures allows for greater flexibility in the design process, and generates a large number of candidate sequences quickly. Experimental structure probing using SHAPE-MaP for two sequences agree with our predictions and suggest that our combined tools yield excellent candidates for further sequence and experimental screening.
Collapse
Affiliation(s)
- Swati Jain
- Department of Chemistry, New York University, 1001 Silver, 100 Washington Square East, New York, NY 10003, USA
| | - Alain Laederach
- Department of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Silvia B V Ramos
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Tamar Schlick
- Department of Chemistry, New York University, 1001 Silver, 100 Washington Square East, New York, NY 10003, USA
- Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA
- NYU-ECNU Center for Computational Chemistry at New York University Shanghai, Room 340, Geography Building, North Zhongshan Road, 3663 Shanghai, China
| |
Collapse
|
8
|
Eastman P, Shi J, Ramsundar B, Pande VS. Solving the RNA design problem with reinforcement learning. PLoS Comput Biol 2018; 14:e1006176. [PMID: 29927936 PMCID: PMC6029810 DOI: 10.1371/journal.pcbi.1006176] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Revised: 07/03/2018] [Accepted: 05/04/2018] [Indexed: 11/19/2022] Open
Abstract
We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.
Collapse
Affiliation(s)
- Peter Eastman
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| | - Jade Shi
- Department of Chemistry, Stanford University, Stanford, CA, United States of America
| | - Bharath Ramsundar
- Department of Computer Science, Stanford University, Stanford, CA, United States of America
| | - Vijay S. Pande
- Department of Bioengineering, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
9
|
Wolfe BR, Porubsky NJ, Zadeh JN, Dirks RM, Pierce NA. Constrained Multistate Sequence Design for Nucleic Acid Reaction Pathway Engineering. J Am Chem Soc 2017; 139:3134-3144. [DOI: 10.1021/jacs.6b12693] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Affiliation(s)
- Brian R. Wolfe
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Nicholas J. Porubsky
- Division of Chemistry & Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Joseph N. Zadeh
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Robert M. Dirks
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Niles A. Pierce
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
- Division of Engineering & Applied Science, California Institute of Technology, Pasadena, California 91125, United States
- Weatherall
Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
| |
Collapse
|
10
|
Anderson-Lee J, Fisker E, Kosaraju V, Wu M, Kong J, Lee J, Lee M, Zada M, Treuille A, Das R. Principles for Predicting RNA Secondary Structure Design Difficulty. J Mol Biol 2016; 428:748-757. [PMID: 26902426 PMCID: PMC4833017 DOI: 10.1016/j.jmb.2015.11.013] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2015] [Revised: 11/04/2015] [Accepted: 11/10/2015] [Indexed: 11/27/2022]
Abstract
Designing RNAs that form specific secondary structures is enabling better understanding and control of living systems through RNA-guided silencing, genome editing and protein organization. Little is known, however, about which RNA secondary structures might be tractable for downstream sequence design, increasing the time and expense of design efforts due to inefficient secondary structure choices. Here, we present insights into specific structural features that increase the difficulty of finding sequences that fold into a target RNA secondary structure, summarizing the design efforts of tens of thousands of human participants and three automated algorithms (RNAInverse, INFO-RNA and RNA-SSD) in the Eterna massive open laboratory. Subsequent tests through three independent RNA design algorithms (NUPACK, DSS-Opt and MODENA) confirmed the hypothesized importance of several features in determining design difficulty, including sequence length, mean stem length, symmetry and specific difficult-to-design motifs such as zigzags. Based on these results, we have compiled an Eterna100 benchmark of 100 secondary structure design challenges that span a large range in design difficulty to help test future efforts. Our in silico results suggest new routes for improving computational RNA design methods and for extending these insights to assess "designability" of single RNA structures, as well as of switches for in vitro and in vivo applications.
Collapse
Affiliation(s)
| | | | - Vineet Kosaraju
- Eterna Massive Open Laboratory; Department of Biochemistry, Stanford University, Stanford, CA 94305, USA
| | - Michelle Wu
- Eterna Massive Open Laboratory; Program in Biomedical Informatics, Stanford University, Stanford, CA 94305, USA
| | - Justin Kong
- Eterna Massive Open Laboratory; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Jeehyung Lee
- Eterna Massive Open Laboratory; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Minjae Lee
- Eterna Massive Open Laboratory; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | - Adrien Treuille
- Eterna Massive Open Laboratory; Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA; Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Rhiju Das
- Eterna Massive Open Laboratory; Department of Biochemistry, Stanford University, Stanford, CA 94305, USA; Department of Physics, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
11
|
Zhou Q, Xia X, Luo Z, Liang H, Shakhnovich E. Searching the Sequence Space for Potent Aptamers Using SELEX in Silico. J Chem Theory Comput 2015; 11:5939-46. [PMID: 26642994 DOI: 10.1021/acs.jctc.5b00707] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
To isolate functional nucleic acids that bind to defined targets with high affinity and specificity, which are known as aptamers, the systematic evolution of ligands by exponential enrichment (SELEX) methodology has emerged as the preferred approach. Here, we propose a computational approach, SELEX in silico, that allows the sequence space to be more thoroughly explored regarding binding of a certain target. Our approach consists of two steps: (i) secondary structure-based sequence screening, which aims to collect the sequences that can form a desired RNA motif as an enhanced initial library, followed by (ii) sequence enrichment regarding target binding by molecular dynamics simulation-based virtual screening. Our SELEX in silico method provided a practical computational solution to three key problems in aptamer sequence searching: design of nucleic acid libraries, knowledge of sequence enrichment, and identification of potent aptamers. Six potent theophylline-binding aptamers, which were isolated by SELEX in silico from a sequence space containing 4(13) sequences, were experimentally verified to bind theophylline with high affinity: Kd ranging from 0.16 to 0.52 μM, compared with the dissociation constant of the original aptamer-theophylline, 0.32 μM. These results demonstrate the significant potential of SELEX in silico as a new method for aptamer discovery and optimization.
Collapse
Affiliation(s)
- Qingtong Zhou
- Department of Chemistry and Chemical Biology, Harvard University , Cambridge, Massachusetts 02138, United States
| | - Xiaole Xia
- Department of Chemistry and Chemical Biology, Harvard University , Cambridge, Massachusetts 02138, United States.,Key Laboratory of Industrial Biotechnology, Ministry of Education, School of Biotechnology, Jiangnan University , Wuxi, Jiangsu 214122, People's Republic of China
| | | | | | - Eugene Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University , Cambridge, Massachusetts 02138, United States
| |
Collapse
|
12
|
Wolfe BR, Pierce NA. Sequence Design for a Test Tube of Interacting Nucleic Acid Strands. ACS Synth Biol 2015; 4:1086-100. [PMID: 25329866 DOI: 10.1021/sb5002196] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
We describe an algorithm for designing the equilibrium base-pairing properties of a test tube of interacting nucleic acid strands. A target test tube is specified as a set of desired "on-target" complexes, each with a target secondary structure and target concentration, and a set of undesired "off-target" complexes, each with vanishing target concentration. Sequence design is performed by optimizing the test tube ensemble defect, corresponding to the concentration of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of the test tube. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, the structural ensemble of each on-target complex is hierarchically decomposed into a tree of conditional subensembles, yielding a forest of decomposition trees. Candidate sequences are evaluated efficiently at the leaf level of the decomposition forest by estimating the test tube ensemble defect from conditional physical properties calculated over the leaf subensembles. As optimized subsequences are merged toward the root level of the forest, any emergent defects are eliminated via ensemble redecomposition and sequence reoptimization. After successfully merging subsequences to the root level, the exact test tube ensemble defect is calculated for the first time, explicitly checking for the effect of the previously neglected off-target complexes. Any off-target complexes that form at appreciable concentration are hierarchically decomposed, added to the decomposition forest, and actively destabilized during subsequent forest reoptimization. For target test tubes representative of design challenges in the molecular programming and synthetic biology communities, our test tube design algorithm typically succeeds in achieving a normalized test tube ensemble defect ≤1% at a design cost within an order of magnitude of the cost of test tube analysis.
Collapse
Affiliation(s)
- Brian R. Wolfe
- Division of Biology and Biological
Engineering and ‡Division of Engineering and Applied
Science, California Institute of Technology, Pasadena, California 91125, United States
| | - Niles A. Pierce
- Division of Biology and Biological
Engineering and ‡Division of Engineering and Applied
Science, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
13
|
Jabbari H, Aminpour M, Montemagno C. Computational Approaches to Nucleic Acid Origami. ACS COMBINATORIAL SCIENCE 2015; 17:535-47. [PMID: 26348196 DOI: 10.1021/acscombsci.5b00079] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Recent advances in experimental DNA origami have dramatically expanded the horizon of DNA nanotechnology. Complex 3D suprastructures have been designed and developed using DNA origami with applications in biomaterial science, nanomedicine, nanorobotics, and molecular computation. Ribonucleic acid (RNA) origami has recently been realized as a new approach. Similar to DNA, RNA molecules can be designed to form complex 3D structures through complementary base pairings. RNA origami structures are, however, more compact and more thermodynamically stable due to RNA's non-canonical base pairing and tertiary interactions. With all these advantages, the development of RNA origami lags behind DNA origami by a large gap. Furthermore, although computational methods have proven to be effective in designing DNA and RNA origami structures and in their evaluation, advances in computational nucleic acid origami is even more limited. In this paper, we review major milestones in experimental and computational DNA and RNA origami and present current challenges in these fields. We believe collaboration between experimental nanotechnologists and computer scientists are critical for advancing these new research paradigms.
Collapse
Affiliation(s)
- Hosna Jabbari
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Maral Aminpour
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Carlo Montemagno
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| |
Collapse
|
14
|
Taneda A. Multi-objective optimization for RNA design with multiple target secondary structures. BMC Bioinformatics 2015; 16:280. [PMID: 26335276 PMCID: PMC4559319 DOI: 10.1186/s12859-015-0706-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2014] [Accepted: 08/17/2015] [Indexed: 12/24/2022] Open
Abstract
Background RNAs are attractive molecules as the biological parts for synthetic biology. In particular, the ability of conformational changes, which can be encoded in designer RNAs, enables us to create multistable molecular switches that function in biological circuits. Although various algorithms for designing such RNA switches have been proposed, the previous algorithms optimize the RNA sequences against the weighted sum of objective functions, where empirical weights among objective functions are used. In addition, an RNA design algorithm for multiple pseudoknot targets is currently not available. Results We developed a novel computational tool for automatically designing RNA sequences which fold into multiple target secondary structures. Our algorithm designs RNA sequences based on multi-objective genetic algorithm, by which we can explore the RNA sequences having good objective function values without empirical weight parameters among the objective functions. Our algorithm has great flexibility by virtue of this weight-free nature. We benchmarked our multi-target RNA design algorithm with the datasets of two, three, and four target structures and found that our algorithm shows better or comparable design performances compared with the previous algorithms, RNAdesign and Frnakenstein. In addition to the benchmarks with pseudoknot-free datasets, we benchmarked MODENA with two-target pseudoknot datasets and found that MODENA can design the RNAs which have the target pseudoknotted secondary structures whose free energies are close to the lowest free energy. Moreover, we applied our algorithm to a ribozyme-based ON-switch which takes a ribozyme-inactive secondary structure when the theophylline aptamer structure is assumed. Conclusions Currently, MODENA is the only RNA design software which can be applied to multiple pseudoknot targets. Successful design results for the multiple targets and an RNA device indicate usefulness of our multi-objective RNA design algorithm. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0706-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Akito Taneda
- Graduate School of Science and Technology, Hirosaki University, 3 Bunkyo-cho, Hirosaki, Aomori, Japan.
| |
Collapse
|
15
|
Abstract
In this chapter, we review both computational and experimental aspects of de novo RNA sequence design. We give an overview of currently available design software and their limitations, and discuss the necessary setup to experimentally validate proper function in vitro and in vivo. We focus on transcription-regulating riboswitches, a task that has just recently lead to first successful designs of such RNA elements.
Collapse
Affiliation(s)
- Sven Findeiß
- Research Group Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, Vienna, Austria; Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Manja Wachsmuth
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany
| | - Mario Mörl
- Institute for Biochemistry, University of Leipzig, Leipzig, Germany.
| | - Peter F Stadler
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria; Bioinformatics Group, Department of Computer Science and the Interdisciplinary Center for Bioinformatic, University of Leipzig, Leipzig, Germany; Center for RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark; Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany; Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany; Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|