1
|
Evolving methods for rational de novo design of functional RNA molecules. Methods 2019; 161:54-63. [PMID: 31059832 DOI: 10.1016/j.ymeth.2019.04.022] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2019] [Revised: 04/26/2019] [Accepted: 04/29/2019] [Indexed: 12/16/2022] Open
Abstract
Artificial RNA molecules with novel functionality have many applications in synthetic biology, pharmacy and white biotechnology. The de novo design of such devices using computational methods and prediction tools is a resource-efficient alternative to experimental screening and selection pipelines. In this review, we describe methods common to many such computational approaches, thoroughly dissect these methods and highlight open questions for the individual steps. Initially, it is essential to investigate the biological target system, the regulatory mechanism that will be exploited, as well as the desired components in order to define design objectives. Subsequent computational design is needed to combine the selected components and to obtain novel functionality. This process can usually be split into constrained sequence sampling, the formulation of an optimization problem and an in silico analysis to narrow down the number of candidates with respect to secondary goals. Finally, experimental analysis is important to check whether the defined design objectives are indeed met in the target environment and detailed characterization experiments should be performed to improve the mechanistic models and detect missing design requirements.
Collapse
|
2
|
Bellaousov S, Kayedkhordeh M, Peterson RJ, Mathews DH. Accelerated RNA secondary structure design using preselected sequences for helices and loops. RNA (NEW YORK, N.Y.) 2018; 24:1555-1567. [PMID: 30097542 PMCID: PMC6191713 DOI: 10.1261/rna.066324.118] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Accepted: 08/06/2018] [Indexed: 06/08/2023]
Abstract
Nucleic acids can be designed to be nano-machines, pharmaceuticals, or probes. RNA secondary structures can form the basis of self-assembling nanostructures. There are only four natural RNA bases, therefore it can be difficult to design sequences that fold to a single, specified structure because many other structures are often possible for a given sequence. One approach taken by state-of-the-art sequence design methods is to select sequences that fold to the specified structure using stochastic, iterative refinement. The goal of this work is to accelerate design. Many existing iterative methods select and refine sequences one base pair and one unpaired nucleotide at a time. Here, the hypothesis that sequences can be preselected in order to accelerate design was tested. To this aim, a database was built of helix sequences that demonstrate thermodynamic features found in natural sequences and that also have little tendency to cross-hybridize. Additionally, a database was assembled of RNA loop sequences with low helix-formation propensity and little tendency to cross-hybridize with either the helices or other loops. These databases of preselected sequences accelerate the selection of sequences that fold with minimal ensemble defect by replacing some of the trial and error of current refinement approaches. When using the database of preselected sequences as compared to randomly chosen sequences, sequences for natural structures are designed 36 times faster, and random structures are designed six times faster. The sequences selected with the aid of the database have similar ensemble defect as those sequences selected at random. The sequence database is part of RNAstructure package at http://rna.urmc.rochester.edu/RNAstructure.html.
Collapse
Affiliation(s)
- Stanislav Bellaousov
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | - Mohammad Kayedkhordeh
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| | | | - David H Mathews
- Department of Biochemistry and Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
- Department of Biostatistics and Computational Biology, University of Rochester Medical Center, Rochester, New York 14642, USA
| |
Collapse
|
3
|
Wolfe BR, Porubsky NJ, Zadeh JN, Dirks RM, Pierce NA. Constrained Multistate Sequence Design for Nucleic Acid Reaction Pathway Engineering. J Am Chem Soc 2017; 139:3134-3144. [DOI: 10.1021/jacs.6b12693] [Citation(s) in RCA: 72] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Affiliation(s)
- Brian R. Wolfe
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Nicholas J. Porubsky
- Division of Chemistry & Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Joseph N. Zadeh
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Robert M. Dirks
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Niles A. Pierce
- Division of Biology & Biological Engineering, California Institute of Technology, Pasadena, California 91125, United States
- Division of Engineering & Applied Science, California Institute of Technology, Pasadena, California 91125, United States
- Weatherall
Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom
| |
Collapse
|
4
|
Zandi K, Butler G, Kharma N. An Adaptive Defect Weighted Sampling Algorithm to Design Pseudoknotted RNA Secondary Structures. Front Genet 2016; 7:129. [PMID: 27499762 PMCID: PMC4956659 DOI: 10.3389/fgene.2016.00129] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Accepted: 07/06/2016] [Indexed: 01/18/2023] Open
Abstract
Computational design of RNA sequences that fold into targeted secondary structures has many applications in biomedicine, nanotechnology and synthetic biology. An RNA molecule is made of different types of secondary structure elements and an important RNA element named pseudoknot plays a key role in stabilizing the functional form of the molecule. However, due to the computational complexities associated with characterizing pseudoknotted RNA structures, most of the existing RNA sequence designer algorithms generally ignore this important structural element and therefore limit their applications. In this paper we present a new algorithm to design RNA sequences for pseudoknotted secondary structures. We use NUPACK as the folding algorithm to compute the equilibrium characteristics of the pseudoknotted RNAs, and describe a new adaptive defect weighted sampling algorithm named Enzymer to design low ensemble defect RNA sequences for targeted secondary structures including pseudoknots. We used a biological data set of 201 pseudoknotted structures from the Pseudobase library to benchmark the performance of our algorithm. We compared the quality characteristics of the RNA sequences we designed by Enzymer with the results obtained from the state of the art MODENA and antaRNA. Our results show our method succeeds more frequently than MODENA and antaRNA do, and generates sequences that have lower ensemble defect, lower probability defect and higher thermostability. Finally by using Enzymer and by constraining the design to a naturally occurring and highly conserved Hammerhead motif, we designed 8 sequences for a pseudoknotted cis-acting Hammerhead ribozyme. Enzymer is available for download at https://bitbucket.org/casraz/enzymer.
Collapse
Affiliation(s)
- Kasra Zandi
- Computer Science Department, Concordia UniversityMontreal, QC, Canada
| | - Gregory Butler
- Computer Science Department, Concordia UniversityMontreal, QC, Canada
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, QC, Canada
| | - Nawwaf Kharma
- Centre for Structural and Functional Genomics, Concordia UniversityMontreal, QC, Canada
- Electrical and Computer Engineering Department, Concordia UniversityMontreal, QC, Canada
| |
Collapse
|
5
|
Kleinkauf R, Houwaart T, Backofen R, Mann M. antaRNA--Multi-objective inverse folding of pseudoknot RNA using ant-colony optimization. BMC Bioinformatics 2015; 16:389. [PMID: 26581440 PMCID: PMC4652366 DOI: 10.1186/s12859-015-0815-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2015] [Accepted: 10/20/2015] [Indexed: 01/14/2023] Open
Abstract
Background Many functional RNA molecules fold into pseudoknot structures, which are often essential for the formation of an RNA’s 3D structure. Currently the design of RNA molecules, which fold into a specific structure (known as RNA inverse folding) within biotechnological applications, is lacking the feature of incorporating pseudoknot structures into the design. Hairpin-(H)- and kissing hairpin-(K)-type pseudoknots cover a wide range of biologically functional pseudoknots and can be represented on a secondary structure level. Results The RNA inverse folding program antaRNA, which takes secondary structure, target GC-content and sequence constraints as input, is extended to provide solutions for such H- and K-type pseudoknotted secondary structure constraint. We demonstrate the easy and flexible interchangeability of modules within the antaRNA framework by incorporating pKiss as structure prediction tool capable of predicting the mentioned pseudoknot types. The performance of the approach is demonstrated on a subset of the Pseudobase ++ dataset. Conclusions This new service is available via a standalone version and is also part of the Freiburg RNA Tools webservice. Furthermore, antaRNA is available in Galaxy and is part of the RNA-workbench Docker image.
Collapse
Affiliation(s)
- Robert Kleinkauf
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany.
| | - Torsten Houwaart
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany.
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany. .,Center for Biological Signaling Studies (BIOSS), University of Freiburg, Freiburg, Germany. .,Center for Biological Systems Analysis (ZBSA), University of Freiburg, Freiburg, Germany. .,Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, 1870, Denmark.
| | - Martin Mann
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Köhler-Allee 106, Freiburg, 79110, Germany.
| |
Collapse
|
6
|
Wolfe BR, Pierce NA. Sequence Design for a Test Tube of Interacting Nucleic Acid Strands. ACS Synth Biol 2015; 4:1086-100. [PMID: 25329866 DOI: 10.1021/sb5002196] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
We describe an algorithm for designing the equilibrium base-pairing properties of a test tube of interacting nucleic acid strands. A target test tube is specified as a set of desired "on-target" complexes, each with a target secondary structure and target concentration, and a set of undesired "off-target" complexes, each with vanishing target concentration. Sequence design is performed by optimizing the test tube ensemble defect, corresponding to the concentration of incorrectly paired nucleotides at equilibrium evaluated over the ensemble of the test tube. To reduce the computational cost of accepting or rejecting mutations to a random initial sequence, the structural ensemble of each on-target complex is hierarchically decomposed into a tree of conditional subensembles, yielding a forest of decomposition trees. Candidate sequences are evaluated efficiently at the leaf level of the decomposition forest by estimating the test tube ensemble defect from conditional physical properties calculated over the leaf subensembles. As optimized subsequences are merged toward the root level of the forest, any emergent defects are eliminated via ensemble redecomposition and sequence reoptimization. After successfully merging subsequences to the root level, the exact test tube ensemble defect is calculated for the first time, explicitly checking for the effect of the previously neglected off-target complexes. Any off-target complexes that form at appreciable concentration are hierarchically decomposed, added to the decomposition forest, and actively destabilized during subsequent forest reoptimization. For target test tubes representative of design challenges in the molecular programming and synthetic biology communities, our test tube design algorithm typically succeeds in achieving a normalized test tube ensemble defect ≤1% at a design cost within an order of magnitude of the cost of test tube analysis.
Collapse
Affiliation(s)
- Brian R. Wolfe
- Division of Biology and Biological
Engineering and ‡Division of Engineering and Applied
Science, California Institute of Technology, Pasadena, California 91125, United States
| | - Niles A. Pierce
- Division of Biology and Biological
Engineering and ‡Division of Engineering and Applied
Science, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
7
|
Jabbari H, Aminpour M, Montemagno C. Computational Approaches to Nucleic Acid Origami. ACS COMBINATORIAL SCIENCE 2015; 17:535-47. [PMID: 26348196 DOI: 10.1021/acscombsci.5b00079] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Recent advances in experimental DNA origami have dramatically expanded the horizon of DNA nanotechnology. Complex 3D suprastructures have been designed and developed using DNA origami with applications in biomaterial science, nanomedicine, nanorobotics, and molecular computation. Ribonucleic acid (RNA) origami has recently been realized as a new approach. Similar to DNA, RNA molecules can be designed to form complex 3D structures through complementary base pairings. RNA origami structures are, however, more compact and more thermodynamically stable due to RNA's non-canonical base pairing and tertiary interactions. With all these advantages, the development of RNA origami lags behind DNA origami by a large gap. Furthermore, although computational methods have proven to be effective in designing DNA and RNA origami structures and in their evaluation, advances in computational nucleic acid origami is even more limited. In this paper, we review major milestones in experimental and computational DNA and RNA origami and present current challenges in these fields. We believe collaboration between experimental nanotechnologists and computer scientists are critical for advancing these new research paradigms.
Collapse
Affiliation(s)
- Hosna Jabbari
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Maral Aminpour
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| | - Carlo Montemagno
- Ingenuity Lab, 11421 Saskatchewan
Drive, Edmonton, Alberta T6G 2M9, Canada
- Department
of Chemical and Materials Engineering, University of Alberta, Edmonton T6G 2V4, Canada
| |
Collapse
|
8
|
Dotu I, Garcia-Martin JA, Slinger BL, Mechery V, Meyer MM, Clote P. Complete RNA inverse folding: computational design of functional hammerhead ribozymes. Nucleic Acids Res 2014; 42:11752-62. [PMID: 25209235 PMCID: PMC4191386 DOI: 10.1093/nar/gku740] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Nanotechnology and synthetic biology currently constitute one of the most innovative, interdisciplinary fields of research, poised to radically transform society in the 21st century. This paper concerns the synthetic design of ribonucleic acid molecules, using our recent algorithm, RNAiFold, which can determine all RNA sequences whose minimum free energy secondary structure is a user-specified target structure. Using RNAiFold, we design ten cis-cleaving hammerhead ribozymes, all of which are shown to be functional by a cleavage assay. We additionally use RNAiFold to design a functional cis-cleaving hammerhead as a modular unit of a synthetic larger RNA. Analysis of kinetics on this small set of hammerheads suggests that cleavage rate of computationally designed ribozymes may be correlated with positional entropy, ensemble defect, structural flexibility/rigidity and related measures. Artificial ribozymes have been designed in the past either manually or by SELEX (Systematic Evolution of Ligands by Exponential Enrichment); however, this appears to be the first purely computational design and experimental validation of novel functional ribozymes. RNAiFold is available at http://bioinformatics.bc.edu/clotelab/RNAiFold/.
Collapse
Affiliation(s)
- Ivan Dotu
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| | | | - Betty L Slinger
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| | - Vinodh Mechery
- Hofstra North Shore-LIJ School of Medicine, Hempstead, NY 11549, USA
| | - Michelle M Meyer
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| | - Peter Clote
- Biology Department, Boston College, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA
| |
Collapse
|
9
|
Esmaili-Taheri A, Ganjtabesh M, Mohammad-Noori M. Evolutionary solution for the RNA design problem. Bioinformatics 2014; 30:1250-8. [PMID: 24407223 DOI: 10.1093/bioinformatics/btu001] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION RNAs play fundamental roles in cellular processes. The function of an RNA is highly dependent on its 3D conformation, which is referred to as the RNA tertiary structure. Because the prediction or experimental determination of these structures is difficult, so many works focus on the problems associated with the RNA secondary structure. Here, we consider the RNA inverse folding problem, in which an RNA secondary structure is given as a target structure and the goal is to design an RNA sequence that folds into the target structure. In this article, we introduce a new evolutionary algorithm for the RNA inverse folding problem. Our algorithm, entitled Evolutionary RNA Design, generates a sequence whose minimum free energy structure is the same as the target structure. RESULTS We compare our algorithm with INFO-RNA, MODENA, RNAiFold and NUPACK approaches for some biological test sets. The results presented in this article indicate that for longer structures, our algorithm performs better than the other mentioned algorithms in terms of the energy range, accuracy, speedup and nucleotide distribution. Particularly, the generated RNA sequences in our method are much more reliable and similar to the natural RNA sequences.
Collapse
Affiliation(s)
- Ali Esmaili-Taheri
- Department of Computer Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, P. O. Box: 14155-6455, Tehran, Iran, Laboratoire d'Informatique (LIX), Ecole Polytechnique, 91128 Palaiseau CEDEX, France and School of Biological Science, Institute for Research in Fundamental Sciences (IPM), P.O. Box: 19395-5746 Tehran, Iran
| | | | | |
Collapse
|
10
|
Pitman DJ, Schenkelberg CD, Huang YM, Teets FD, DiTursi D, Bystroff C. Improving computational efficiency and tractability of protein design using a piecemeal approach. A strategy for parallel and distributed protein design. ACTA ACUST UNITED AC 2013; 30:1138-1145. [PMID: 24371152 DOI: 10.1093/bioinformatics/btt735] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 12/15/2013] [Indexed: 11/14/2022]
Abstract
MOTIVATION Accuracy in protein design requires a fine-grained rotamer search, multiple backbone conformations, and a detailed energy function, creating a burden in runtime and memory requirements. A design task may be split into manageable pieces in both three-dimensional space and in the rotamer search space to produce small, fast jobs that are easily distributed. However, these jobs must overlap, presenting a problem in resolving conflicting solutions in the overlap regions. RESULTS Piecemeal design, in which the design space is split into overlapping regions and rotamer search spaces, accelerates the design process whether jobs are run in series or in parallel. Large jobs that cannot fit in memory were made possible by splitting. Accepting the consensus amino acid selection in conflict regions led to non-optimal choices. Instead, conflicts were resolved using a second pass, in which the split regions were re-combined and designed as one, producing results that were closer to optimal with a minimal increase in runtime over the consensus strategy. Splitting the search space at the rotamer level instead of at the amino acid level further improved the efficiency by reducing the search space in the second pass. AVAILABILITY AND IMPLEMENTATION Programs for splitting protein design expressions are available at www.bioinfo.rpi.edu/tools/piecemeal.html CONTACT: bystrc@rpi.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Derek J Pitman
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Christian D Schenkelberg
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Yao-Ming Huang
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Frank D Teets
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Daniel DiTursi
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | - Christopher Bystroff
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94143, Department of Computer Science and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| |
Collapse
|
11
|
Viral IRES prediction system - a web server for prediction of the IRES secondary structure in silico. PLoS One 2013; 8:e79288. [PMID: 24223923 PMCID: PMC3818432 DOI: 10.1371/journal.pone.0079288] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2013] [Accepted: 09/22/2013] [Indexed: 02/06/2023] Open
Abstract
The internal ribosomal entry site (IRES) functions as cap-independent translation initiation sites in eukaryotic cells. IRES elements have been applied as useful tools for bi-cistronic expression vectors. Current RNA structure prediction programs are unable to predict precisely the potential IRES element. We have designed a viral IRES prediction system (VIPS) to perform the IRES secondary structure prediction. In order to obtain better results for the IRES prediction, the VIPS can evaluate and predict for all four different groups of IRESs with a higher accuracy. RNA secondary structure prediction, comparison, and pseudoknot prediction programs were implemented to form the three-stage procedure for the VIPS. The backbone of VIPS includes: the RNAL fold program, aimed to predict local RNA secondary structures by minimum free energy method; the RNA Align program, intended to compare predicted structures; and pknotsRG program, used to calculate the pseudoknot structure. VIPS was evaluated by using UTR database, IRES database and Virus database, and the accuracy rate of VIPS was assessed as 98.53%, 90.80%, 82.36% and 80.41% for IRES groups 1, 2, 3, and 4, respectively. This advance useful search approach for IRES structures will facilitate IRES related studies. The VIPS on-line website service is available at http://140.135.61.250/vips/.
Collapse
|
12
|
Generation of RNA pseudoknot structures with topological genus filtration. Math Biosci 2013; 245:216-25. [DOI: 10.1016/j.mbs.2013.07.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Revised: 06/11/2013] [Accepted: 07/12/2013] [Indexed: 11/22/2022]
|
13
|
Ganjtabesh M, Zare-Mirakabad F, Nowzari-Dalini A. Inverse RNA folding solution based on multi-objective genetic algorithm and Gibbs sampling method. EXCLI JOURNAL 2013; 12:546-55. [PMID: 26933401 PMCID: PMC4763459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2013] [Accepted: 05/02/2013] [Indexed: 11/09/2022]
Abstract
In living systems, RNAs play important biological functions. The functional form of an RNA frequently requires a specific tertiary structure. The scaffold for this structure is provided by secondary structural elements that are hydrogen bonds within the molecule. Here, we concentrate on the inverse RNA folding problem. In this problem, an RNA secondary structure is given as a target structure and the goal is to design an RNA sequence that its structure is the same (or very similar) to the given target structure. Different heuristic search methods have been proposed for this problem. One common feature among these methods is to use a folding algorithm to evaluate the accuracy of the designed RNA sequence during the generation process. The well known folding algorithms take O(n(3)) times where n is the length of the RNA sequence. In this paper, we introduce a new algorithm called GGI-Fold based on multi-objective genetic algorithm and Gibbs sampling method for the inverse RNA folding problem. Our algorithm generates a sequence where its structure is the same or very similar to the given target structure. The key feature of our method is that it never uses any folding algorithm to improve the quality of the generated sequences. We compare our algorithm with RNA-SSD for some biological test samples. In all test samples, our algorithm outperforms the RNA-SSD method for generating a sequence where its structure is more stable.
Collapse
Affiliation(s)
- M Ganjtabesh
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran,School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran,Laboratoire d’Informatique (LIX), Ecole Polytechnique, Palaiseau CEDEX, 91128, France,*To whom correspondence should be addressed: M Ganjtabesh, School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran, E-mail:
| | - F Zare-Mirakabad
- Department of Applied Mathematics, Faculty of Mathematics and Computer Science, Amirkabir University of Technology, Tehran, Iran
| | - A Nowzari-Dalini
- School of Mathematics, Statistics, and Computer Science, College of Science, University of Tehran, Tehran, Iran
| |
Collapse
|
14
|
Garcia-Martin JA, Clote P, Dotu I. RNAiFold: a web server for RNA inverse folding and molecular design. Nucleic Acids Res 2013; 41:W465-70. [PMID: 23700314 PMCID: PMC3692061 DOI: 10.1093/nar/gkt280] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Synthetic biology and nanotechnology are poised to make revolutionary contributions to the 21st century. In this article, we describe a new web server to support in silico RNA molecular design. Given an input target RNA secondary structure, together with optional constraints, such as requiring GC-content to lie within a certain range, requiring the number of strong (GC), weak (AU) and wobble (GU) base pairs to lie in a certain range, the RNAiFold web server determines one or more RNA sequences, whose minimum free-energy secondary structure is the target structure. RNAiFold provides access to two servers: RNA-CPdesign, which applies constraint programming, and RNA-LNSdesign, which applies the large neighborhood search heuristic; hence, it is suitable for larger input structures. Both servers can also solve the RNA inverse hybridization problem, i.e. given a representation of the desired hybridization structure, RNAiFold returns two sequences, whose minimum free-energy hybridization is the input target structure. The web server is publicly accessible at http://bioinformatics.bc.edu/clotelab/RNAiFold, which provides access to two specialized servers: RNA-CPdesign and RNA-LNSdesign. Source code for the underlying algorithms, implemented in COMET and supported on linux, can be downloaded at the server website.
Collapse
|
15
|
Lyngsø RB, Anderson JWJ, Sizikova E, Badugu A, Hyland T, Hein J. Frnakenstein: multiple target inverse RNA folding. BMC Bioinformatics 2012; 13:260. [PMID: 23043260 PMCID: PMC3534541 DOI: 10.1186/1471-2105-13-260] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2012] [Accepted: 09/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. RESULTS In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. CONCLUSIONS Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.
Collapse
Affiliation(s)
- Rune B Lyngsø
- Department of Statistics, University of Oxford, Oxford OX1 3TG, UK
| | | | - Elena Sizikova
- Department of Computer Science, University of Oxford, Oxford OX1 3QD, UK
| | - Amarendra Badugu
- ETH Zürich, Department of Biosystems Science and Engineering, 4058 Basel, Switzerland
| | - Tomas Hyland
- Mathematics Institute, University of Oxford, Oxford OX1 3LB, UK
| | - Jotun Hein
- Department of Statistics, University of Oxford, Oxford OX1 3TG, UK
| |
Collapse
|
16
|
Topological classification and enumeration of RNA structures by genus. J Math Biol 2012; 67:1261-78. [PMID: 23053535 DOI: 10.1007/s00285-012-0594-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2011] [Revised: 06/27/2012] [Indexed: 10/27/2022]
Abstract
To an RNA pseudoknot structure is naturally associated a topological surface, which has its associated genus, and structures can thus be classified by the genus. Based on earlier work of Harer-Zagier, we compute the generating function Dg,σ (z) = ∑n dg,σ (n)zn for the number dg,σ (n) of those structures of fixed genus g and minimum stack size σ with n nucleotides so that no two consecutive nucleotides are basepaired and show that Dg,σ (z) is algebraic. In particular, we prove that dg,2(n) ∼ kg n3(g−1/2 )γ n2, where γ2 ≈ 1.9685. Thus, for stack size at least two, the genus only enters through the sub-exponential factor, and the slow growth rate compared to the number of RNA molecules implies the existence of neutral networks of distinct molecules with the same structure of any genus. Certain RNA structures called shapes are shown to be in natural one-to-one correspondence with the cells in the Penner-Strebel decomposition of Riemann's moduli space of a surface of genus g with one boundary component, thus providing a link between RNA enumerative problems and the geometry of Riemann's moduli space.
Collapse
|
17
|
Taneda A. Multi-objective genetic algorithm for pseudoknotted RNA sequence design. Front Genet 2012; 3:36. [PMID: 22558001 PMCID: PMC3337422 DOI: 10.3389/fgene.2012.00036] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2011] [Accepted: 02/25/2012] [Indexed: 01/28/2023] Open
Abstract
RNA inverse folding is a computational technology for designing RNA sequences which fold into a user-specified secondary structure. Although pseudoknots are functionally important motifs in RNA structures, less reports concerning the inverse folding of pseudoknotted RNAs have been done compared to those for pseudoknot-free RNA design. In this paper, we present a new version of our multi-objective genetic algorithm (MOGA), MODENA, which we have previously proposed for pseudoknot-free RNA inverse folding. In the new version of MODENA, (i) a new crossover operator is implemented and (ii) pseudoknot prediction methods, IPknot and HotKnots, are used to evaluate the designed RNA sequences, allowing us to perform the inverse folding of pseudoknotted RNAs. The new version of MODENA with the new crossover operator was benchmarked with a dataset composed of natural pseudoknotted RNA secondary structures, and we found that MODENA can successfully design more pseudoknotted RNAs compared to the other pseudoknot design algorithm. In addition, a sequence constraint function newly implemented in the new version of MODENA was tested by designing RNA sequences which fold into the pseudoknotted structure of a hepatitis delta virus ribozyme; as a result, we successfully designed eight RNA sequences. The new version of MODENA is downloadable from http://rna.eit.hirosaki-u.ac.jp/modena/.
Collapse
Affiliation(s)
- Akito Taneda
- Graduate School of Science and Technology, Hirosaki University Hirosaki, Japan
| |
Collapse
|