1
|
Matsuda T, Hori H, Yamagami R. Rational design of oligonucleotides for enhanced in vitro transcription of small RNA. RNA (NEW YORK, N.Y.) 2024; 30:710-727. [PMID: 38423625 PMCID: PMC11098460 DOI: 10.1261/rna.079923.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Accepted: 02/14/2024] [Indexed: 03/02/2024]
Abstract
All kinds of RNA molecules can be produced by in vitro transcription using T7 RNA polymerase using DNA templates obtained by solid-phase chemical synthesis, primer extension, PCR, or DNA cloning. The oligonucleotide design, however, is a challenge to nonexperts as this relies on a set of rules that have been established empirically over time. Here, we describe a Python program to facilitate the rational design of oligonucleotides, calculated with kinetic parameters for enhanced in vitro transcription (ROCKET). The Python tool uses thermodynamic parameters, performs folding-energy calculations, and selects oligonucleotides suitable for the polymerase extension reaction. These oligonucleotides improve yields of template DNA. With the oligonucleotides selected by the program, the tRNA transcripts can be prepared by a one-pot reaction of the DNA polymerase extension reaction and the transcription reaction. Also, the ROCKET-selected oligonucleotides provide greater transcription yields than that from oligonucleotides selected by Primerize, a leading software for designing oligonucleotides for in vitro transcription, due to the enhancement of template DNA synthesis. Apart from over 50 tRNA genes tested, an in vitro transcribed self-cleaving ribozyme was found to have catalytic activity. In addition, the program can be applied to the synthesis of mRNA, demonstrating the wide applicability of the ROCKET software.
Collapse
MESH Headings
- Transcription, Genetic
- Oligonucleotides/chemistry
- Oligonucleotides/genetics
- Oligonucleotides/chemical synthesis
- Software
- DNA-Directed RNA Polymerases/metabolism
- DNA-Directed RNA Polymerases/genetics
- RNA, Catalytic/genetics
- RNA, Catalytic/metabolism
- RNA, Catalytic/chemistry
- Thermodynamics
- RNA, Transfer/genetics
- RNA, Transfer/chemistry
- RNA, Transfer/metabolism
- Kinetics
- RNA, Messenger/genetics
- RNA, Messenger/chemistry
- RNA, Messenger/metabolism
- Viral Proteins/genetics
- Viral Proteins/metabolism
Collapse
Affiliation(s)
- Teppei Matsuda
- Department of Applied Chemistry, Graduate School of Science and Engineering, Ehime University, Matsuyama, Ehime 790-8577, Japan
| | - Hiroyuki Hori
- Department of Applied Chemistry, Graduate School of Science and Engineering, Ehime University, Matsuyama, Ehime 790-8577, Japan
| | - Ryota Yamagami
- Department of Applied Chemistry, Graduate School of Science and Engineering, Ehime University, Matsuyama, Ehime 790-8577, Japan
| |
Collapse
|
2
|
Sumi S, Hamada M, Saito H. Deep generative design of RNA family sequences. Nat Methods 2024; 21:435-443. [PMID: 38238559 DOI: 10.1038/s41592-023-02148-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 12/07/2023] [Indexed: 03/13/2024]
Abstract
RNA engineering has immense potential to drive innovation in biotechnology and medicine. Despite its importance, a versatile platform for the automated design of functional RNA is still lacking. Here, we propose RNA family sequence generator (RfamGen), a deep generative model that designs RNA family sequences in a data-efficient manner by explicitly incorporating alignment and consensus secondary structure information. RfamGen can generate novel and functional RNA family sequences by sampling points from a semantically rich and continuous representation. We have experimentally demonstrated the versatility of RfamGen using diverse RNA families. Furthermore, we confirmed the high success rate of RfamGen in designing functional ribozymes through a quantitative massively parallel assay. Notably, RfamGen successfully generates artificial sequences with higher activity than natural sequences. Overall, RfamGen significantly improves our ability to design functional RNA and opens up new potential for generative RNA engineering in synthetic biology.
Collapse
Affiliation(s)
- Shunsuke Sumi
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan.
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan.
| | - Hirohide Saito
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan.
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| |
Collapse
|
3
|
Pham TM, Miffin T, Sun H, Sharp KK, Wang X, Zhu M, Hoshika S, Peterson RJ, Benner SA, Kahn JD, Mathews DH. DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. ACS Synth Biol 2023; 12:2750-2763. [PMID: 37671922 PMCID: PMC10510751 DOI: 10.1021/acssynbio.3c00358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Indexed: 09/07/2023]
Abstract
We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-d-2'-deoxyribofuranosyl)-imidazo-[1,2-a]-1,3,5-triazin-(8H)-4-one and 6-amino-3-(1'-β-d-2'-deoxyribofuranosyl)-5-nitro-(1H)-pyridin-2-one, abbreviated as P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find G-Z pairs have stability comparable to that of A-T pairs and should therefore be included as base pairs in structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include the P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing it with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases in which Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.
Collapse
Affiliation(s)
- Tuan M. Pham
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| | - Terrel Miffin
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Hongying Sun
- Department
of Surgery, University of Rochester Medical
Center, Rochester, New York 14642, United States
| | - Kenneth K. Sharp
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Xiaoyu Wang
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Mingyi Zhu
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| | - Shuichi Hoshika
- Foundation
for Applied Molecular Evolution, Alachua, Florida 32615, United States
| | | | - Steven A. Benner
- Foundation
for Applied Molecular Evolution, Alachua, Florida 32615, United States
| | - Jason D. Kahn
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - David H. Mathews
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| |
Collapse
|
4
|
Pham TM, Miffin T, Sun H, Sharp KK, Wang X, Zhu M, Hoshika S, Peterson RJ, Benner SA, Kahn JD, Mathews DH. DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543917. [PMID: 37333404 PMCID: PMC10274641 DOI: 10.1101/2023.06.06.543917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-D-2'-deoxyribofuranosyl)-imidazo-[1,2- a ]-1,3,5-triazin-(8 H )-4-one and 6-amino-3-(1'-β-D-2'-deoxyribofuranosyl)-5-nitro-(1 H )-pyridin-2-one, simply P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit a new set of free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find that G-Z pairs have stability comparable to A-T pairs and therefore should be considered quantitatively by structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases where Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.
Collapse
Affiliation(s)
- Tuan M. Pham
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| | - Terrel Miffin
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Hongying Sun
- Department of Surgery, University of Rochester Medical Center, Rochester, NY
| | - Kenneth K. Sharp
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Xiaoyu Wang
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Mingyi Zhu
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| | | | | | | | - Jason D. Kahn
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| |
Collapse
|
5
|
Najeh S, Zandi K, Kharma N, Perreault J. Computational design and experimental verification of pseudoknotted ribozymes. RNA (NEW YORK, N.Y.) 2023; 29:764-776. [PMID: 36868786 PMCID: PMC10187678 DOI: 10.1261/rna.079148.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 05/27/2022] [Indexed: 05/18/2023]
Abstract
The design of new RNA sequences that retain the function of a model RNA structure is a challenge in bioinformatics because of the structural complexity of these molecules. RNA can fold into its secondary and tertiary structures by forming stem-loops and pseudoknots. A pseudoknot is a set of base pairs between a region within a stem-loop and nucleotides outside of this stem-loop; this motif is very important for numerous functional structures. It is important for any computational design algorithm to take into account these interactions to give a reliable result for any structures that include pseudoknots. In our study, we experimentally validated synthetic ribozymes designed by Enzymer, which implements algorithms allowing for the design of pseudoknots. Enzymer is a program that uses an inverse folding approach to design pseudoknotted RNAs; we used it in this study to design two types of ribozymes. The ribozymes tested were the hammerhead and the glmS, which have a self-cleaving activity that allows them to liberate the new RNA genome copy during rolling-circle replication or to control the expression of the downstream genes, respectively. We demonstrated the efficiency of Enzymer by showing that the pseudoknotted hammerhead and glmS ribozymes sequences it designed were extensively modified compared to wild-type sequences and were still active.
Collapse
Affiliation(s)
- Sabrine Najeh
- INRS - Institut Armand-Frappier, Laval, QC H7V 1B7, Canada
| | - Kasra Zandi
- Software Engineering and Computer Science Department, Concordia University, Montreal, Quebec H3G 1M8, Canada
| | - Nawwaf Kharma
- Electrical and Computer Engineering Department, Concordia University, Montreal, Quebec H3G 1M8, Canada
| | | |
Collapse
|
6
|
Beck JD, Roberts JM, Kitzhaber JM, Trapp A, Serra E, Spezzano F, Hayden EJ. Predicting higher-order mutational effects in an RNA enzyme by machine learning of high-throughput experimental data. Front Mol Biosci 2022; 9:893864. [PMID: 36046603 PMCID: PMC9421044 DOI: 10.3389/fmolb.2022.893864] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Ribozymes are RNA molecules that catalyze biochemical reactions. Self-cleaving ribozymes are a common naturally occurring class of ribozymes that catalyze site-specific cleavage of their own phosphodiester backbone. In addition to their natural functions, self-cleaving ribozymes have been used to engineer control of gene expression because they can be designed to alter RNA processing and stability. However, the rational design of ribozyme activity remains challenging, and many ribozyme-based systems are engineered or improved by random mutagenesis and selection (in vitro evolution). Improving a ribozyme-based system often requires several mutations to achieve the desired function, but extensive pairwise and higher-order epistasis prevent a simple prediction of the effect of multiple mutations that is needed for rational design. Recently, high-throughput sequencing-based approaches have produced data sets on the effects of numerous mutations in different ribozymes (RNA fitness landscapes). Here we used such high-throughput experimental data from variants of the CPEB3 self-cleaving ribozyme to train a predictive model through machine learning approaches. We trained models using either a random forest or long short-term memory (LSTM) recurrent neural network approach. We found that models trained on a comprehensive set of pairwise mutant data could predict active sequences at higher mutational distances, but the correlation between predicted and experimentally observed self-cleavage activity decreased with increasing mutational distance. Adding sequences with increasingly higher numbers of mutations to the training data improved the correlation at increasing mutational distances. Systematically reducing the size of the training data set suggests that a wide distribution of ribozyme activity may be the key to accurate predictions. Because the model predictions are based only on sequence and activity data, the results demonstrate that this machine learning approach allows readily obtainable experimental data to be used for RNA design efforts even for RNA molecules with unknown structures. The accurate prediction of RNA functions will enable a more comprehensive understanding of RNA fitness landscapes for studying evolution and for guiding RNA-based engineering efforts.
Collapse
Affiliation(s)
| | - Jessica M. Roberts
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, United States
| | - Joey M. Kitzhaber
- Department of Computer Science, Boise State University, Boise, ID, United States
| | - Ashlyn Trapp
- Department of Biological Sciences, Boise State University, Boise, ID, United States
| | | | | | - Eric J. Hayden
- Biomolecular Sciences Graduate Programs, Boise State University, Boise, ID, United States
- Department of Computer Science, Boise State University, Boise, ID, United States
- *Correspondence: Eric J. Hayden,
| |
Collapse
|
7
|
Inverse RNA Folding Workflow to Design and Test Ribozymes that Include Pseudoknots. Methods Mol Biol 2021; 2167:113-143. [PMID: 32712918 DOI: 10.1007/978-1-0716-0716-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Ribozymes are RNAs that catalyze reactions. They occur in nature, and can also be evolved in vitro to catalyze novel reactions. This chapter provides detailed protocols for using inverse folding software to design a ribozyme sequence that will fold to a known ribozyme secondary structure and for testing the catalytic activity of the sequence experimentally. This protocol is able to design sequences that include pseudoknots, which is important as all naturally occurring full-length ribozymes have pseudoknots. The starting point is the known pseudoknot-containing secondary structure of the ribozyme and knowledge of any nucleotides whose identity is required for function. The output of the protocol is a set of sequences that have been tested for function. Using this protocol, we were previously successful at designing highly active double-pseudoknotted HDV ribozymes.
Collapse
|
8
|
Yamagami R, Huang R, Bevilacqua PC. Cellular Concentrations of Nucleotide Diphosphate-Chelated Magnesium Ions Accelerate Catalysis by RNA and DNA Enzymes. Biochemistry 2019; 58:3971-3979. [PMID: 31512860 DOI: 10.1021/acs.biochem.9b00578] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
RNAs are involved in myriad cellular events. In general, RNA function is affected by cellular conditions. For instance, molecular crowding promotes RNA folding through compaction of the RNA. Metabolites generally destabilize RNA secondary structure, which improves RNA folding cooperativity and increases the fraction of functional RNA. Our recent studies demonstrate that cellular concentrations of amino acid-chelated magnesium (aaCM) stimulate RNA folding and catalysis while protecting RNAs from magnesium ion-induced degradation. However, effects of other cellular magnesium ion chelators on RNA function have not been tested. Herein, we report that nucleotide diphosphate-chelated magnesium, which is of intermediate strength, promotes RNA catalysis much like aaCM. Nucleotides are some of the major metabolites in cells and have one to three phosphates, which have increasingly tight binding of magnesium. On the basis of binding calculations, ∼85% ATP, ∼40% ADP, and only 5% AMP are estimated to possess a magnesium ion under cellular conditions of 0.50 mM Mg2+free. We tested the self-cleaving activity of the hammerhead ribozyme in the presence of these chelated magnesium species. Our results indicate that NTP-chelated magnesium and NMP-chelated magnesium do not appreciably stimulate RNA catalysis, whereas NDP-chelated magnesium promotes RNA catalysis up to 6.5-fold. Inspired by NDP, we observed similar stimulatory effects for several other Mg2+ diphosphate-containing metabolites, including UDP-GlcNAC and UDP-Glc; in addition, we found similar effects for a DNAzyme. Thus, rate stimulatory effects are general with respect to the diphosphate and nucleic acid enzyme. These results implicate magnesium-chelated diphosphate metabolites as general facilitators of RNA function inside cells.
Collapse
Affiliation(s)
- Ryota Yamagami
- Department of Chemistry , Pennsylvania State University , University Park , Pennsylvania 16802 , United States.,Center for RNA Molecular Biology , Pennsylvania State University , University Park , Pennsylvania 16802 , United States
| | - Ruochuan Huang
- Department of Chemistry , Pennsylvania State University , University Park , Pennsylvania 16802 , United States
| | - Philip C Bevilacqua
- Department of Chemistry , Pennsylvania State University , University Park , Pennsylvania 16802 , United States.,Center for RNA Molecular Biology , Pennsylvania State University , University Park , Pennsylvania 16802 , United States.,Department of Biochemistry and Molecular Biology , Pennsylvania State University , University Park , Pennsylvania 16802 , United States
| |
Collapse
|