1
|
Tsishevskaya AA, Alkhireenko DA, Bayandin RB, Kartashov MY, Ternovoi VA, Gladysheva AV. Untranslated Regions of a Segmented Kindia Tick Virus Genome Are Highly Conserved and Contain Multiple Regulatory Elements for Viral Replication. Microorganisms 2024; 12:239. [PMID: 38399643 PMCID: PMC10893285 DOI: 10.3390/microorganisms12020239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 01/21/2024] [Accepted: 01/22/2024] [Indexed: 02/25/2024] Open
Abstract
Novel segmented tick-borne RNA viruses belonging to the group of Jingmenviruses (JMVs) are widespread across Africa, Asia, Europe, and America. In this work, we obtained whole-genome sequences of two Kindia tick virus (KITV) isolates and performed modeling and the functional annotation of the secondary structure of 5' and 3' UTRs from JMV and KITV viruses. UTRs of various KITV segments are characterized by the following points: (1) the polyadenylated 3' UTR; (2) 5' DAR and 3' DAR motifs; (3) a highly conserved 5'-CACAG-3' pentanucleotide; (4) a binding site of the La protein; (5) multiple UAG sites providing interactions with the MSI1 protein; (6) three homologous sequences in the 5' UTR and 3' UTR of segment 2; (7) the segment 2 3' UTR of a KITV/2017/1 isolate, which comprises two consecutive 40 nucleotide repeats forming a Y-3 structure; (8) a 35-nucleotide deletion in the second repeat of the segment 2 3' UTR of KITV/2018/1 and KITV/2018/2 isolates, leading to a modification of the Y-3 structure; (9) two pseudoknots in the segment 2 3' UTR; (10) the 5' UTR and 3' UTR being represented by patterns of conserved motifs; (11) the 5'-CAAGUG-3' sequence occurring in early UTR hairpins. Thus, we identified regulatory elements in the UTRs of KITV, which are characteristic of orthoflaviviruses. This suggests that they hold functional significance for the replication of JMVs and the evolutionary similarity between orthoflaviviruses and segmented flavi-like viruses.
Collapse
Affiliation(s)
- Anastasia A. Tsishevskaya
- State Research Center of Virology and Biotechnology «Vector», 630559 Kol’tsovo, Russia; (A.A.T.); (D.A.A.); (R.B.B.); (M.Y.K.); (V.A.T.)
- Physics Department, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Daria A. Alkhireenko
- State Research Center of Virology and Biotechnology «Vector», 630559 Kol’tsovo, Russia; (A.A.T.); (D.A.A.); (R.B.B.); (M.Y.K.); (V.A.T.)
- Natural Sciences Department, Novosibirsk State University, 630090 Novosibirsk, Russia
| | - Roman B. Bayandin
- State Research Center of Virology and Biotechnology «Vector», 630559 Kol’tsovo, Russia; (A.A.T.); (D.A.A.); (R.B.B.); (M.Y.K.); (V.A.T.)
| | - Mikhail Yu. Kartashov
- State Research Center of Virology and Biotechnology «Vector», 630559 Kol’tsovo, Russia; (A.A.T.); (D.A.A.); (R.B.B.); (M.Y.K.); (V.A.T.)
| | - Vladimir A. Ternovoi
- State Research Center of Virology and Biotechnology «Vector», 630559 Kol’tsovo, Russia; (A.A.T.); (D.A.A.); (R.B.B.); (M.Y.K.); (V.A.T.)
| | - Anastasia V. Gladysheva
- State Research Center of Virology and Biotechnology «Vector», 630559 Kol’tsovo, Russia; (A.A.T.); (D.A.A.); (R.B.B.); (M.Y.K.); (V.A.T.)
| |
Collapse
|
2
|
Zuber J, Mathews DH. Estimating RNA Secondary Structure Folding Free Energy Changes with efn2. Methods Mol Biol 2024; 2726:1-13. [PMID: 38780725 DOI: 10.1007/978-1-0716-3519-3_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
A number of analyses require estimates of the folding free energy changes of specific RNA secondary structures. These predictions are often based on a set of nearest neighbor parameters that models the folding stability of a RNA secondary structure as the sum of folding stabilities of the structural elements that comprise the secondary structure. In the software suite RNAstructure, the free energy change calculation is implemented in the program efn2. The efn2 program estimates the folding free energy change and the experimental uncertainty in the folding free energy change. It can be run through the graphical user interface for RNAstructure, from the command line, or a web server. This chapter provides detailed protocols for using efn2.
Collapse
Affiliation(s)
- Jeffrey Zuber
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA
| | - David H Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY, USA.
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, USA.
| |
Collapse
|
3
|
Pham TM, Miffin T, Sun H, Sharp KK, Wang X, Zhu M, Hoshika S, Peterson RJ, Benner SA, Kahn JD, Mathews DH. DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. ACS Synth Biol 2023; 12:2750-2763. [PMID: 37671922 PMCID: PMC10510751 DOI: 10.1021/acssynbio.3c00358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Indexed: 09/07/2023]
Abstract
We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-d-2'-deoxyribofuranosyl)-imidazo-[1,2-a]-1,3,5-triazin-(8H)-4-one and 6-amino-3-(1'-β-d-2'-deoxyribofuranosyl)-5-nitro-(1H)-pyridin-2-one, abbreviated as P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find G-Z pairs have stability comparable to that of A-T pairs and should therefore be included as base pairs in structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include the P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing it with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases in which Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.
Collapse
Affiliation(s)
- Tuan M. Pham
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| | - Terrel Miffin
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Hongying Sun
- Department
of Surgery, University of Rochester Medical
Center, Rochester, New York 14642, United States
| | - Kenneth K. Sharp
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Xiaoyu Wang
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - Mingyi Zhu
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| | - Shuichi Hoshika
- Foundation
for Applied Molecular Evolution, Alachua, Florida 32615, United States
| | | | - Steven A. Benner
- Foundation
for Applied Molecular Evolution, Alachua, Florida 32615, United States
| | - Jason D. Kahn
- Department
of Chemistry & Biochemistry, University
of Maryland, College
Park, Maryland 20742, United States
| | - David H. Mathews
- Department
of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, New York 14642, United States
| |
Collapse
|
4
|
Zhou T, Dai N, Li S, Ward M, Mathews DH, Huang L. RNA design via structure-aware multifrontier ensemble optimization. Bioinformatics 2023; 39:i563-i571. [PMID: 37387188 DOI: 10.1093/bioinformatics/btad252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION RNA design is the search for a sequence or set of sequences that will fold to desired structure, also known as the inverse problem of RNA folding. However, the sequences designed by existing algorithms often suffer from low ensemble stability, which worsens for long sequence design. Additionally, for many methods only a small number of sequences satisfying the MFE criterion can be found by each run of design. These drawbacks limit their use cases. RESULTS We propose an innovative optimization paradigm, SAMFEO, which optimizes ensemble objectives (equilibrium probability or ensemble defect) by iterative search and yields a very large number of successfully designed RNA sequences as byproducts. We develop a search method which leverages structure level and ensemble level information at different stages of the optimization: initialization, sampling, mutation, and updating. Our work, while being less complicated than others, is the first algorithm that is able to design thousands of RNA sequences for the puzzles from the Eterna100 benchmark. In addition, our algorithm solves the most Eterna100 puzzles among all the general optimization based methods in our study. The only baseline solving more puzzles than our work is dependent on handcrafted heuristics designed for a specific folding model. Surprisingly, our approach shows superiority on designing long sequences for structures adapted from the database of 16S Ribosomal RNAs. AVAILABILITY AND IMPLEMENTATION Our source code and data used in this article is available at https://github.com/shanry/SAMFEO.
Collapse
Affiliation(s)
- Tianshuo Zhou
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Ning Dai
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Sizhen Li
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| | - Max Ward
- Department of Computer Science and Software Engineering, The University of Western Australia, Perth, Australia
| | - David H Mathews
- Department of Biochemistry and Biophysics, University of Rochester Medical Center, Rochester, NY 14642, United States
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642, United States
| | - Liang Huang
- School of Electrical Engineering and Computer Science, Oregon State University, Corvalli OR 97330, United States
| |
Collapse
|
5
|
Pham TM, Miffin T, Sun H, Sharp KK, Wang X, Zhu M, Hoshika S, Peterson RJ, Benner SA, Kahn JD, Mathews DH. DNA Structure Design Is Improved Using an Artificially Expanded Alphabet of Base Pairs Including Loop and Mismatch Thermodynamic Parameters. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.06.543917. [PMID: 37333404 PMCID: PMC10274641 DOI: 10.1101/2023.06.06.543917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
We show that in silico design of DNA secondary structures is improved by extending the base pairing alphabet beyond A-T and G-C to include the pair between 2-amino-8-(1'-β-D-2'-deoxyribofuranosyl)-imidazo-[1,2- a ]-1,3,5-triazin-(8 H )-4-one and 6-amino-3-(1'-β-D-2'-deoxyribofuranosyl)-5-nitro-(1 H )-pyridin-2-one, simply P and Z. To obtain the thermodynamic parameters needed to include P-Z pairs in the designs, we performed 47 optical melting experiments and combined the results with previous work to fit a new set of free energy and enthalpy nearest neighbor folding parameters for P-Z pairs and G-Z wobble pairs. We find that G-Z pairs have stability comparable to A-T pairs and therefore should be considered quantitatively by structure prediction and design algorithms. Additionally, we extrapolated the set of loop, terminal mismatch, and dangling end parameters to include P and Z nucleotides. These parameters were incorporated into the RNAstructure software package for secondary structure prediction and analysis. Using the RNAstructure Design program, we solved 99 of the 100 design problems posed by Eterna using the ACGT alphabet or supplementing with P-Z pairs. Extending the alphabet reduced the propensity of sequences to fold into off-target structures, as evaluated by the normalized ensemble defect (NED). The NED values were improved relative to those from the Eterna example solutions in 91 of 99 cases where Eterna-player solutions were provided. P-Z-containing designs had average NED values of 0.040, significantly below the 0.074 of standard-DNA-only designs, and inclusion of the P-Z pairs decreased the time needed to converge on a design. This work provides a sample pipeline for inclusion of any expanded alphabet nucleotides into prediction and design workflows.
Collapse
Affiliation(s)
- Tuan M. Pham
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| | - Terrel Miffin
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Hongying Sun
- Department of Surgery, University of Rochester Medical Center, Rochester, NY
| | - Kenneth K. Sharp
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Xiaoyu Wang
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - Mingyi Zhu
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| | | | | | | | - Jason D. Kahn
- Department of Chemistry & Biochemistry, University of Maryland, College Park, MD
| | - David H. Mathews
- Department of Biochemistry & Biophysics and Center for RNA Biology, University of Rochester Medical Center, Rochester, NY
| |
Collapse
|
6
|
Inverse RNA Folding Workflow to Design and Test Ribozymes that Include Pseudoknots. Methods Mol Biol 2021; 2167:113-143. [PMID: 32712918 DOI: 10.1007/978-1-0716-0716-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Ribozymes are RNAs that catalyze reactions. They occur in nature, and can also be evolved in vitro to catalyze novel reactions. This chapter provides detailed protocols for using inverse folding software to design a ribozyme sequence that will fold to a known ribozyme secondary structure and for testing the catalytic activity of the sequence experimentally. This protocol is able to design sequences that include pseudoknots, which is important as all naturally occurring full-length ribozymes have pseudoknots. The starting point is the known pseudoknot-containing secondary structure of the ribozyme and knowledge of any nucleotides whose identity is required for function. The output of the protocol is a set of sequences that have been tested for function. Using this protocol, we were previously successful at designing highly active double-pseudoknotted HDV ribozymes.
Collapse
|
7
|
Koodli RV, Keep B, Coppess KR, Portela F, Das R. EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame. PLoS Comput Biol 2019; 15:e1007059. [PMID: 31247029 PMCID: PMC6597038 DOI: 10.1371/journal.pcbi.1007059] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 04/30/2019] [Indexed: 11/18/2022] Open
Abstract
Emerging RNA-based approaches to disease detection and gene therapy require RNA sequences that fold into specific base-pairing patterns, but computational algorithms generally remain inadequate for these secondary structure design tasks. The Eterna project has crowdsourced RNA design to human video game players in the form of puzzles that reach extraordinary difficulty. Here, we demonstrate that Eterna participants' moves and strategies can be leveraged to improve automated computational RNA design. We present an eternamoves-large repository consisting of 1.8 million of player moves on 12 of the most-played Eterna puzzles as well as an eternamoves-select repository of 30,477 moves from the top 72 players on a select set of more advanced puzzles. On eternamoves-select, we present a multilayer convolutional neural network (CNN) EternaBrain that achieves test accuracies of 51% and 34% in base prediction and location prediction, respectively, suggesting that top players' moves are partially stereotyped. Pipelining this CNN's move predictions with single-action-playout (SAP) of six strategies compiled by human players solves 61 out of 100 independent puzzles in the Eterna100 benchmark. EternaBrain-SAP outperforms previously published RNA design algorithms and achieves similar or better performance than a newer generation of deep learning methods, while being largely orthogonal to these other methods. Our study provides useful lessons for future efforts to achieve human-competitive performance with automated RNA design algorithms.
Collapse
Affiliation(s)
- Rohan V. Koodli
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
| | - Benjamin Keep
- Department of Education, Stanford University, Stanford, CA, United States of America
| | - Katherine R. Coppess
- Department of Physics, Stanford University, Stanford, CA, United States of America
| | - Fernando Portela
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
| | | | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, United States of America
- Department of Physics, Stanford University, Stanford, CA, United States of America
| |
Collapse
|
8
|
Pankovics P, Boros Á, Kiss T, Engelmann P, Reuter G. Genetically highly divergent RNA virus with astrovirus-like (5'-end) and hepevirus-like (3'-end) genome organization in carnivorous birds, European roller (Coracias garrulus). INFECTION GENETICS AND EVOLUTION 2019; 71:215-223. [PMID: 30959207 DOI: 10.1016/j.meegid.2019.04.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2019] [Revised: 04/02/2019] [Accepted: 04/04/2019] [Indexed: 11/15/2022]
Abstract
Astroviruses (family Astroviridae) and hepeviruses (family Hepeviridae) are small, non-enveloped viruses with genetically diverse +ssRNA genome thought to be enteric pathogens infecting vertebrates including humans. Recently, many novel astro- and hepatitis E virus-like +ssRNA viruses have been described from lower vertebrate species. The non-structural proteins of astro- and hepeviruses are highly diverse, but the structural/capsid proteins represent a common phylogenetic position shed the light of their common origin by inter-viral recombination. In this study, a novel astrovirus/hepevirus-like virus with +ssRNA genome (Er/SZAL5/HUN/2011, MK450332) was serendipitously identified and characterized from 3 (8.5%) out of 35 European roller (Coracias garrulus) faecal samples by RT-PCR in Hungary. The complete genome of Er/SZAL5/HUN/2011 (MK450332) is 8402 nt-long and potentially composed three non-overlapping open reading frames (ORFs): ORF1a (4449 nt/1482aa), ORF1b (1206 nt/401aa) and ORF2 (1491 nt/496aa). The ORF1ab has an astrovirus-like genome organization containing the non-structural conserved elements (TM, CC, NLS, VPg) and enzyme residues (trypsine-like protease, RNA-dependent RNA-polymerase) with low amino acid sequence identity, 15% (ORF1a) and 44% (ORF1b), to astroviruses. Supposedly the ORF2 is a capsid protein but neither the astrovirus-like subgenomic RNA promoter (sgRNA) nor the astrovirus-like capsid characteristics have been identifiable. However, the predicted capsid protein (ORF2) showed 26% identity to the corresponding protein of hepevirus-like novel Rana hepevirus (MH330682). This novel +ssRNA virus strain Er/SZAL5/HUN/2011 with astrovirus-like genome organization in the non-structural genome regions (ORF1a and ORF1b) and Rana hepevirus-related capsid (ORF2) protein represent a potentially recombinant virus species and supports the common origin hypothesis, although, the taxonomic position of the studied virus is still under discussion.
Collapse
Affiliation(s)
- Péter Pankovics
- Department of Medical Microbiology and Immunology, Medical School, University of Pécs, Pécs, Hungary
| | - Ákos Boros
- Department of Medical Microbiology and Immunology, Medical School, University of Pécs, Pécs, Hungary
| | - Tamás Kiss
- Hungarian Ornithological and Nature Conservation Society, Budapest, Hungary
| | - Péter Engelmann
- Department of Immunology and Biotechnology, Clinical Center, Medical School, University of Pécs, Pécs, Hungary
| | - Gábor Reuter
- Department of Medical Microbiology and Immunology, Medical School, University of Pécs, Pécs, Hungary.
| |
Collapse
|