1
|
Ontiveros-Palacios N, Cooke E, Nawrocki E, Triebel S, Marz M, Rivas E, Griffiths-Jones S, Petrov A, Bateman A, Sweeney B. Rfam 15: RNA families database in 2025. Nucleic Acids Res 2025; 53:D258-D267. [PMID: 39526405 PMCID: PMC11701678 DOI: 10.1093/nar/gkae1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/09/2024] [Accepted: 10/24/2024] [Indexed: 11/16/2024] Open
Abstract
The Rfam database, a widely used repository of non-coding RNA families, has undergone significant updates in release 15.0. This paper introduces major improvements, including the expansion of Rfamseq to 26 106 genomes, a 76% increase, incorporating the latest UniProt reference proteomes and additional viral genomes. Sixty-five RNA families were enhanced using experimentally determined 3D structures, improving the accuracy of consensus secondary structures and annotations. R-scape covariation analysis was used to refine structural predictions in 26 families. Gene Ontology (GO) and Sequence Ontology annotations were comprehensively updated, increasing GO term coverage to 75% of families. The release adds 14 new Hepatitis C Virus RNA families and completes microRNA family synchronization with miRBase, resulting in 1603 microRNA families. New data types, including FULL alignments, have been implemented. Integration with APICURON for improved curator attribution and multiple website enhancements further improve user experience. These updates significantly expand Rfam's coverage and improve annotation quality, reinforcing its critical role in RNA research, genome annotation and the development of machine learning models. Rfam is freely available at https://rfam.org.
Collapse
Affiliation(s)
- Nancy Ontiveros-Palacios
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emma Cooke
- SciBite Limited, BioData Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK
| | - Eric P Nawrocki
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sandra Triebel
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Sam Griffiths-Jones
- School of Biological Sciences, Faculty of Medicine, Biology and Health, Michael Smith Building, The University of Manchester, Dover St, Manchester M13 9NT, UK
| | | | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Blake Sweeney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
2
|
Karan A, Rivas E. All-at-once RNA folding with 3D motif prediction framed by evolutionary information. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.17.628809. [PMID: 39764046 PMCID: PMC11702757 DOI: 10.1101/2024.12.17.628809] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2025]
Abstract
Structural RNAs exhibit a vast array of recurrent short 3D elements involving non-Watson-Crick interactions that help arrange canonical double helices into tertiary structures. We present CaCoFold-R3D, a probabilistic grammar that predicts these RNA 3D motifs (also termed modules) jointly with RNA secondary structure over a sequence or alignment. CaCoFold-R3D uses evolutionary information present in an RNA alignment to reliably identify canonical helices (including pseudoknots) by covariation. We further introduce the R3D grammars, which also exploit helix covariation that constrains the positioning of the mostly non-covarying RNA 3D motifs. Our method runs predictions over an almost-exhaustive list of over fifty known RNA motifs (everything). Motifs can appear in any non-helical loop region (including 3-way, 4-way and higher junctions) (everywhere). All structural motifs as well as the canonical helices are arranged into one single structure predicted by one single joint probabilistic grammar (all-at-once). Our results demonstrate that CaCoFold-R3D is a valid alternative for predicting the all-residue interactions present in a RNA 3D structure. Furthermore, CaCoFold-R3D is fast and easily customizable for novel motif discovery.
Collapse
|
3
|
Aleksashin NA, Langeberg CJ, Shelke RR, Yin T, Cate JHD. RNA elements required for the high efficiency of West Nile virus-induced ribosomal frameshifting. Nucleic Acids Res 2024:gkae1248. [PMID: 39698810 DOI: 10.1093/nar/gkae1248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Revised: 11/29/2024] [Accepted: 12/04/2024] [Indexed: 12/20/2024] Open
Abstract
West Nile virus (WNV) requires programmed -1 ribosomal frameshifting for translation of the viral genome. The efficiency of WNV frameshifting is among the highest known. However, it remains unclear why WNV exhibits such a high frameshifting efficiency. Here, we employed dual-luciferase reporter assays in multiple human cell lines to probe the RNA requirements for highly efficient frameshifting by the WNV genome. We find that both the sequence and structure of a predicted RNA pseudoknot downstream of the slippery sequence-the codons in the genome on which frameshifting occurs-are required for efficient frameshifting. We also show that multiple proposed RNA secondary structures downstream of the slippery sequence are inconsistent with efficient frameshifting. We also find that the base of the pseudoknot structure likely is unfolded prior to frameshifting. Finally, we show that many mutations in the WNV slippery sequence allow efficient frameshifting, but often result in aberrant shifting into other reading frames. Mutations in the slippery sequence also support a model in which frameshifting occurs concurrent with or after ribosome translocation. These results provide a comprehensive analysis of the molecular determinants of WNV-programmed ribosomal frameshifting and provide a foundation for the development of new antiviral strategies targeting viral gene expression.
Collapse
Affiliation(s)
- Nikolay A Aleksashin
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Conner J Langeberg
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Rohan R Shelke
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Tianhao Yin
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Jamie H D Cate
- Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA
- Department of Chemistry, University of California, Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
4
|
McKinley LN, Meyer MO, Sebastian A, Chang BK, Messina KJ, Albert I, Bevilacqua PC. Direct testing of natural twister ribozymes from over a thousand organisms reveals a broad tolerance for structural imperfections. Nucleic Acids Res 2024; 52:14133-14153. [PMID: 39498486 DOI: 10.1093/nar/gkae908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 09/25/2024] [Accepted: 10/02/2024] [Indexed: 11/13/2024] Open
Abstract
Twister ribozymes are an extensively studied class of nucleolytic RNAs. Thousands of natural twisters have been proposed using sequence homology and structural descriptors. Yet, most of these candidates have not been validated experimentally. To address this gap, we developed Cleavage High-Throughput Assay (CHiTA), a high-throughput pipeline utilizing massively parallel oligonucleotide synthesis and next-generation sequencing to test putative ribozymes en masse in a scarless fashion. As proof of principle, we applied CHiTA to a small set of known active and mutant ribozymes. We then used CHiTA to test two large sets of naturally occurring twister ribozymes: over 1600 previously reported putative twisters and ∼1000 new candidate twisters. The new candidates were identified computationally in ∼1000 organisms, representing a massive increase in the number of ribozyme-harboring organisms. Approximately 94% of the twisters we tested were active and cleaved site-specifically. Analysis of their structural features revealed that many substitutions and helical imperfections can be tolerated. We repeated our computational search with structural descriptors updated from this analysis, whereupon we identified and confirmed the first intrinsically active twister ribozyme in mammals. CHiTA broadly expands the number of active twister ribozymes found in nature and provides a powerful method for functional analyses of other RNAs.
Collapse
Affiliation(s)
- Lauren N McKinley
- Department of Chemistry, Pennsylvania State University, 104 Benkovic Building, 376 Science Drive, University Park, PA 16802, USA
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - McCauley O Meyer
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biochemistry and Molecular Biology, Althouse Room 107, 363 Science Drive, Pennsylvania State University, University Park, PA 16802, USA
| | - Aswathy Sebastian
- Huck Institutes of Life Sciences, 401 Huck Life Sciences Building, 432 Science Drive, Pennsylvania State University, University Park, PA 16802, USA
| | - Benjamin K Chang
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biochemistry and Molecular Biology, Althouse Room 107, 363 Science Drive, Pennsylvania State University, University Park, PA 16802, USA
| | - Kyle J Messina
- Department of Chemistry, Pennsylvania State University, 104 Benkovic Building, 376 Science Drive, University Park, PA 16802, USA
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Istvan Albert
- Department of Biochemistry and Molecular Biology, Althouse Room 107, 363 Science Drive, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of Life Sciences, 401 Huck Life Sciences Building, 432 Science Drive, Pennsylvania State University, University Park, PA 16802, USA
| | - Philip C Bevilacqua
- Department of Chemistry, Pennsylvania State University, 104 Benkovic Building, 376 Science Drive, University Park, PA 16802, USA
- Center for RNA Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biochemistry and Molecular Biology, Althouse Room 107, 363 Science Drive, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
5
|
Tong Y, Childs-Disney JL, Disney MD. Targeting RNA with small molecules, from RNA structures to precision medicines: IUPHAR review: 40. Br J Pharmacol 2024; 181:4152-4173. [PMID: 39224931 DOI: 10.1111/bph.17308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/10/2024] [Accepted: 07/09/2024] [Indexed: 09/04/2024] Open
Abstract
RNA plays important roles in regulating both health and disease biology in all kingdoms of life. Notably, RNA can form intricate three-dimensional structures, and their biological functions are dependent on these structures. Targeting the structured regions of RNA with small molecules has gained increasing attention over the past decade, because it provides both chemical probes to study fundamental biology processes and lead medicines for diseases with unmet medical needs. Recent advances in RNA structure prediction and determination and RNA biology have accelerated the rational design and development of RNA-targeted small molecules to modulate disease pathology. However, challenges remain in advancing RNA-targeted small molecules towards clinical applications. This review summarizes strategies to study RNA structures, to identify small molecules recognizing these structures, and to augment the functionality of RNA-binding small molecules. We focus on recent advances in developing RNA-targeted small molecules as potential therapeutics in a variety of diseases, encompassing different modes of actions and targeting strategies. Furthermore, we present the current gaps between early-stage discovery of RNA-binding small molecules and their clinical applications, as well as a roadmap to overcome these challenges in the near future.
Collapse
Affiliation(s)
- Yuquan Tong
- Department of Chemistry, The Scripps Research Institute, Jupiter, Florida, USA
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| | - Jessica L Childs-Disney
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| | - Matthew D Disney
- Department of Chemistry, The Scripps Research Institute, Jupiter, Florida, USA
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| |
Collapse
|
6
|
Aleksashin NA, Langeberg CJ, Shelke RR, Yin T, Cate JHD. RNA elements required for the high efficiency of West Nile Virus-induced ribosomal frameshifting. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.16.618579. [PMID: 39464146 PMCID: PMC11507841 DOI: 10.1101/2024.10.16.618579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/29/2024]
Abstract
West Nile Virus (WNV), a member of the Flaviviridae family, requires programmed -1 ribosomal frameshifting (PRF) for translation of the viral genome. The efficiency of WNV frameshifting is among the highest observed to date. Despite structural similarities to frameshifting sites in other viruses, it remains unclear why WNV exhibits such a high frameshifting efficiency. Here we employed dual-luciferase reporter assays in multiple human cell lines to probe the RNA requirements for highly efficient frameshifting by the WNV genome. We find that both the sequence and structure of a predicted RNA pseudoknot downstream of the slippery sequence-the codons in the genome on which frameshifting occurs-are required for efficient frameshifting. We also show that multiple proposed RNA secondary structures downstream of the slippery sequence are inconsistent with efficient frameshifting. We mapped the most favorable distance between the slippery site and the pseudoknot essential for optimal frameshifting, and found the base of the pseudoknot structure likely is unfolded prior to frameshifting. Finally, we find that many mutations in the WNV slippery sequence allow efficient frameshifting, but often result in aberrant shifting into other reading frames. Mutations in the slippery sequence also support a model in which frameshifting occurs concurrent with or after translocation of the mRNA and tRNA on the ribosome. These results provide a comprehensive analysis of the molecular determinants of WNV-programmed ribosomal frameshifting and provide a foundation for the development of new antiviral strategies targeting viral gene expression.
Collapse
Affiliation(s)
- Nikolay A. Aleksashin
- Innovative Genomics Institute, University of California-Berkeley, Berkeley, CA, USA
- Department of Molecular & Cell Biology, University of California-Berkeley, Berkeley, CA, USA
| | - Conner J. Langeberg
- Innovative Genomics Institute, University of California-Berkeley, Berkeley, CA, USA
- Department of Molecular & Cell Biology, University of California-Berkeley, Berkeley, CA, USA
| | - Rohan R. Shelke
- Department of Molecular & Cell Biology, University of California-Berkeley, Berkeley, CA, USA
| | - Tianhao Yin
- Department of Molecular & Cell Biology, University of California-Berkeley, Berkeley, CA, USA
| | - Jamie H. D. Cate
- Innovative Genomics Institute, University of California-Berkeley, Berkeley, CA, USA
- Department of Molecular & Cell Biology, University of California-Berkeley, Berkeley, CA, USA
- Department of Chemistry, University of California-Berkeley, Berkeley, CA, USA
| |
Collapse
|
7
|
Ontiveros N, Cooke E, Nawrocki EP, Triebel S, Marz M, Rivas E, Griffiths-Jones S, Petrov AI, Bateman A, Sweeney B. Rfam 15: RNA families database in 2025. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614430. [PMID: 39372780 PMCID: PMC11451735 DOI: 10.1101/2024.09.23.614430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
The Rfam database, a widely-used repository of non-coding RNA (ncRNA) families, has undergone significant updates in release 15.0. This paper introduces major improvements, including the expansion of Rfamseq to 26,106 genomes, a 76% increase, incorporating the latest UniProt reference proteomes and additional viral genomes. Sixty-five RNA families were enhanced using experimentally determined 3D structures, improving the accuracy of consensus secondary structures and annotations. R-scape covariation analysis was used to refine structural predictions in 26 families. Gene Ontology and Sequence Ontology annotations were comprehensively updated, increasing GO term coverage to 75% of families. The release adds 14 new Hepatitis C Virus RNA families and completes microRNA family synchronisation with miRBase, resulting in 1,603 microRNA families. New data types, including FULL alignments, have been implemented. Integration with APICURON for improved curator attribution and multiple website enhancements further improve user experience. These updates significantly expand Rfam's coverage and improve annotation quality, reinforcing its critical role in RNA research, genome annotation, and the development of machine learning models. Rfam is freely available at https://rfam.org.
Collapse
Affiliation(s)
- Nancy Ontiveros
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Eric P Nawrocki
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA (EPN)
| | - Sandra Triebel
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743, Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743, Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Sam Griffiths-Jones
- School of Biological Sciences, Faculty of Medicine, Biology and Health, Michael Smith Building, The University of Manchester, Manchester M13 9GB, UK
| | | | - Alex Bateman
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Blake Sweeney
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
8
|
Eich T, O’Leary C, Moss W. Intronic RNA secondary structural information captured for the human MYC pre-mRNA. NAR Genom Bioinform 2024; 6:lqae143. [PMID: 39450312 PMCID: PMC11500451 DOI: 10.1093/nargab/lqae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 09/06/2024] [Accepted: 10/04/2024] [Indexed: 10/26/2024] Open
Abstract
To address the lack of intronic reads in secondary structure probing data for the human MYC pre-mRNA, we developed a method that combines spliceosomal inhibition with RNA probing and sequencing. Here, the SIRP-seq method was applied to study the secondary structure of human MYC RNAs by chemically probing HeLa cells with dimethyl sulfate in the presence of the small molecule spliceosome inhibitor pladienolide B. Pladienolide B binds to the SF3B complex of the spliceosome to inhibit intron removal during splicing, resulting in retained intronic sequences. This method was used to increase the read coverage over intronic regions of MYC. The purpose for increasing coverage across introns was to generate complete reactivity profiles for intronic sequences via the DMS-MaPseq approach. Notably, depth was sufficient for analysis by the program DRACO, which was able to deduce distinct reactivity profiles and predict multiple secondary structural conformations as well as their suggested stoichiometric abundances. The results presented here provide a new method for intronic RNA secondary structural analyses, as well as specific structural insights relevant to MYC RNA splicing regulation and therapeutic targeting.
Collapse
Affiliation(s)
- Taylor O Eich
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Collin A O’Leary
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
- Current Address: Department of Biology and Chemistry, Cornell College, Mount Vernon, IA 52314, USA
| | - Walter N Moss
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
9
|
Rouse WB, Tompkins VS, O’Leary CA, Moss WN. The RNA secondary structure of androgen receptor-FL and V7 transcripts reveals novel regulatory regions. Nucleic Acids Res 2024; 52:6596-6613. [PMID: 38554103 PMCID: PMC11194067 DOI: 10.1093/nar/gkae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 03/18/2024] [Indexed: 04/01/2024] Open
Abstract
The androgen receptor (AR) is a ligand-dependent nuclear transcription factor belonging to the steroid hormone nuclear receptor family. Due to its roles in regulating cell proliferation and differentiation, AR is tightly regulated to maintain proper levels of itself and the many genes it controls. AR dysregulation is a driver of many human diseases including prostate cancer. Though this dysregulation often occurs at the RNA level, there are many unknowns surrounding post-transcriptional regulation of AR mRNA, particularly the role that RNA secondary structure plays. Thus, a comprehensive analysis of AR transcript secondary structure is needed. We address this through the computational and experimental analyses of two key isoforms, full length (AR-FL) and truncated (AR-V7). Here, a combination of in-cell RNA secondary structure probing experiments (targeted DMS-MaPseq) and computational predictions were used to characterize the static structural landscape and conformational dynamics of both isoforms. Additionally, in-cell assays were used to identify functionally relevant structures in the 5' and 3' UTRs of AR-FL. A notable example is a conserved stem loop structure in the 5'UTR of AR-FL that can bind to Poly(RC) Binding Protein 2 (PCBP2). Taken together, our results reveal novel features that regulate AR expression.
Collapse
Affiliation(s)
- Warren B Rouse
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Van S Tompkins
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Collin A O’Leary
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
- Current Address: Departments of Biology and Chemistry, Cornell College, Mount Vernon, IA 52314, USA
| | - Walter N Moss
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
10
|
Trinity L, Stege U, Jabbari H. Tying the knot: Unraveling the intricacies of the coronavirus frameshift pseudoknot. PLoS Comput Biol 2024; 20:e1011787. [PMID: 38713726 PMCID: PMC11108256 DOI: 10.1371/journal.pcbi.1011787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Revised: 05/21/2024] [Accepted: 04/27/2024] [Indexed: 05/09/2024] Open
Abstract
Understanding and targeting functional RNA structures towards treatment of coronavirus infection can help us to prepare for novel variants of SARS-CoV-2 (the virus causing COVID-19), and any other coronaviruses that could emerge via human-to-human transmission or potential zoonotic (inter-species) events. Leveraging the fact that all coronaviruses use a mechanism known as -1 programmed ribosomal frameshifting (-1 PRF) to replicate, we apply algorithms to predict the most energetically favourable secondary structures (each nucleotide involved in at most one pairing) that may be involved in regulating the -1 PRF event in coronaviruses, especially SARS-CoV-2. We compute previously unknown most stable structure predictions for the frameshift site of coronaviruses via hierarchical folding, a biologically motivated framework where initial non-crossing structure folds first, followed by subsequent, possibly crossing (pseudoknotted), structures. Using mutual information from 181 coronavirus sequences, in conjunction with the algorithm KnotAli, we compute secondary structure predictions for the frameshift site of different coronaviruses. We then utilize the Shapify algorithm to obtain most stable SARS-CoV-2 secondary structure predictions guided by frameshift sequence-specific and genome-wide experimental data. We build on our previous secondary structure investigation of the singular SARS-CoV-2 68 nt frameshift element sequence, by using Shapify to obtain predictions for 132 extended sequences and including covariation information. Previous investigations have not applied hierarchical folding to extended length SARS-CoV-2 frameshift sequences. By doing so, we simulate the effects of ribosome interaction with the frameshift site, providing insight to biological function. We contribute in-depth discussion to contextualize secondary structure dual-graph motifs for SARS-CoV-2, highlighting the energetic stability of the previously identified 3_8 motif alongside the known dominant 3_3 and 3_6 (native-type) -1 PRF structures. Using a combination of thermodynamic methods and sequence covariation, our novel predictions suggest function of the attenuator hairpin via previously unknown pseudoknotted base pairing. While certain initial RNA folding is consistent, other pseudoknotted base pairs form which indicate potential conformational switching between the two structures.
Collapse
Affiliation(s)
- Luke Trinity
- Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada
| | - Ulrike Stege
- Department of Computer Science, University of Victoria, Victoria, British Columbia, Canada
| | - Hosna Jabbari
- Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta, Canada
- Institute on Aging and Lifelong Health, Victoria, British Columbia, Canada
| |
Collapse
|
11
|
Mediati DG, Dan W, Lalaouna D, Dinh H, Pokhrel A, Rowell KN, Michie KA, Stinear TP, Cain AK, Tree JJ. The 3' UTR of vigR is required for virulence in Staphylococcus aureus and has expanded through STAR sequence repeat insertions. Cell Rep 2024; 43:114082. [PMID: 38583155 DOI: 10.1016/j.celrep.2024.114082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 01/17/2024] [Accepted: 03/25/2024] [Indexed: 04/09/2024] Open
Abstract
Infections caused by methicillin-resistant Staphylococcus aureus (MRSA) are alarmingly common, and treatment is confined to last-line antibiotics. Vancomycin is the treatment of choice for MRSA bacteremia, and treatment failure is often associated with vancomycin-intermediate S. aureus isolates. The regulatory 3' UTR of the vigR mRNA contributes to vancomycin tolerance and upregulates the autolysin IsaA. Using MS2-affinity purification coupled with RNA sequencing, we find that the vigR 3' UTR also regulates dapE, a succinyl-diaminopimelate desuccinylase required for lysine and peptidoglycan synthesis, suggesting a broader role in controlling cell wall metabolism and vancomycin tolerance. Deletion of the 3' UTR increased virulence, while the isaA mutant is completely attenuated in a wax moth larvae model. Sequence and structural analyses of vigR indicated that the 3' UTR has expanded through the acquisition of Staphylococcus aureus repeat insertions that contribute sequence for the isaA interaction seed and may functionalize the 3' UTR.
Collapse
Affiliation(s)
- Daniel G Mediati
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia; Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo, NSW, Australia.
| | - William Dan
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | - David Lalaouna
- Université de Strasbourg, CNRS, ARN UPR 9002, Strasbourg, France
| | - Hue Dinh
- School of Natural Sciences, ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW, Australia
| | - Alaska Pokhrel
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Ultimo, NSW, Australia; School of Natural Sciences, ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW, Australia
| | - Keiran N Rowell
- Structural Biology Facility, University of New South Wales, Sydney, NSW, Australia
| | - Katharine A Michie
- Structural Biology Facility, University of New South Wales, Sydney, NSW, Australia
| | - Timothy P Stinear
- Department of Microbiology and Immunology, Peter Doherty Institute, University of Melbourne, Melbourne, VIC, Australia
| | - Amy K Cain
- School of Natural Sciences, ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney, NSW, Australia
| | - Jai J Tree
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
| |
Collapse
|
12
|
Sumi S, Hamada M, Saito H. Deep generative design of RNA family sequences. Nat Methods 2024; 21:435-443. [PMID: 38238559 DOI: 10.1038/s41592-023-02148-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 12/07/2023] [Indexed: 03/13/2024]
Abstract
RNA engineering has immense potential to drive innovation in biotechnology and medicine. Despite its importance, a versatile platform for the automated design of functional RNA is still lacking. Here, we propose RNA family sequence generator (RfamGen), a deep generative model that designs RNA family sequences in a data-efficient manner by explicitly incorporating alignment and consensus secondary structure information. RfamGen can generate novel and functional RNA family sequences by sampling points from a semantically rich and continuous representation. We have experimentally demonstrated the versatility of RfamGen using diverse RNA families. Furthermore, we confirmed the high success rate of RfamGen in designing functional ribozymes through a quantitative massively parallel assay. Notably, RfamGen successfully generates artificial sequences with higher activity than natural sequences. Overall, RfamGen significantly improves our ability to design functional RNA and opens up new potential for generative RNA engineering in synthetic biology.
Collapse
Affiliation(s)
- Shunsuke Sumi
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan.
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan.
| | - Hirohide Saito
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan.
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| |
Collapse
|
13
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
14
|
Peterson JM, O'Leary CA, Coppenbarger EC, Tompkins VS, Moss WN. Discovery of RNA secondary structural motifs using sequence-ordered thermodynamic stability and comparative sequence analysis. MethodsX 2023; 11:102275. [PMID: 37448951 PMCID: PMC10336498 DOI: 10.1016/j.mex.2023.102275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 06/28/2023] [Indexed: 07/18/2023] Open
Abstract
Major advances in RNA secondary structural motif prediction have been achieved in the last few years; however, few methods harness the predictive power of multiple approaches to deliver in-depth characterizations of local RNA motifs and their potential functionality. Additionally, most available methods do not predict RNA pseudoknots. This work combines complementary bioinformatic systems into one robust discovery pipeline where: •RNA sequences are folded to search for thermodynamically favorable motifs utilizing ScanFold.•Motifs are expanded and refolded into alternate pseudoknot conformations by Knotty/Iterative HFold.•All conformations are evaluated for covariance via the cm-builder pipeline (Infernal and R-scape).
Collapse
|
15
|
Escamilla-Gutiérrez A, Córdova-Espinoza MG, Sánchez-Monciváis A, Tecuatzi-Cadena B, Regalado-García AG, Medina-Quero K. In silico selection of aptamers for bacterial toxins detection. J Biomol Struct Dyn 2023; 41:10909-10918. [PMID: 36546716 DOI: 10.1080/07391102.2022.2159529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 12/10/2022] [Indexed: 12/24/2022]
Abstract
The most commonly used toxins in biological warfare are staphylococcal enterotoxin B (3SEB), cholera toxin (1XTC), and botulinum toxin (3BTA). Uncovering novel strategies for identifying these toxins is paramount; therefore, aptamers are used for this purpose. Aptamers are single-stranded DNA or RNA oligonucleotides selected via Systematic Evolution of Ligands by Exponential Enrichment (SELEX) with high binding affinity and specificity against target molecules. However, SELEX in vitro is tedious; hence, adopting alternative in silico molecular docking approaches is necessary. We aimed to conduct molecular docking with accessible tools and obtain RNA aptamers. First, 4,820,095 sequences obtained from an initial library of 9.5 × 109 Python script sequences were used. The GraphClust program was used to create representative groups or clusters, and the DoGSiteScorer (https://proteins.plus/) was used to conduct binding site detection of the proteins: 5DO4 (thrombin), 3SEB, 1XTC, and 3BTA. rDock, HDock, and PatchDock were adopted, combining different docking program results (consensus scoring), to improve receptor-ligand prediction. An analysis of the poses and root mean square deviation (RMSD) was performed, and 468 structurally different aptamers were obtained. The DoGSiteScorer program predicted the binding site of each protein to direct the interaction with the aptamer. Candidate aptamers for 3SEB, 1XTC, and 3BTA were selected according to the pose value considering the closeness of the interaction with a lower mean of 45.923 Å, 45.854 Å, and 72.490 Å, respectively.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Alejandro Escamilla-Gutiérrez
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
- Hospital General, Instituto Mexicano del Seguro Social IMSS, Ciudad de México, México
| | - María Guadalupe Córdova-Espinoza
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Ciudad de México, México
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Anahí Sánchez-Monciváis
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Brenda Tecuatzi-Cadena
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Ana Gabriela Regalado-García
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| | - Karen Medina-Quero
- Laboratorio de Inmunología, Escuela Militar de Graduados de Sanidad, Secretaría de la Defensa Nacional, Ciudad de México, México
| |
Collapse
|
16
|
Fremin BJ, Bhatt AS, Kyrpides NC. Identification of over ten thousand candidate structured RNAs in viruses and phages. Comput Struct Biotechnol J 2023; 21:5630-5639. [PMID: 38047235 PMCID: PMC10690425 DOI: 10.1016/j.csbj.2023.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 11/03/2023] [Accepted: 11/03/2023] [Indexed: 12/05/2023] Open
Abstract
Structured RNAs play crucial roles in viruses, exerting influence over both viral and host gene expression. However, the extensive diversity of structured RNAs and their ability to act in cis or trans positions pose challenges for predicting and assigning their functions. While comparative genomics approaches have successfully predicted candidate structured RNAs in microbes on a large scale, similar efforts for viruses have been lacking. In this study, we screened over 5 million DNA and RNA viral sequences, resulting in the prediction of 10,006 novel candidate structured RNAs. These predictions are widely distributed across taxonomy and ecosystem. We found transcriptional evidence for 206 of these candidate structured RNAs in the human fecal microbiome. These candidate RNAs exhibited evidence of nucleotide covariation, indicative of selective pressure maintaining the predicted secondary structures. Our analysis revealed a diverse repertoire of candidate structured RNAs, encompassing a substantial number of putative tRNAs or tRNA-like structures, Rho-independent transcription terminators, and potentially cis-regulatory structures consistently positioned upstream of genes. In summary, our findings shed light on the extensive diversity of structured RNAs in viruses, offering a valuable resource for further investigations into their functional roles and implications in viral gene expression and pave the way for a deeper understanding of the intricate interplay between viruses and their hosts at the molecular level.
Collapse
Affiliation(s)
- Brayon J. Fremin
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Ami S. Bhatt
- Blood and Marrow Transplantation) and Genetics, Stanford University, Stanford, CA, USA
- Department of Medicine (Hematology, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Nikos C. Kyrpides
- Department of Energy, Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Lead Contact, USA
| |
Collapse
|
17
|
Rivas E. RNA covariation at helix-level resolution for the identification of evolutionarily conserved RNA structure. PLoS Comput Biol 2023; 19:e1011262. [PMID: 37450549 PMCID: PMC10370758 DOI: 10.1371/journal.pcbi.1011262] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open
Abstract
Many biologically important RNAs fold into specific 3D structures conserved through evolution. Knowing when an RNA sequence includes a conserved RNA structure that could lead to new biology is not trivial and depends on clues left behind by conservation in the form of covariation and variation. For that purpose, the R-scape statistical test was created to identify from alignments of RNA sequences, the base pairs that significantly covary above phylogenetic expectation. R-scape treats base pairs as independent units. However, RNA base pairs do not occur in isolation. The Watson-Crick (WC) base pairs stack together forming helices that constitute the scaffold that facilitates the formation of the non-WC base pairs, and ultimately the complete 3D structure. The helix-forming WC base pairs carry most of the covariation signal in an RNA structure. Here, I introduce a new measure of statistically significant covariation at helix-level by aggregation of the covariation significance and covariation power calculated at base-pair-level resolution. Performance benchmarks show that helix-level aggregated covariation increases sensitivity in the detection of evolutionarily conserved RNA structure without sacrificing specificity. This additional helix-level sensitivity reveals an artifact that results from using covariation to build an alignment for a hypothetical structure and then testing the alignment for whether its covariation significantly supports the structure. Helix-level reanalysis of the evolutionary evidence for a selection of long non-coding RNAs (lncRNAs) reinforces the evidence against these lncRNAs having a conserved secondary structure.
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
18
|
Kumar A, Daripa P, Maiti S, Jain N. Interaction of hnRNPB1 with Helix-12 of hHOTAIR Reveals the Distinctive Mode of RNA Recognition That Enables the Structural Rearrangement by LCD. Biochemistry 2023; 62:2041-2054. [PMID: 37307069 DOI: 10.1021/acs.biochem.3c00181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The lncRNA human Hox transcript antisense intergenic RNA (hHOTAIR) regulates gene expression by recruiting chromatin modifiers. The prevailing model suggests that hHOTAIR recruits hnRNPB1 to facilitate intermolecular RNA-RNA interactions between the lncRNA HOTAIR and its target gene transcripts. This B1-mediated RNA-RNA interaction modulates the structure of hHOTAIR, attenuates its inhibitory effect on polycomb repression complex 2, and enhances its methyl transferase activity. However, the molecular details by which the nuclear hnRNPB1 protein assembles on the lncRNA HOTAIR have not yet been described. Here, we investigate the molecular interactions between hnRNPB1 and Helix-12 (hHOTAIR). We show that the low-complexity domain segment (LCD) of hnRNPB1 interacts with a strong affinity for Helix-12. Our studies revealed that unbound Helix-12 folds into a specific base-pairing pattern and contains an internal loop that, as determined by thermal melting and NMR studies, exhibits hydrogen bonding between strands and forms the recognition site for the LCD segment. In addition, mutation studies show that the secondary structure of Helix-12 makes an important contribution by acting as a landing pad for hnRNPB1. The secondary structure of Helix-12 is involved in specific interactions with different domains of hnRNPB1. Finally, we show that the LCD unwinds Helix-12 locally, indicating its importance in the hHOTAIR restructuring mechanism.
Collapse
Affiliation(s)
- Ajit Kumar
- CSIR Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Purba Daripa
- CSIR Institute of Genomics and Integrative Biology, New Delhi 110025, India
| | - Souvik Maiti
- CSIR Institute of Genomics and Integrative Biology, New Delhi 110025, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Niyati Jain
- CSIR Institute of Genomics and Integrative Biology, New Delhi 110025, India
| |
Collapse
|
19
|
Gao W, Yang A, Rivas E. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life 2023; 75:471-492. [PMID: 36495545 PMCID: PMC11234323 DOI: 10.1002/iub.2694] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 10/24/2022] [Indexed: 12/14/2022]
Abstract
Covariation induced by compensatory base substitutions in RNA alignments is a great way to deduce conserved RNA structure, in principle. In practice, success depends on many factors, importantly the quality and depth of the alignment and the choice of covariation statistic. Measuring covariation between pairs of aligned positions is easy. However, using covariation to infer evolutionarily conserved RNA structure is complicated by other extraneous sources of covariation such as that resulting from homologous sequences having evolved from a common ancestor. In order to provide evidence of evolutionarily conserved RNA structure, a method to distinguish covariation due to sources other than RNA structure is necessary. Moreover, there are several sorts of artifactually generated covariation signals that can further confound the analysis. Additionally, some covariation signal is difficult to detect due to incomplete comparative data. Here, we investigate and critically discuss the practice of inferring conserved RNA structure by comparative sequence analysis. We provide new methods on how to approach and decide which of the numerous long non-coding RNAs (lncRNAs) have biologically relevant structures.
Collapse
Affiliation(s)
- William Gao
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ann Yang
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
20
|
Rivas E. RNA covariation at helix-level resolution for the identification of evolutionarily conserved RNA structure. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.14.536965. [PMID: 37131783 PMCID: PMC10153129 DOI: 10.1101/2023.04.14.536965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Many biologically important RNAs fold into specific 3D structures conserved through evolution. Knowing when an RNA sequence includes a conserved RNA structure that could lead to new biology is not trivial and depends on clues left behind by conservation in the form of covariation and variation. For that purpose, the R-scape statistical test was created to identify from alignments of RNA sequences, the base pairs that significantly covary above phylogenetic expectation. R-scape treats base pairs as independent units. However, RNA base pairs do not occur in isolation. The Watson-Crick (WC) base pairs stack together forming helices that constitute the scaffold that facilitates the formation of the non-WC base pairs, and ultimately the complete 3D structure. The helix-forming WC base pairs carry most of the covariation signal in an RNA structure. Here, I introduce a new measure of statistically significant covariation at helix-level by aggregation of the covariation significance and covariation power calculated at base-pair-level resolution. Performance benchmarks show that helix-level aggregated covariation increases sensitivity in the detection of evolutionarily conserved RNA structure without sacrificing specificity. This additional helix-level sensitivity reveals an artifact that results from using covariation to build an alignment for a hypothetical structure and then testing the alignment for whether its covariation significantly supports the structure. Helix-level reanalysis of the evolutionary evidence for a selection of long non-coding RNAs (lncRNAs) reinforces the evidence against these lncRNAs having a conserved secondary structure. Availability Helix aggregated E-values are integrated in the R-scape software package (version 2.0.0.p and higher). The R-scape web server eddylab.org/R-scape includes a link to download the source code. Contact elenarivas@fas.harvard.edu. Supplementary information Supplementary data and code are provided with this manuscript at rivaslab.org .
Collapse
|
21
|
Mattick JS. RNA out of the mist. Trends Genet 2023; 39:187-207. [PMID: 36528415 DOI: 10.1016/j.tig.2022.11.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 11/08/2022] [Accepted: 11/27/2022] [Indexed: 12/23/2022]
Abstract
RNA has long been regarded primarily as the intermediate between genes and proteins. It was a surprise then to discover that eukaryotic genes are mosaics of mRNA sequences interrupted by large tracts of transcribed but untranslated sequences, and that multicellular organisms also express many long 'intergenic' and antisense noncoding RNAs (lncRNAs). The identification of small RNAs that regulate mRNA translation and half-life did not disturb the prevailing view that animals and plant genomes are full of evolutionary debris and that their development is mainly supervised by transcription factors. Gathering evidence to the contrary involved addressing the low conservation, expression, and genetic visibility of lncRNAs, demonstrating their cell-specific roles in cell and developmental biology, and their association with chromatin-modifying complexes and phase-separated domains. The emerging picture is that most lncRNAs are the products of genetic loci termed 'enhancers', which marshal generic effector proteins to their sites of action to control cell fate decisions during development.
Collapse
Affiliation(s)
- John S Mattick
- School of Biotechnology and Biomolecular Sciences, UNSW, Sydney, NSW 2052, Australia; UNSW RNA Institute, UNSW, Sydney, NSW 2052, Australia.
| |
Collapse
|
22
|
O’Leary CA, Tompkins VS, Rouse WB, Nam G, Moss W. Thermodynamic and structural characterization of an EBV infected B-cell lymphoma transcriptome. NAR Genom Bioinform 2022; 4:lqac082. [PMID: 36285286 PMCID: PMC9585548 DOI: 10.1093/nargab/lqac082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 09/30/2022] [Accepted: 10/06/2022] [Indexed: 11/12/2022] Open
Abstract
Epstein-Barr virus (EBV) is a widely prevalent human herpes virus infecting over 95% of all adults and is associated with a variety of B-cell cancers and induction of multiple sclerosis. EBV accomplishes this in part by expression of coding and noncoding RNAs and alteration of the host cell transcriptome. To better understand the structures which are forming in the viral and host transcriptomes of infected cells, the RNA structure probing technique Structure-seq2 was applied to the BJAB-B1 cell line (an EBV infected B-cell lymphoma). This resulted in reactivity profiles and secondary structural analyses for over 10000 human mRNAs and lncRNAs, along with 19 lytic and latent EBV transcripts. We report in-depth structural analyses for the human MYC mRNA and the human lncRNA CYTOR. Additionally, we provide a new model for the EBV noncoding RNA EBER2 and provide the first reported model for the EBV tandem terminal repeat RNA. In-depth thermodynamic and structural analyses were carried out with the motif discovery tool ScanFold and RNAfold prediction tool; subsequent covariation analyses were performed on resulting models finding various levels of support. ScanFold results for all analyzed transcripts are made available for viewing and download on the user-friendly RNAStructuromeDB.
Collapse
Affiliation(s)
- Collin A O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Van S Tompkins
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Warren B Rouse
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Gijong Nam
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
23
|
rMSA: a sequence search and alignment algorithm to improve RNA structure modeling. J Mol Biol 2022. [DOI: 10.1016/j.jmb.2022.167904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
24
|
Andrews RJ, Rouse WB, O’Leary CA, Booher NJ, Moss WN. ScanFold 2.0: a rapid approach for identifying potential structured RNA targets in genomes and transcriptomes. PeerJ 2022; 10:e14361. [PMID: 36389431 PMCID: PMC9651051 DOI: 10.7717/peerj.14361] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Accepted: 10/18/2022] [Indexed: 11/11/2022] Open
Abstract
A major limiting factor in target discovery for both basic research and therapeutic intervention is the identification of structural and/or functional RNA elements in genomes and transcriptomes. This was the impetus for the original ScanFold algorithm, which provides maps of local RNA structural stability, evidence of sequence-ordered (potentially evolved) structure, and unique model structures comprised of recurring base pairs with the greatest structural bias. A key step in quantifying this propensity for ordered structure is the prediction of secondary structural stability for randomized sequences which, in the original implementation of ScanFold, is explicitly evaluated. This slow process has limited the rapid identification of ordered structures in large genomes/transcriptomes, which we seek to overcome in this current work introducing ScanFold 2.0. In this revised version of ScanFold, we no longer explicitly evaluate randomized sequence folding energy, but rather estimate it using a machine learning approach. For high randomization numbers, this can increase prediction speeds over 100-fold compared to ScanFold 1.0, allowing for the analysis of large sequences, as well as the use of additional folding algorithms that may be computationally expensive. In the testing of ScanFold 2.0, we re-evaluate the Zika, HIV, and SARS-CoV-2 genomes and compare both the consistency of results and the time of each run to ScanFold 1.0. We also re-evaluate the SARS-CoV-2 genome to assess the quality of ScanFold 2.0 predictions vs several biochemical structure probing datasets and compare the results to those of the original ScanFold program.
Collapse
Affiliation(s)
- Ryan J. Andrews
- Department of Biochemistry, University of Utah, Salt Lake City, UT, United States
| | - Warren B. Rouse
- The Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United States
| | - Collin A. O’Leary
- The Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United States
| | - Nicholas J. Booher
- Infrastructure and Research IT Services, Iowa State University, Ames, IA, United States
| | - Walter N. Moss
- The Roy J Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa, United States
| |
Collapse
|
25
|
Childs-Disney JL, Yang X, Gibaut QMR, Tong Y, Batey RT, Disney MD. Targeting RNA structures with small molecules. Nat Rev Drug Discov 2022; 21:736-762. [PMID: 35941229 PMCID: PMC9360655 DOI: 10.1038/s41573-022-00521-4] [Citation(s) in RCA: 232] [Impact Index Per Article: 77.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/17/2022] [Indexed: 01/07/2023]
Abstract
RNA adopts 3D structures that confer varied functional roles in human biology and dysfunction in disease. Approaches to therapeutically target RNA structures with small molecules are being actively pursued, aided by key advances in the field including the development of computational tools that predict evolutionarily conserved RNA structures, as well as strategies that expand mode of action and facilitate interactions with cellular machinery. Existing RNA-targeted small molecules use a range of mechanisms including directing splicing - by acting as molecular glues with cellular proteins (such as branaplam and the FDA-approved risdiplam), inhibition of translation of undruggable proteins and deactivation of functional structures in noncoding RNAs. Here, we describe strategies to identify, validate and optimize small molecules that target the functional transcriptome, laying out a roadmap to advance these agents into the next decade.
Collapse
Affiliation(s)
| | - Xueyi Yang
- Department of Chemistry, Scripps Research, Jupiter, FL, USA
| | | | - Yuquan Tong
- Department of Chemistry, Scripps Research, Jupiter, FL, USA
| | - Robert T Batey
- Department of Biochemistry, University of Colorado, Boulder, CO, USA.
| | | |
Collapse
|
26
|
Zhang J, Fei Y, Sun L, Zhang QC. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat Methods 2022; 19:1193-1207. [PMID: 36203019 DOI: 10.1038/s41592-022-01623-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 08/23/2022] [Indexed: 11/09/2022]
Abstract
Beyond transferring genetic information, RNAs are molecules with diverse functions that include catalyzing biochemical reactions and regulating gene expression. Most of these activities depend on RNAs' specific structures. Therefore, accurately determining RNA structure is integral to advancing our understanding of RNA functions. Here, we summarize the state-of-the-art experimental and computational technologies developed to evaluate RNA secondary and tertiary structures. We also highlight how the rapid increase of experimental data facilitates the integrative modeling approaches for better resolving RNA structures. Finally, we provide our thoughts on the latest advances and challenges in RNA structure determination methods, as well as on future directions for both experimental approaches and artificial intelligence-based computational tools to model RNA structure. Ultimately, we hope the technological advances will deepen our understanding of RNA biology and facilitate RNA structure-based biomedical research such as designing specific RNA structures for therapeutics and deploying RNA-targeting small-molecule drugs.
Collapse
Affiliation(s)
- Jinsong Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.,Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Yuhan Fei
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.,Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Lei Sun
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China. .,Tsinghua-Peking Center for Life Sciences, Beijing, China.
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China. .,Tsinghua-Peking Center for Life Sciences, Beijing, China.
| |
Collapse
|
27
|
Variant-Specific Analysis Reveals a Novel Long-Range RNA-RNA Interaction in SARS-CoV-2 Orf1a. Int J Mol Sci 2022; 23:ijms231911050. [PMID: 36232353 PMCID: PMC9570297 DOI: 10.3390/ijms231911050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/05/2022] [Accepted: 09/08/2022] [Indexed: 01/08/2023] Open
Abstract
Since the start of the COVID-19 pandemic, understanding the pathology of the SARS-CoV-2 RNA virus and its life cycle has been the priority of many researchers. Currently, new variants of the virus have emerged with various levels of pathogenicity and abundance within the human-host population. Although much of viral pathogenicity is attributed to the viral Spike protein’s binding affinity to human lung cells’ ACE2 receptor, comprehensive knowledge on the distinctive features of viral variants that might affect their life cycle and pathogenicity is yet to be attained. Recent in vivo studies into the RNA structure of the SARS-CoV-2 genome have revealed certain long-range RNA-RNA interactions. Using in silico predictions and a large population of SARS-CoV-2 sequences, we observed variant-specific evolutionary changes for certain long-range RRIs. We also found statistical evidence for the existence of one of the thermodynamic-based RRI predictions, namely Comp1, in the Beta variant sequences. A similar test that disregarded sequence variant information did not, however, lead to significant results. When performing population-based analyses, aggregate tests may fail to identify novel interactions due to variant-specific changes. Variant-specific analyses can result in de novo RRI identification.
Collapse
|
28
|
False-positive IRESes from Hoxa9 and other genes resulting from errors in mammalian 5' UTR annotations. Proc Natl Acad Sci U S A 2022; 119:e2122170119. [PMID: 36037358 PMCID: PMC9456764 DOI: 10.1073/pnas.2122170119] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Hyperconserved genomic sequences have great promise for understanding core biological processes. It has been recently proposed that scores of hyperconserved 5' untranslated regions (UTRs), also known as transcript leaders (hTLs), encode internal ribosome entry sites (IRESes) that drive cap-independent translation, in part, via interactions with ribosome expansion segments. However, the direct functional significance of such interactions has not yet been definitively demonstrated. We provide evidence that the putative IRESes previously reported in Hox gene hTLs are rarely included in transcript leaders. Instead, these regions function independently as transcriptional promoters. In addition, we find the proposed RNA structure of the putative Hoxa9 IRES is not conserved. Instead, sequences previously shown to be essential for putative IRES activity encode a hyperconserved transcription factor binding site (E-box) that contributes to its promoter activity and is bound by several transcription factors, including USF1 and USF2. Similar E-box sequences enhance the promoter activities of other putative Hoxa gene IRESes. Moreover, we provide evidence that the vast majority of hTLs with putative IRES activity overlap transcriptional promoters, enhancers, and 3' splice sites that are most likely responsible for their reported IRES activities. These results argue strongly against recently reported widespread IRES-like activities from hTLs and contradict proposed interactions between ribosomal expansion segment ES9S and putative IRESes. Furthermore, our work underscores the importance of accurate transcript annotations, controls in bicistronic reporter assays, and the power of synthesizing publicly available data from multiple sources.
Collapse
|
29
|
Omoru OB, Pereira F, Janga SC, Manzourolajdad A. A Putative long-range RNA-RNA interaction between ORF8 and Spike of SARS-CoV-2. PLoS One 2022; 17:e0260331. [PMID: 36048827 PMCID: PMC9436084 DOI: 10.1371/journal.pone.0260331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 06/22/2022] [Indexed: 12/15/2022] Open
Abstract
SARS-CoV-2 has affected people worldwide as the causative agent of COVID-19. The virus is related to the highly lethal SARS-CoV-1 responsible for the 2002-2003 SARS outbreak in Asia. Research is ongoing to understand why both viruses have different spreading capacities and mortality rates. Like other beta coronaviruses, RNA-RNA interactions occur between different parts of the viral genomic RNA, resulting in discontinuous transcription and production of various sub-genomic RNAs. These sub-genomic RNAs are then translated into other viral proteins. In this work, we performed a comparative analysis for novel long-range RNA-RNA interactions that may involve the Spike region. Comparing in-silico fragment-based predictions between reference sequences of SARS-CoV-1 and SARS-CoV-2 revealed several predictions amongst which a thermodynamically stable long-range RNA-RNA interaction between (23660-23703 Spike) and (28025-28060 ORF8) unique to SARS-CoV-2 was observed. The patterns of sequence variation using data gathered worldwide further supported the predicted stability of the sub-interacting region (23679-23690 Spike) and (28031-28042 ORF8). Such RNA-RNA interactions can potentially impact viral life cycle including sub-genomic RNA production rates.
Collapse
Affiliation(s)
- Okiemute Beatrice Omoru
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
| | - Filipe Pereira
- Centre for Functional Ecology, Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- IDENTIFICA Genetic Testing, Maia, Portugal
| | - Sarath Chandra Janga
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, Indianapolis, Indiana, United States of America
- Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), Indianapolis, Indiana, United States of America
| | - Amirhossein Manzourolajdad
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
- Department of Computer Science, Colgate University, Hamilton, NY, United States of America
| |
Collapse
|
30
|
Westhof E. Data, data, burning deep, in the forests of the net. Biochem Biophys Res Commun 2022; 633:42-44. [DOI: 10.1016/j.bbrc.2022.09.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/07/2022] [Indexed: 11/28/2022]
|
31
|
Ponting CP, Haerty W. Genome-Wide Analysis of Human Long Noncoding RNAs: A Provocative Review. Annu Rev Genomics Hum Genet 2022; 23:153-172. [PMID: 35395170 DOI: 10.1146/annurev-genom-112921-123710] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Do long noncoding RNAs (lncRNAs) contribute little or substantively to human biology? To address how lncRNA loci and their transcripts, structures, interactions, and functions contribute to human traits and disease, we adopt a genome-wide perspective. We intend to provoke alternative interpretation of questionable evidence and thorough inquiry into unsubstantiated claims. We discuss pitfalls of lncRNA experimental and computational methods as well as opposing interpretations of their results. The majority of evidence, we argue, indicates that most lncRNA transcript models reflect transcriptional noise or provide minor regulatory roles, leaving relatively few human lncRNAs that contribute centrally to human development, physiology, or behavior. These important few tend to be spliced and better conserved but lack a simple syntax relating sequence to structure and mechanism, and so resist simple categorization. This genome-wide view should help investigators prioritize individual lncRNAs based on their likely contribution to human biology.
Collapse
Affiliation(s)
- Chris P Ponting
- MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom;
| | | |
Collapse
|
32
|
Rouse WB, O'Leary CA, Booher NJ, Moss WN. Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome. Sci Rep 2022; 12:14515. [PMID: 36008510 PMCID: PMC9403969 DOI: 10.1038/s41598-022-18699-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 08/17/2022] [Indexed: 11/22/2022] Open
Abstract
RNA plays vital functional roles in almost every component of biology, and these functional roles are often influenced by its folding into secondary and tertiary structures. An important role of RNA secondary structure is in maintaining proper gene regulation; therefore, making accurate predictions of the structures involved in these processes is important. In this study, we have expanded on our previous work that led to the creation of the RNAStructuromeDB. Unlike this previous study that analyzed the human genome at low resolution, we have now scanned the protein-coding human transcriptome at high (single nt) resolution. This provides more robust structure predictions for over 100,000 isoforms of known protein-coding genes. Notably, we also utilize the motif identification tool, ScanFold, to model structures with high propensity for ordered/evolved stability. All data have been uploaded to the RNAStructuromeDB, allowing for easy searching of transcripts, visualization of data tracks (via the Integrative Genomics Viewer or IGV), and download of ScanFold data—including unique highly-ordered motifs. Herein, we provide an example analysis of MAT2A to demonstrate the utility of ScanFold at finding known and novel secondary structures, highlighting regions of potential functionality, and guiding generation of functional hypotheses through use of the data.
Collapse
Affiliation(s)
- Warren B Rouse
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Collin A O'Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Nicholas J Booher
- Infrastructure and Research IT Services, Iowa State University, Ames, IA, 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
33
|
Ross CJ, Ulitsky I. Discovering functional motifs in long noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1708. [PMID: 34981665 DOI: 10.1002/wrna.1708] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/19/2021] [Accepted: 12/04/2021] [Indexed: 12/27/2022]
Abstract
Long noncoding RNAs (lncRNAs) are products of pervasive transcription that closely resemble messenger RNAs on the molecular level, yet function through largely unknown modes of action. The current model is that the function of lncRNAs often relies on specific, typically short, conserved elements, connected by linkers in which specific sequences and/or structures are less important. This notion has fueled the development of both computational and experimental methods focused on the discovery of functional elements within lncRNA genes, based on diverse signals such as evolutionary conservation, predicted structural elements, or the ability to rescue loss-of-function phenotypes. In this review, we outline the main challenges that the different methods need to overcome, describe the recently developed approaches, and discuss their respective limitations. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs.
Collapse
Affiliation(s)
- Caroline Jane Ross
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Igor Ulitsky
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
34
|
Mestre MR, Gao LA, Shah SA, López-Beltrán A, González-Delgado A, Martínez-Abarca F, Iranzo J, Redrejo-Rodríguez M, Zhang F, Toro N. UG/Abi: a highly diverse family of prokaryotic reverse transcriptases associated with defense functions. Nucleic Acids Res 2022; 50:6084-6101. [PMID: 35648479 PMCID: PMC9226505 DOI: 10.1093/nar/gkac467] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 04/11/2022] [Accepted: 05/17/2022] [Indexed: 11/20/2022] Open
Abstract
Reverse transcriptases (RTs) are enzymes capable of synthesizing DNA using RNA as a template. Within the last few years, a burst of research has led to the discovery of novel prokaryotic RTs with diverse antiviral properties, such as DRTs (Defense-associated RTs), which belong to the so-called group of unknown RTs (UG) and are closely related to the Abortive Infection system (Abi) RTs. In this work, we performed a systematic analysis of UG and Abi RTs, increasing the number of UG/Abi members up to 42 highly diverse groups, most of which are predicted to be functionally associated with other gene(s) or domain(s). Based on this information, we classified these systems into three major classes. In addition, we reveal that most of these groups are associated with defense functions and/or mobile genetic elements, and demonstrate the antiphage role of four novel groups. Besides, we highlight the presence of one of these systems in novel families of human gut viruses infecting members of the Bacteroidetes and Firmicutes phyla. This work lays the foundation for a comprehensive and unified understanding of these highly diverse RTs with enormous biotechnological potential.
Collapse
Affiliation(s)
- Mario Rodríguez Mestre
- Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas Alberto Sols (CSIC-UAM), Madrid, Spain
| | - Linyi Alex Gao
- Howard Hughes Medical Institute, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Society of Fellows, Harvard University, Cambridge, MA 02138, USA
| | - Shiraz A Shah
- Copenhagen Prospective Studies on Asthma in Childhood, Copenhagen University Hospital, Herlev-Gentofte, Ledreborg Allé 34, DK-2820 Gentofte, Denmark
| | - Adrián López-Beltrán
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) – Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain
| | - Alejandro González-Delgado
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| | - Francisco Martínez-Abarca
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| | - Jaime Iranzo
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) – Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain
- Institute for Biocomputation and Physics of Complex Systems (BIFI), University of Zaragoza, Zaragoza, Spain
| | - Modesto Redrejo-Rodríguez
- Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas Alberto Sols (CSIC-UAM), Madrid, Spain
| | - Feng Zhang
- Howard Hughes Medical Institute, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Massachusetts Institute of Technology, Cambridge, MA, USA
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Nicolás Toro
- Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Spain
| |
Collapse
|
35
|
Gray M, Chester S, Jabbari H. KnotAli: informed energy minimization through the use of evolutionary information. BMC Bioinformatics 2022; 23:159. [PMID: 35505276 PMCID: PMC9063079 DOI: 10.1186/s12859-022-04673-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and by the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. RESULTS We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment as input and uses covariation and thermodynamic energy minimization to predict possibly pseudoknotted secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based programs, two that can handle pseudoknotted structures and one control, on a large data set of 3034 RNA sequences with varying lengths and levels of sequence conservation from 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). CONCLUSIONS We found KnotAli's performance to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. While both KnotAli and Cacofold use background noise correction strategies, we found KnotAli's predictions to be less dependent on the alignment quality. KnotAli can be found online at the Zenodo image: https://doi.org/10.5281/zenodo.5794719.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Sean Chester
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada. .,Institute on Aging and Lifelong Health, University of Victoria, Victoria, Canada.
| |
Collapse
|
36
|
Abstract
Noncoding RNAs with secondary structures play important roles in CRISPR-Cas systems. Many of these structures likely remain undiscovered. We used a large-scale comparative genomics approach to predict 156 novel candidate structured RNAs from 36,111 CRISPR-Cas systems. A number of these were found to overlap with coding genes, including palindromic candidates that overlapped with a variety of Cas genes in type I and III systems. Among these 156 candidates, we identified 46 new models of CRISPR direct repeats and 1 tracrRNA. This tracrRNA model occasionally overlapped with predicted cas9 coding regions, emphasizing the importance of expanding our search windows for novel structure RNAs in coding regions. We also demonstrated that the antirepeat sequence in this tracrRNA model can be used to accurately assign thousands of predicted CRISPR arrays to type II-C systems. This study highlights the importance of unbiased identification of candidate structured RNAs across CRISPR-Cas systems.
Collapse
Affiliation(s)
- Brayon J. Fremin
- Department of Energy, Joint Genome Institute, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Nikos C. Kyrpides
- Department of Energy, Joint Genome Institute, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| |
Collapse
|
37
|
Prince S, Munoz C, Filion-Bienvenue F, Rioux P, Sarrasin M, Lang BF. Refining Mitochondrial Intron Classification With ERPIN: Identification Based on Conservation of Sequence Plus Secondary Structure Motifs. Front Microbiol 2022; 13:866187. [PMID: 35369492 PMCID: PMC8971849 DOI: 10.3389/fmicb.2022.866187] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Accepted: 02/28/2022] [Indexed: 12/02/2022] Open
Abstract
Mitochondrial genomes—in particular those of fungi—often encode genes with a large number of Group I and Group II introns that are conserved at both the sequence and the RNA structure level. They provide a rich resource for the investigation of intron and gene structure, self- and protein-guided splicing mechanisms, and intron evolution. Yet, the degree of sequence conservation of introns is limited, and the primary sequence differs considerably among the distinct intron sub-groups. It makes intron identification, classification, structural modeling, and the inference of gene models a most challenging and error-prone task—frequently passed on to an “expert” for manual intervention. To reduce the need for manual curation of intron structures and mitochondrial gene models, computational methods using ERPIN sequence profiles were initially developed in 2007. Here we present a refinement of search models and alignments using the now abundant publicly available fungal mtDNA sequences. In addition, we have tested in how far members of the originally proposed sub-groups are clearly distinguished and validated by our computational approach. We confirm clearly distinct mitochondrial Group I sub-groups IA1, IA3, IB3, IC1, IC2, and ID. Yet, IB1, IB2, and IB4 ERPIN models are overlapping substantially in predictions, and are therefore combined and reported as IB. We have further explored the conversion of our ERPIN profiles into covariance models (CM). Current limitations and prospects of the CM approach will be discussed.
Collapse
|
38
|
Tompkins VS, Rouse WB, O’Leary CA, Andrews RJ, Moss WN. Analyses of human cancer driver genes uncovers evolutionarily conserved RNA structural elements involved in posttranscriptional control. PLoS One 2022; 17:e0264025. [PMID: 35213597 PMCID: PMC8880891 DOI: 10.1371/journal.pone.0264025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 02/01/2022] [Indexed: 12/02/2022] Open
Abstract
Experimental breakthroughs have provided unprecedented insights into the genes involved in cancer. The identification of such cancer driver genes is a major step in gaining a fuller understanding of oncogenesis and provides novel lists of potential therapeutic targets. A key area that requires additional study is the posttranscriptional control mechanisms at work in cancer driver genes. This is important not only for basic insights into the biology of cancer, but also to advance new therapeutic modalities that target RNA—an emerging field with great promise toward the treatment of various cancers. In the current study we performed an in silico analysis on the transcripts associated with 800 cancer driver genes (10,390 unique transcripts) that identified 179,190 secondary structural motifs with evidence of evolutionarily ordered structures with unusual thermodynamic stability. Narrowing to one transcript per gene, 35,426 predicted structures were subjected to phylogenetic comparisons of sequence and structural conservation. This identified 7,001 RNA secondary structures embedded in transcripts with evidence of covariation between paired sites, supporting structure models and suggesting functional significance. A select set of seven structures were tested in vitro for their ability to regulate gene expression; all were found to have significant effects. These results indicate potentially widespread roles for RNA structure in posttranscriptional control of human cancer driver genes.
Collapse
Affiliation(s)
- Van S. Tompkins
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Warren B. Rouse
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Collin A. O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Ryan J. Andrews
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Walter N. Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
- * E-mail:
| |
Collapse
|
39
|
Soszynska-Jozwiak M, Ruszkowska A, Kierzek R, O’Leary CA, Moss WN, Kierzek E. Secondary Structure of Subgenomic RNA M of SARS-CoV-2. Viruses 2022; 14:322. [PMID: 35215915 PMCID: PMC8878378 DOI: 10.3390/v14020322] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 01/25/2022] [Accepted: 01/31/2022] [Indexed: 02/06/2023] Open
Abstract
SARS-CoV-2 belongs to the Coronavirinae family. Like other coronaviruses, SARS-CoV-2 is enveloped and possesses a positive-sense, single-stranded RNA genome of ~30 kb. Genomic RNA is used as the template for replication and transcription. During these processes, positive-sense genomic RNA (gRNA) and subgenomic RNAs (sgRNAs) are created. Several studies presented the importance of the genomic RNA secondary structure in SARS-CoV-2 replication. However, the structure of sgRNAs has remained largely unsolved so far. In this study, we probed the sgRNA M model of SARS-CoV-2 in vitro. The presented model molecule includes 5'UTR and a coding sequence of gene M. This is the first experimentally informed secondary structure model of sgRNA M, which presents features likely to be important in sgRNA M function. The knowledge of sgRNA M structure provides insights to better understand virus biology and could be used for designing new therapeutics.
Collapse
Affiliation(s)
- Marta Soszynska-Jozwiak
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland; (M.S.-J.); (A.R.); (R.K.)
| | - Agnieszka Ruszkowska
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland; (M.S.-J.); (A.R.); (R.K.)
| | - Ryszard Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland; (M.S.-J.); (A.R.); (R.K.)
| | - Collin A. O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA; (C.A.O.); (W.N.M.)
| | - Walter N. Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA; (C.A.O.); (W.N.M.)
| | - Elzbieta Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland; (M.S.-J.); (A.R.); (R.K.)
| |
Collapse
|
40
|
Peterson JM, O'Leary CA, Moss WN. In silico analysis of local RNA secondary structure in influenza virus A, B and C finds evidence of widespread ordered stability but little evidence of significant covariation. Sci Rep 2022; 12:310. [PMID: 35013354 PMCID: PMC8748542 DOI: 10.1038/s41598-021-03767-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 12/02/2021] [Indexed: 12/13/2022] Open
Abstract
Influenza virus is a persistent threat to human health; indeed, the deadliest modern pandemic was in 1918 when an H1N1 virus killed an estimated 50 million people globally. The intent of this work is to better understand influenza from an RNA-centric perspective to provide local, structural motifs with likely significance to the influenza infectious cycle for therapeutic targeting. To accomplish this, we analyzed over four hundred thousand RNA sequences spanning three major clades: influenza A, B and C. We scanned influenza segments for local secondary structure, identified/modeled motifs of likely functionality, and coupled the results to an analysis of evolutionary conservation. We discovered 185 significant regions of predicted ordered stability, yet evidence of sequence covariation was limited to 7 motifs, where 3-found in influenza C-had higher than expected amounts of sequence covariation.
Collapse
Affiliation(s)
- Jake M Peterson
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Collin A O'Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
41
|
Dingle K, Ghaddar F, Šulc P, Louis AA. Phenotype Bias Determines How Natural RNA Structures Occupy the Morphospace of All Possible Shapes. Mol Biol Evol 2022; 39:msab280. [PMID: 34542628 PMCID: PMC8763027 DOI: 10.1093/molbev/msab280] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Morphospaces-representations of phenotypic characteristics-are often populated unevenly, leaving large parts unoccupied. Such patterns are typically ascribed to contingency, or else to natural selection disfavoring certain parts of the morphospace. The extent to which developmental bias, the tendency of certain phenotypes to preferentially appear as potential variation, also explains these patterns is hotly debated. Here we demonstrate quantitatively that developmental bias is the primary explanation for the occupation of the morphospace of RNA secondary structure (SS) shapes. Upon random mutations, some RNA SS shapes (the frequent ones) are much more likely to appear than others. By using the RNAshapes method to define coarse-grained SS classes, we can directly compare the frequencies that noncoding RNA SS shapes appear in the RNAcentral database to frequencies obtained upon a random sampling of sequences. We show that: 1) only the most frequent structures appear in nature; the vast majority of possible structures in the morphospace have not yet been explored; 2) remarkably small numbers of random sequences are needed to produce all the RNA SS shapes found in nature so far; and 3) perhaps most surprisingly, the natural frequencies are accurately predicted, over several orders of magnitude in variation, by the likelihood that structures appear upon a uniform random sampling of sequences. The ultimate cause of these patterns is not natural selection, but rather a strong phenotype bias in the RNA genotype-phenotype map, a type of developmental bias or "findability constraint," which limits evolutionary dynamics to a hugely reduced subset of structures that are easy to "find."
Collapse
Affiliation(s)
- Kamaludin Dingle
- Centre for Applied Mathematics and Bioinformatics, Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, Hawally, Kuwait
| | - Fatme Ghaddar
- Centre for Applied Mathematics and Bioinformatics, Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, Hawally, Kuwait
| | - Petr Šulc
- School of Molecular Sciences and Center for Molecular Design and Biomimetics at the Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
42
|
Zhao J, Kennedy SD, Turner DH. Nuclear Magnetic Resonance Spectra and AMBER OL3 and ROC-RNA Simulations of UCUCGU Reveal Force Field Strengths and Weaknesses for Single-Stranded RNA. J Chem Theory Comput 2022; 18:1241-1254. [PMID: 34990548 DOI: 10.1021/acs.jctc.1c00643] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Single-stranded regions of RNA are important for folding of sequences into 3D structures and for design of therapeutics targeting RNA. Prediction of ensembles of 3D structures for single-stranded regions often involves classical mechanical approximations of interactions defined by quantum mechanical calculations on small model systems. Nuclear magnetic resonance (NMR) spectra and molecular dynamics (MD) simulations of short single strands provide tests for how well the approximations model many of the interactions. Here, the NMR spectra for UCUCGU at 2, 15, and 30 °C are compared to simulations with the AMBER force fields, OL3 and ROC-RNA. This is the first such comparison to an oligoribonucleotide containing an internal guanosine nucleotide (G). G is particularly interesting because of its many H-bonding groups, large dipole moment, and proclivity for both syn and anti conformations. Results reveal formation of a G amino to phosphate non-bridging oxygen H-bond. The results also demonstrate dramatic differences in details of the predicted structures. The variations emphasize the dependence of predictions on individual parameters and their balance with the rest of the force field. The NMR data can serve as a benchmark for future force fields.
Collapse
|
43
|
Seemann SE, Mirza AH, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Workman CT, Pociot F, Tommerup N, Gorodkin J, Ruzzo WL. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2452-2463. [PMID: 35188540 PMCID: PMC8934657 DOI: 10.1093/nar/gkac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.
Collapse
Affiliation(s)
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
| | | | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Cellular and Molecular Medicine (ICMM), University of Copenhagen, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Computer Science and Engineering and Genome Sciences, University of Washington, USA
- Fred Hutchinson Cancer Research Center, Seattle, USA
| |
Collapse
|
44
|
Chen L, Zhu QH. The evolutionary landscape and expression pattern of plant lincRNAs. RNA Biol 2022; 19:1190-1207. [PMID: 36382947 PMCID: PMC9673970 DOI: 10.1080/15476286.2022.2144609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 11/02/2022] [Indexed: 11/17/2022] Open
Abstract
Long intergenic non-coding RNAs (lincRNAs) are important regulators of cellular processes, including development and stress response. Many lincRNAs have been bioinformatically identified in plants, but their evolutionary dynamics and expression characteristics are still elusive. Here, we systematically identified thousands of lincRNAs in 26 plant species, including 6 non-flowering plants, investigated the conservation of the identified lincRNAs in different levels of plant lineages based on sequence and/or synteny homology and explored characteristics of the conserved lincRNAs during plant evolution and their co-expression relationship with protein-coding genes (PCGs). In addition to confirmation of the features well documented in literature for lincRNAs, such as species-specific, fewer exons, tissue-specific expression patterns and less abundantly expressed, we revealed that histone modification signals and/or binding sites of transcription factors were enriched in the conserved lincRNAs, implying their biological functionalities, as demonstrated by identifying conserved lincRNAs related to flower development in both the Brassicaceae and grass families and ancient lincRNAs potentially functioning in meristem development of non-flowering plants. Compared to PCGs, lincRNAs are more likely to be associated with transposable elements (TEs), but with different characteristics in different evolutionary lineages, for instance, the types of TEs and the variable level of association in lincRNAs with different conservativeness. Together, these results provide a comprehensive view on the evolutionary landscape of plant lincRNAs and shed new insights on the conservation and functionality of plant lincRNAs.
Collapse
Affiliation(s)
- Li Chen
- School of Life Sciences, Westlake University, Hangzhou, China
- Institute for Biology, Plant Cell and Molecular Biology, Humboldt-Universität Zu Berlin, Berlin, Germany
| | | |
Collapse
|
45
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-time global prediction of conserved structures for RNA homologs with applications to SARS-CoV-2. Proc Natl Acad Sci U S A 2021; 118:e2116269118. [PMID: 34887342 PMCID: PMC8719904 DOI: 10.1073/pnas.2116269118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2021] [Indexed: 12/26/2022] Open
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - He Zhang
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
- Baidu Research, Sunnyvale, CA 94089
| | - Kaibo Liu
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | | | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642;
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331;
- Baidu Research, Sunnyvale, CA 94089
| |
Collapse
|
46
|
Soszynska-Jozwiak M, Pszczola M, Piasecka J, Peterson JM, Moss WN, Taras-Goslinska K, Kierzek R, Kierzek E. Universal and strain specific structure features of segment 8 genomic RNA of influenza A virus-application of 4-thiouridine photocrosslinking. J Biol Chem 2021; 297:101245. [PMID: 34688660 PMCID: PMC8666676 DOI: 10.1016/j.jbc.2021.101245] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 09/22/2021] [Accepted: 09/23/2021] [Indexed: 11/24/2022] Open
Abstract
RNA structure in the influenza A virus (IAV) has been the focus of several studies that have shown connections between conserved secondary structure motifs and their biological function in the virus replication cycle. Questions have arisen on how to best recognize and understand the pandemic properties of IAV strains from an RNA perspective, but determination of the RNA secondary structure has been challenging. Herein, we used chemical mapping to determine the secondary structure of segment 8 viral RNA (vRNA) of the pandemic A/California/04/2009 (H1N1) strain of IAV. Additionally, this long, naturally occurring RNA served as a model to evaluate RNA mapping with 4-thiouridine (4sU) crosslinking. We explored 4-thiouridine as a probe of nucleotides in close proximity, through its incorporation into newly transcribed RNA and subsequent photoactivation. RNA secondary structural features both universal to type A strains and unique to the A/California/04/2009 (H1N1) strain were recognized. 4sU mapping confirmed and facilitated RNA structure prediction, according to several rules: 4sU photocross-linking forms efficiently in the double-stranded region of RNA with some flexibility, in the ends of helices, and across bulges and loops when their structural mobility is permitted. This method highlighted three-dimensional properties of segment 8 vRNA secondary structure motifs and allowed to propose several long-range three-dimensional interactions. 4sU mapping combined with chemical mapping and bioinformatic analysis could be used to enhance the RNA structure determination as well as recognition of target regions for antisense strategies or viral RNA detection.
Collapse
Affiliation(s)
| | - Maciej Pszczola
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Julita Piasecka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Jake M Peterson
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, Iowa, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, Iowa, USA
| | | | - Ryszard Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| | - Elzbieta Kierzek
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| |
Collapse
|
47
|
Bonilla SL, Sherlock ME, MacFadden A, Kieft JS. A viral RNA hijacks host machinery using dynamic conformational changes of a tRNA-like structure. Science 2021; 374:955-960. [PMID: 34793227 PMCID: PMC9033304 DOI: 10.1126/science.abe8526] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Viruses require multifunctional structured RNAs to hijack their host’s biochemistry, but their mechanisms can be obscured by the difficulty of solving conformationally dynamic RNA structures. Using cryo–electron microscopy (cryo-EM), we visualized the structure of the mysterious viral transfer RNA (tRNA)–like structure (TLS) from the brome mosaic virus, which affects replication, translation, and genome encapsidation. Structures in isolation and those bound to tyrosyl-tRNA synthetase (TyrRS) show that this ~55-kilodalton purported tRNA mimic undergoes large conformational rearrangements to bind TyrRS in a form that differs substantially from that of tRNA. Our study reveals how viral RNAs can use a combination of static and dynamic RNA structures to bind host machinery through highly noncanonical interactions, and we highlight the utility of cryo-EM for visualizing small, conformationally dynamic structured RNAs.
Collapse
Affiliation(s)
- Steve L. Bonilla
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Madeline E. Sherlock
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Andrea MacFadden
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Jeffrey S. Kieft
- Department of Biochemistry and Molecular Genetics, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
- RNA BioScience Initiative, University of Colorado Anschutz Medical Campus, Aurora, CO 10 80045, USA
| |
Collapse
|
48
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.11.23.393488. [PMID: 34816262 PMCID: PMC8609897 DOI: 10.1101/2020.11.23.393488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt ) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurbo-Fold's purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics. SIGNIFICANCE STATEMENT Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
| | - He Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Kaibo Liu
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | | | - David H. Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| |
Collapse
|
49
|
Gao W, Jones TA, Rivas E. Discovery of 17 conserved structural RNAs in fungi. Nucleic Acids Res 2021; 49:6128-6143. [PMID: 34086938 PMCID: PMC8216456 DOI: 10.1093/nar/gkab355] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/25/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Many non-coding RNAs with known functions are structurally conserved: their intramolecular secondary and tertiary interactions are maintained across evolutionary time. Consequently, the presence of conserved structure in multiple sequence alignments can be used to identify candidate functional non-coding RNAs. Here, we present a bioinformatics method that couples iterative homology search with covariation analysis to assess whether a genomic region has evidence of conserved RNA structure. We used this method to examine all unannotated regions of five well-studied fungal genomes (Saccharomyces cerevisiae, Candida albicans, Neurospora crassa, Aspergillus fumigatus, and Schizosaccharomyces pombe). We identified 17 novel structurally conserved non-coding RNA candidates, which include four H/ACA box small nucleolar RNAs, four intergenic RNAs and nine RNA structures located within the introns and untranslated regions (UTRs) of mRNAs. For the two structures in the 3' UTRs of the metabolic genes GLY1 and MET13, we performed experiments that provide evidence against them being eukaryotic riboswitches.
Collapse
Affiliation(s)
- William Gao
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
| | - Thomas A Jones
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
| |
Collapse
|
50
|
Chen SC, Olsthoorn RCL, Yu CH. Structural phylogenetic analysis reveals lineage-specific RNA repetitive structural motifs in all coronaviruses and associated variations in SARS-CoV-2. Virus Evol 2021; 7:veab021. [PMID: 34141447 PMCID: PMC8206606 DOI: 10.1093/ve/veab021] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
In many single-stranded (ss) RNA viruses, the cis-acting packaging signal that confers selectivity genome packaging usually encompasses short structured RNA repeats. These structural units, termed repetitive structural motifs (RSMs), potentially mediate capsid assembly by specific RNA–protein interactions. However, general knowledge of the conservation and/or the diversity of RSMs in the positive-sense ssRNA coronaviruses (CoVs) is limited. By performing structural phylogenetic analysis, we identified a variety of RSMs in nearly all CoV genomic RNAs, which are exclusively located in the 5′-untranslated regions (UTRs) and/or in the inter-domain regions of poly-protein 1ab coding sequences in a lineage-specific manner. In all alpha- and beta-CoVs, except for Embecovirus spp, two to four copies of 5′-gUUYCGUc-3′ RSMs displaying conserved hexa-loop sequences were generally identified in Stem-loop 5 (SL5) located in the 5′-UTRs of genomic RNAs. In Embecovirus spp., however, two to eight copies of 5′-agc-3′/guAAu RSMs were found in the coding regions of non-structural protein (NSP) 3 and/or NSP15 in open reading frame (ORF) 1ab. In gamma- and delta-CoVs, other types of RSMs were found in several clustered structural elements in 5′-UTRs and/or ORF1ab. The identification of RSM-encompassing structural elements in all CoVs suggests that these RNA elements play fundamental roles in the life cycle of CoVs. In the recently emerged SARS-CoV-2, beta-CoV-specific RSMs are also found in its SL5, displaying two copies of 5′-gUUUCGUc-3′ motifs. However, multiple sequence alignment reveals that the majority of SARS-CoV-2 possesses a variant RSM harboring SL5b C241U, and intriguingly, several variations in the coding sequences of viral proteins, such as Nsp12 P323L, S protein D614G, and N protein R203K-G204R, are concurrently found with such variant RSM. In conclusion, the comprehensive exploration for RSMs reveals phylogenetic insights into the RNA structural elements in CoVs as a whole and provides a new perspective on variations currently found in SARS-CoV-2.
Collapse
Affiliation(s)
- Shih-Cheng Chen
- Department of Biochemistry and Molecular Biology, College of Medicine, National Cheng-Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - René C L Olsthoorn
- Department of Supramolecular Biomaterials Chemistry, Leiden Institute of Chemistry, Gorlaeus Laboratories, Leiden University, Einsteinweg 55, 2333 CC, Leiden,The Netherlands
| | - Chien-Hung Yu
- Department of Biochemistry and Molecular Biology, College of Medicine, National Cheng-Kung University, No.1, University Road, Tainan City 701, Taiwan
| |
Collapse
|