1
|
Zheludev IN, Edgar RC, Lopez-Galiano MJ, de la Peña M, Babaian A, Bhatt AS, Fire AZ. Viroid-like colonists of human microbiomes. Cell 2024; 187:6521-6536.e18. [PMID: 39481381 DOI: 10.1016/j.cell.2024.09.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 07/03/2024] [Accepted: 09/18/2024] [Indexed: 11/02/2024]
Abstract
Here, we describe "obelisks," a class of heritable RNA elements sharing several properties: (1) apparently circular RNA ∼1 kb genome assemblies, (2) predicted rod-like genome-wide secondary structures, and (3) open reading frames encoding a novel "Oblin" protein superfamily. A subset of obelisks includes a variant hammerhead self-cleaving ribozyme. Obelisks form their own phylogenetic group without detectable similarity to known biological agents. Surveying globally, we identified 29,959 distinct obelisks (clustered at 90% sequence identity) from diverse ecological niches. Obelisks are prevalent in human microbiomes, with detection in ∼7% (29/440) and ∼50% (17/32) of queried stool and oral metatranscriptomes, respectively. We establish Streptococcus sanguinis as a cellular host of a specific obelisk and find that this obelisk's maintenance is not essential for bacterial growth. Our observations identify obelisks as a class of diverse RNAs of yet-to-be-determined impact that have colonized and gone unnoticed in human and global microbiomes.
Collapse
Affiliation(s)
- Ivan N Zheludev
- Stanford University, Department of Biochemistry, Stanford, CA, USA.
| | | | - Maria Jose Lopez-Galiano
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Marcos de la Peña
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Artem Babaian
- University of Toronto, Department of Molecular Genetics, Toronto, ON, Canada; University of Toronto, Donnelly Centre for Cellular and Biomolecular Research, Toronto, ON, Canada
| | - Ami S Bhatt
- Stanford University, Department of Genetics, Stanford, CA, USA; Stanford University, Department of Medicine, Division of Hematology, Stanford, CA, USA
| | - Andrew Z Fire
- Stanford University, Department of Genetics, Stanford, CA, USA; Stanford University, Department of Pathology, Stanford, CA, USA.
| |
Collapse
|
2
|
Ontiveros-Palacios N, Cooke E, Nawrocki EP, Triebel S, Marz M, Rivas E, Griffiths-Jones S, Petrov AI, Bateman A, Sweeney B. Rfam 15: RNA families database in 2025. Nucleic Acids Res 2024:gkae1023. [PMID: 39526405 DOI: 10.1093/nar/gkae1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2024] [Revised: 10/09/2024] [Accepted: 10/24/2024] [Indexed: 11/16/2024] Open
Abstract
The Rfam database, a widely used repository of non-coding RNA families, has undergone significant updates in release 15.0. This paper introduces major improvements, including the expansion of Rfamseq to 26 106 genomes, a 76% increase, incorporating the latest UniProt reference proteomes and additional viral genomes. Sixty-five RNA families were enhanced using experimentally determined 3D structures, improving the accuracy of consensus secondary structures and annotations. R-scape covariation analysis was used to refine structural predictions in 26 families. Gene Ontology (GO) and Sequence Ontology annotations were comprehensively updated, increasing GO term coverage to 75% of families. The release adds 14 new Hepatitis C Virus RNA families and completes microRNA family synchronization with miRBase, resulting in 1603 microRNA families. New data types, including FULL alignments, have been implemented. Integration with APICURON for improved curator attribution and multiple website enhancements further improve user experience. These updates significantly expand Rfam's coverage and improve annotation quality, reinforcing its critical role in RNA research, genome annotation and the development of machine learning models. Rfam is freely available at https://rfam.org.
Collapse
Affiliation(s)
- Nancy Ontiveros-Palacios
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Emma Cooke
- SciBite Limited, BioData Innovation Centre, Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK
| | - Eric P Nawrocki
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Sandra Triebel
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743 Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Sam Griffiths-Jones
- School of Biological Sciences, Faculty of Medicine, Biology and Health, Michael Smith Building, The University of Manchester, Dover St, Manchester M13 9NT, UK
| | | | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Blake Sweeney
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
3
|
Tang S, Conte V, Zhang DJ, Žedaveinytė R, Lampe GD, Wiegand T, Tang LC, Wang M, Walker MWG, George JT, Berchowitz LE, Jovanovic M, Sternberg SH. De novo gene synthesis by an antiviral reverse transcriptase. Science 2024; 386:eadq0876. [PMID: 39116258 DOI: 10.1126/science.adq0876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2024] [Accepted: 07/17/2024] [Indexed: 08/10/2024]
Abstract
Defense-associated reverse transcriptase (DRT) systems perform DNA synthesis to protect bacteria against viral infection, but the identities and functions of their DNA products remain largely unknown. We show that DRT2 systems encode an unprecedented immune pathway that involves de novo gene synthesis through rolling circle reverse transcription of a noncoding RNA (ncRNA). Programmed template jumping on the ncRNA generates a concatemeric cDNA, which becomes double-stranded upon viral infection. This DNA product constitutes a protein-coding, nearly endless open reading frame (neo) gene whose expression leads to potent cell growth arrest, restricting the viral infection. Our work highlights an elegant expansion of genome coding potential through RNA-templated gene creation and challenges conventional paradigms of genetic information encoded along the one-dimensional axis of genomic DNA.
Collapse
Affiliation(s)
- Stephen Tang
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Valentin Conte
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Dennis J Zhang
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Rimantė Žedaveinytė
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - George D Lampe
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Tanner Wiegand
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Lauren C Tang
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Megan Wang
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Matt W G Walker
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Jerrin Thomas George
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Luke E Berchowitz
- Department of Genetics and Development, Columbia University, New York, NY, USA
- Taub Institute for Research on Alzheimer's and the Aging Brain, New York, NY, USA
| | - Marko Jovanovic
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Samuel H Sternberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| |
Collapse
|
4
|
Ontiveros N, Cooke E, Nawrocki EP, Triebel S, Marz M, Rivas E, Griffiths-Jones S, Petrov AI, Bateman A, Sweeney B. Rfam 15: RNA families database in 2025. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.23.614430. [PMID: 39372780 PMCID: PMC11451735 DOI: 10.1101/2024.09.23.614430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 10/08/2024]
Abstract
The Rfam database, a widely-used repository of non-coding RNA (ncRNA) families, has undergone significant updates in release 15.0. This paper introduces major improvements, including the expansion of Rfamseq to 26,106 genomes, a 76% increase, incorporating the latest UniProt reference proteomes and additional viral genomes. Sixty-five RNA families were enhanced using experimentally determined 3D structures, improving the accuracy of consensus secondary structures and annotations. R-scape covariation analysis was used to refine structural predictions in 26 families. Gene Ontology and Sequence Ontology annotations were comprehensively updated, increasing GO term coverage to 75% of families. The release adds 14 new Hepatitis C Virus RNA families and completes microRNA family synchronisation with miRBase, resulting in 1,603 microRNA families. New data types, including FULL alignments, have been implemented. Integration with APICURON for improved curator attribution and multiple website enhancements further improve user experience. These updates significantly expand Rfam's coverage and improve annotation quality, reinforcing its critical role in RNA research, genome annotation, and the development of machine learning models. Rfam is freely available at https://rfam.org.
Collapse
Affiliation(s)
- Nancy Ontiveros
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | | | - Eric P Nawrocki
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA (EPN)
| | - Sandra Triebel
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743, Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743, Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Sam Griffiths-Jones
- School of Biological Sciences, Faculty of Medicine, Biology and Health, Michael Smith Building, The University of Manchester, Manchester M13 9GB, UK
| | | | - Alex Bateman
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Blake Sweeney
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
5
|
Eich T, O’Leary C, Moss W. Intronic RNA secondary structural information captured for the human MYC pre-mRNA. NAR Genom Bioinform 2024; 6:lqae143. [PMID: 39450312 PMCID: PMC11500451 DOI: 10.1093/nargab/lqae143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 09/06/2024] [Accepted: 10/04/2024] [Indexed: 10/26/2024] Open
Abstract
To address the lack of intronic reads in secondary structure probing data for the human MYC pre-mRNA, we developed a method that combines spliceosomal inhibition with RNA probing and sequencing. Here, the SIRP-seq method was applied to study the secondary structure of human MYC RNAs by chemically probing HeLa cells with dimethyl sulfate in the presence of the small molecule spliceosome inhibitor pladienolide B. Pladienolide B binds to the SF3B complex of the spliceosome to inhibit intron removal during splicing, resulting in retained intronic sequences. This method was used to increase the read coverage over intronic regions of MYC. The purpose for increasing coverage across introns was to generate complete reactivity profiles for intronic sequences via the DMS-MaPseq approach. Notably, depth was sufficient for analysis by the program DRACO, which was able to deduce distinct reactivity profiles and predict multiple secondary structural conformations as well as their suggested stoichiometric abundances. The results presented here provide a new method for intronic RNA secondary structural analyses, as well as specific structural insights relevant to MYC RNA splicing regulation and therapeutic targeting.
Collapse
Affiliation(s)
- Taylor O Eich
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Collin A O’Leary
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
- Current Address: Department of Biology and Chemistry, Cornell College, Mount Vernon, IA 52314, USA
| | - Walter N Moss
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
6
|
Žedaveinytė R, Meers C, Le HC, Mortman EE, Tang S, Lampe GD, Pesari SR, Gelsinger DR, Wiegand T, Sternberg SH. Antagonistic conflict between transposon-encoded introns and guide RNAs. Science 2024; 385:eadm8189. [PMID: 38991068 DOI: 10.1126/science.adm8189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 05/08/2024] [Indexed: 07/13/2024]
Abstract
TnpB nucleases represent the evolutionary precursors to CRISPR-Cas12 and are widespread in all domains of life. IS605-family TnpB homologs function as programmable RNA-guided homing endonucleases in bacteria, driving transposon maintenance through DNA double-strand break-stimulated homologous recombination. In this work, we uncovered molecular mechanisms of the transposition life cycle of IS607-family elements that, notably, also encode group I introns. We identified specific features for a candidate "IStron" from Clostridium botulinum that allow the element to carefully control the relative levels of spliced products versus functional guide RNAs. Our results suggest that IStron transcripts evolved an ability to balance competing and mutually exclusive activities that promote selfish transposon spread while limiting adverse fitness costs on the host. Collectively, this work highlights molecular innovation in the multifunctional utility of transposon-encoded noncoding RNAs.
Collapse
Affiliation(s)
- Rimantė Žedaveinytė
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Chance Meers
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Hoang C Le
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Edan E Mortman
- Department of Genetics and Development, Columbia University, New York, NY 10032, USA
| | - Stephen Tang
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - George D Lampe
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Sanjana R Pesari
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Diego R Gelsinger
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Tanner Wiegand
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Samuel H Sternberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
7
|
Triebel S, Lamkiewicz K, Ontiveros N, Sweeney B, Stadler PF, Petrov AI, Niepmann M, Marz M. Comprehensive survey of conserved RNA secondary structures in full-genome alignment of Hepatitis C virus. Sci Rep 2024; 14:15145. [PMID: 38956134 PMCID: PMC11219754 DOI: 10.1038/s41598-024-62897-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 05/22/2024] [Indexed: 07/04/2024] Open
Abstract
Hepatitis C virus (HCV) is a plus-stranded RNA virus that often chronically infects liver hepatocytes and causes liver cirrhosis and cancer. These viruses replicate their genomes employing error-prone replicases. Thereby, they routinely generate a large 'cloud' of RNA genomes (quasispecies) which-by trial and error-comprehensively explore the sequence space available for functional RNA genomes that maintain the ability for efficient replication and immune escape. In this context, it is important to identify which RNA secondary structures in the sequence space of the HCV genome are conserved, likely due to functional requirements. Here, we provide the first genome-wide multiple sequence alignment (MSA) with the prediction of RNA secondary structures throughout all representative full-length HCV genomes. We selected 57 representative genomes by clustering all complete HCV genomes from the BV-BRC database based on k-mer distributions and dimension reduction and adding RefSeq sequences. We include annotations of previously recognized features for easy comparison to other studies. Our results indicate that mainly the core coding region, the C-terminal NS5A region, and the NS5B region contain secondary structure elements that are conserved beyond coding sequence requirements, indicating functionality on the RNA level. In contrast, the genome regions in between contain less highly conserved structures. The results provide a complete description of all conserved RNA secondary structures and make clear that functionally important RNA secondary structures are present in certain HCV genome regions but are largely absent from other regions. Full-genome alignments of all branches of Hepacivirus C are provided in the supplement.
Collapse
Affiliation(s)
- Sandra Triebel
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743, Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Kevin Lamkiewicz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743, Jena, Germany
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743, Jena, Germany
| | - Nancy Ontiveros
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Blake Sweeney
- European Molecular Biology Laboratory, Wellcome Genome Campus, European Bioinformatics Institute, Hinxton, Cambridge, CB10 1SD, UK
| | - Peter F Stadler
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743, Jena, Germany
- Bioinformatics Group, Institute of Computer Science, and Interdisciplinary Center for Bioinformatics, University Leipzig, 04107, Leipzig, Germany
- German Center for Integrative Biodiversity Research (iDiv), 04103, Leipzig, Germany
| | | | - Michael Niepmann
- Institute for Biochemistry, Justus-Liebig-University Giessen, 35392, Giessen, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, 07743, Jena, Germany.
- European Virus Bioinformatics Center, Friedrich Schiller University Jena, 07743, Jena, Germany.
- Leibniz Institute on Aging-Fritz Lipmann Institute, 07745, Jena, Germany.
- German Center for Integrative Biodiversity Research (iDiv), 04103, Leipzig, Germany.
- Michael Stifel Center Jena, Friedrich Schiller University Jena, 07743, Jena, Germany.
- Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743, Jena, Germany.
| |
Collapse
|
8
|
Tang S, Conte V, Zhang DJ, Žedaveinytė R, Lampe GD, Wiegand T, Tang LC, Wang M, Walker MW, George JT, Berchowitz LE, Jovanovic M, Sternberg SH. De novo gene synthesis by an antiviral reverse transcriptase. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.08.593200. [PMID: 38766058 PMCID: PMC11100668 DOI: 10.1101/2024.05.08.593200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Bacteria defend themselves from viral infection using diverse immune systems, many of which sense and target foreign nucleic acids. Defense-associated reverse transcriptase (DRT) systems provide an intriguing counterpoint to this immune strategy by instead leveraging DNA synthesis, but the identities and functions of their DNA products remain largely unknown. Here we show that DRT2 systems execute an unprecedented immunity mechanism that involves de novo gene synthesis via rolling-circle reverse transcription of a non-coding RNA (ncRNA). Unbiased profiling of RT-associated RNA and DNA ligands in DRT2-expressing cells revealed that reverse transcription generates concatenated cDNA repeats through programmed template jumping on the ncRNA. The presence of phage then triggers second-strand cDNA synthesis, leading to the production of long double-stranded DNA. Remarkably, this DNA product is efficiently transcribed, generating messenger RNAs that encode a stop codon-less, never-ending ORF (neo) whose translation causes potent growth arrest. Phylogenetic analyses and screening of diverse DRT2 homologs further revealed broad conservation of rolling-circle reverse transcription and Neo protein function. Our work highlights an elegant expansion of genome coding potential through RNA-templated gene creation, and challenges conventional paradigms of genetic information encoded along the one-dimensional axis of genomic DNA.
Collapse
Affiliation(s)
- Stephen Tang
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Valentin Conte
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Dennis J. Zhang
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Rimantė Žedaveinytė
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - George D. Lampe
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Tanner Wiegand
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Lauren C. Tang
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Megan Wang
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Matt W.G. Walker
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Jerrin Thomas George
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Luke E. Berchowitz
- Department of Genetics and Development, Columbia University, New York, NY, USA
- Taub Institute for Research on Alzheimer’s and the Aging Brain, New York, NY, USA
| | - Marko Jovanovic
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Samuel H. Sternberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| |
Collapse
|
9
|
Ho LLY, Schiess GHA, Miranda P, Weber G, Astakhova K. Pseudouridine and N1-methylpseudouridine as potent nucleotide analogues for RNA therapy and vaccine development. RSC Chem Biol 2024; 5:418-425. [PMID: 38725905 PMCID: PMC11078203 DOI: 10.1039/d4cb00022f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 03/10/2024] [Indexed: 05/12/2024] Open
Abstract
Modified nucleosides are integral to modern drug development, serving as crucial building blocks for creating safer, more potent, and more precisely targeted therapeutic interventions. Nucleobase modifications often confer antiviral and anti-cancer activity as monomers. When incorporated into nucleic acid oligomers, they increase stability against degradation by enzymes, enhancing the drugs' lifespan within the body. Moreover, modification strategies can mitigate potential toxic effects and reduce immunogenicity, making drugs safer and better tolerated. Particularly, N1-methylpseudouridine modification improved the efficacy of the mRNA coding for spike protein of COVID-19. This became a crucial step for developing COVID-19 vaccine applied during the 2020 pandemic. This makes N1-methylpseudouridine, and its "parent" analogue pseudouridine, potent nucleotide analogues for future RNA therapy and vaccine development. This review focuses on the structure and properties of pseudouridine and N1-methylpseudouridine. RNA has a greater structural versatility, different conformation, and chemical reactivity than DNA. Watson-Crick pairing is not strictly followed by RNA that has more unusual base pairs and base-triplets. This requires detailed structural studies and structure-activity relationship analyses for RNA, also when modifications are incorporated. Recent successes in this direction are revised in this review. We describe recent successes with using pseudouridine and N1-methylpseudouridine in mRNA drug candidates. We also highlight remaining challenges that need to be solved to develop new mRNA vaccines and therapies.
Collapse
Affiliation(s)
- Lyana L Y Ho
- Technical University of Denmark 2800 Kongens Lyngby Denmark
- The Hong Kong Polytechnic University 11 Yuk Choi Rd Hung Hom Hong Kong
| | - Gabriel H A Schiess
- Departamento de Física, Universidade Federal de Minas Gerais Belo Horizonte MG Brazil
| | - Pâmella Miranda
- Departamento de Física, Universidade Federal de Minas Gerais Belo Horizonte MG Brazil
- Programa Interunidades de Pós-Graduação em Bioinformática, Universidade Federal de Minas Gerais Belo Horizonte MG Brazil
| | - Gerald Weber
- Departamento de Física, Universidade Federal de Minas Gerais Belo Horizonte MG Brazil
| | - Kira Astakhova
- Technical University of Denmark 2800 Kongens Lyngby Denmark
| |
Collapse
|
10
|
Ziesel A, Jabbari H. Unveiling hidden structural patterns in the SARS-CoV-2 genome: Computational insights and comparative analysis. PLoS One 2024; 19:e0298164. [PMID: 38574063 PMCID: PMC10994416 DOI: 10.1371/journal.pone.0298164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 01/19/2024] [Indexed: 04/06/2024] Open
Abstract
SARS-CoV-2, the causative agent of COVID-19, is known to exhibit secondary structures in its 5' and 3' untranslated regions, along with the frameshifting stimulatory element situated between ORF1a and 1b. To identify additional regions containing conserved structures, we utilized a multiple sequence alignment with related coronaviruses as a starting point. We applied a computational pipeline developed for identifying non-coding RNA elements. Our pipeline employed three different RNA structural prediction approaches. We identified forty genomic regions likely to harbor structures, with ten of them showing three-way consensus substructure predictions among our predictive utilities. We conducted intracomparisons of the predictive utilities within the pipeline and intercomparisons with four previously published SARS-CoV-2 structural datasets. While there was limited agreement on the precise structure, different approaches seemed to converge on regions likely to contain structures in the viral genome. By comparing and combining various computational approaches, we can predict regions most likely to form structures, as well as a probable structure or ensemble of structures. These predictions can be used to guide surveillance, prophylactic measures, or therapeutic efforts. Data and scripts employed in this study may be found at https://doi.org/10.5281/zenodo.8298680.
Collapse
Affiliation(s)
- Alison Ziesel
- Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta, Canada
| | - Hosna Jabbari
- Department of Biomedical Engineering, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
11
|
Rinaldi S, Moroni E, Rozza R, Magistrato A. Frontiers and Challenges of Computing ncRNAs Biogenesis, Function and Modulation. J Chem Theory Comput 2024; 20:993-1018. [PMID: 38287883 DOI: 10.1021/acs.jctc.3c01239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Non-coding RNAs (ncRNAs), generated from nonprotein coding DNA sequences, constitute 98-99% of the human genome. Non-coding RNAs encompass diverse functional classes, including microRNAs, small interfering RNAs, PIWI-interacting RNAs, small nuclear RNAs, small nucleolar RNAs, and long non-coding RNAs. With critical involvement in gene expression and regulation across various biological and physiopathological contexts, such as neuronal disorders, immune responses, cardiovascular diseases, and cancer, non-coding RNAs are emerging as disease biomarkers and therapeutic targets. In this review, after providing an overview of non-coding RNAs' role in cell homeostasis, we illustrate the potential and the challenges of state-of-the-art computational methods exploited to study non-coding RNAs biogenesis, function, and modulation. This can be done by directly targeting them with small molecules or by altering their expression by targeting the cellular engines underlying their biosynthesis. Drawing from applications, also taken from our work, we showcase the significance and role of computer simulations in uncovering fundamental facets of ncRNA mechanisms and modulation. This information may set the basis to advance gene modulation tools and therapeutic strategies to address unmet medical needs.
Collapse
Affiliation(s)
- Silvia Rinaldi
- National Research Council of Italy (CNR) - Institute of Chemistry of OrganoMetallic Compounds (ICCOM), c/o Area di Ricerca CNR di Firenze Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy
| | - Elisabetta Moroni
- National Research Council of Italy (CNR) - Institute of Chemical Sciences and Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
12
|
Zheludev IN, Edgar RC, Lopez-Galiano MJ, de la Peña M, Babaian A, Bhatt AS, Fire AZ. Viroid-like colonists of human microbiomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.20.576352. [PMID: 38293115 PMCID: PMC10827157 DOI: 10.1101/2024.01.20.576352] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Here, we describe the "Obelisks," a previously unrecognised class of viroid-like elements that we first identified in human gut metatranscriptomic data. "Obelisks" share several properties: (i) apparently circular RNA ~1kb genome assemblies, (ii) predicted rod-like secondary structures encompassing the entire genome, and (iii) open reading frames coding for a novel protein superfamily, which we call the "Oblins". We find that Obelisks form their own distinct phylogenetic group with no detectable sequence or structural similarity to known biological agents. Further, Obelisks are prevalent in tested human microbiome metatranscriptomes with representatives detected in ~7% of analysed stool metatranscriptomes (29/440) and in ~50% of analysed oral metatranscriptomes (17/32). Obelisk compositions appear to differ between the anatomic sites and are capable of persisting in individuals, with continued presence over >300 days observed in one case. Large scale searches identified 29,959 Obelisks (clustered at 90% nucleotide identity), with examples from all seven continents and in diverse ecological niches. From this search, a subset of Obelisks are identified to code for Obelisk-specific variants of the hammerhead type-III self-cleaving ribozyme. Lastly, we identified one case of a bacterial species (Streptococcus sanguinis) in which a subset of defined laboratory strains harboured a specific Obelisk RNA population. As such, Obelisks comprise a class of diverse RNAs that have colonised, and gone unnoticed in, human, and global microbiomes.
Collapse
Affiliation(s)
- Ivan N Zheludev
- Stanford University, Department of Biochemistry, Stanford, CA, USA
| | | | - Maria Jose Lopez-Galiano
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Marcos de la Peña
- Instituto de Biología Molecular y Celular de Plantas, Universidad Politécnica de Valencia-CSIC, Valencia, Spain
| | - Artem Babaian
- University of Toronto, Department of Molecular Genetics, Ontario, Canada
- University of Toronto, Donnelly Centre for Cellular and Biomolecular Research, Ontario, Canada
| | - Ami S Bhatt
- Stanford University, Department of Genetics, Stanford, CA, USA
- Stanford University, Department of Medicine, Division of Hematology, Stanford, CA, USA
| | - Andrew Z Fire
- Stanford University, Department of Genetics, Stanford, CA, USA
- Stanford University, Department of Pathology, Stanford, CA, USA
| |
Collapse
|
13
|
Žedaveinytė R, Meers C, Le HC, Mortman EE, Tang S, Lampe GD, Pesari SR, Gelsinger DR, Wiegand T, Sternberg SH. Antagonistic conflict between transposon-encoded introns and guide RNAs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.20.567912. [PMID: 38045383 PMCID: PMC10690162 DOI: 10.1101/2023.11.20.567912] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
TnpB nucleases represent the evolutionary precursors to CRISPR-Cas12 and are widespread in all domains of life, presumably due to the critical roles they play in transposon proliferation. IS605family TnpB homologs function in bacteria as programmable homing endonucleases by exploiting transposon-encoded guide RNAs to cleave vacant genomic sites, thereby driving transposon maintenance through DSB-stimulated homologous recombination. Whether this pathway is conserved in other genetic contexts, and in association with other transposases, is unknown. Here we uncover molecular mechanisms of transposition and RNA-guided DNA cleavage by IS607-family elements that, remarkably, also encode catalytic, self-splicing group I introns. After reconstituting and systematically investigating each of these biochemical activities for a candidate 'IStron' derived from Clostridium botulinum, we discovered sequence and structural features of the transposon-encoded RNA that satisfy molecular requirements of a group I intron and TnpB guide RNA, while still retaining the ability to be faithfully mobilized at the DNA level by the TnpA transposase. Strikingly, intron splicing was strongly repressed not only by TnpB, but also by the secondary structure of ωRNA alone, allowing the element to carefully control the relative levels of spliced products versus functional guide RNAs. Our results suggest that IStron transcripts have evolved a sensitive equilibrium to balance competing and mutually exclusive activities that promote transposon maintenance while limiting adverse fitness costs on the host. Collectively, this work explains how diverse enzymatic activities emerged during the selfish spread of IS607-family elements and highlights molecular innovation in the multi-functional utility of transposon-encoded noncoding RNAs.
Collapse
Affiliation(s)
- Rimantė Žedaveinytė
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
| | - Chance Meers
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
| | - Hoang C. Le
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
| | - Edan E. Mortman
- Department of Genetics and Development, Columbia University; New York, NY 10032, USA
| | - Stephen Tang
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
| | - George D. Lampe
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
| | - Sanjana R. Pesari
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
- Present address: Biochemistry and Molecular Biophysics Program, University of California, San Diego, CA, USA
| | - Diego R. Gelsinger
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
| | - Tanner Wiegand
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
| | - Samuel H. Sternberg
- Department of Biochemistry and Molecular Biophysics, Columbia University; New York, NY 10032, USA
| |
Collapse
|
14
|
Zhang M, Li K, Bai J, Van Damme R, Zhang W, Alba M, Stiles BL, Chen JF, Lu Z. A snoRNA-tRNA modification network governs codon-biased cellular states. Proc Natl Acad Sci U S A 2023; 120:e2312126120. [PMID: 37792516 PMCID: PMC10576143 DOI: 10.1073/pnas.2312126120] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 09/06/2023] [Indexed: 10/06/2023] Open
Abstract
The dynamic balance between tRNA supply and codon usage demand is a fundamental principle in the cellular translation economy. However, the regulation and functional consequences of this balance remain unclear. Here, we use PARIS2 interactome capture, structure modeling, conservation analysis, RNA-protein interaction analysis, and modification mapping to reveal the targets of hundreds of snoRNAs, many of which were previously considered orphans. We identify a snoRNA-tRNA interaction network that is required for global tRNA modifications, including 2'-O-methylation and others. Loss of Fibrillarin, the snoRNA-guided 2'-O-methyltransferase, induces global upregulation of tRNA fragments, a large group of regulatory RNAs. In particular, the snoRNAs D97/D133 guide the 2'-O-methylation of multiple tRNAs, especially for the amino acid methionine (Met), a protein-intrinsic antioxidant. Loss of D97/D133 snoRNAs in human HEK293 cells reduced target tRNA levels and induced codon adaptation of the transcriptome and translatome. Both single and double knockouts of D97 and D133 in HEK293 cells suppress Met-enriched proliferation-related gene expression programs, including, translation, splicing, and mitochondrial energy metabolism, and promote Met-depleted programs related to development, differentiation, and morphogenesis. In a mouse embryonic stem cell model of development, knockdown and knockout of D97/D133 promote differentiation to mesoderm and endoderm fates, such as cardiomyocytes, without compromising pluripotency, consistent with the enhanced development-related gene expression programs in human cells. This work solves a decades-old mystery about orphan snoRNAs and reveals a function of snoRNAs in controlling the codon-biased dichotomous cellular states of proliferation and development.
Collapse
Affiliation(s)
- Minjie Zhang
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA90089
| | - Kongpan Li
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA90089
| | - Jianhui Bai
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA90089
| | - Ryan Van Damme
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA90089
| | - Wei Zhang
- Center for Craniofacial Molecular Biology, University of Southern California, Los Angeles, CA90089
| | - Mario Alba
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA90089
| | - Bangyan L. Stiles
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA90089
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA90089
| | - Jian-Fu Chen
- Center for Craniofacial Molecular Biology, University of Southern California, Los Angeles, CA90089
| | - Zhipeng Lu
- Department of Pharmacology and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA90089
- Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA90089
| |
Collapse
|
15
|
Meers C, Le HC, Pesari SR, Hoffmann FT, Walker MWG, Gezelle J, Tang S, Sternberg SH. Transposon-encoded nucleases use guide RNAs to promote their selfish spread. Nature 2023; 622:863-871. [PMID: 37758954 DOI: 10.1038/s41586-023-06597-1] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 08/31/2023] [Indexed: 09/29/2023]
Abstract
Insertion sequences are compact and pervasive transposable elements found in bacteria, which encode only the genes necessary for their mobilization and maintenance1. IS200- and IS605-family transposons undergo 'peel-and-paste' transposition catalysed by a TnpA transposase2, but they also encode diverse, TnpB- and IscB-family proteins that are evolutionarily related to the CRISPR-associated effectors Cas12 and Cas9, respectively3,4. Recent studies have demonstrated that TnpB and IscB function as RNA-guided DNA endonucleases5,6, but the broader biological role of this activity has remained enigmatic. Here we show that TnpB and IscB are essential to prevent permanent transposon loss as a consequence of the TnpA transposition mechanism. We selected a family of related insertion sequences from Geobacillus stearothermophilus that encode several TnpB and IscB orthologues, and showed that a single TnpA transposase was broadly active for transposon mobilization. The donor joints formed upon religation of transposon-flanking sequences were efficiently targeted for cleavage by RNA-guided TnpB and IscB nucleases, and co-expression of TnpB and TnpA led to substantially greater transposon retention relative to conditions in which TnpA was expressed alone. Notably, TnpA and TnpB also stimulated recombination frequencies, surpassing rates observed with TnpB alone. Collectively, this study reveals that RNA-guided DNA cleavage arose as a primal biochemical activity to bias the selfish inheritance and spread of transposable elements, which was later co-opted during the evolution of CRISPR-Cas adaptive immunity for antiviral defence.
Collapse
Affiliation(s)
- Chance Meers
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Hoang C Le
- Department of Biology, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Sanjana R Pesari
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
- Biochemistry and Molecular Biophysics Program, University of California, San Diego, CA, USA
| | - Florian T Hoffmann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Matt W G Walker
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Jeanine Gezelle
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Stephen Tang
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Samuel H Sternberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
| |
Collapse
|
16
|
Dupont MJ, Major F. D-ORB: A Web Server to Extract Structural Features of Related But Unaligned RNA Sequences. J Mol Biol 2023; 435:168181. [PMID: 37468182 DOI: 10.1016/j.jmb.2023.168181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 06/02/2023] [Accepted: 06/06/2023] [Indexed: 07/21/2023]
Abstract
Identifying the common structural elements of functionally related RNA sequences (family) is usually based on an alignment of the sequences, which is often subject to human bias and may not be accurate. The resulting covariance model (CM) provides probabilities for each base to covary with another, which allows to support evolutionarily the formation of double helical regions and possibly pseudoknots. The coexistence of alternative folds in RNA, resulting from its dynamic nature, may lead to the potential omission of motifs by CM. To overcome this limitation, we present D-ORB, a system of algorithms that identifies overrepresented motifs in the secondary conformational landscapes of a family when compared to those of unrelated sequences. The algorithms are bundled into an easy-to-use website allowing users to submit a family, and optionally provide unrelated sequences. D-ORB produces a non-pseudoknotted secondary structure based on the overrepresented motifs, a deep neural network classifier and two decision trees. When used to model an Rfam family, D-ORB fits overrepresented motifs in the corresponding Rfam structure; more than a hundred Rfam families have been modeled. The statistical approach behind D-ORB derives the structural composition of an RNA family, making it a valuable tool for analyzing and modeling it. Its easy-to-use interface and advanced algorithms make it an essential resource for researchers studying RNA structure. D-ORB is available at https://d-orb.major.iric.ca/.
Collapse
Affiliation(s)
- Mathieu J Dupont
- Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada
| | - François Major
- Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada. https://twitter.com/francois_major
| |
Collapse
|
17
|
Rivas E. RNA covariation at helix-level resolution for the identification of evolutionarily conserved RNA structure. PLoS Comput Biol 2023; 19:e1011262. [PMID: 37450549 PMCID: PMC10370758 DOI: 10.1371/journal.pcbi.1011262] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open
Abstract
Many biologically important RNAs fold into specific 3D structures conserved through evolution. Knowing when an RNA sequence includes a conserved RNA structure that could lead to new biology is not trivial and depends on clues left behind by conservation in the form of covariation and variation. For that purpose, the R-scape statistical test was created to identify from alignments of RNA sequences, the base pairs that significantly covary above phylogenetic expectation. R-scape treats base pairs as independent units. However, RNA base pairs do not occur in isolation. The Watson-Crick (WC) base pairs stack together forming helices that constitute the scaffold that facilitates the formation of the non-WC base pairs, and ultimately the complete 3D structure. The helix-forming WC base pairs carry most of the covariation signal in an RNA structure. Here, I introduce a new measure of statistically significant covariation at helix-level by aggregation of the covariation significance and covariation power calculated at base-pair-level resolution. Performance benchmarks show that helix-level aggregated covariation increases sensitivity in the detection of evolutionarily conserved RNA structure without sacrificing specificity. This additional helix-level sensitivity reveals an artifact that results from using covariation to build an alignment for a hypothetical structure and then testing the alignment for whether its covariation significantly supports the structure. Helix-level reanalysis of the evolutionary evidence for a selection of long non-coding RNAs (lncRNAs) reinforces the evidence against these lncRNAs having a conserved secondary structure.
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
18
|
Gao W, Yang A, Rivas E. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life 2023; 75:471-492. [PMID: 36495545 PMCID: PMC11234323 DOI: 10.1002/iub.2694] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 10/24/2022] [Indexed: 12/14/2022]
Abstract
Covariation induced by compensatory base substitutions in RNA alignments is a great way to deduce conserved RNA structure, in principle. In practice, success depends on many factors, importantly the quality and depth of the alignment and the choice of covariation statistic. Measuring covariation between pairs of aligned positions is easy. However, using covariation to infer evolutionarily conserved RNA structure is complicated by other extraneous sources of covariation such as that resulting from homologous sequences having evolved from a common ancestor. In order to provide evidence of evolutionarily conserved RNA structure, a method to distinguish covariation due to sources other than RNA structure is necessary. Moreover, there are several sorts of artifactually generated covariation signals that can further confound the analysis. Additionally, some covariation signal is difficult to detect due to incomplete comparative data. Here, we investigate and critically discuss the practice of inferring conserved RNA structure by comparative sequence analysis. We provide new methods on how to approach and decide which of the numerous long non-coding RNAs (lncRNAs) have biologically relevant structures.
Collapse
Affiliation(s)
- William Gao
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ann Yang
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
19
|
How does precursor RNA structure influence RNA processing and gene expression? Biosci Rep 2023; 43:232489. [PMID: 36689327 PMCID: PMC9977717 DOI: 10.1042/bsr20220149] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 01/17/2023] [Accepted: 01/23/2023] [Indexed: 01/24/2023] Open
Abstract
RNA is a fundamental biomolecule that has many purposes within cells. Due to its single-stranded and flexible nature, RNA naturally folds into complex and dynamic structures. Recent technological and computational advances have produced an explosion of RNA structural data. Many RNA structures have regulatory and functional properties. Studying the structure of nascent RNAs is particularly challenging due to their low abundance and long length, but their structures are important because they can influence RNA processing. Precursor RNA processing is a nexus of pathways that determines mature isoform composition and that controls gene expression. In this review, we examine what is known about human nascent RNA structure and the influence of RNA structure on processing of precursor RNAs. These known structures provide examples of how other nascent RNAs may be structured and show how novel RNA structures may influence RNA processing including splicing and polyadenylation. RNA structures can be targeted therapeutically to treat disease.
Collapse
|
20
|
Meers C, Le H, Pesari SR, Hoffmann FT, Walker MW, Gezelle J, Sternberg SH. Transposon-encoded nucleases use guide RNAs to selfishly bias their inheritance. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.14.532601. [PMID: 36993599 PMCID: PMC10055086 DOI: 10.1101/2023.03.14.532601] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Insertion sequences (IS) are compact and pervasive transposable elements found in bacteria, which encode only the genes necessary for their mobilization and maintenance. IS 200 /IS 605 elements undergo 'peel-and-paste' transposition catalyzed by a TnpA transposase, but intriguingly, they also encode diverse, TnpB- and IscB-family proteins that are evolutionarily related to the CRISPR-associated effectors Cas12 and Cas9, respectively. Recent studies demonstrated that TnpB-family enzymes function as RNA-guided DNA endonucleases, but the broader biological role of this activity has remained enigmatic. Here we show that TnpB/IscB are essential to prevent permanent transposon loss as a consequence of the TnpA transposition mechanism. We selected a family of related IS elements from Geobacillus stearothermophilus that encode diverse TnpB/IscB orthologs, and showed that a single TnpA transposase was active for transposon excision. The donor joints formed upon religation of IS-flanking sequences were efficiently targeted for cleavage by RNA-guided TnpB/IscB nucleases, and co-expression of TnpB together with TnpA led to significantly greater transposon retention, relative to conditions in which TnpA was expressed alone. Remarkably, TnpA and TnpB/IscB recognize the same AT-rich transposon-adjacent motif (TAM) during transposon excision and RNA-guided DNA cleavage, respectively, revealing a striking convergence in the evolution of DNA sequence specificity between collaborating transposase and nuclease proteins. Collectively, our study reveals that RNA-guided DNA cleavage is a primal biochemical activity that arose to bias the selfish inheritance and spread of transposable elements, which was later co-opted during the evolution of CRISPR-Cas adaptive immunity for antiviral defense.
Collapse
Affiliation(s)
- Chance Meers
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY
| | - Hoang Le
- Department of Biology, University of Pennsylvania, Philadelphia, PA
| | - Sanjana R. Pesari
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY
| | - Florian T. Hoffmann
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY
| | - Matt W.G. Walker
- Department of Biological Sciences, Columbia University, New York, NY
| | - Jeanine Gezelle
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY
| | - Samuel H. Sternberg
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY
| |
Collapse
|
21
|
Hollar A, Bursey H, Jabbari H. Pseudoknots in RNA Structure Prediction. Curr Protoc 2023; 3:e661. [PMID: 36779804 DOI: 10.1002/cpz1.661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023]
Abstract
RNA molecules play active roles in the cell and are important for numerous applications in biotechnology and medicine. The function of an RNA molecule stems from its structure. RNA structure determination is time consuming, challenging, and expensive using experimental methods. Thus, much research has been directed at RNA structure prediction through computational means. Many of these methods focus primarily on the secondary structure of the molecule, ignoring the possibility of pseudoknotted structures. However, pseudoknots are known to play functional roles in many RNA molecules or in their method of interaction with other molecules. Improving the accuracy and efficiency of computational methods that predict pseudoknots is an ongoing challenge for single RNA molecules, RNA-RNA interactions, and RNA-protein interactions. To improve the accuracy of prediction, many methods focus on specific applications while restricting the length and the class of the pseudoknotted structures they can identify. In recent years, computational methods for structure prediction have begun to catch up with the impressive developments seen in biotechnology. Here, we provide a non-comprehensive overview of available pseudoknot prediction methods and their best-use cases. © 2023 Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Andrew Hollar
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hunter Bursey
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada
| |
Collapse
|
22
|
Biesiada M, Hu MY, Williams LD, Purzycka KJ, Petrov AS. rRNA expansion segment 7 in eukaryotes: from Signature Fold to tentacles. Nucleic Acids Res 2022; 50:10717-10732. [PMID: 36200812 PMCID: PMC9561286 DOI: 10.1093/nar/gkac844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 09/13/2022] [Accepted: 09/22/2022] [Indexed: 11/14/2022] Open
Abstract
The ribosomal core is universally conserved across the tree of life. However, eukaryotic ribosomes contain diverse rRNA expansion segments (ESs) on their surfaces. Sites of ES insertions are predicted from sites of insertion of micro-ESs in archaea. Expansion segment 7 (ES7) is one of the most diverse regions of the ribosome, emanating from a short stem loop and ranging to over 750 nucleotides in mammals. We present secondary and full-atom 3D structures of ES7 from species spanning eukaryotic diversity. Our results are based on experimental 3D structures, the accretion model of ribosomal evolution, phylogenetic relationships, multiple sequence alignments, RNA folding algorithms and 3D modeling by RNAComposer. ES7 contains a distinct motif, the 'ES7 Signature Fold', which is generally invariant in 2D topology and 3D structure in all eukaryotic ribosomes. We establish a model in which ES7 developed over evolution through a series of elementary and recursive growth events. The data are sufficient to support an atomic-level accretion path for rRNA growth. The non-monophyletic distribution of some ES7 features across the phylogeny suggests acquisition via convergent processes. And finally, illustrating the power of our approach, we constructed the 2D and 3D structure of the entire LSU rRNA of Mus musculus.
Collapse
Affiliation(s)
- Marcin Biesiada
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan 61-704, Poland
| | - Michael Y Hu
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Loren Dean Williams
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Katarzyna J Purzycka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan 61-704, Poland
| | - Anton S Petrov
- Center for the Origins of Life, Georgia Institute of Technology, Atlanta, GA 30332, USA.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
23
|
Zhang J, Fei Y, Sun L, Zhang QC. Advances and opportunities in RNA structure experimental determination and computational modeling. Nat Methods 2022; 19:1193-1207. [PMID: 36203019 DOI: 10.1038/s41592-022-01623-y] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Accepted: 08/23/2022] [Indexed: 11/09/2022]
Abstract
Beyond transferring genetic information, RNAs are molecules with diverse functions that include catalyzing biochemical reactions and regulating gene expression. Most of these activities depend on RNAs' specific structures. Therefore, accurately determining RNA structure is integral to advancing our understanding of RNA functions. Here, we summarize the state-of-the-art experimental and computational technologies developed to evaluate RNA secondary and tertiary structures. We also highlight how the rapid increase of experimental data facilitates the integrative modeling approaches for better resolving RNA structures. Finally, we provide our thoughts on the latest advances and challenges in RNA structure determination methods, as well as on future directions for both experimental approaches and artificial intelligence-based computational tools to model RNA structure. Ultimately, we hope the technological advances will deepen our understanding of RNA biology and facilitate RNA structure-based biomedical research such as designing specific RNA structures for therapeutics and deploying RNA-targeting small-molecule drugs.
Collapse
Affiliation(s)
- Jinsong Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.,Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Yuhan Fei
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China.,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China.,Tsinghua-Peking Center for Life Sciences, Beijing, China
| | - Lei Sun
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China. .,Tsinghua-Peking Center for Life Sciences, Beijing, China.
| | - Qiangfeng Cliff Zhang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China. .,Beijing Advanced Innovation Center for Structural Biology & Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing, China. .,Tsinghua-Peking Center for Life Sciences, Beijing, China.
| |
Collapse
|
24
|
Variant-Specific Analysis Reveals a Novel Long-Range RNA-RNA Interaction in SARS-CoV-2 Orf1a. Int J Mol Sci 2022; 23:ijms231911050. [PMID: 36232353 PMCID: PMC9570297 DOI: 10.3390/ijms231911050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/05/2022] [Accepted: 09/08/2022] [Indexed: 01/08/2023] Open
Abstract
Since the start of the COVID-19 pandemic, understanding the pathology of the SARS-CoV-2 RNA virus and its life cycle has been the priority of many researchers. Currently, new variants of the virus have emerged with various levels of pathogenicity and abundance within the human-host population. Although much of viral pathogenicity is attributed to the viral Spike protein’s binding affinity to human lung cells’ ACE2 receptor, comprehensive knowledge on the distinctive features of viral variants that might affect their life cycle and pathogenicity is yet to be attained. Recent in vivo studies into the RNA structure of the SARS-CoV-2 genome have revealed certain long-range RNA-RNA interactions. Using in silico predictions and a large population of SARS-CoV-2 sequences, we observed variant-specific evolutionary changes for certain long-range RRIs. We also found statistical evidence for the existence of one of the thermodynamic-based RRI predictions, namely Comp1, in the Beta variant sequences. A similar test that disregarded sequence variant information did not, however, lead to significant results. When performing population-based analyses, aggregate tests may fail to identify novel interactions due to variant-specific changes. Variant-specific analyses can result in de novo RRI identification.
Collapse
|
25
|
Omoru OB, Pereira F, Janga SC, Manzourolajdad A. A Putative long-range RNA-RNA interaction between ORF8 and Spike of SARS-CoV-2. PLoS One 2022; 17:e0260331. [PMID: 36048827 PMCID: PMC9436084 DOI: 10.1371/journal.pone.0260331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 06/22/2022] [Indexed: 12/15/2022] Open
Abstract
SARS-CoV-2 has affected people worldwide as the causative agent of COVID-19. The virus is related to the highly lethal SARS-CoV-1 responsible for the 2002-2003 SARS outbreak in Asia. Research is ongoing to understand why both viruses have different spreading capacities and mortality rates. Like other beta coronaviruses, RNA-RNA interactions occur between different parts of the viral genomic RNA, resulting in discontinuous transcription and production of various sub-genomic RNAs. These sub-genomic RNAs are then translated into other viral proteins. In this work, we performed a comparative analysis for novel long-range RNA-RNA interactions that may involve the Spike region. Comparing in-silico fragment-based predictions between reference sequences of SARS-CoV-1 and SARS-CoV-2 revealed several predictions amongst which a thermodynamically stable long-range RNA-RNA interaction between (23660-23703 Spike) and (28025-28060 ORF8) unique to SARS-CoV-2 was observed. The patterns of sequence variation using data gathered worldwide further supported the predicted stability of the sub-interacting region (23679-23690 Spike) and (28031-28042 ORF8). Such RNA-RNA interactions can potentially impact viral life cycle including sub-genomic RNA production rates.
Collapse
Affiliation(s)
- Okiemute Beatrice Omoru
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
| | - Filipe Pereira
- Centre for Functional Ecology, Department of Life Sciences, University of Coimbra, Coimbra, Portugal
- IDENTIFICA Genetic Testing, Maia, Portugal
| | - Sarath Chandra Janga
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Medical Research and Library Building, Indianapolis, Indiana, United States of America
- Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, 5021 Health Information and Translational Sciences (HITS), Indianapolis, Indiana, United States of America
| | - Amirhossein Manzourolajdad
- Department of Biohealth Informatics, School of Informatics and Computing, Indiana University Purdue University, Indianapolis, IN, United States of America
- Department of Computer Science, Colgate University, Hamilton, NY, United States of America
| |
Collapse
|
26
|
Westhof E. Data, data, burning deep, in the forests of the net. Biochem Biophys Res Commun 2022; 633:42-44. [DOI: 10.1016/j.bbrc.2022.09.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/07/2022] [Indexed: 11/28/2022]
|
27
|
Gray M, Chester S, Jabbari H. KnotAli: informed energy minimization through the use of evolutionary information. BMC Bioinformatics 2022; 23:159. [PMID: 35505276 PMCID: PMC9063079 DOI: 10.1186/s12859-022-04673-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/05/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Improving the prediction of structures, especially those containing pseudoknots (structures with crossing base pairs) is an ongoing challenge. Homology-based methods utilize structural similarities within a family to predict the structure. However, their prediction is limited to the consensus structure, and by the quality of the alignment. Minimum free energy (MFE) based methods, on the other hand, do not rely on familial information and can predict structures of novel RNA molecules. Their prediction normally suffers from inaccuracies due to their underlying energy parameters. RESULTS We present a new method for prediction of RNA pseudoknotted secondary structures that combines the strengths of MFE prediction and alignment-based methods. KnotAli takes a multiple RNA sequence alignment as input and uses covariation and thermodynamic energy minimization to predict possibly pseudoknotted secondary structures for each individual sequence in the alignment. We compared KnotAli's performance to that of three other alignment-based programs, two that can handle pseudoknotted structures and one control, on a large data set of 3034 RNA sequences with varying lengths and levels of sequence conservation from 10 families with pseudoknotted and pseudoknot-free reference structures. We produced sequence alignments for each family using two well-known sequence aligners (MUSCLE and MAFFT). CONCLUSIONS We found KnotAli's performance to be superior in 6 of the 10 families for MUSCLE and 7 of the 10 for MAFFT. While both KnotAli and Cacofold use background noise correction strategies, we found KnotAli's predictions to be less dependent on the alignment quality. KnotAli can be found online at the Zenodo image: https://doi.org/10.5281/zenodo.5794719.
Collapse
Affiliation(s)
- Mateo Gray
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Sean Chester
- Department of Computer Science, University of Victoria, Victoria, Canada
| | - Hosna Jabbari
- Department of Computer Science, University of Victoria, Victoria, Canada. .,Institute on Aging and Lifelong Health, University of Victoria, Victoria, Canada.
| |
Collapse
|
28
|
Abstract
Noncoding RNAs with secondary structures play important roles in CRISPR-Cas systems. Many of these structures likely remain undiscovered. We used a large-scale comparative genomics approach to predict 156 novel candidate structured RNAs from 36,111 CRISPR-Cas systems. A number of these were found to overlap with coding genes, including palindromic candidates that overlapped with a variety of Cas genes in type I and III systems. Among these 156 candidates, we identified 46 new models of CRISPR direct repeats and 1 tracrRNA. This tracrRNA model occasionally overlapped with predicted cas9 coding regions, emphasizing the importance of expanding our search windows for novel structure RNAs in coding regions. We also demonstrated that the antirepeat sequence in this tracrRNA model can be used to accurately assign thousands of predicted CRISPR arrays to type II-C systems. This study highlights the importance of unbiased identification of candidate structured RNAs across CRISPR-Cas systems.
Collapse
Affiliation(s)
- Brayon J. Fremin
- Department of Energy, Joint Genome Institute, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Nikos C. Kyrpides
- Department of Energy, Joint Genome Institute, Berkeley, CA, USA
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| |
Collapse
|
29
|
Huff AL, Jaffee EM, Zaidi N. Messenger RNA vaccines for cancer immunotherapy: progress promotes promise. J Clin Invest 2022; 132:e156211. [PMID: 35289317 PMCID: PMC8920340 DOI: 10.1172/jci156211] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The COVID-19 pandemic has elevated mRNA vaccines to global recognition due to their unprecedented success rate in protecting against a deadly virus. This international success is underscored by the remarkable versatility, favorable immunogenicity, and overall safety of the mRNA platform in diverse populations. Although mRNA vaccines have been studied in preclinical models and patients with cancer for almost three decades, development has been slow. The recent technological advances responsible for the COVID-19 vaccines have potential implications for successfully adapting this vaccine platform for cancer therapeutics. Here we discuss the lessons learned along with the chemical, biologic, and immunologic adaptations needed to optimize mRNA technology to successfully treat cancers.
Collapse
Affiliation(s)
- Amanda L. Huff
- Department of Oncology
- The Sidney Kimmel Comprehensive Cancer Center
- The Skip Viragh Center for Pancreatic Cancer Research and Clinical Care
- The Bloomberg-Kimmel Institute for Cancer Immunotherapy, and
- The Cancer Convergence Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Elizabeth M. Jaffee
- Department of Oncology
- The Sidney Kimmel Comprehensive Cancer Center
- The Skip Viragh Center for Pancreatic Cancer Research and Clinical Care
- The Bloomberg-Kimmel Institute for Cancer Immunotherapy, and
- The Cancer Convergence Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Neeha Zaidi
- Department of Oncology
- The Sidney Kimmel Comprehensive Cancer Center
- The Skip Viragh Center for Pancreatic Cancer Research and Clinical Care
- The Bloomberg-Kimmel Institute for Cancer Immunotherapy, and
- The Cancer Convergence Institute, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
30
|
Tompkins VS, Rouse WB, O’Leary CA, Andrews RJ, Moss WN. Analyses of human cancer driver genes uncovers evolutionarily conserved RNA structural elements involved in posttranscriptional control. PLoS One 2022; 17:e0264025. [PMID: 35213597 PMCID: PMC8880891 DOI: 10.1371/journal.pone.0264025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 02/01/2022] [Indexed: 12/02/2022] Open
Abstract
Experimental breakthroughs have provided unprecedented insights into the genes involved in cancer. The identification of such cancer driver genes is a major step in gaining a fuller understanding of oncogenesis and provides novel lists of potential therapeutic targets. A key area that requires additional study is the posttranscriptional control mechanisms at work in cancer driver genes. This is important not only for basic insights into the biology of cancer, but also to advance new therapeutic modalities that target RNA—an emerging field with great promise toward the treatment of various cancers. In the current study we performed an in silico analysis on the transcripts associated with 800 cancer driver genes (10,390 unique transcripts) that identified 179,190 secondary structural motifs with evidence of evolutionarily ordered structures with unusual thermodynamic stability. Narrowing to one transcript per gene, 35,426 predicted structures were subjected to phylogenetic comparisons of sequence and structural conservation. This identified 7,001 RNA secondary structures embedded in transcripts with evidence of covariation between paired sites, supporting structure models and suggesting functional significance. A select set of seven structures were tested in vitro for their ability to regulate gene expression; all were found to have significant effects. These results indicate potentially widespread roles for RNA structure in posttranscriptional control of human cancer driver genes.
Collapse
Affiliation(s)
- Van S. Tompkins
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Warren B. Rouse
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Collin A. O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Ryan J. Andrews
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
| | - Walter N. Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA, United States of America
- * E-mail:
| |
Collapse
|
31
|
Seemann SE, Mirza AH, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Workman CT, Pociot F, Tommerup N, Gorodkin J, Ruzzo WL. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2452-2463. [PMID: 35188540 PMCID: PMC8934657 DOI: 10.1093/nar/gkac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.
Collapse
Affiliation(s)
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
| | | | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Cellular and Molecular Medicine (ICMM), University of Copenhagen, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Computer Science and Engineering and Genome Sciences, University of Washington, USA
- Fred Hutchinson Cancer Research Center, Seattle, USA
| |
Collapse
|
32
|
Vicens Q, Kieft JS. Shared properties and singularities of exoribonuclease-resistant RNAs in viruses. Comput Struct Biotechnol J 2021; 19:4373-4380. [PMID: 34471487 PMCID: PMC8374639 DOI: 10.1016/j.csbj.2021.07.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/21/2021] [Accepted: 07/23/2021] [Indexed: 11/29/2022] Open
Abstract
What viral RNA genomes lack in size, they make up for in intricacy. Elaborate RNA structures embedded in viral genomes can hijack essential cellular mechanisms aiding virus propagation. Exoribonuclease-resistant RNAs (xrRNAs) are an emerging class of viral elements, which resist degradation by host cellular exoribonucleases to produce viral RNAs with diverse roles during infection. Detailed three-dimensional structural studies of xrRNAs from flaviviruses and a subset of plant viruses led to a mechanistic model in which xrRNAs block enzymatic digestion using a ring-like structure that encircles the 5' end of the resistant structure. In this mini-review, we describe the state of our understanding of the phylogenetic distribution of xrRNAs, their structures, and their conformational dynamics. Because xrRNAs have now been found in several major superfamilies of RNA viruses, they may represent a more widely used strategy than currently appreciated. Could xrRNAs represent a 'molecular clock' that would help us understand virus evolution and pathogenicity? The more we study xrRNAs in viruses, the closer we get to finding xrRNAs within cellular RNAs.
Collapse
Affiliation(s)
- Quentin Vicens
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, CO 80045, USA
- RNA BioScience Initiative, University of Colorado Denver School of Medicine, Aurora, CO 80045, USA
| | - Jeffrey S. Kieft
- Department of Biochemistry and Molecular Genetics, University of Colorado Denver School of Medicine, Aurora, CO 80045, USA
- RNA BioScience Initiative, University of Colorado Denver School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
33
|
Gao W, Jones TA, Rivas E. Discovery of 17 conserved structural RNAs in fungi. Nucleic Acids Res 2021; 49:6128-6143. [PMID: 34086938 PMCID: PMC8216456 DOI: 10.1093/nar/gkab355] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/25/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Many non-coding RNAs with known functions are structurally conserved: their intramolecular secondary and tertiary interactions are maintained across evolutionary time. Consequently, the presence of conserved structure in multiple sequence alignments can be used to identify candidate functional non-coding RNAs. Here, we present a bioinformatics method that couples iterative homology search with covariation analysis to assess whether a genomic region has evidence of conserved RNA structure. We used this method to examine all unannotated regions of five well-studied fungal genomes (Saccharomyces cerevisiae, Candida albicans, Neurospora crassa, Aspergillus fumigatus, and Schizosaccharomyces pombe). We identified 17 novel structurally conserved non-coding RNA candidates, which include four H/ACA box small nucleolar RNAs, four intergenic RNAs and nine RNA structures located within the introns and untranslated regions (UTRs) of mRNAs. For the two structures in the 3' UTRs of the metabolic genes GLY1 and MET13, we performed experiments that provide evidence against them being eukaryotic riboswitches.
Collapse
Affiliation(s)
- William Gao
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
| | - Thomas A Jones
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, USA
| |
Collapse
|
34
|
Chen SC, Olsthoorn RCL, Yu CH. Structural phylogenetic analysis reveals lineage-specific RNA repetitive structural motifs in all coronaviruses and associated variations in SARS-CoV-2. Virus Evol 2021; 7:veab021. [PMID: 34141447 PMCID: PMC8206606 DOI: 10.1093/ve/veab021] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
In many single-stranded (ss) RNA viruses, the cis-acting packaging signal that confers selectivity genome packaging usually encompasses short structured RNA repeats. These structural units, termed repetitive structural motifs (RSMs), potentially mediate capsid assembly by specific RNA–protein interactions. However, general knowledge of the conservation and/or the diversity of RSMs in the positive-sense ssRNA coronaviruses (CoVs) is limited. By performing structural phylogenetic analysis, we identified a variety of RSMs in nearly all CoV genomic RNAs, which are exclusively located in the 5′-untranslated regions (UTRs) and/or in the inter-domain regions of poly-protein 1ab coding sequences in a lineage-specific manner. In all alpha- and beta-CoVs, except for Embecovirus spp, two to four copies of 5′-gUUYCGUc-3′ RSMs displaying conserved hexa-loop sequences were generally identified in Stem-loop 5 (SL5) located in the 5′-UTRs of genomic RNAs. In Embecovirus spp., however, two to eight copies of 5′-agc-3′/guAAu RSMs were found in the coding regions of non-structural protein (NSP) 3 and/or NSP15 in open reading frame (ORF) 1ab. In gamma- and delta-CoVs, other types of RSMs were found in several clustered structural elements in 5′-UTRs and/or ORF1ab. The identification of RSM-encompassing structural elements in all CoVs suggests that these RNA elements play fundamental roles in the life cycle of CoVs. In the recently emerged SARS-CoV-2, beta-CoV-specific RSMs are also found in its SL5, displaying two copies of 5′-gUUUCGUc-3′ motifs. However, multiple sequence alignment reveals that the majority of SARS-CoV-2 possesses a variant RSM harboring SL5b C241U, and intriguingly, several variations in the coding sequences of viral proteins, such as Nsp12 P323L, S protein D614G, and N protein R203K-G204R, are concurrently found with such variant RSM. In conclusion, the comprehensive exploration for RSMs reveals phylogenetic insights into the RNA structural elements in CoVs as a whole and provides a new perspective on variations currently found in SARS-CoV-2.
Collapse
Affiliation(s)
- Shih-Cheng Chen
- Department of Biochemistry and Molecular Biology, College of Medicine, National Cheng-Kung University, No.1, University Road, Tainan City 701, Taiwan
| | - René C L Olsthoorn
- Department of Supramolecular Biomaterials Chemistry, Leiden Institute of Chemistry, Gorlaeus Laboratories, Leiden University, Einsteinweg 55, 2333 CC, Leiden,The Netherlands
| | - Chien-Hung Yu
- Department of Biochemistry and Molecular Biology, College of Medicine, National Cheng-Kung University, No.1, University Road, Tainan City 701, Taiwan
| |
Collapse
|
35
|
Andrews RJ, O’Leary CA, Tompkins VS, Peterson JM, Haniff H, Williams C, Disney MD, Moss WN. A map of the SARS-CoV-2 RNA structurome. NAR Genom Bioinform 2021; 3:lqab043. [PMID: 34046592 PMCID: PMC8140738 DOI: 10.1093/nargab/lqab043] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Revised: 04/06/2021] [Accepted: 04/28/2021] [Indexed: 12/11/2022] Open
Abstract
SARS-CoV-2 has exploded throughout the human population. To facilitate efforts to gain insights into SARS-CoV-2 biology and to target the virus therapeutically, it is essential to have a roadmap of likely functional regions embedded in its RNA genome. In this report, we used a bioinformatics approach, ScanFold, to deduce the local RNA structural landscape of the SARS-CoV-2 genome with the highest likelihood of being functional. We recapitulate previously-known elements of RNA structure and provide a model for the folding of an essential frameshift signal. Our results find that SARS-CoV-2 is greatly enriched in unusually stable and likely evolutionarily ordered RNA structure, which provides a large reservoir of potential drug targets for RNA-binding small molecules. Results are enhanced via the re-analyses of publicly-available genome-wide biochemical structure probing datasets that are broadly in agreement with our models. Additionally, ScanFold was updated to incorporate experimental data as constraints in the analysis to facilitate comparisons between ScanFold and other RNA modelling approaches. Ultimately, ScanFold was able to identify eight highly structured/conserved motifs in SARS-CoV-2 that agree with experimental data, without explicitly using these data. All results are made available via a public database (the RNAStructuromeDB: https://structurome.bb.iastate.edu/sars-cov-2) and model comparisons are readily viewable at https://structurome.bb.iastate.edu/sars-cov-2-global-model-comparisons.
Collapse
Affiliation(s)
- Ryan J Andrews
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Collin A O’Leary
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Van S Tompkins
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Jake M Peterson
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Hafeez S Haniff
- Department of Chemistry, The Scripps Research Institute, Jupiter, FL 33458, USA
| | | | - Matthew D Disney
- Department of Chemistry, The Scripps Research Institute, Jupiter, FL 33458, USA
| | - Walter N Moss
- Roy J. Carver Department of Biophysics, Biochemistry and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
36
|
Rivas E. Evolutionary conservation of RNA sequence and structure. WILEY INTERDISCIPLINARY REVIEWS-RNA 2021; 12:e1649. [PMID: 33754485 PMCID: PMC8250186 DOI: 10.1002/wrna.1649] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 02/24/2021] [Accepted: 02/25/2021] [Indexed: 12/22/2022]
Abstract
An RNA structure prediction from a single‐sequence RNA folding program is not evidence for an RNA whose structure is important for function. Random sequences have plausible and complex predicted structures not easily distinguishable from those of structural RNAs. How to tell when an RNA has a conserved structure is a question that requires looking at the evolutionary signature left by the conserved RNA. This question is important not just for long noncoding RNAs which usually lack an identified function, but also for RNA binding protein motifs which can be single stranded RNAs or structures. Here we review recent advances using sequence and structural analysis to determine when RNA structure is conserved or not. Although covariation measures assess structural RNA conservation, one must distinguish covariation due to RNA structure from covariation due to independent phylogenetic substitutions. We review a statistical test to measure false positives expected under the null hypothesis of phylogenetic covariation alone (specificity). We also review a complementary test that measures power, that is, expected covariation derived from sequence variation alone (sensitivity). Power in the absence of covariation signals the absence of a conserved RNA structure. We analyze artifacts that falsely identify conserved RNA structure such as the misuse of programs that do not assess significance, the use of inappropriate statistics confounded by signals other than covariation, or misalignments that induce spurious covariation. Among artifacts that obscure the signal of a conserved RNA structure, we discuss the inclusion of pseudogenes in alignments which increase power but destroy covariation. This article is categorized under:RNA Structure and Dynamics > RNA Structure, Dynamics and Chemistry RNA Evolution and Genomics > Computational Analyses of RNA RNA Evolution and Genomics > RNA and Ribonucleoprotein Evolution
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
37
|
Singh J, Paliwal K, Zhang T, Singh J, Litfin T, Zhou Y. Improved RNA Secondary Structure and Tertiary Base-pairing Prediction Using Evolutionary Profile, Mutational Coupling and Two-dimensional Transfer Learning. Bioinformatics 2021; 37:2589-2600. [PMID: 33704363 DOI: 10.1093/bioinformatics/btab165] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Revised: 02/05/2021] [Accepted: 03/08/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is hampered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. RESULTS The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, noncanonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving >0.8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. AVAILABILITY Standalone-version of SPOT-RNA2 is available at https://github.com/jaswindersingh2/SPOT-RNA2. Direct prediction can also be made at https://sparks-lab.org/server/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above.
Collapse
Affiliation(s)
- Jaswinder Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Tongchuan Zhang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| |
Collapse
|
38
|
Kalvari I, Nawrocki EP, Ontiveros-Palacios N, Argasinska J, Lamkiewicz K, Marz M, Griffiths-Jones S, Toffano-Nioche C, Gautheret D, Weinberg Z, Rivas E, Eddy SR, Finn RD, Bateman A, Petrov AI. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res 2021; 49:D192-D200. [PMID: 33211869 PMCID: PMC7779021 DOI: 10.1093/nar/gkaa1047] [Citation(s) in RCA: 484] [Impact Index Per Article: 121.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/14/2020] [Accepted: 10/21/2020] [Indexed: 12/15/2022] Open
Abstract
Rfam is a database of RNA families where each of the 3444 families is represented by a multiple sequence alignment of known RNA sequences and a covariance model that can be used to search for additional members of the family. Recent developments have involved expert collaborations to improve the quality and coverage of Rfam data, focusing on microRNAs, viral and bacterial RNAs. We have completed the first phase of synchronising microRNA families in Rfam and miRBase, creating 356 new Rfam families and updating 40. We established a procedure for comprehensive annotation of viral RNA families starting with Flavivirus and Coronaviridae RNAs. We have also increased the coverage of bacterial and metagenome-based RNA families from the ZWD database. These developments have enabled a significant growth of the database, with the addition of 759 new families in Rfam 14. To facilitate further community contribution to Rfam, expert users are now able to build and submit new families using the newly developed Rfam Cloud family curation system. New Rfam website features include a new sequence similarity search powered by RNAcentral, as well as search and visualisation of families with pseudoknots. Rfam is freely available at https://rfam.org.
Collapse
Affiliation(s)
- Ioanna Kalvari
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Eric P Nawrocki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Nancy Ontiveros-Palacios
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Joanna Argasinska
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Kevin Lamkiewicz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany.,European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Manja Marz
- RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany.,European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Sam Griffiths-Jones
- Faculty of Biology, Medicine and Health, University of Manchester, Oxford Road, Manchester, M13 9PT, UK
| | - Claire Toffano-Nioche
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Daniel Gautheret
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Zasha Weinberg
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Centre for Bioinformatics, Leipzig University, 04107 Leipzig, Germany
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA
| | - Sean R Eddy
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.,Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA.,John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Anton I Petrov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
39
|
Mestre MR, González-Delgado A, Gutiérrez-Rus LI, Martínez-Abarca F, Toro N. Systematic prediction of genes functionally associated with bacterial retrons and classification of the encoded tripartite systems. Nucleic Acids Res 2021; 48:12632-12647. [PMID: 33275130 PMCID: PMC7736814 DOI: 10.1093/nar/gkaa1149] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 11/05/2020] [Accepted: 11/10/2020] [Indexed: 02/06/2023] Open
Abstract
Bacterial retrons consist of a reverse transcriptase (RT) and a contiguous non-coding RNA (ncRNA) gene. One third of annotated retrons carry additional open reading frames (ORFs), the contribution and significance of which in retron biology remains to be determined. In this study we developed a computational pipeline for the systematic prediction of genes specifically associated with retron RTs based on a previously reported large dataset representative of the diversity of prokaryotic RTs. We found that retrons generally comprise a tripartite system composed of the ncRNA, the RT and an additional protein or RT-fused domain with diverse enzymatic functions. These retron systems are highly modular, and their components have coevolved to different extents. Based on the additional module, we classified retrons into 13 types, some of which include additional variants. Our findings provide a basis for future studies on the biological function of retrons and for expanding their biotechnological applications.
Collapse
Affiliation(s)
- Mario Rodríguez Mestre
- Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, C/ Profesor Albareda 1, 18008 Granada, Spain
| | - Alejandro González-Delgado
- Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, C/ Profesor Albareda 1, 18008 Granada, Spain
| | - Luis I Gutiérrez-Rus
- Departamento de Química Física. Facultad de Ciencias, Universidad de Granada, 18071 Granada, Spain
| | - Francisco Martínez-Abarca
- Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, C/ Profesor Albareda 1, 18008 Granada, Spain
| | - Nicolás Toro
- Structure, Dynamics and Function of Rhizobacterial Genomes, Grupo de Ecología Genética de la Rizosfera, Department of Soil Microbiology and Symbiotic Systems, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, C/ Profesor Albareda 1, 18008 Granada, Spain
| |
Collapse
|
40
|
Wilburn GW, Eddy SR. Remote homology search with hidden Potts models. PLoS Comput Biol 2020; 16:e1008085. [PMID: 33253143 PMCID: PMC7728182 DOI: 10.1371/journal.pcbi.1008085] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 12/10/2020] [Accepted: 10/27/2020] [Indexed: 12/03/2022] Open
Abstract
Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.
Collapse
Affiliation(s)
- Grey W. Wilburn
- Department of Physics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sean R. Eddy
- Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- John A Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|