1
|
Tong Y, Childs-Disney JL, Disney MD. Targeting RNA with small molecules, from RNA structures to precision medicines: IUPHAR review: 40. Br J Pharmacol 2024; 181:4152-4173. [PMID: 39224931 DOI: 10.1111/bph.17308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/10/2024] [Accepted: 07/09/2024] [Indexed: 09/04/2024] Open
Abstract
RNA plays important roles in regulating both health and disease biology in all kingdoms of life. Notably, RNA can form intricate three-dimensional structures, and their biological functions are dependent on these structures. Targeting the structured regions of RNA with small molecules has gained increasing attention over the past decade, because it provides both chemical probes to study fundamental biology processes and lead medicines for diseases with unmet medical needs. Recent advances in RNA structure prediction and determination and RNA biology have accelerated the rational design and development of RNA-targeted small molecules to modulate disease pathology. However, challenges remain in advancing RNA-targeted small molecules towards clinical applications. This review summarizes strategies to study RNA structures, to identify small molecules recognizing these structures, and to augment the functionality of RNA-binding small molecules. We focus on recent advances in developing RNA-targeted small molecules as potential therapeutics in a variety of diseases, encompassing different modes of actions and targeting strategies. Furthermore, we present the current gaps between early-stage discovery of RNA-binding small molecules and their clinical applications, as well as a roadmap to overcome these challenges in the near future.
Collapse
Affiliation(s)
- Yuquan Tong
- Department of Chemistry, The Scripps Research Institute, Jupiter, Florida, USA
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| | - Jessica L Childs-Disney
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| | - Matthew D Disney
- Department of Chemistry, The Scripps Research Institute, Jupiter, Florida, USA
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation & Technology, Jupiter, Florida, USA
| |
Collapse
|
2
|
Zhao Y, Oono K, Takizawa H, Kotera M. GenerRNA: A generative pre-trained language model for de novo RNA design. PLoS One 2024; 19:e0310814. [PMID: 39352899 PMCID: PMC11444397 DOI: 10.1371/journal.pone.0310814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 09/08/2024] [Indexed: 10/04/2024] Open
Abstract
The design of RNA plays a crucial role in developing RNA vaccines, nucleic acid therapeutics, and innovative biotechnological tools. However, existing techniques frequently lack versatility across various tasks and are dependent on pre-defined secondary structure or other prior knowledge. To address these limitations, we introduce GenerRNA, a Transformer-based model inspired by the success of large language models (LLMs) in protein and molecule generation. GenerRNA is pre-trained on large-scale RNA sequences and capable of generating novel RNA sequences with stable secondary structures, while ensuring distinctiveness from existing sequences, thereby expanding our exploration of the RNA space. Moreover, GenerRNA can be fine-tuned on smaller, specialized datasets for specific subtasks, enabling the generation of RNAs with desired functionalities or properties without requiring any prior knowledge input. As a demonstration, we fine-tuned GenerRNA and successfully generated novel RNA sequences exhibiting high affinity for target proteins. Our work is the first application of a generative language model to RNA generation, presenting an innovative approach to RNA design.
Collapse
|
3
|
Allan MF, Aruda J, Plung JS, Grote SL, des Taillades YJM, de Lajarte AA, Bathe M, Rouskin S. Discovery and Quantification of Long-Range RNA Base Pairs in Coronavirus Genomes with SEARCH-MaP and SEISMIC-RNA. RESEARCH SQUARE 2024:rs.3.rs-4814547. [PMID: 39149495 PMCID: PMC11326378 DOI: 10.21203/rs.3.rs-4814547/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
RNA molecules perform a diversity of essential functions for which their linear sequences must fold into higher-order structures. Techniques including crystallography and cryogenic electron microscopy have revealed 3D structures of ribosomal, transfer, and other well-structured RNAs; while chemical probing with sequencing facilitates secondary structure modeling of any RNAs of interest, even within cells. Ongoing efforts continue increasing the accuracy, resolution, and ability to distinguish coexisting alternative structures. However, no method can discover and quantify alternative structures with base pairs spanning arbitrarily long distances - an obstacle for studying viral, messenger, and long noncoding RNAs, which may form long-range base pairs. Here, we introduce the method of Structure Ensemble Ablation by Reverse Complement Hybridization with Mutational Profiling (SEARCH-MaP) and software for Structure Ensemble Inference by Sequencing, Mutation Identification, and Clustering of RNA (SEISMIC-RNA). We use SEARCH-MaP and SEISMIC-RNA to discover that the frameshift stimulating element of SARS coronavirus 2 base-pairs with another element 1 kilobase downstream in nearly half of RNA molecules, and that this structure competes with a pseudoknot that stimulates ribosomal frameshifting. Moreover, we identify long-range base pairs involving the frameshift stimulating element in other coronaviruses including SARS coronavirus 1 and transmissible gastroenteritis virus, and model the full genomic secondary structure of the latter. These findings suggest that long-range base pairs are common in coronaviruses and may regulate ribosomal frameshifting, which is essential for viral RNA synthesis. We anticipate that SEARCH-MaP will enable solving many RNA structure ensembles that have eluded characterization, thereby enhancing our general understanding of RNA structures and their functions. SEISMIC-RNA, software for analyzing mutational profiling data at any scale, could power future studies on RNA structure and is available on GitHub and the Python Package Index.
Collapse
Affiliation(s)
- Matthew F. Allan
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 02139
- Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 02139
| | - Justin Aruda
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
- Harvard Program in Biological and Biomedical Sciences, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA 02115
| | - Jesse S. Plung
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
- Harvard Program in Virology, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA 02115
| | - Scott L. Grote
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
| | | | - Albéric A. de Lajarte
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
| | - Mark Bathe
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 02139
| | - Silvi Rouskin
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
| |
Collapse
|
4
|
Allan MF, Aruda J, Plung JS, Grote SL, Martin des Taillades YJ, de Lajarte AA, Bathe M, Rouskin S. Discovery and Quantification of Long-Range RNA Base Pairs in Coronavirus Genomes with SEARCH-MaP and SEISMIC-RNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591762. [PMID: 38746332 PMCID: PMC11092567 DOI: 10.1101/2024.04.29.591762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
RNA molecules perform a diversity of essential functions for which their linear sequences must fold into higher-order structures. Techniques including crystallography and cryogenic electron microscopy have revealed 3D structures of ribosomal, transfer, and other well-structured RNAs; while chemical probing with sequencing facilitates secondary structure modeling of any RNAs of interest, even within cells. Ongoing efforts continue increasing the accuracy, resolution, and ability to distinguish coexisting alternative structures. However, no method can discover and quantify alternative structures with base pairs spanning arbitrarily long distances - an obstacle for studying viral, messenger, and long noncoding RNAs, which may form long-range base pairs. Here, we introduce the method of Structure Ensemble Ablation by Reverse Complement Hybridization with Mutational Profiling (SEARCH-MaP) and software for Structure Ensemble Inference by Sequencing, Mutation Identification, and Clustering of RNA (SEISMIC-RNA). We use SEARCH-MaP and SEISMIC-RNA to discover that the frameshift stimulating element of SARS coronavirus 2 base-pairs with another element 1 kilobase downstream in nearly half of RNA molecules, and that this structure competes with a pseudoknot that stimulates ribosomal frameshifting. Moreover, we identify long-range base pairs involving the frameshift stimulating element in other coronaviruses including SARS coronavirus 1 and transmissible gastroenteritis virus, and model the full genomic secondary structure of the latter. These findings suggest that long-range base pairs are common in coronaviruses and may regulate ribosomal frameshifting, which is essential for viral RNA synthesis. We anticipate that SEARCH-MaP will enable solving many RNA structure ensembles that have eluded characterization, thereby enhancing our general understanding of RNA structures and their functions. SEISMIC-RNA, software for analyzing mutational profiling data at any scale, could power future studies on RNA structure and is available on GitHub and the Python Package Index.
Collapse
|
5
|
Calvanese F, Lambert CN, Nghe P, Zamponi F, Weigt M. Towards parsimonious generative modeling of RNA families. Nucleic Acids Res 2024; 52:5465-5477. [PMID: 38661206 PMCID: PMC11162787 DOI: 10.1093/nar/gkae289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2023] [Revised: 03/05/2024] [Accepted: 04/05/2024] [Indexed: 04/26/2024] Open
Abstract
Generative probabilistic models emerge as a new paradigm in data-driven, evolution-informed design of biomolecular sequences. This paper introduces a novel approach, called Edge Activation Direct Coupling Analysis (eaDCA), tailored to the characteristics of RNA sequences, with a strong emphasis on simplicity, efficiency, and interpretability. eaDCA explicitly constructs sparse coevolutionary models for RNA families, achieving performance levels comparable to more complex methods while utilizing a significantly lower number of parameters. Our approach demonstrates efficiency in generating artificial RNA sequences that closely resemble their natural counterparts in both statistical analyses and SHAPE-MaP experiments, and in predicting the effect of mutations. Notably, eaDCA provides a unique feature: estimating the number of potential functional sequences within a given RNA family. For example, in the case of cyclic di-AMP riboswitches (RF00379), our analysis suggests the existence of approximately 1039 functional nucleotide sequences. While huge compared to the known <4000 natural sequences, this number represents only a tiny fraction of the vast pool of nearly 1082 possible nucleotide sequences of the same length (136 nucleotides). These results underscore the promise of sparse and interpretable generative models, such as eaDCA, in enhancing our understanding of the expansive RNA sequence space.
Collapse
Affiliation(s)
- Francesco Calvanese
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Camille N Lambert
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Philippe Nghe
- Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, Paris, France
| | - Francesco Zamponi
- Dipartimento di Fisica, Sapienza Università di Roma, Rome, Italy
- Laboratoire de Physique de l’Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative – LCQB, Paris, France
| |
Collapse
|
6
|
von Löhneysen S, Spicher T, Varenyk Y, Yao HT, Lorenz R, Hofacker I, Stadler PF. Phylogenetic and Chemical Probing Information as Soft Constraints in RNA Secondary Structure Prediction. J Comput Biol 2024; 31:549-563. [PMID: 38935442 DOI: 10.1089/cmb.2024.0519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.
Collapse
Affiliation(s)
- Sarah von Löhneysen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
| | - Thomas Spicher
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- UniVie Doctoral School Computer Science (DoCS), University of Vienna, Vienna, Austria
| | - Yuliia Varenyk
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- Vienna BioCenter PhD Program, Doctoral School of the University of Vienna and Medical, University of Vienna, Vienna, Austria
| | - Hua-Ting Yao
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Ronny Lorenz
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Ivo Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
- Facultad de Ciencias, Universidad Nacional de Colombia, Bogotá, Colombia
- Santa Fe Institute, Santa Fe, New Mexico, USA
| |
Collapse
|
7
|
Kovachka S, Tong Y, Childs-Disney JL, Disney MD. Heterobifunctional small molecules to modulate RNA function. Trends Pharmacol Sci 2024; 45:449-463. [PMID: 38641489 DOI: 10.1016/j.tips.2024.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Revised: 03/27/2024] [Accepted: 03/27/2024] [Indexed: 04/21/2024]
Abstract
RNA has diverse cellular functionality, including regulating gene expression, protein translation, and cellular response to stimuli, due to its intricate structures. Over the past decade, small molecules have been discovered that target functional structures within cellular RNAs and modulate their function. Simple binding, however, is often insufficient, resulting in low or even no biological activity. To overcome this challenge, heterobifunctional compounds have been developed that can covalently bind to the RNA target, alter RNA sequence, or induce its cleavage. Herein, we review the recent progress in the field of RNA-targeted heterobifunctional compounds using representative case studies. We identify critical gaps and limitations and propose a strategic pathway for future developments of RNA-targeted molecules with augmented functionalities.
Collapse
Affiliation(s)
- Sandra Kovachka
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology, 130 Scripps Way, Jupiter, FL 33458, USA
| | - Yuquan Tong
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology, 130 Scripps Way, Jupiter, FL 33458, USA; The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458, USA
| | - Jessica L Childs-Disney
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology, 130 Scripps Way, Jupiter, FL 33458, USA
| | - Matthew D Disney
- Department of Chemistry, The Herbert Wertheim UF Scripps Institute for Biomedical Innovation and Technology, 130 Scripps Way, Jupiter, FL 33458, USA; The Scripps Research Institute, 130 Scripps Way, Jupiter, FL 33458, USA.
| |
Collapse
|
8
|
Sumi S, Hamada M, Saito H. Deep generative design of RNA family sequences. Nat Methods 2024; 21:435-443. [PMID: 38238559 DOI: 10.1038/s41592-023-02148-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Accepted: 12/07/2023] [Indexed: 03/13/2024]
Abstract
RNA engineering has immense potential to drive innovation in biotechnology and medicine. Despite its importance, a versatile platform for the automated design of functional RNA is still lacking. Here, we propose RNA family sequence generator (RfamGen), a deep generative model that designs RNA family sequences in a data-efficient manner by explicitly incorporating alignment and consensus secondary structure information. RfamGen can generate novel and functional RNA family sequences by sampling points from a semantically rich and continuous representation. We have experimentally demonstrated the versatility of RfamGen using diverse RNA families. Furthermore, we confirmed the high success rate of RfamGen in designing functional ribozymes through a quantitative massively parallel assay. Notably, RfamGen successfully generates artificial sequences with higher activity than natural sequences. Overall, RfamGen significantly improves our ability to design functional RNA and opens up new potential for generative RNA engineering in synthetic biology.
Collapse
Affiliation(s)
- Shunsuke Sumi
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan.
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan.
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan.
| | - Hirohide Saito
- Department of Life Science Frontiers, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan.
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| |
Collapse
|
9
|
Soares LW, King CG, Fernando CM, Roth A, Breaker RR. Genetic disruption of the bacterial raiA motif noncoding RNA causes defects in sporulation and aggregation. Proc Natl Acad Sci U S A 2024; 121:e2318008121. [PMID: 38306478 PMCID: PMC10861870 DOI: 10.1073/pnas.2318008121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Accepted: 12/02/2023] [Indexed: 02/04/2024] Open
Abstract
Several structured noncoding RNAs in bacteria are essential contributors to fundamental cellular processes. Thus, discoveries of additional ncRNA classes provide opportunities to uncover and explore biochemical mechanisms relevant to other major and potentially ancient processes. A candidate structured ncRNA named the "raiA motif" has been found via bioinformatic analyses in over 2,500 bacterial species. The gene coding for the RNA typically resides between the raiA and comFC genes of many species of Bacillota and Actinomycetota. Structural probing of the raiA motif RNA from the Gram-positive anaerobe Clostridium acetobutylicum confirms key features of its sophisticated secondary structure model. Expression analysis of raiA motif RNA reveals that the RNA is constitutively produced but reaches peak abundance during the transition from exponential growth to stationary phase. The raiA motif RNA becomes the fourth most abundant RNA in C. acetobutylicum, excluding ribosomal RNAs and transfer RNAs. Genetic disruption of the raiA motif RNA causes cells to exhibit substantially decreased spore formation and diminished ability to aggregate. Restoration of normal cellular function in this knock-out strain is achieved by expression of a raiA motif gene from a plasmid. These results demonstrate that raiA motif RNAs normally participate in major cell differentiation processes by operating as a trans-acting factor.
Collapse
Affiliation(s)
- Lucas W. Soares
- Department of Microbial Pathogenesis, Yale University, New Haven, CT06536
| | - Christopher G. King
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT06511-8103
| | - Chrishan M. Fernando
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT06511-8103
| | - Adam Roth
- HHMI, Yale University, New Haven, CT06511-8103
| | - Ronald R. Breaker
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT06511-8103
- HHMI, Yale University, New Haven, CT06511-8103
- Department of Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT06511-8103
| |
Collapse
|
10
|
Gupta S, Pal D. Utilizing RNA-seq Data to Infer Bacterial Transcription Termination Sites and Validate Predictions. Methods Mol Biol 2024; 2812:345-365. [PMID: 39068372 DOI: 10.1007/978-1-0716-3886-6_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
The transcription termination process is an important part of the gene expression process in the cell. It has been studied extensively, but many aspects of the mechanism are not well understood. The widespread availability of experimental RNA-seq data from high-throughput experiments provides a unique opportunity to infer the end of the transcription units genome wide. This data is available for both Rho-dependent and Rho-independent termination pathways that drive transcription termination in bacteria. Our book chapter gives an overview of the current knowledge of Rho-independent transcription termination mechanisms and the prediction approaches currently deployed to infer the termination sites. Thereafter, we describe our method that uses cluster hairpins to detect Rho-independent transcription termination sites. These clusters are a group of hairpins that lies at <15 bp from each other and are together capable of enforcing the termination process. The idea of a group of hairpins being extensively used for transcription termination is new, and results show that at least 52% of the total cases are of this type, while in the remaining cases, a single strong hairpin is capable of driving transcription termination. The reads derived from the RNA-seq data for corresponding bacteria have been used to validate the predicted sites. The predictions that match these RNA-seq derived sites have higher confidence, and we find almost 98% of the predicted sites, including alternate termination sites, to match the RNA-seq data. We discuss the features of predicted hairpins in detail for a better understanding of the Rho-independent transcription termination mechanism in bacteria. We also explain how users can use the tools developed by us to do transcription terminator predictions and design their experiments through genome-level visualization of the transcription termination sites from the precomputed INTERPIN database.
Collapse
Affiliation(s)
- Swati Gupta
- Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, Karnataka, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bengaluru, Karnataka, India.
| |
Collapse
|
11
|
Fiedler L, Middendorf M, Bernt M. Fully automated annotation of mitochondrial genomes using a cluster-based approach with de Bruijn graphs. Front Genet 2023; 14:1250907. [PMID: 37636259 PMCID: PMC10448254 DOI: 10.3389/fgene.2023.1250907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 07/24/2023] [Indexed: 08/29/2023] Open
Abstract
A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI (De Bruijn graph Gene Cluster Identification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems.
Collapse
Affiliation(s)
- Lisa Fiedler
- Department of Computer Science, Leipzig University, Leipzig, Germany
| | - Martin Middendorf
- Department of Computer Science, Leipzig University, Leipzig, Germany
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research—UFZ, Leipzig, Germany
| |
Collapse
|
12
|
Umu SU, Paynter VM, Trondsen H, Buschmann T, Rounge TB, Peterson KJ, Fromm B. Accurate microRNA annotation of animal genomes using trained covariance models of curated microRNA complements in MirMachine. CELL GENOMICS 2023; 3:100348. [PMID: 37601971 PMCID: PMC10435380 DOI: 10.1016/j.xgen.2023.100348] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 03/15/2023] [Accepted: 05/26/2023] [Indexed: 08/22/2023]
Abstract
The annotation of microRNAs depends on the availability of transcriptomics data and expert knowledge. This has led to a gap between the availability of novel genomes and high-quality microRNA complements. Using >16,000 microRNAs from the manually curated microRNA gene database MirGeneDB, we generated trained covariance models for all conserved microRNA families. These models are available in our tool MirMachine, which annotates conserved microRNAs within genomes. We successfully applied MirMachine to a range of animal species, including those with large genomes and genome duplications and extinct species, where small RNA sequencing is hard to achieve. We further describe a microRNA score of expected microRNAs that can be used to assess the completeness of genome assemblies. MirMachine closes a long-persisting gap in the microRNA field by facilitating automated genome annotation pipelines and deeper studies into the evolution of genome regulation, even in extinct organisms.
Collapse
Affiliation(s)
- Sinan Uğur Umu
- Department of Pathology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | - Vanessa M. Paynter
- The Arctic University Museum of Norway, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Håvard Trondsen
- Department of Pathology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | | | - Trine B. Rounge
- Department of Research, Cancer Registry of Norway, Oslo, Norway
- Centre for Bioinformatics, Department of Pharmacy, University of Oslo, Oslo, Norway
| | - Kevin J. Peterson
- Department of Biological Sciences, Dartmouth College, Hanover, NH, USA
| | - Bastian Fromm
- The Arctic University Museum of Norway, UiT - The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
13
|
Dupont MJ, Major F. D-ORB: A Web Server to Extract Structural Features of Related But Unaligned RNA Sequences. J Mol Biol 2023; 435:168181. [PMID: 37468182 DOI: 10.1016/j.jmb.2023.168181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 06/02/2023] [Accepted: 06/06/2023] [Indexed: 07/21/2023]
Abstract
Identifying the common structural elements of functionally related RNA sequences (family) is usually based on an alignment of the sequences, which is often subject to human bias and may not be accurate. The resulting covariance model (CM) provides probabilities for each base to covary with another, which allows to support evolutionarily the formation of double helical regions and possibly pseudoknots. The coexistence of alternative folds in RNA, resulting from its dynamic nature, may lead to the potential omission of motifs by CM. To overcome this limitation, we present D-ORB, a system of algorithms that identifies overrepresented motifs in the secondary conformational landscapes of a family when compared to those of unrelated sequences. The algorithms are bundled into an easy-to-use website allowing users to submit a family, and optionally provide unrelated sequences. D-ORB produces a non-pseudoknotted secondary structure based on the overrepresented motifs, a deep neural network classifier and two decision trees. When used to model an Rfam family, D-ORB fits overrepresented motifs in the corresponding Rfam structure; more than a hundred Rfam families have been modeled. The statistical approach behind D-ORB derives the structural composition of an RNA family, making it a valuable tool for analyzing and modeling it. Its easy-to-use interface and advanced algorithms make it an essential resource for researchers studying RNA structure. D-ORB is available at https://d-orb.major.iric.ca/.
Collapse
Affiliation(s)
- Mathieu J Dupont
- Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada
| | - François Major
- Department of Computer Science and Operations Research, and the Institute for Research in Immunology and Cancer, Université de Montréal, Montreal, Quebec H3C 3J7, Canada. https://twitter.com/francois_major
| |
Collapse
|
14
|
Langschied F, Leisegang MS, Brandes RP, Ebersberger I. ncOrtho: efficient and reliable identification of miRNA orthologs. Nucleic Acids Res 2023; 51:e71. [PMID: 37260093 PMCID: PMC10359484 DOI: 10.1093/nar/gkad467] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 05/04/2023] [Accepted: 05/30/2023] [Indexed: 06/02/2023] Open
Abstract
MicroRNAs (miRNAs) are post-transcriptional regulators that finetune gene expression via translational repression or degradation of their target mRNAs. Despite their functional relevance, frameworks for the scalable and accurate detection of miRNA orthologs are missing. Consequently, there is still no comprehensive picture of how miRNAs and their associated regulatory networks have evolved. Here we present ncOrtho, a synteny informed pipeline for the targeted search of miRNA orthologs in unannotated genome sequences. ncOrtho matches miRNA annotations from multi-tissue transcriptomes in precision, while scaling to the analysis of hundreds of custom-selected species. The presence-absence pattern of orthologs to 266 human miRNA families across 402 vertebrate species reveals four bursts of miRNA acquisition, of which the most recent event occurred in the last common ancestor of higher primates. miRNA families are rarely modified or lost, but notable exceptions for both events exist. miRNA co-ortholog numbers faithfully indicate lineage-specific whole genome duplications, and miRNAs are powerful markers for phylogenomic analyses. Their exceptionally low genetic diversity makes them suitable to resolve clades where the phylogenetic signal is blurred by incomplete lineage sorting of ancestral alleles. In summary, ncOrtho allows to routinely consider miRNAs in evolutionary analyses that were thus far reserved to protein-coding genes.
Collapse
Affiliation(s)
- Felix Langschied
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
| | - Matthias S Leisegang
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ralf P Brandes
- Institute for Cardiovascular Physiology, Goethe University, Frankfurt, Germany
- German Center of Cardiovascular Research (DZHK), Partner site RheinMain, Frankfurt, Germany
| | - Ingo Ebersberger
- Applied Bioinformatics Group, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany
- Senckenberg Biodiversity and Climate Research Centre (S-BIK-F), Frankfurt am Main, Germany
- LOEWE Centre for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| |
Collapse
|
15
|
Sato K, Hamada M. Recent trends in RNA informatics: a review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief Bioinform 2023; 24:bbad186. [PMID: 37232359 PMCID: PMC10359090 DOI: 10.1093/bib/bbad186] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 04/24/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023] Open
Abstract
Computational analysis of RNA sequences constitutes a crucial step in the field of RNA biology. As in other domains of the life sciences, the incorporation of artificial intelligence and machine learning techniques into RNA sequence analysis has gained significant traction in recent years. Historically, thermodynamics-based methods were widely employed for the prediction of RNA secondary structures; however, machine learning-based approaches have demonstrated remarkable advancements in recent years, enabling more accurate predictions. Consequently, the precision of sequence analysis pertaining to RNA secondary structures, such as RNA-protein interactions, has also been enhanced, making a substantial contribution to the field of RNA biology. Additionally, artificial intelligence and machine learning are also introducing technical innovations in the analysis of RNA-small molecule interactions for RNA-targeted drug discovery and in the design of RNA aptamers, where RNA serves as its own ligand. This review will highlight recent trends in the prediction of RNA secondary structure, RNA aptamers and RNA drug discovery using machine learning, deep learning and related technologies, and will also discuss potential future avenues in the field of RNA informatics.
Collapse
Affiliation(s)
- Kengo Sato
- School of System Design and Technology, Tokyo Denki University, 5 Senju Asahi-cho, Adachi-ku, Tokyo 120-8551, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL) , National Institute of Advanced Industrial Science and Technology (AIST), 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo 113-8602, Japan
| |
Collapse
|
16
|
Gao W, Yang A, Rivas E. Thirteen dubious ways to detect conserved structural RNAs. IUBMB Life 2023; 75:471-492. [PMID: 36495545 PMCID: PMC11234323 DOI: 10.1002/iub.2694] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Accepted: 10/24/2022] [Indexed: 12/14/2022]
Abstract
Covariation induced by compensatory base substitutions in RNA alignments is a great way to deduce conserved RNA structure, in principle. In practice, success depends on many factors, importantly the quality and depth of the alignment and the choice of covariation statistic. Measuring covariation between pairs of aligned positions is easy. However, using covariation to infer evolutionarily conserved RNA structure is complicated by other extraneous sources of covariation such as that resulting from homologous sequences having evolved from a common ancestor. In order to provide evidence of evolutionarily conserved RNA structure, a method to distinguish covariation due to sources other than RNA structure is necessary. Moreover, there are several sorts of artifactually generated covariation signals that can further confound the analysis. Additionally, some covariation signal is difficult to detect due to incomplete comparative data. Here, we investigate and critically discuss the practice of inferring conserved RNA structure by comparative sequence analysis. We provide new methods on how to approach and decide which of the numerous long non-coding RNAs (lncRNAs) have biologically relevant structures.
Collapse
Affiliation(s)
- William Gao
- Department of Genetics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ann Yang
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
17
|
Gomes RMODS, da Silva KJG, Ferreira LC, Arantes TD, Theodoro RC. Distribution and Polymorphisms of Group I Introns in Mitochondrial Genes from Cryptococcus neoformans and Cryptococcus gattii. J Fungi (Basel) 2023; 9:629. [PMID: 37367565 DOI: 10.3390/jof9060629] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 05/19/2023] [Accepted: 05/25/2023] [Indexed: 06/28/2023] Open
Abstract
The species complexes Cryptococcus neoformans and Cryptococcus gattii are the causative agents of cryptococcosis. Virulence and susceptibility to antifungals may vary within each species according to the fungal genotype. Therefore, specific and easily accessible molecular markers are required to distinguish cryptic species and/or genotypes. Group I introns are potential markers for this purpose because they are polymorphic concerning their presence and sequence. Therefore, in this study, we evaluated the presence of group I introns in the mitochondrial genes cob and cox1 in different Cryptococcus isolates. Additionally, the origin, distribution, and evolution of these introns were investigated by phylogenetic analyses, including previously sequenced introns for the mtLSU gene. Approximately 80.5% of the 36 sequenced introns presented homing endonucleases, and phylogenetic analyses revealed that introns occupying the same insertion site form monophyletic clades. This suggests that they likely share a common ancestor that invaded the site prior to species divergence. There was only one case of heterologous invasion, probably through horizontal transfer to C. decagattii (VGIV genotype) from another fungal species. Our results showed that the C. neoformans complex has fewer introns compared to C. gattii. Additionally, there is significant polymorphism in the presence and size of these elements, both among and within genotypes. As a result, it is impossible to differentiate the cryptic species using a single intron. However, it was possible to differentiate among genotypes within each species complex, by combining PCRs of mtLSU and cox1 introns, for C. neoformans species, and mtLSU and cob introns for C. gattii species.
Collapse
Affiliation(s)
| | | | - Leonardo Capistrano Ferreira
- Institute of Tropical Medicine, Universidade Federal do Rio Grande do Norte, Natal 59064-741, RN, Brazil
- Department of Biochemistry, Center of Bioscience, Universidade Federal do Rio Grande do Norte, Natal 59064-741, RN, Brazil
| | - Thales Domingos Arantes
- Institute of Tropical Pathology and Public Health, Universidade Federal de Goiás, Goiânia 74605-050, GO, Brazil
| | - Raquel Cordeiro Theodoro
- Institute of Tropical Medicine, Universidade Federal do Rio Grande do Norte, Natal 59064-741, RN, Brazil
- Department of Cell Biology and Genetics, Center of Bioscience, Universidade Federal do Rio Grande do Norte, Natal 59064-741, RN, Brazil
| |
Collapse
|
18
|
How does precursor RNA structure influence RNA processing and gene expression? Biosci Rep 2023; 43:232489. [PMID: 36689327 PMCID: PMC9977717 DOI: 10.1042/bsr20220149] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 01/17/2023] [Accepted: 01/23/2023] [Indexed: 01/24/2023] Open
Abstract
RNA is a fundamental biomolecule that has many purposes within cells. Due to its single-stranded and flexible nature, RNA naturally folds into complex and dynamic structures. Recent technological and computational advances have produced an explosion of RNA structural data. Many RNA structures have regulatory and functional properties. Studying the structure of nascent RNAs is particularly challenging due to their low abundance and long length, but their structures are important because they can influence RNA processing. Precursor RNA processing is a nexus of pathways that determines mature isoform composition and that controls gene expression. In this review, we examine what is known about human nascent RNA structure and the influence of RNA structure on processing of precursor RNAs. These known structures provide examples of how other nascent RNAs may be structured and show how novel RNA structures may influence RNA processing including splicing and polyadenylation. RNA structures can be targeted therapeutically to treat disease.
Collapse
|
19
|
RNA Secondary Structure Prediction Based on Energy Models. Methods Mol Biol 2023; 2586:89-105. [PMID: 36705900 DOI: 10.1007/978-1-0716-2768-6_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
This chapter introduces the RNA secondary structure prediction based on the nearest neighbor energy model, which is one of the most popular architectures of modeling RNA secondary structure without pseudoknots. We discuss the parameterization and the parameter determination by experimental and machine learning-based approaches as well as an integrated approach that compensates each other's shortcomings. Then, folding algorithms for the minimum free energy and the maximum expected accuracy using the dynamic programming technique are introduced. Finally, we compare the prediction accuracy of the method described so far with benchmark datasets.
Collapse
|
20
|
Bumunang EW, McAllister TA, Polo RO, Ateba CN, Stanford K, Schlechte J, Walker M, MacLean K, Niu YD. Genomic Profiling of Non-O157 Shiga Toxigenic Escherichia coli-Infecting Bacteriophages from South Africa. PHAGE (NEW ROCHELLE, N.Y.) 2022; 3:221-230. [PMID: 36793886 PMCID: PMC9917312 DOI: 10.1089/phage.2022.0003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Background Non-O157 Shiga toxigenic Escherichia coli (STEC) are one of the most important food and waterborne pathogens worldwide. Although bacteriophages (phages) have been used for the biocontrol of these pathogens, a comprehensive understanding of the genetic characteristics and lifestyle of potentially effective candidate phages is lacking. Materials and Methods In this study, 10 non-O157-infecting phages previously isolated from feedlot cattle and dairy farms in the North-West province of South Africa were sequenced, and their genomes were analyzed. Results Comparative genomics and proteomics revealed that the phages were closely related to other E. coli-infecting Tunaviruses, Seuratviruses, Carltongylesviruses, Tequatroviruses, and Mosigviruses from the National Center for Biotechnology Information GenBank database. Phages lacked integrases associated with a lysogenic cycle and genes associated with antibiotic resistance and Shiga toxins. Conclusions Comparative genomic analysis identified a diversity of unique non-O157-infecting phages, which could be used to mitigate the abundance of various non-O157 STEC serogroups without safety concerns.
Collapse
Affiliation(s)
- Emmanuel W. Bumunang
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, Canada
| | - Tim A. McAllister
- Agriculture and Agri-Food Canada, Lethbridge Research and Development Centre, Lethbridge, Canada
| | - Rodrigo Ortega Polo
- Agriculture and Agri-Food Canada, Lethbridge Research and Development Centre, Lethbridge, Canada
| | - Collins N. Ateba
- Department of Microbiology, Faculty of Natural and Agricultural Sciences, North-West University, Mmabatho, South Africa
| | - Kim Stanford
- Department of Biological Sciences, University of Lethbridge, Lethbridge, Canada
| | - Jared Schlechte
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, Canada
| | - Matthew Walker
- Canadian Science Centre for Human and Animal Health, Public Health Agency of Canada, Winnipeg, Canada
| | - Kellie MacLean
- Cumming School of Medicine, Faculty of Science, University of Calgary, Calgary, Canada
| | - Yan D. Niu
- Department of Ecosystem and Public Health, Faculty of Veterinary Medicine, University of Calgary, Calgary, Canada
| |
Collapse
|
21
|
Yazaki E, Yabuki A, Nishimura Y, Shiratori T, Hashimoto T, Inagaki Y. Microheliella maris possesses the most gene-rich mitochondrial genome in Diaphoretickes. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.1030570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The mitochondrial genomes are very diverse, but their evolutionary history is unclear due to the lack of efforts to sequence those of protists (unicellular eukaryotes), which cover a major part of the eukaryotic tree. Cryptista comprises cryptophytes, goniomonads, kathablepharids, and Palpitomonas bilix, and their mitochondrial genomes (mt-genomes) are characterized by various gene contents, particularly the presence/absence of an ancestral (bacterial) system for the cytochrome c maturation system. To shed light on mt-genome evolution in Cryptista, we report the complete mt-genome of Microheliella maris, which was recently revealed to branch at the root of Cryptista. The M. maris mt-genome was reconstructed as a circular mapping chromosome of 61.2 kbp with a pair of inverted repeats (12.9 kbp) and appeared to be the most gene-rich among the mt-genomes of the members of Diaphoretickes (a mega-scale eukaryotic assembly including Archaeplastida, Cryptista, Haptista, and SAR) studied so far, carrying 53 protein-coding genes. With this newly sequenced mt-genome, we inferred and discussed the evolution of the mt-genome in Cryptista and Diaphoretickes.
Collapse
|
22
|
Childs-Disney JL, Yang X, Gibaut QMR, Tong Y, Batey RT, Disney MD. Targeting RNA structures with small molecules. Nat Rev Drug Discov 2022; 21:736-762. [PMID: 35941229 PMCID: PMC9360655 DOI: 10.1038/s41573-022-00521-4] [Citation(s) in RCA: 200] [Impact Index Per Article: 100.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/17/2022] [Indexed: 01/07/2023]
Abstract
RNA adopts 3D structures that confer varied functional roles in human biology and dysfunction in disease. Approaches to therapeutically target RNA structures with small molecules are being actively pursued, aided by key advances in the field including the development of computational tools that predict evolutionarily conserved RNA structures, as well as strategies that expand mode of action and facilitate interactions with cellular machinery. Existing RNA-targeted small molecules use a range of mechanisms including directing splicing - by acting as molecular glues with cellular proteins (such as branaplam and the FDA-approved risdiplam), inhibition of translation of undruggable proteins and deactivation of functional structures in noncoding RNAs. Here, we describe strategies to identify, validate and optimize small molecules that target the functional transcriptome, laying out a roadmap to advance these agents into the next decade.
Collapse
Affiliation(s)
| | - Xueyi Yang
- Department of Chemistry, Scripps Research, Jupiter, FL, USA
| | | | - Yuquan Tong
- Department of Chemistry, Scripps Research, Jupiter, FL, USA
| | - Robert T Batey
- Department of Biochemistry, University of Colorado, Boulder, CO, USA.
| | | |
Collapse
|
23
|
Kutschera LS, Wolfinger MT. Evolutionary traits of Tick-borne encephalitis virus: Pervasive non-coding RNA structure conservation and molecular epidemiology. Virus Evol 2022; 8:veac051. [PMID: 35822110 PMCID: PMC9272599 DOI: 10.1093/ve/veac051] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 04/14/2022] [Accepted: 06/09/2022] [Indexed: 12/17/2022] Open
Abstract
Tick-borne encephalitis virus (TBEV) is the aetiological agent of tick-borne
encephalitis, an infectious disease of the central nervous system that is often associated
with severe sequelae in humans. While TBEV is typically classified into three subtypes,
recent evidence suggests a more varied range of TBEV subtypes and lineages that differ
substantially in the architecture of their 3ʹ untranslated region (3ʹUTR). Building on
comparative genomic approaches and thermodynamic modelling, we characterize the TBEV UTR
structureome diversity and propose a unified picture of pervasive non-coding RNA structure
conservation. Moreover, we provide an updated phylogeny of TBEV, building on more than 220
publicly available complete genomes, and investigate the molecular epidemiology and
phylodynamics with Nextstrain, a web-based visualization framework for real-time pathogen
evolution.
Collapse
Affiliation(s)
- Lena S Kutschera
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna 1090, Austria
| | - Michael T Wolfinger
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna 1090, Austria
| |
Collapse
|
24
|
Ross CJ, Ulitsky I. Discovering functional motifs in long noncoding RNAs. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1708. [PMID: 34981665 DOI: 10.1002/wrna.1708] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/19/2021] [Accepted: 12/04/2021] [Indexed: 12/27/2022]
Abstract
Long noncoding RNAs (lncRNAs) are products of pervasive transcription that closely resemble messenger RNAs on the molecular level, yet function through largely unknown modes of action. The current model is that the function of lncRNAs often relies on specific, typically short, conserved elements, connected by linkers in which specific sequences and/or structures are less important. This notion has fueled the development of both computational and experimental methods focused on the discovery of functional elements within lncRNA genes, based on diverse signals such as evolutionary conservation, predicted structural elements, or the ability to rescue loss-of-function phenotypes. In this review, we outline the main challenges that the different methods need to overcome, describe the recently developed approaches, and discuss their respective limitations. This article is categorized under: RNA Evolution and Genomics > Computational Analyses of RNA RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Regulatory RNAs.
Collapse
Affiliation(s)
- Caroline Jane Ross
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| | - Igor Ulitsky
- Biological Regulation and Molecular Neuroscience, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
25
|
Li Y, Baptista RP, Mei X, Kissinger JC. Small and intermediate size structural RNAs in the unicellular parasite Cryptosporidium parvum as revealed by sRNA-seq and comparative genomics. Microb Genom 2022; 8. [PMID: 35536609 PMCID: PMC9465071 DOI: 10.1099/mgen.0.000821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Small and intermediate-size noncoding RNAs (sRNAs and is-ncRNAs) have been shown to play important regulatory roles in the development of several eukaryotic organisms. However, they have not been thoroughly explored in Cryptosporidium parvum, an obligate zoonotic protist parasite responsible for the diarrhoeal disease cryptosporidiosis. Using Illumina sequencing of a small RNA library, a systematic identification of novel small and is-ncRNAs was performed in C. parvum excysted sporozoites. A total of 79 novel is-ncRNA candidates, including antisense, intergenic and intronic is-ncRNAs, were identified, including 7 new small nucleolar RNAs (snoRNAs). Expression of select novel is-ncRNAs was confirmed by RT-PCR. Phylogenetic conservation was analysed using covariance models (CMs) in related Cryptosporidium and apicomplexan parasite genome sequences. A potential new type of small ncRNA derived from tRNA fragments was observed. Overall, a deep profiling analysis of novel is-ncRNAs in C. parvum and related species revealed structural features and conservation of these novel is-ncRNAs. Covariance models can be used to detect is-ncRNA genes in other closely related parasites. These findings provide important new sequences for additional functional characterization of novel is-ncRNAs in the protist pathogen C. parvum.
Collapse
Affiliation(s)
- Yiran Li
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Rodrigo P Baptista
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA.,Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA, USA.,Present address: Houston Methodist Research Institute, Houston, TX, USA
| | - Xiaohan Mei
- Department of Physiology and Pharmacology, University of Georgia, Athens, GA, USA
| | - Jessica C Kissinger
- Institute of Bioinformatics, University of Georgia, Athens, GA, USA.,Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA, USA.,Department of Genetics, University of Georgia, Athens, GA, USA
| |
Collapse
|
26
|
Morandi E, van Hemert MJ, Incarnato D. SHAPE-guided RNA structure homology search and motif discovery. Nat Commun 2022; 13:1722. [PMID: 35361788 PMCID: PMC8971488 DOI: 10.1038/s41467-022-29398-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 03/11/2022] [Indexed: 01/13/2023] Open
Abstract
The rapidly growing popularity of RNA structure probing methods is leading to increasingly large amounts of available RNA structure information. This demands the development of efficient tools for the identification of RNAs sharing regions of structural similarity by direct comparison of their reactivity profiles, hence enabling the discovery of conserved structural features. We here introduce SHAPEwarp, a largely sequence-agnostic SHAPE-guided algorithm for the identification of structurally-similar regions in RNA molecules. Analysis of Dengue, Zika and coronavirus genomes recapitulates known regulatory RNA structures and identifies novel highly-conserved structural elements. This work represents a preliminary step towards the model-free search and identification of shared and conserved RNA structural features within transcriptomes.
Collapse
Affiliation(s)
- Edoardo Morandi
- Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Groningen, The Netherlands
| | - Martijn J van Hemert
- Department of Medical Microbiology, Molecular Virology Laboratory, Leiden University Medical Center, Leiden, The Netherlands
| | - Danny Incarnato
- Department of Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute (GBB), University of Groningen, Groningen, The Netherlands.
| |
Collapse
|
27
|
Shi H, Jing X. Efficient Generation of RNA Secondary Structure Prediction Algorithm Under PAR Framework. FRONTIERS IN PLANT SCIENCE 2022; 12:830042. [PMID: 35126440 PMCID: PMC8813866 DOI: 10.3389/fpls.2021.830042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 12/16/2021] [Indexed: 06/14/2023]
Abstract
Prediction of RNA secondary structure is an important part of bioinformatics genomics research. Mastering RNA secondary structure can help us to better analyze protein synthesis, cell differentiation, metabolism, and genetic processes and thus reveal the genetic laws of organisms. Comparative sequence analysis, support vector machine, centroid method, and other algorithms in RNA secondary structure prediction algorithms often use dynamic programming algorithm to predict RNA secondary structure because of their huge time and space consumption and complex data structure. In this article, the domain of RNA secondary structure prediction algorithm based on dynamic programming (DP-SSP) is analyzed in depth, and the domain features are modeled. According to the generative programming method, the DP-SSP algorithm components are interactively designed. With the support of PAR platform, the DP-SSP algorithm component library is formally realized. Finally, the concrete algorithm is generated through component assembly, which improves the efficiency and reliability of algorithm development.
Collapse
Affiliation(s)
- Haihe Shi
- School of Computer and Information Engineering, Jiangxi Normal University, Nanchang, China
| | | |
Collapse
|
28
|
Hita A, Brocart G, Fernandez A, Rehmsmeier M, Alemany A, Schvartzman S. MGcount: a total RNA-seq quantification tool to address multi-mapping and multi-overlapping alignments ambiguity in non-coding transcripts. BMC Bioinformatics 2022; 23:39. [PMID: 35030988 PMCID: PMC8760670 DOI: 10.1186/s12859-021-04544-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 12/20/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. RESULTS Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. CONCLUSIONS MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount .
Collapse
Affiliation(s)
- Andrea Hita
- Epigenetics unit, Diagenode s.a., Liège, Belgium
- Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | | | - Ana Fernandez
- Epigenetics unit, Diagenode s.a., Liège, Belgium
- Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Marc Rehmsmeier
- Department of Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Anna Alemany
- Department of Anatomy and Embryology, Leiden University Medical Centre, Leiden, The Netherlands
| | | |
Collapse
|
29
|
Tagashira M, Asai K. ConsAlifold: considering RNA structural alignments improves prediction accuracy of RNA consensus secondary structures. Bioinformatics 2022; 38:710-719. [PMID: 34694364 DOI: 10.1093/bioinformatics/btab738] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 08/24/2021] [Accepted: 10/20/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION By detecting homology among RNAs, the probabilistic consideration of RNA structural alignments has improved the prediction accuracy of significant RNA prediction problems. Predicting an RNA consensus secondary structure from an RNA sequence alignment is a fundamental research objective because in the detection of conserved base-pairings among RNA homologs, predicting an RNA consensus secondary structure is more convenient than predicting an RNA structural alignment. RESULTS We developed and implemented ConsAlifold, a dynamic programming-based method that predicts the consensus secondary structure of an RNA sequence alignment. ConsAlifold considers RNA structural alignments. ConsAlifold achieves moderate running time and the best prediction accuracy of RNA consensus secondary structures among available prediction methods. AVAILABILITY AND IMPLEMENTATION ConsAlifold, data and Python scripts for generating both figures and tables are freely available at https://github.com/heartsh/consalifold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Masaki Tagashira
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, University of Tokyo, Chiba 277-8561, Japan.,Artificial Intelligence Research Center, AIST, Tokyo 135-0064, Japan
| |
Collapse
|
30
|
Seemann SE, Mirza AH, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Workman CT, Pociot F, Tommerup N, Gorodkin J, Ruzzo WL. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2452-2463. [PMID: 35188540 PMCID: PMC8934657 DOI: 10.1093/nar/gkac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.
Collapse
Affiliation(s)
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
| | | | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Cellular and Molecular Medicine (ICMM), University of Copenhagen, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Computer Science and Engineering and Genome Sciences, University of Washington, USA
- Fred Hutchinson Cancer Research Center, Seattle, USA
| |
Collapse
|
31
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-time global prediction of conserved structures for RNA homologs with applications to SARS-CoV-2. Proc Natl Acad Sci U S A 2021; 118:e2116269118. [PMID: 34887342 PMCID: PMC8719904 DOI: 10.1073/pnas.2116269118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/05/2021] [Indexed: 12/26/2022] Open
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single-sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurboFold's purely in silico prediction not only is close to experimentally guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' untranslated regions (UTRs) (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies undiscovered conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, small interfering RNAs (siRNAs), CRISPR-Cas13 guide RNAs, and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies and will be a useful tool in fighting the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - He Zhang
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
- Baidu Research, Sunnyvale, CA 94089
| | - Kaibo Liu
- Baidu Research, Sunnyvale, CA 94089
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331
| | | | - David H Mathews
- Department of Biochemistry & Biophysics, University of Rochester Medical Center, Rochester, NY 14642;
- Center for RNA Biology, University of Rochester Medical Center, Rochester, NY 14642
- Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY 14642
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR 97331;
- Baidu Research, Sunnyvale, CA 94089
| |
Collapse
|
32
|
Li S, Zhang H, Zhang L, Liu K, Liu B, Mathews DH, Huang L. LinearTurboFold: Linear-Time Global Prediction of Conserved Structures for RNA Homologs with Applications to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2020.11.23.393488. [PMID: 34816262 PMCID: PMC8609897 DOI: 10.1101/2020.11.23.393488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The constant emergence of COVID-19 variants reduces the effectiveness of existing vaccines and test kits. Therefore, it is critical to identify conserved structures in SARS-CoV-2 genomes as potential targets for variant-proof diagnostics and therapeutics. However, the algorithms to predict these conserved structures, which simultaneously fold and align multiple RNA homologs, scale at best cubically with sequence length, and are thus infeasible for coronaviruses, which possess the longest genomes (∼30,000 nt ) among RNA viruses. As a result, existing efforts on modeling SARS-CoV-2 structures resort to single sequence folding as well as local folding methods with short window sizes, which inevitably neglect long-range interactions that are crucial in RNA functions. Here we present LinearTurboFold, an efficient algorithm for folding RNA homologs that scales linearly with sequence length, enabling unprecedented global structural analysis on SARS-CoV-2. Surprisingly, on a group of SARS-CoV-2 and SARS-related genomes, LinearTurbo-Fold's purely in silico prediction not only is close to experimentally-guided models for local structures, but also goes far beyond them by capturing the end-to-end pairs between 5' and 3' UTRs (∼29,800 nt apart) that match perfectly with a purely experimental work. Furthermore, LinearTurboFold identifies novel conserved structures and conserved accessible regions as potential targets for designing efficient and mutation-insensitive small-molecule drugs, antisense oligonucleotides, siRNAs, CRISPR-Cas13 guide RNAs and RT-PCR primers. LinearTurboFold is a general technique that can also be applied to other RNA viruses and full-length genome studies, and will be a useful tool in fighting the current and future pandemics. SIGNIFICANCE STATEMENT Conserved RNA structures are critical for designing diagnostic and therapeutic tools for many diseases including COVID-19. However, existing algorithms are much too slow to model the global structures of full-length RNA viral genomes. We present LinearTurboFold, a linear-time algorithm that is orders of magnitude faster, making it the first method to simultaneously fold and align whole genomes of SARS-CoV-2 variants, the longest known RNA virus (∼30 kilobases). Our work enables unprecedented global structural analysis and captures long-range interactions that are out of reach for existing algorithms but crucial for RNA functions. LinearTurboFold is a general technique for full-length genome studies and can help fight the current and future pandemics.
Collapse
Affiliation(s)
- Sizhen Li
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
| | - He Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Liang Zhang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | - Kaibo Liu
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| | | | - David H. Mathews
- Department of Biochemistry & Biophysics, Center for RNA Biology, and Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY
| | - Liang Huang
- School of Electrical Engineering & Computer Science, Oregon State University, Corvallis, OR
- Baidu Research, Sunnyvale, CA
| |
Collapse
|
33
|
Chan PP, Lin BY, Mak AJ, Lowe TM. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res 2021; 49:9077-9096. [PMID: 34417604 PMCID: PMC8450103 DOI: 10.1093/nar/gkab688] [Citation(s) in RCA: 569] [Impact Index Per Article: 189.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/23/2021] [Accepted: 07/27/2021] [Indexed: 12/17/2022] Open
Abstract
tRNAscan-SE has been widely used for transfer RNA (tRNA) gene prediction for over twenty years, developed just as the first genomes were decoded. With the massive increase in quantity and phylogenetic diversity of genomes, the accurate detection and functional prediction of tRNAs has become more challenging. Utilizing a vastly larger training set, we created nearly one hundred specialized isotype- and clade-specific models, greatly improving tRNAscan-SE’s ability to identify and classify both typical and atypical tRNAs. We employ a new comparative multi-model strategy where predicted tRNAs are scored against a full set of isotype-specific covariance models, allowing functional prediction based on both the anticodon and the highest-scoring isotype model. Comparative model scoring has also enhanced the program's ability to detect tRNA-derived SINEs and other likely pseudogenes. For the first time, tRNAscan-SE also includes fast and highly accurate detection of mitochondrial tRNAs using newly developed models. Overall, tRNA detection sensitivity and specificity is improved for all isotypes, particularly those utilizing specialized models for selenocysteine and the three subtypes of tRNA genes encoding a CAU anticodon. These enhancements will provide researchers with more accurate and detailed tRNA annotation for a wider variety of tRNAs, and may direct attention to tRNAs with novel traits.
Collapse
Affiliation(s)
- Patricia P Chan
- Department of Biomolecular Engineering, Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Brian Y Lin
- Department of Biomolecular Engineering, Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Allysia J Mak
- Department of Biomolecular Engineering, Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA
| | - Todd M Lowe
- Department of Biomolecular Engineering, Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA
| |
Collapse
|
34
|
Radecki P, Uppuluri R, Aviran S. Rapid structure-function insights via hairpin-centric analysis of big RNA structure probing datasets. NAR Genom Bioinform 2021; 3:lqab073. [PMID: 34447931 PMCID: PMC8384053 DOI: 10.1093/nargab/lqab073] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Revised: 07/14/2021] [Accepted: 08/03/2021] [Indexed: 12/23/2022] Open
Abstract
The functions of RNA are often tied to its structure, hence analyzing structure is of significant interest when studying cellular processes. Recently, large-scale structure probing (SP) studies have enabled assessment of global structure-function relationships via standard data summarizations or local folding. Here, we approach structure quantification from a hairpin-centric perspective where putative hairpins are identified in SP datasets and used as a means to capture local structural effects. This has the advantage of rapid processing of big (e.g. transcriptome-wide) data as RNA folding is circumvented, yet it captures more information than simple data summarizations. We reformulate a statistical learning algorithm we previously developed to significantly improve precision of hairpin detection, then introduce a novel nucleotide-wise measure, termed the hairpin-derived structure level (HDSL), which captures local structuredness by accounting for the presence of likely hairpin elements. Applying HDSL to data from recent studies recapitulates, strengthens and expands on their findings which were obtained by more comprehensive folding algorithms, yet our analyses are orders of magnitude faster. These results demonstrate that hairpin detection is a promising avenue for global and rapid structure-function analysis, furthering our understanding of RNA biology and the principal features which drive biological insights from SP data.
Collapse
Affiliation(s)
- Pierce Radecki
- Biomedical Engineering Department and Genome Center, University of California at Davis, Davis, CA 95616, USA
| | - Rahul Uppuluri
- Biomedical Engineering Department and Genome Center, University of California at Davis, Davis, CA 95616, USA
| | - Sharon Aviran
- Biomedical Engineering Department and Genome Center, University of California at Davis, Davis, CA 95616, USA
| |
Collapse
|
35
|
Evolution and Phylogeny of MicroRNAs - Protocols, Pitfalls, and Problems. Methods Mol Biol 2021. [PMID: 34432281 DOI: 10.1007/978-1-0716-1170-8_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/17/2023]
Abstract
MicroRNAs are important regulators in many eukaryotic lineages. Typical miRNAs have a length of about 22nt and are processed from precursors that form a characteristic hairpin structure. Once they appear in a genome, miRNAs are among the best-conserved elements in both animal and plant genomes. Functionally, they play an important role in particular in development. In contrast to protein-coding genes, miRNAs frequently emerge de novo. The genomes of animals and plants harbor hundreds of mutually unrelated families of homologous miRNAs that tend to be persistent throughout evolution. The evolution of their genomic miRNA complement closely correlates with important morphological innovation. In addition, miRNAs have been used as valuable characters in phylogenetic studies. An accurate and comprehensive annotation of miRNAs is required as a basis to understand their impact on phenotypic evolution. Since experimental data on miRNA expression are limited to relatively few species and are subject to unavoidable ascertainment biases, it is inevitable to complement miRNA sequencing by homology based annotation methods. This chapter reviews the state of the art workflows for homology based miRNA annotation, with an emphasis on their limitations and open problems.
Collapse
|
36
|
High-throughput dissection of the thermodynamic and conformational properties of a ubiquitous class of RNA tertiary contact motifs. Proc Natl Acad Sci U S A 2021; 118:2109085118. [PMID: 34373334 DOI: 10.1073/pnas.2109085118] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Despite RNA's diverse secondary and tertiary structures and its complex conformational changes, nature utilizes a limited set of structural "motifs"-helices, junctions, and tertiary contact modules-to build diverse functional RNAs. Thus, in-depth descriptions of a relatively small universe of RNA motifs may lead to predictive models of RNA tertiary conformational landscapes. Motifs may have different properties depending on sequence and secondary structure, giving rise to subclasses that expand the universe of RNA building blocks. Yet we know very little about motif subclasses, given the challenges in mapping conformational properties in high throughput. Previously, we used "RNA on a massively parallel array" (RNA-MaP), a quantitative, high-throughput technique, to study thousands of helices and two-way junctions. Here, we adapt RNA-MaP to study the thermodynamic and conformational properties of tetraloop/tetraloop receptor (TL/TLR) tertiary contact motifs, analyzing 1,493 TLR sequences from different classes. Clustering analyses revealed variability in TL specificity, stability, and conformational behavior. Nevertheless, natural GAAA/11ntR TL/TLRs, while varying in tertiary stability by ∼2.5 kcal/mol, exhibited conserved TL specificity and conformational properties. Thus, RNAs may tune stability without altering the overall structure of these TL/TLRs. Furthermore, their stability correlated with natural frequency, suggesting thermodynamics as the dominant selection pressure. In contrast, other TL/TLRs displayed heterogenous conformational behavior and appear to not be under strong thermodynamic selection. Our results build toward a generalizable model of RNA-folding thermodynamics based on the properties of isolated motifs, and our characterized TL/TLR library can be used to engineer RNAs with predictable thermodynamic and conformational behavior.
Collapse
|
37
|
Schäffer AA, McVeigh R, Robbertse B, Schoch CL, Johnston A, Underwood BA, Karsch-Mizrachi I, Nawrocki EP. Ribovore: ribosomal RNA sequence analysis for GenBank submissions and database curation. BMC Bioinformatics 2021; 22:400. [PMID: 34384346 PMCID: PMC8359073 DOI: 10.1186/s12859-021-04316-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 08/03/2021] [Indexed: 02/01/2023] Open
Abstract
Background The DNA sequences encoding ribosomal RNA genes (rRNAs) are commonly used as markers to identify species, including in metagenomics samples that may combine many organismal communities. The 16S small subunit ribosomal RNA (SSU rRNA) gene is typically used to identify bacterial and archaeal species. The nuclear 18S SSU rRNA gene, and 28S large subunit (LSU) rRNA gene have been used as DNA barcodes and for phylogenetic studies in different eukaryote taxonomic groups. Because of their popularity, the National Center for Biotechnology Information (NCBI) receives a disproportionate number of rRNA sequence submissions and BLAST queries. These sequences vary in quality, length, origin (nuclear, mitochondria, plastid), and organism source and can represent any region of the ribosomal cistron. Results To improve the timely verification of quality, origin and loci boundaries, we developed Ribovore, a software package for sequence analysis of rRNA sequences. The ribotyper and ribosensor programs are used to validate incoming sequences of bacterial and archaeal SSU rRNA. The ribodbmaker program is used to create high-quality datasets of rRNAs from different taxonomic groups. Key algorithmic steps include comparing candidate sequences against rRNA sequence profile hidden Markov models (HMMs) and covariance models of rRNA sequence and secondary-structure conservation, as well as other tests. Nine freely available blastn rRNA databases created and maintained with Ribovore are used for checking incoming GenBank submissions and used by the blastn browser interface at NCBI. Since 2018, Ribovore has been used to analyze more than 50 million prokaryotic SSU rRNA sequences submitted to GenBank, and to select at least 10,435 fungal rRNA RefSeq records from type material of 8350 taxa. Conclusion Ribovore combines single-sequence and profile-based methods to improve GenBank processing and analysis of rRNA sequences. It is a standalone, portable, and extensible software package for the alignment, classification and validation of rRNA sequences. Researchers planning on submitting SSU rRNA sequences to GenBank are encouraged to download and use Ribovore to analyze their sequences prior to submission to determine which sequences are likely to be automatically accepted into GenBank.
Collapse
Affiliation(s)
- Alejandro A Schäffer
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA.,National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Richard McVeigh
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Barbara Robbertse
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Conrad L Schoch
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Anjanette Johnston
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Beverly A Underwood
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Ilene Karsch-Mizrachi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Eric P Nawrocki
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
38
|
Fajkus P, Kilar A, Nelson ADL, Holá M, Peška V, Goffová I, Fojtová M, Zachová D, Fulnečková J, Fajkus J. Evolution of plant telomerase RNAs: farther to the past, deeper to the roots. Nucleic Acids Res 2021; 49:7680-7694. [PMID: 34181710 PMCID: PMC8287931 DOI: 10.1093/nar/gkab545] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 06/01/2021] [Accepted: 06/10/2021] [Indexed: 01/10/2023] Open
Abstract
The enormous sequence heterogeneity of telomerase RNA (TR) subunits has thus far complicated their characterization in a wider phylogenetic range. Our recent finding that land plant TRs are, similarly to known ciliate TRs, transcribed by RNA polymerase III and under the control of the type-3 promoter, allowed us to design a novel strategy to characterize TRs in early diverging Viridiplantae taxa, as well as in ciliates and other Diaphoretickes lineages. Starting with the characterization of the upstream sequence element of the type 3 promoter that is conserved in a number of small nuclear RNAs, and the expected minimum TR template region as search features, we identified candidate TRs in selected Diaphoretickes genomes. Homologous TRs were then used to build covariance models to identify TRs in more distant species. Transcripts of the identified TRs were confirmed by transcriptomic data, RT-PCR and Northern hybridization. A templating role for one of our candidates was validated in Physcomitrium patens. Analysis of secondary structure demonstrated a deep conservation of motifs (pseudoknot and template boundary element) observed in all published TRs. These results elucidate the evolution of the earliest eukaryotic TRs, linking the common origin of TRs across Diaphoretickes, and underlying evolutionary transitions in telomere repeats.
Collapse
Affiliation(s)
- Petr Fajkus
- Department of Cell Biology and Radiobiology, Institute of Biophysics of the Czech Academy of Sciences, Brno CZ-61265, Czech Republic.,Mendel Centre for Plant Genomics and Proteomics, CEITEC Masaryk University, Brno CZ-62500, Czech Republic
| | - Agata Kilar
- Mendel Centre for Plant Genomics and Proteomics, CEITEC Masaryk University, Brno CZ-62500, Czech Republic.,Laboratory of Functional Genomics and Proteomics, NCBR, Faculty of Science, Masaryk University, Brno CZ-61137, Czech Republic
| | | | - Marcela Holá
- Institute of Experimental Botany of the Czech Academy of Sciences, Prague CZ-16000, Czech Republic
| | - Vratislav Peška
- Department of Cell Biology and Radiobiology, Institute of Biophysics of the Czech Academy of Sciences, Brno CZ-61265, Czech Republic
| | - Ivana Goffová
- Mendel Centre for Plant Genomics and Proteomics, CEITEC Masaryk University, Brno CZ-62500, Czech Republic.,Laboratory of Functional Genomics and Proteomics, NCBR, Faculty of Science, Masaryk University, Brno CZ-61137, Czech Republic
| | - Miloslava Fojtová
- Mendel Centre for Plant Genomics and Proteomics, CEITEC Masaryk University, Brno CZ-62500, Czech Republic.,Laboratory of Functional Genomics and Proteomics, NCBR, Faculty of Science, Masaryk University, Brno CZ-61137, Czech Republic
| | - Dagmar Zachová
- Mendel Centre for Plant Genomics and Proteomics, CEITEC Masaryk University, Brno CZ-62500, Czech Republic
| | - Jana Fulnečková
- Department of Cell Biology and Radiobiology, Institute of Biophysics of the Czech Academy of Sciences, Brno CZ-61265, Czech Republic
| | - Jiří Fajkus
- Department of Cell Biology and Radiobiology, Institute of Biophysics of the Czech Academy of Sciences, Brno CZ-61265, Czech Republic.,Mendel Centre for Plant Genomics and Proteomics, CEITEC Masaryk University, Brno CZ-62500, Czech Republic.,Laboratory of Functional Genomics and Proteomics, NCBR, Faculty of Science, Masaryk University, Brno CZ-61137, Czech Republic
| |
Collapse
|
39
|
Escamilla-Gutiérrez A, Ribas-Aparicio RM, Córdova-Espinoza MG, Castelán-Vega JA. In silico strategies for modeling RNA aptamers and predicting binding sites of their molecular targets. NUCLEOSIDES NUCLEOTIDES & NUCLEIC ACIDS 2021; 40:798-807. [PMID: 34323642 DOI: 10.1080/15257770.2021.1951754] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
RNA aptamers are single-stranded nucleic acids of 20-100 nucleotides, with high sensitivity and specificity against particular molecular targets. In vitro production and selection of aptamers can be performed using the SELEX method. However, this procedure requires considerable time and cost. In this sense, bioinformatics tools play an important role in reducing the time and cost associated with development and production of aptamers. In this article, we propose bioinformatics strategies for modeling and analysis of the interaction with molecular targets for two RNA aptamers: ATP binding RNA aptamer and iSpinach aptamer. For this purpose, molecular modeling of the tertiary structure of the aptamers was performed with two servers (SimRNA and RNAComposer); and AutoDock Vina and rDock programs were used to dock their respective ligands. The predictions developed with these methods could be used for in silico design of RNA aptamers, through a simple and accessible methodology.Supplemental data for this article is available online at https://doi.org/10.1080/15257770.2021.1951754 .
Collapse
Affiliation(s)
- Alejandro Escamilla-Gutiérrez
- Laboratorio de Producción y Control de Biológicos, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City, Mexico.,Hospital General "Dr. Gaudencio González Garza," Centro Médico Nacional "La Raza," Unidad Médica de Alta Especialidad, Instituto Mexicano del Seguro Social, Mexico City, Mexico
| | - Rosa María Ribas-Aparicio
- Laboratorio de Producción y Control de Biológicos, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City, Mexico
| | - María Guadalupe Córdova-Espinoza
- Laboratorio de Bacteriología Médica, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City, Mexico.,Laboratory of Immunology, Escuela Militar de Graduados de Sanidad, Mexico City, Mexico
| | - Juan Arturo Castelán-Vega
- Laboratorio de Producción y Control de Biológicos, Departamento de Microbiología, Escuela Nacional de Ciencias Biológicas, Instituto Politécnico Nacional, Mexico City, Mexico
| |
Collapse
|
40
|
Megarioti AH, Kouvelis VN. The Coevolution of Fungal Mitochondrial Introns and Their Homing Endonucleases (GIY-YIG and LAGLIDADG). Genome Biol Evol 2021; 12:1337-1354. [PMID: 32585032 PMCID: PMC7487136 DOI: 10.1093/gbe/evaa126] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/17/2020] [Indexed: 12/21/2022] Open
Abstract
Fungal mitochondrial (mt) genomes exhibit great diversity in size which is partially attributed to their variable intergenic regions and most importantly to the inclusion of introns within their genes. These introns belong to group I or II, and both of them are self-splicing. The majority of them carry genes encoding homing endonucleases, either LAGLIDADG or GIY-YIG. In this study, it was found that these intronic homing endonucleases genes (HEGs) may originate from mt free-standing open reading frames which can be found nowadays in species belonging to Early Diverging Fungi as “living fossils.” A total of 487 introns carrying HEGs which were located in the publicly available mt genomes of representative species belonging to orders from all fungal phyla was analyzed. Their distribution in the mt genes, their insertion target sequence, and the phylogenetic analyses of the HEGs showed that these introns along with their HEGs form a composite structure in which both selfish elements coevolved. The invasion of the ancestral free-standing HEGs in the introns occurred through a perpetual mechanism, called in this study as “aenaon” hypothesis. It is based on recombination, transpositions, and horizontal gene transfer events throughout evolution. HEGs phylogenetically clustered primarily according to their intron hosts and secondarily to the mt genes carrying the introns and their HEGs. The evolutionary models created revealed an “intron-early” evolution which was enriched by “intron-late” events through many different independent recombinational events which resulted from both vertical and horizontal gene transfers.
Collapse
Affiliation(s)
- Amalia H Megarioti
- Department of Genetics and Biotechnology, Faculty of Biology, National and Kapodistrian University of Athens, Greece
| | - Vassili N Kouvelis
- Department of Genetics and Biotechnology, Faculty of Biology, National and Kapodistrian University of Athens, Greece
| |
Collapse
|
41
|
Abstract
Alignments of discrete objects can be constructed in a very general setting as super-objects from which the constituent objects are recovered by means of projections. Here, we focus on contact maps, i.e. undirected graphs with an ordered set of vertices. These serve as natural discretizations of RNA and protein structures. In the general case, the alignment problem for vertex-ordered graphs is NP-complete. In the special case of RNA secondary structures, i.e. crossing-free matchings, however, the alignments have a recursive structure. The alignment problem then can be solved by a variant of the Sankoff algorithm in polynomial time. Moreover, the tree or forest alignments of RNA secondary structure can be understood as the alignments of ordered edge sets.
Collapse
Affiliation(s)
- Peter F Stadler
- Bioinformatics Group, Department of Computer Science and Interdisciplinary Centre for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Centre for Scalable Data Services and Solutions Dresden-Leipzig, Leipzig Research Centre for Civilization Diseases, and Centre for Biotechnology and Biomedicine at Leipzig University, Universität Leipzig, Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103 Leipzig, Germany.,Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090 Wien, Austria.,Facultad de Ciencias, Universidad National de Colombia, Bogotá, Colombia.,Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
| |
Collapse
|
42
|
Liu B, Thippabhotla S, Zhang J, Zhong C. DRAGoM: Classification and Quantification of Noncoding RNA in Metagenomic Data. Front Genet 2021; 12:669495. [PMID: 34025724 PMCID: PMC8131839 DOI: 10.3389/fgene.2021.669495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/23/2021] [Indexed: 12/21/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play important regulatory and functional roles in microorganisms, such as regulation of gene expression, signaling, protein synthesis, and RNA processing. Hence, their classification and quantification are central tasks toward the understanding of the function of the microbial community. However, the majority of the current metagenomic sequencing technologies generate short reads, which may contain only a partial secondary structure that complicates ncRNA homology detection. Meanwhile, de novo assembly of the metagenomic sequencing data remains challenging for complex communities. To tackle these challenges, we developed a novel algorithm called DRAGoM (Detection of RNA using Assembly Graph from Metagenomic data). DRAGoM first constructs a hybrid graph by merging an assembly string graph and an assembly de Bruijn graph. Then, it classifies paths in the hybrid graph and their constituent readsinto differentncRNA families based on both sequence and structural homology. Our benchmark experiments show that DRAGoMcan improve the performance and robustness over traditional approaches on the classification and quantification of a wide class of ncRNA families.
Collapse
Affiliation(s)
- Ben Liu
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Sirisha Thippabhotla
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Jun Zhang
- Division of Medical Oncology, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, United States.,Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS, United States
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States.,Bioengineering Program, The University of Kansas, Lawrence, KS, United States.,Center for Computational Biology, The University of Kansas, Lawrence, KS, United States
| |
Collapse
|
43
|
Dyrka W, Gąsior-Głogowska M, Szefczyk M, Szulc N. Searching for universal model of amyloid signaling motifs using probabilistic context-free grammars. BMC Bioinformatics 2021; 22:222. [PMID: 33926372 PMCID: PMC8086366 DOI: 10.1186/s12859-021-04139-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Accepted: 04/19/2021] [Indexed: 11/16/2022] Open
Abstract
Background Amyloid signaling motifs are a class of protein motifs which share basic structural and functional features despite the lack of clear sequence homology. They are hard to detect in large sequence databases either with the alignment-based profile methods (due to short length and diversity) or with generic amyloid- and prion-finding tools (due to insufficient discriminative power). We propose to address the challenge with a machine learning grammatical model capable of generalizing over diverse collections of unaligned yet related motifs. Results First, we introduce and test improvements to our probabilistic context-free grammar framework for protein sequences that allow for inferring more sophisticated models achieving high sensitivity at low false positive rates. Then, we infer universal grammars for a collection of recently identified bacterial amyloid signaling motifs and demonstrate that the method is capable of generalizing by successfully searching for related motifs in fungi. The results are compared to available alternative methods. Finally, we conduct spectroscopy and staining analyses of selected peptides to verify their structural and functional relationship. Conclusions While the profile HMMs remain the method of choice for modeling homologous sets of sequences, PCFGs seem more suitable for building meta-family descriptors and extrapolating beyond the seed sample. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04139-y.
Collapse
Affiliation(s)
- Witold Dyrka
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland.
| | - Marlena Gąsior-Głogowska
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland
| | - Monika Szefczyk
- Wydział Chemiczny, Katedra Chemii Bioorganicznej, Politechnika Wrocławska, Wrocław, Poland
| | - Natalia Szulc
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland
| |
Collapse
|
44
|
Conserved long-range base pairings are associated with pre-mRNA processing of human genes. Nat Commun 2021; 12:2300. [PMID: 33863890 PMCID: PMC8052449 DOI: 10.1038/s41467-021-22549-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 03/20/2021] [Indexed: 02/07/2023] Open
Abstract
The ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3'-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation.
Collapse
|
45
|
Velandia-Huerto CA, Fallmann J, Stadler PF. miRNAture-Computational Detection of microRNA Candidates. Genes (Basel) 2021; 12:348. [PMID: 33673400 PMCID: PMC7996739 DOI: 10.3390/genes12030348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 02/19/2021] [Accepted: 02/20/2021] [Indexed: 12/16/2022] Open
Abstract
Homology-based annotation of short RNAs, including microRNAs, is a difficult problem because their inherently small size limits the available information. Highly sensitive methods, including parameter optimized blast, nhmmer, or cmsearch runs designed to increase sensitivity inevitable lead to large numbers of false positives, which can be detected only by detailed analysis of specific features typical for a RNA family and/or the analysis of conservation patterns in structure-annotated multiple sequence alignments. The miRNAture pipeline implements a workflow specific to animal microRNAs that automatizes homology search and validation steps. The miRNAture pipeline yields very good results for a large number of "typical" miRBase families. However, it also highlights difficulties with atypical cases, in particular microRNAs deriving from repetitive elements and microRNAs with unusual, branched precursor structures and atypical locations of the mature product, which require specific curation by domain experts.
Collapse
Affiliation(s)
- Cristian A. Velandia-Huerto
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, CO-111321 Bogotá, Colombia
- Santa Fe Insitute, Santa Fe, NM 87501, USA
| |
Collapse
|
46
|
Zhu H, Li J, Li Y, Zheng Z, Guan H, Wang H, Tao K, Liu J, Wang Y, Zhang W, Li C, Li J, Jia L, Bai W, Hu D. Glucocorticoid counteracts cellular mechanoresponses by LINC01569-dependent glucocorticoid receptor-mediated mRNA decay. SCIENCE ADVANCES 2021; 7:7/9/eabd9923. [PMID: 33627425 PMCID: PMC7904261 DOI: 10.1126/sciadv.abd9923] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Accepted: 01/08/2021] [Indexed: 05/05/2023]
Abstract
Mechanical stimuli on cells and mechanotransduction are essential in many biological and pathological processes. Glucocorticoid is an important hormone, roles, and mechanisms of which in cellular mechanotransduction remain unknown. Here, we report that glucocorticoid counteracted cellular mechanoresponses dependently on a novel long noncoding RNA (lncRNA), LINC01569 Further, LINC01569 mediated glucocorticoid effects on mechanotransduction by destabilizing messenger RNA (mRNA) of mechanosensors including early growth response protein 1 (EGR1), Cbp/P300-interacting transactivator 2 (CITED2), and bone morphogenic protein 7 (BMP7) in glucocorticoid receptor-mediated mRNA decay (GMD) manner. Mechanistically, LINC01569 directly bound to the GMD factor Y-box-binding protein 1 (YBX1). Then, the LINC01569-YBX1 complex was guided to the mRNAs of EGR1, CITED2, and BMP7 through specific LINC01569-mRNA interaction, thereby contributing to the successful assembly of GMD complex and triggering GMD. Our results uncovered roles of glucocorticoid in cellular mechanotransduction and novel lncRNA-dependent GMD machinery and provided potential strategy for early intervention in mechanical disorder-associated diseases.
Collapse
Affiliation(s)
- Huayu Zhu
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Jun Li
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Yize Li
- Department of Clinical Oncology, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Zhao Zheng
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Hao Guan
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Hongtao Wang
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Ke Tao
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Jiaqi Liu
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Yunchuan Wang
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Wanfu Zhang
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Chao Li
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Jie Li
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China
| | - Lintao Jia
- State Key Laboratory of Cancer Biology, Department of Biochemistry and Molecular Biology, Fourth Military Medical University, Xi'an, Shaanxi 710032, China.
| | - Wendong Bai
- Department of Endocrinology, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China.
- Department of Clinical Laboratory Center, Xinjiang Command General Hospital of Chinese People's Liberation Army, Urumqi, Xinjiang 830000, China
| | - Dahai Hu
- Department of Burns and Cutaneous Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, Shaanxi 710032, China.
| |
Collapse
|
47
|
Sellés Vidal L, Ayala R, Stan GB, Ledesma-Amaro R. rfaRm: An R client-side interface to facilitate the analysis of the Rfam database of RNA families. PLoS One 2021; 16:e0245280. [PMID: 33449976 PMCID: PMC7810343 DOI: 10.1371/journal.pone.0245280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 12/27/2020] [Indexed: 11/19/2022] Open
Abstract
rfaRm is an R package providing a client-side interface for the Rfam database of non-coding RNA and other structured RNA elements. The package facilitates the search of the Rfam database by keywords or sequences, as well as the retrieval of all available information about specific Rfam families, such as member sequences, multiple sequence alignments, secondary structures and covariance models. By providing such programmatic access to the Rfam database, rfaRm enables genomic workflows to incorporate information about non-coding RNA, whose potential cannot be fully exploited just through interactive access to the database. The features of rfaRm are demonstrated by using it to analyze the SARS-CoV-2 genome as an example case.
Collapse
Affiliation(s)
- Lara Sellés Vidal
- Department of Bioengineering, Faculty of Engineering, Imperial College London, London, United Kingdom
- * E-mail: (LSV); (GBS); (RLA)
| | - Rafael Ayala
- Department of Infectious Disease, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Guy-Bart Stan
- Department of Bioengineering, Faculty of Engineering, Imperial College London, London, United Kingdom
- * E-mail: (LSV); (GBS); (RLA)
| | - Rodrigo Ledesma-Amaro
- Department of Bioengineering, Faculty of Engineering, Imperial College London, London, United Kingdom
- * E-mail: (LSV); (GBS); (RLA)
| |
Collapse
|
48
|
Wilburn GW, Eddy SR. Remote homology search with hidden Potts models. PLoS Comput Biol 2020; 16:e1008085. [PMID: 33253143 PMCID: PMC7728182 DOI: 10.1371/journal.pcbi.1008085] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 12/10/2020] [Accepted: 10/27/2020] [Indexed: 12/03/2022] Open
Abstract
Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.
Collapse
Affiliation(s)
- Grey W. Wilburn
- Department of Physics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Sean R. Eddy
- Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts, United States of America
- John A Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
49
|
Discoveries of Exoribonuclease-Resistant Structures of Insect-Specific Flaviviruses Isolated in Zambia. Viruses 2020; 12:v12091017. [PMID: 32933075 PMCID: PMC7551683 DOI: 10.3390/v12091017] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 09/08/2020] [Accepted: 09/08/2020] [Indexed: 12/13/2022] Open
Abstract
To monitor the arthropod-borne virus transmission in mosquitoes, we have attempted both to detect and isolate viruses from 3304 wild-caught female mosquitoes in the Livingstone (Southern Province) and Mongu (Western Province) regions in Zambia in 2017. A pan-flavivirus RT-PCR assay was performed to identify flavivirus genomes in total RNA extracted from mosquito lysates, followed by virus isolation and full genome sequence analysis using next-generation sequencing and rapid amplification of cDNA ends. We isolated a newly identified Barkedji virus (BJV Zambia) (10,899 nt) and a novel flavivirus, tentatively termed Barkedji-like virus (BJLV) (10,885 nt) from Culex spp. mosquitoes which shared 96% and 75% nucleotide identity with BJV which has been isolated in Israel, respectively. These viruses could replicate in C6/36 cells but not in mammalian and avian cell lines. In parallel, a comparative genomics screening was conducted to study evolutionary traits of the 5'- and 3'-untranslated regions (UTRs) of isolated viruses. Bioinformatic analyses of the secondary structures in the UTRs of both viruses revealed that the 5'-UTRs exhibit canonical stem-loop structures, while the 3'-UTRs contain structural homologs to exoribonuclease-resistant RNAs (xrRNAs), SL-III, dumbbell, and terminal stem-loop (3'SL) structures. The function of predicted xrRNA structures to stop RNA degradation by Xrn1 exoribonuclease was further proved by the in vitro Xrn1 resistance assay.
Collapse
|
50
|
RNA-centric approaches to study RNA-protein interactions in vitro and in silico. Methods 2020; 178:11-18. [DOI: 10.1016/j.ymeth.2019.09.011] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 09/10/2019] [Accepted: 09/10/2019] [Indexed: 01/17/2023] Open
|