1
|
Zhao Y, Oono K, Takizawa H, Kotera M. GenerRNA: A generative pre-trained language model for de novo RNA design. PLoS One 2024; 19:e0310814. [PMID: 39352899 PMCID: PMC11444397 DOI: 10.1371/journal.pone.0310814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 09/08/2024] [Indexed: 10/04/2024] Open
Abstract
The design of RNA plays a crucial role in developing RNA vaccines, nucleic acid therapeutics, and innovative biotechnological tools. However, existing techniques frequently lack versatility across various tasks and are dependent on pre-defined secondary structure or other prior knowledge. To address these limitations, we introduce GenerRNA, a Transformer-based model inspired by the success of large language models (LLMs) in protein and molecule generation. GenerRNA is pre-trained on large-scale RNA sequences and capable of generating novel RNA sequences with stable secondary structures, while ensuring distinctiveness from existing sequences, thereby expanding our exploration of the RNA space. Moreover, GenerRNA can be fine-tuned on smaller, specialized datasets for specific subtasks, enabling the generation of RNAs with desired functionalities or properties without requiring any prior knowledge input. As a demonstration, we fine-tuned GenerRNA and successfully generated novel RNA sequences exhibiting high affinity for target proteins. Our work is the first application of a generative language model to RNA generation, presenting an innovative approach to RNA design.
Collapse
|
2
|
Cao X, Zhang Y, Ding Y, Wan Y. Identification of RNA structures and their roles in RNA functions. Nat Rev Mol Cell Biol 2024; 25:784-801. [PMID: 38926530 DOI: 10.1038/s41580-024-00748-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2024] [Indexed: 06/28/2024]
Abstract
The development of high-throughput RNA structure profiling methods in the past decade has greatly facilitated our ability to map and characterize different aspects of RNA structures transcriptome-wide in cell populations, single cells and single molecules. The resulting high-resolution data have provided insights into the static and dynamic nature of RNA structures, revealing their complexity as they perform their respective functions in the cell. In this Review, we discuss recent technical advances in the determination of RNA structures, and the roles of RNA structures in RNA biogenesis and functions, including in transcription, processing, translation, degradation, localization and RNA structure-dependent condensates. We also discuss the current understanding of how RNA structures could guide drug design for treating genetic diseases and battling pathogenic viruses, and highlight existing challenges and future directions in RNA structure research.
Collapse
Affiliation(s)
- Xinang Cao
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore
| | - Yueying Zhang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK
| | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK.
| | - Yue Wan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
3
|
Boon WX, Sia BZ, Ng CH. Prediction of the effects of the top 10 synonymous mutations from 26645 SARS-CoV-2 genomes of early pandemic phase. F1000Res 2024; 10:1053. [PMID: 39268187 PMCID: PMC11391198 DOI: 10.12688/f1000research.72896.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/11/2024] [Indexed: 09/15/2024] Open
Abstract
Background The emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) had led to a global pandemic since December 2019. SARS-CoV-2 is a single-stranded RNA virus, which mutates at a higher rate. Multiple works had been done to study nonsynonymous mutations, which change protein sequences. However, there is little study on the effects of SARS-CoV-2 synonymous mutations, which may affect viral fitness. This study aims to predict the effect of synonymous mutations on the SARS-CoV-2 genome. Methods A total of 26645 SARS-CoV-2 genomic sequences retrieved from Global Initiative on Sharing all Influenza Data (GISAID) database were aligned using MAFFT. Then, the mutations and their respective frequency were identified. Multiple RNA secondary structures prediction tools, namely RNAfold, IPknot++ and MXfold2 were applied to predict the effect of the mutations on RNA secondary structure and their base pair probabilities was estimated using MutaRNA. Relative synonymous codon usage (RSCU) analysis was also performed to measure the codon usage bias (CUB) of SARS-CoV-2. Results A total of 150 synonymous mutations were identified. The synonymous mutation identified with the highest frequency is C3037U mutation in the nsp3 of ORF1a. Of these top 10 highest frequency synonymous mutations, C913U, C3037U, U16176C and C18877U mutants show pronounced changes between wild type and mutant in all 3 RNA secondary structure prediction tools, suggesting these mutations may have some biological impact on viral fitness. These four mutations show changes in base pair probabilities. All mutations except U16176C change the codon to a more preferred codon, which may result in higher translation efficiency. Conclusion Synonymous mutations in SARS-CoV-2 genome may affect RNA secondary structure, changing base pair probabilities and possibly resulting in a higher translation rate. However, lab experiments are required to validate the results obtained from prediction analysis.
Collapse
Affiliation(s)
- Wan Xin Boon
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| | - Boon Zhan Sia
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| | - Chong Han Ng
- Faculty of Information Science and Technology, Multimedia University, Bukit Beruang, Melaka, 75450, Malaysia
| |
Collapse
|
4
|
Creux C, Zehraoui F, Radvanyi F, Tahi F. Comparison and benchmark of deep learning methods for non-coding RNA classification. PLoS Comput Biol 2024; 20:e1012446. [PMID: 39264986 PMCID: PMC11421803 DOI: 10.1371/journal.pcbi.1012446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 09/24/2024] [Accepted: 08/30/2024] [Indexed: 09/14/2024] Open
Abstract
The involvement of non-coding RNAs in biological processes and diseases has made the exploration of their functions crucial. Most non-coding RNAs have yet to be studied, creating the need for methods that can rapidly classify large sets of non-coding RNAs into functional groups, or classes. In recent years, the success of deep learning in various domains led to its application to non-coding RNA classification. Multiple novel architectures have been developed, but these advancements are not covered by current literature reviews. We present an exhaustive comparison of the different methods proposed in the state-of-the-art and describe their associated datasets. Moreover, the literature lacks objective benchmarks. We perform experiments to fairly evaluate the performance of various tools for non-coding RNA classification on popular datasets. The robustness of methods to non-functional sequences and sequence boundary noise is explored. We also measure computation time and CO2 emissions. With regard to these results, we assess the relevance of the different architectural choices and provide recommendations to consider in future methods.
Collapse
Affiliation(s)
- Constance Creux
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes, France
- Molecular Oncology, PSL Research University, CNRS, UMR, Institut Curie, Paris, France
| | - Farida Zehraoui
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes, France
| | - François Radvanyi
- Molecular Oncology, PSL Research University, CNRS, UMR, Institut Curie, Paris, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ Evry, IBISC, Evry-Courcouronnes, France
| |
Collapse
|
5
|
Allan MF, Aruda J, Plung JS, Grote SL, des Taillades YJM, de Lajarte AA, Bathe M, Rouskin S. Discovery and Quantification of Long-Range RNA Base Pairs in Coronavirus Genomes with SEARCH-MaP and SEISMIC-RNA. RESEARCH SQUARE 2024:rs.3.rs-4814547. [PMID: 39149495 PMCID: PMC11326378 DOI: 10.21203/rs.3.rs-4814547/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
RNA molecules perform a diversity of essential functions for which their linear sequences must fold into higher-order structures. Techniques including crystallography and cryogenic electron microscopy have revealed 3D structures of ribosomal, transfer, and other well-structured RNAs; while chemical probing with sequencing facilitates secondary structure modeling of any RNAs of interest, even within cells. Ongoing efforts continue increasing the accuracy, resolution, and ability to distinguish coexisting alternative structures. However, no method can discover and quantify alternative structures with base pairs spanning arbitrarily long distances - an obstacle for studying viral, messenger, and long noncoding RNAs, which may form long-range base pairs. Here, we introduce the method of Structure Ensemble Ablation by Reverse Complement Hybridization with Mutational Profiling (SEARCH-MaP) and software for Structure Ensemble Inference by Sequencing, Mutation Identification, and Clustering of RNA (SEISMIC-RNA). We use SEARCH-MaP and SEISMIC-RNA to discover that the frameshift stimulating element of SARS coronavirus 2 base-pairs with another element 1 kilobase downstream in nearly half of RNA molecules, and that this structure competes with a pseudoknot that stimulates ribosomal frameshifting. Moreover, we identify long-range base pairs involving the frameshift stimulating element in other coronaviruses including SARS coronavirus 1 and transmissible gastroenteritis virus, and model the full genomic secondary structure of the latter. These findings suggest that long-range base pairs are common in coronaviruses and may regulate ribosomal frameshifting, which is essential for viral RNA synthesis. We anticipate that SEARCH-MaP will enable solving many RNA structure ensembles that have eluded characterization, thereby enhancing our general understanding of RNA structures and their functions. SEISMIC-RNA, software for analyzing mutational profiling data at any scale, could power future studies on RNA structure and is available on GitHub and the Python Package Index.
Collapse
Affiliation(s)
- Matthew F. Allan
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 02139
- Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 02139
| | - Justin Aruda
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
- Harvard Program in Biological and Biomedical Sciences, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA 02115
| | - Jesse S. Plung
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
- Harvard Program in Virology, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA 02115
| | - Scott L. Grote
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
| | | | - Albéric A. de Lajarte
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
| | - Mark Bathe
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA 02139
| | - Silvi Rouskin
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, USA 02115
| |
Collapse
|
6
|
Allan MF, Aruda J, Plung JS, Grote SL, Martin des Taillades YJ, de Lajarte AA, Bathe M, Rouskin S. Discovery and Quantification of Long-Range RNA Base Pairs in Coronavirus Genomes with SEARCH-MaP and SEISMIC-RNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.29.591762. [PMID: 38746332 PMCID: PMC11092567 DOI: 10.1101/2024.04.29.591762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
RNA molecules perform a diversity of essential functions for which their linear sequences must fold into higher-order structures. Techniques including crystallography and cryogenic electron microscopy have revealed 3D structures of ribosomal, transfer, and other well-structured RNAs; while chemical probing with sequencing facilitates secondary structure modeling of any RNAs of interest, even within cells. Ongoing efforts continue increasing the accuracy, resolution, and ability to distinguish coexisting alternative structures. However, no method can discover and quantify alternative structures with base pairs spanning arbitrarily long distances - an obstacle for studying viral, messenger, and long noncoding RNAs, which may form long-range base pairs. Here, we introduce the method of Structure Ensemble Ablation by Reverse Complement Hybridization with Mutational Profiling (SEARCH-MaP) and software for Structure Ensemble Inference by Sequencing, Mutation Identification, and Clustering of RNA (SEISMIC-RNA). We use SEARCH-MaP and SEISMIC-RNA to discover that the frameshift stimulating element of SARS coronavirus 2 base-pairs with another element 1 kilobase downstream in nearly half of RNA molecules, and that this structure competes with a pseudoknot that stimulates ribosomal frameshifting. Moreover, we identify long-range base pairs involving the frameshift stimulating element in other coronaviruses including SARS coronavirus 1 and transmissible gastroenteritis virus, and model the full genomic secondary structure of the latter. These findings suggest that long-range base pairs are common in coronaviruses and may regulate ribosomal frameshifting, which is essential for viral RNA synthesis. We anticipate that SEARCH-MaP will enable solving many RNA structure ensembles that have eluded characterization, thereby enhancing our general understanding of RNA structures and their functions. SEISMIC-RNA, software for analyzing mutational profiling data at any scale, could power future studies on RNA structure and is available on GitHub and the Python Package Index.
Collapse
|
7
|
Paremskaia AI, Kogan AA, Murashkina A, Naumova DA, Satish A, Abramov IS, Feoktistova SG, Mityaeva ON, Deviatkin AA, Volchkov PY. Codon-optimization in gene therapy: promises, prospects and challenges. Front Bioeng Biotechnol 2024; 12:1371596. [PMID: 38605988 PMCID: PMC11007035 DOI: 10.3389/fbioe.2024.1371596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 03/19/2024] [Indexed: 04/13/2024] Open
Abstract
Codon optimization has evolved to enhance protein expression efficiency by exploiting the genetic code's redundancy, allowing for multiple codon options for a single amino acid. Initially observed in E. coli, optimal codon usage correlates with high gene expression, which has propelled applications expanding from basic research to biopharmaceuticals and vaccine development. The method is especially valuable for adjusting immune responses in gene therapies and has the potenial to create tissue-specific therapies. However, challenges persist, such as the risk of unintended effects on protein function and the complexity of evaluating optimization effectiveness. Despite these issues, codon optimization is crucial in advancing gene therapeutics. This study provides a comprehensive review of the current metrics for codon-optimization, and its practical usage in research and clinical applications, in the context of gene therapy.
Collapse
Affiliation(s)
- Anastasiia Iu Paremskaia
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anna A. Kogan
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anastasiia Murashkina
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Daria A. Naumova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Anakha Satish
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Ivan S. Abramov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
- The MCSC named after A. S. Loginov, Moscow, Russia
| | - Sofya G. Feoktistova
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Olga N. Mityaeva
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Andrei A. Deviatkin
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
| | - Pavel Yu Volchkov
- Federal Research Center for Innovator and Emerging Biomedical and Pharmaceutical Technologies, Moscow, Russia
- The MCSC named after A. S. Loginov, Moscow, Russia
| |
Collapse
|
8
|
Krishnan SR, Roy A, Gromiha MM. Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning. Brief Bioinform 2024; 25:bbae002. [PMID: 38261341 PMCID: PMC10805179 DOI: 10.1093/bib/bbae002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 12/21/2023] [Accepted: 12/24/2023] [Indexed: 01/24/2024] Open
Abstract
Ribonucleic acids (RNAs) play important roles in cellular regulation. Consequently, dysregulation of both coding and non-coding RNAs has been implicated in several disease conditions in the human body. In this regard, a growing interest has been observed to probe into the potential of RNAs to act as drug targets in disease conditions. To accelerate this search for disease-associated novel RNA targets and their small molecular inhibitors, machine learning models for binding affinity prediction were developed specific to six RNA subtypes namely, aptamers, miRNAs, repeats, ribosomal RNAs, riboswitches and viral RNAs. We found that differences in RNA sequence composition, flexibility and polar nature of RNA-binding ligands are important for predicting the binding affinity. Our method showed an average Pearson correlation (r) of 0.83 and a mean absolute error of 0.66 upon evaluation using the jack-knife test, indicating their reliability despite the low amount of data available for several RNA subtypes. Further, the models were validated with external blind test datasets, which outperform other existing quantitative structure-activity relationship (QSAR) models. We have developed a web server to host the models, RNA-Small molecule binding Affinity Predictor, which is freely available at: https://web.iitm.ac.in/bioinfo2/RSAPred/.
Collapse
Affiliation(s)
- Sowmya R Krishnan
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- TCS Research (Life Sciences division), Tata Consultancy Services, Hyderabad 500081, India
| | - Arijit Roy
- TCS Research (Life Sciences division), Tata Consultancy Services, Hyderabad 500081, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
- International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama 226-8501, Japan
- Department of Computer Science, National University of Singapore, Singapore 117543
| |
Collapse
|
9
|
Jamialahmadi H, Khalili-Tanha G, Nazari E, Rezaei-Tavirani M. Artificial intelligence and bioinformatics: a journey from traditional techniques to smart approaches. GASTROENTEROLOGY AND HEPATOLOGY FROM BED TO BENCH 2024; 17:241-252. [PMID: 39308539 PMCID: PMC11413381 DOI: 10.22037/ghfbb.v17i3.2977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2024] [Accepted: 05/11/2024] [Indexed: 09/25/2024]
Abstract
The incorporation of AI models into bioinformatics has brought about a revolutionary era in the analysis and interpretation of biological data. This mini-review offers a succinct overview of the indispensable role AI plays in the convergence of computational techniques and biological research. The search strategy followed PRISMA guidelines, encompassing databases such as PubMed, Embase, and Google Scholar to include studies published between 2018 and 2024, utilizing specific keywords. We explored the diverse applications of AI methodologies, including machine learning (ML), deep learning (DL), and natural language processing (NLP), across various domains of bioinformatics. These domains encompass genome sequencing, protein structure prediction, drug discovery, systems biology, personalized medicine, imaging, signal processing, and text mining. AI algorithms have exhibited remarkable efficacy in tackling intricate biological challenges, spanning from genome sequencing to protein structure prediction, and from drug discovery to personalized medicine. In conclusion, this study scrutinizes the evolving landscape of AI-driven tools and algorithms, emphasizing their pivotal role in expediting research, facilitating data interpretation, and catalyzing innovations in biomedical sciences.
Collapse
Affiliation(s)
- Hamid Jamialahmadi
- Department of Medical Genetics and Molecular Medicine, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- These authors equally contributed to this study as the first authors.
| | - Ghazaleh Khalili-Tanha
- Department of Medical Genetics and Molecular Medicine, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- These authors equally contributed to this study as the first authors.
| | - Elham Nazari
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mostafa Rezaei-Tavirani
- Proteomics Research Center, Faculty of Paramedical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
10
|
Liu N, Dong W, Yang H, Li JH, Chiu TY. Application of artificial scaffold systems in microbial metabolic engineering. Front Bioeng Biotechnol 2023; 11:1328141. [PMID: 38188488 PMCID: PMC10771841 DOI: 10.3389/fbioe.2023.1328141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Accepted: 12/12/2023] [Indexed: 01/09/2024] Open
Abstract
In nature, metabolic pathways are often organized into complex structures such as multienzyme complexes, enzyme molecular scaffolds, or reaction microcompartments. These structures help facilitate multi-step metabolic reactions. However, engineered metabolic pathways in microbial cell factories do not possess inherent metabolic regulatory mechanisms, which can result in metabolic imbalance. Taking inspiration from nature, scientists have successfully developed synthetic scaffolds to enhance the performance of engineered metabolic pathways in microbial cell factories. By recruiting enzymes, synthetic scaffolds facilitate the formation of multi-enzyme complexes, leading to the modulation of enzyme spatial distribution, increased enzyme activity, and a reduction in the loss of intermediate products and the toxicity associated with harmful intermediates within cells. In recent years, scaffolds based on proteins, nucleic acids, and various organelles have been developed and employed to facilitate multiple metabolic pathways. Despite varying degrees of success, synthetic scaffolds still encounter numerous challenges. The objective of this review is to provide a comprehensive introduction to these synthetic scaffolds and discuss their latest research advancements and challenges.
Collapse
Affiliation(s)
- Nana Liu
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou, China
- HIM-BGI Omics Center, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences (CAS), Hangzhou, China
| | - Wei Dong
- HIM-BGI Omics Center, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences (CAS), Hangzhou, China
| | - Huanming Yang
- HIM-BGI Omics Center, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences (CAS), Hangzhou, China
| | - Jing-Hua Li
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou, China
| | - Tsan-Yu Chiu
- College of Pharmaceutical Science, Zhejiang University of Technology, Hangzhou, China
- HIM-BGI Omics Center, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences (CAS), Hangzhou, China
| |
Collapse
|
11
|
Agarwal R, T RR, Smith JC. Comparative Assessment of Pose Prediction Accuracy in RNA-Ligand Docking. J Chem Inf Model 2023; 63:7444-7452. [PMID: 37972310 DOI: 10.1021/acs.jcim.3c01533] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Structure-based virtual high-throughput screening is used in early-stage drug discovery. Over the years, docking protocols and scoring functions for protein-ligand complexes have evolved to improve the accuracy in the computation of binding strengths and poses. In the past decade, RNA has also emerged as a target class for new small-molecule drugs. However, most ligand docking programs have been validated and tested for proteins and not RNA. Here, we test the docking power (pose prediction accuracy) of three state-of-the-art docking protocols on 173 RNA-small molecule crystal structures. The programs are AutoDock4 (AD4) and AutoDock Vina (Vina), which were designed for protein targets, and rDock, which was designed for both protein and nucleic acid targets. AD4 performed relatively poorly. For RNA targets for which a crystal structure of a bound ligand used to limit the docking search space is available and for which the goal is to identify new molecules for the same pocket, rDock performs slightly better than Vina, with success rates of 48% and 63%, respectively. However, in the more common type of early-stage drug discovery setting, in which no structure of a ligand-target complex is known and for which a larger search space is defined, rDock performed similarly to Vina, with a low success rate of ∼27%. Vina was found to have bias for ligands with certain physicochemical properties, whereas rDock performs similarly for all ligand properties. Thus, for projects where no ligand-protein structure already exists, Vina and rDock are both applicable. However, the relatively poor performance of all methods relative to protein-target docking illustrates a need for further methods refinement.
Collapse
Affiliation(s)
- Rupesh Agarwal
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| | - Rajitha Rajeshwar T
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| | - Jeremy C Smith
- UT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-6309, United States
- Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, Tennessee 37996-1939, United States
| |
Collapse
|
12
|
Tieng FYF, Abdullah-Zawawi MR, Md Shahri NAA, Mohamed-Hussein ZA, Lee LH, Mutalib NSA. A Hitchhiker's guide to RNA-RNA structure and interaction prediction tools. Brief Bioinform 2023; 25:bbad421. [PMID: 38040490 PMCID: PMC10753535 DOI: 10.1093/bib/bbad421] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/16/2023] [Accepted: 10/26/2023] [Indexed: 12/03/2023] Open
Abstract
RNA biology has risen to prominence after a remarkable discovery of diverse functions of noncoding RNA (ncRNA). Most untranslated transcripts often exert their regulatory functions into RNA-RNA complexes via base pairing with complementary sequences in other RNAs. An interplay between RNAs is essential, as it possesses various functional roles in human cells, including genetic translation, RNA splicing, editing, ribosomal RNA maturation, RNA degradation and the regulation of metabolic pathways/riboswitches. Moreover, the pervasive transcription of the human genome allows for the discovery of novel genomic functions via RNA interactome investigation. The advancement of experimental procedures has resulted in an explosion of documented data, necessitating the development of efficient and precise computational tools and algorithms. This review provides an extensive update on RNA-RNA interaction (RRI) analysis via thermodynamic- and comparative-based RNA secondary structure prediction (RSP) and RNA-RNA interaction prediction (RIP) tools and their general functions. We also highlighted the current knowledge of RRIs and the limitations of RNA interactome mapping via experimental data. Then, the gap between RSP and RIP, the importance of RNA homologues, the relationship between pseudoknots, and RNA folding thermodynamics are discussed. It is hoped that these emerging prediction tools will deepen the understanding of RNA-associated interactions in human diseases and hasten treatment processes.
Collapse
Affiliation(s)
- Francis Yew Fu Tieng
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | | | - Nur Alyaa Afifah Md Shahri
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
| | - Zeti-Azura Mohamed-Hussein
- Institute of Systems Biology (INBIOSIS), UKM, Selangor 43600, Malaysia
- Department of Applied Physics, Faculty of Science and Technology, UKM, Selangor 43600, Malaysia
| | - Learn-Han Lee
- Sunway Microbiomics Centre, School of Medical and Life Sciences, Sunway University, Sunway City 47500, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
| | - Nurul-Syakima Ab Mutalib
- UKM Medical Molecular Biology Institute (UMBI), Universiti Kebangsaan Malaysia (UKM), Kuala Lumpur 56000, Malaysia
- Novel Bacteria and Drug Discovery Research Group, Microbiome and Bioresource Research Strength, Jeffrey Cheah School of Medicine and Health Sciences, Monash University of Malaysia, Selangor 47500, Malaysia
- Faculty of Health Sciences, UKM, Kuala Lumpur 50300, Malaysia
| |
Collapse
|
13
|
Wang Y, Zhang H, Xu Z, Zhang S, Guo R. TransUFold: Unlocking the structural complexity of short and long RNA with pseudoknots. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:19320-19340. [PMID: 38052602 DOI: 10.3934/mbe.2023854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
The RNA secondary structure is like a blueprint that holds the key to unlocking the mysteries of RNA function and 3D structure. It serves as a crucial foundation for investigating the complex world of RNA, making it an indispensable component of research in this exciting field. However, pseudoknots cannot be accurately predicted by conventional prediction methods based on free energy minimization, which results in a performance bottleneck. To this end, we propose a deep learning-based method called TransUFold to train directly on RNA data annotated with structure information. It employs an encoder-decoder network architecture, named Vision Transformer, to extract long-range interactions in RNA sequences and utilizes convolutions with lateral connections to supplement short-range interactions. Then, a post-processing program is designed to constrain the model's output to produce realistic and effective RNA secondary structures, including pseudoknots. After training TransUFold on benchmark datasets, we outperform other methods in test data on the same family. Additionally, we achieve better results on longer sequences up to 1600 nt, demonstrating the outstanding performance of Vision Transformer in extracting long-range interactions in RNA sequences. Finally, our analysis indicates that TransUFold produces effective pseudoknot structures in long sequences. As more high-quality RNA structures become available, deep learning-based prediction methods like Vision Transformer can exhibit better performance.
Collapse
Affiliation(s)
- Yunxiang Wang
- School of Cyber Security and Computer, Hebei University, Baoding, Hebei, China
| | - Hong Zhang
- School of Cyber Security and Computer, Hebei University, Baoding, Hebei, China
| | - Zhenchao Xu
- School of Cyber Security and Computer, Hebei University, Baoding, Hebei, China
| | - Shouhua Zhang
- Information Technology and Electrical Engineering, University of Oulu, Oulu, Finland
| | - Rui Guo
- College of Life Sciences, Institute of Life Science and Green Development, Hebei University, Baoding, China
| |
Collapse
|
14
|
Hara K, Iwano N, Fukunaga T, Hamada M. DeepRaccess: high-speed RNA accessibility prediction using deep learning. FRONTIERS IN BIOINFORMATICS 2023; 3:1275787. [PMID: 37881622 PMCID: PMC10597636 DOI: 10.3389/fbinf.2023.1275787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 09/29/2023] [Indexed: 10/27/2023] Open
Abstract
RNA accessibility is a useful RNA secondary structural feature for predicting RNA-RNA interactions and translation efficiency in prokaryotes. However, conventional accessibility calculation tools, such as Raccess, are computationally expensive and require considerable computational time to perform transcriptome-scale analysis. In this study, we developed DeepRaccess, which predicts RNA accessibility based on deep learning methods. DeepRaccess was trained to take artificial RNA sequences as input and to predict the accessibility of these sequences as calculated by Raccess. Simulation and empirical dataset analyses showed that the accessibility predicted by DeepRaccess was highly correlated with the accessibility calculated by Raccess. In addition, we confirmed that DeepRaccess could predict protein abundance in E.coli with moderate accuracy from the sequences around the start codon. We also demonstrated that DeepRaccess achieved tens to hundreds of times software speed-up in a GPU environment. The source codes and the trained models of DeepRaccess are freely available at https://github.com/hmdlab/DeepRaccess.
Collapse
Affiliation(s)
- Kaisei Hara
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo, Japan
| | - Natsuki Iwano
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, Tokyo, Japan
- Graduate School of Medicine, Nippon Medical School, Tokyo, Japan
| |
Collapse
|
15
|
Zarnack K, Eyras E. 'Artificial intelligence and machine learning in RNA biology'. Brief Bioinform 2023; 24:bbad415. [PMID: 37965807 PMCID: PMC10646484 DOI: 10.1093/bib/bbad415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/16/2023] Open
Affiliation(s)
- Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| |
Collapse
|