1
|
Wang J, Fan Y, Hong L, Hu Z, Li Y. Deep learning for RNA structure prediction. Curr Opin Struct Biol 2025; 91:102991. [PMID: 39933218 DOI: 10.1016/j.sbi.2025.102991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 11/27/2024] [Accepted: 01/04/2025] [Indexed: 02/13/2025]
Abstract
Predicting RNA structures from sequences with computational approaches is of vital importance in RNA biology considering the high costs of experimental determination. AI methods have revolutionized this field in recent years, enabling RNA structure prediction with increasingly higher accuracy and efficiency. With an increase in the number of models proposed for this task, this review presents a timely summary of the applications of AI, particularly deep learning, in RNA structure prediction, highlighting their methodology advances as well as the challenges and opportunities for further work in this field.
Collapse
Affiliation(s)
- Jiuming Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yimin Fan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Liang Hong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China.
| |
Collapse
|
2
|
Chaturvedi M, Rashid MA, Paliwal KK. RNA structure prediction using deep learning - A comprehensive review. Comput Biol Med 2025; 188:109845. [PMID: 39983363 DOI: 10.1016/j.compbiomed.2025.109845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Revised: 02/09/2025] [Accepted: 02/10/2025] [Indexed: 02/23/2025]
Abstract
In computational biology, accurate RNA structure prediction offers several benefits, including facilitating a better understanding of RNA functions and RNA-based drug design. Implementing deep learning techniques for RNA structure prediction has led tremendous progress in this field, resulting in significant improvements in prediction accuracy. This comprehensive review aims to provide an overview of the diverse strategies employed in predicting RNA secondary structures, emphasizing deep learning methods. The article categorizes the discussion into three main dimensions: feature extraction methods, existing state-of-the-art learning model architectures, and prediction approaches. We present a comparative analysis of various techniques and models highlighting their strengths and weaknesses. Finally, we identify gaps in the literature, discuss current challenges, and suggest future approaches to enhance model performance and applicability in RNA structure prediction tasks. This review provides a deeper insight into the subject and paves the way for further progress in this dynamic intersection of life sciences and artificial intelligence.
Collapse
Affiliation(s)
- Mayank Chaturvedi
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Mahmood A Rashid
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| | - Kuldip K Paliwal
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD, 4111, Australia.
| |
Collapse
|
3
|
Liu X, Wang S, Sun Y, Liao Y, Jiang G, Sun BY, Yu J, Zhao D. Unlocking the potential of circular RNA vaccines: a bioinformatics and computational biology perspective. EBioMedicine 2025; 114:105638. [PMID: 40112741 DOI: 10.1016/j.ebiom.2025.105638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 02/23/2025] [Accepted: 02/24/2025] [Indexed: 03/22/2025] Open
Abstract
Bioinformatics has significantly advanced RNA-based therapeutics, particularly circular RNAs (circRNAs), which outperform mRNA vaccines, by offering superior stability, sustained expression, and enhanced immunogenicity due to their covalently closed structure. This review highlights how bioinformatics and computational biology optimise circRNA vaccine design, elucidates internal ribosome entry sites (IRES) selection, open reading frame (ORF) optimisation, codon usage, RNA secondary structure prediction, and delivery system development. While circRNA vaccines may not always surpass traditional vaccines in stability, their production efficiency and therapeutic efficacy can be enhanced through computational strategies. The discussion also addresses challenges and future prospects, emphasizing the need for innovative solutions to overcome current limitations and advance circRNA vaccine applications.
Collapse
Affiliation(s)
- Xuyuan Liu
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Siqi Wang
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Yunan Sun
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Yunxi Liao
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Guangzhen Jiang
- Division of Life Sciences and Medicine, School of Life Sciences, University of Science and Technology of China, Hefei, Anhui 230027, China; Guangzhou National Laboratory, Bio-Island, Guangzhou, Guangdong 510005, China
| | - Bryan-Yu Sun
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China
| | - Jingyou Yu
- Guangzhou National Laboratory, Bio-Island, Guangzhou, Guangdong 510005, China; State Key Laboratory of Respiratory Disease, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China.
| | - Dongyu Zhao
- Department of Biomedical Informatics, School of Basic Medical Sciences, State Key Laboratory of Vascular Homeostasis and Remodeling, Peking University, Beijing 100191, China.
| |
Collapse
|
4
|
Wei B, Ma E, Tang S, Cadang L, Collins V, Gorman S, Chen B, Huang R, Wang J, Ma M, Zhang K. Real-Time Monitoring of Higher-Order Structure of RNAs by Temperature-Course Size Exclusion Chromatography and Microfluidic Modulation Spectroscopy. Anal Chem 2025; 97:5632-5642. [PMID: 40014844 DOI: 10.1021/acs.analchem.4c06343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2025]
Abstract
Recently, there has been emerging interest in the characterization of the higher order structure (HOS) of oligonucleotide therapeutics because of its potential impact on the function. However, many existing experimental and computational methods face challenges with respect to throughput, cost, and resolution for large ribonucleic acids (RNAs). In this study, we present the use of two orthogonal analytical methods, size-exclusion chromatography (SEC) and microfluidic modulation spectroscopy (MMS), which are used to investigate conformational changes of two 100 mer single guide RNAs (sgRNAs) with complex HOS and aggregation specie profiles. SEC, coupled with multiangle light scattering (MALS), mass spectrometry (MS), and isothermal MMS revealed various forms of aggregation and potential interactions. We also developed temperature-course SEC and thermal ramping MMS methods to monitor real-time HOS changes from room temperature to the RNA melting point. Through the experiments, we observed two discrete steps of thermally induced dissociation of RNA aggregates, namely higher order aggregates (HOA) dissociation and dimer dissociation. Temperature-course SEC allows for thermodynamic analysis of the enthalpy and entropy of the reaction. We also identified two spectral regions in infrared (IR) spectra with thermal ramping MMS, 1665 cm-1 and between 1700 and 1720 cm-1, which closely correlated to the Watson-Crick base pairing and the related HOS change in RNA. The combination of SEC and MMS offers a comprehensive biophysical characterization toolkit for RNA HOS under native conditions, providing valuable insights for candidate optimization and formulation screening in the development of RNA therapeutics.
Collapse
Affiliation(s)
- Bingchuan Wei
- Synthetic Molecule Pharmaceutical Science, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Eugene Ma
- RedShift BioAnalytics, Inc., 80 Central Street, Boxborough, Massachusetts 01719, United States
| | - Shijia Tang
- Synthetic Molecule Pharmaceutical Science, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Lance Cadang
- Synthetic Molecule Pharmaceutical Science, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Valerie Collins
- RedShift BioAnalytics, Inc., 80 Central Street, Boxborough, Massachusetts 01719, United States
| | - Scott Gorman
- RedShift BioAnalytics, Inc., 80 Central Street, Boxborough, Massachusetts 01719, United States
| | - Bifan Chen
- Synthetic Molecule Pharmaceutical Science, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Richard Huang
- RedShift BioAnalytics, Inc., 80 Central Street, Boxborough, Massachusetts 01719, United States
| | - Jenny Wang
- Synthetic Molecule Pharmaceutical Science, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Maria Ma
- RedShift BioAnalytics, Inc., 80 Central Street, Boxborough, Massachusetts 01719, United States
| | - Kelly Zhang
- Synthetic Molecule Pharmaceutical Science, Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States
| |
Collapse
|
5
|
Fu J, Li H, Kang Y, Zhu H, Huang T, Li Z. DRFormer: A Benchmark Model for RNA Sequence Downstream Tasks. Genes (Basel) 2025; 16:284. [PMID: 40149436 PMCID: PMC11942477 DOI: 10.3390/genes16030284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2025] [Revised: 02/24/2025] [Accepted: 02/25/2025] [Indexed: 03/29/2025] Open
Abstract
Background/Objectives: RNA research is critical for understanding gene regulation, disease mechanisms, and therapeutic development. Constructing effective RNA benchmark models for accurate downstream analysis has become a significant research challenge. The objective of this study is to propose a robust benchmark model, DRFormer, for RNA sequence downstream tasks. Methods: The DRFormer model utilizes RNA sequences to construct novel vision features based on secondary structure and sequence distance. These features are pre-trained using the SWIN model to develop a SWIN-RNA submodel. This submodel is then integrated with an RNA sequence model to construct a multimodal model for downstream analysis. Results: We conducted experiments on various RNA downstream tasks. In the sequence classification task, the MCC reached 94.4%, surpassing the state-of-the-art RNAErnie model by 1.2%. In the protein-RNA interaction prediction, DRFormer achieved an MCC of 0.492, outperforming advanced models like BERT-RBP and PrismNet. In RNA secondary structure prediction, the F1 score was 0.690, exceeding the widely used SPOT-RNA model by 1%. Additionally, generalization experiments on DNA tasks yielded satisfactory results. Conclusions: DRFormer is the first RNA sequence downstream analysis model that leverages structural features to construct a vision model and integrates sequence and vision models in a multimodal manner. This approach yields excellent prediction and analysis results, making it a valuable contribution to RNA research.
Collapse
Affiliation(s)
- Jianqi Fu
- School of Information Engineering, Huzhou University, Huzhou 313000, China (Y.K.)
| | - Haohao Li
- College of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China; (H.L.); (T.H.)
| | - Yanlei Kang
- School of Information Engineering, Huzhou University, Huzhou 313000, China (Y.K.)
| | - Hancan Zhu
- School of Mathematics, Physics and Information, Shaoxing University, Shaoxing 312000, China;
| | - Tiren Huang
- College of Science, Zhejiang Sci-Tech University, Hangzhou 310018, China; (H.L.); (T.H.)
| | - Zhong Li
- School of Information Engineering, Huzhou University, Huzhou 313000, China (Y.K.)
| |
Collapse
|
6
|
Zilberzwige-Tal S, Altae-Tran H, Kannan S, Wilkinson ME, Vo SCDT, Strebinger D, Edmonds KK, Yao CCJ, Mears KS, Shmakov SA, Makarova KS, Macrae RK, Koonin EV, Zhang F. Reprogrammable RNA-targeting CRISPR systems evolved from RNA toxin-antitoxins. Cell 2025:S0092-8674(25)00103-5. [PMID: 39970912 DOI: 10.1016/j.cell.2025.01.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 12/12/2024] [Accepted: 01/24/2025] [Indexed: 02/21/2025]
Abstract
Despite ongoing efforts to study CRISPR systems, the evolutionary origins giving rise to reprogrammable RNA-guided mechanisms remain poorly understood. Here, we describe an integrated sequence/structure evolutionary tracing approach to identify the ancestors of the RNA-targeting CRISPR-Cas13 system. We find that Cas13 likely evolved from AbiF, which is encoded by an abortive infection-linked gene that is stably associated with a conserved non-coding RNA (ncRNA). We further characterize a miniature Cas13, classified here as Cas13e, which serves as an evolutionary intermediate between AbiF and other known Cas13s. Despite this relationship, we show that their functions substantially differ. Whereas Cas13e is an RNA-guided RNA-targeting system, AbiF is a toxin-antitoxin (TA) system with an RNA antitoxin. We solve the structure of AbiF using cryoelectron microscopy (cryo-EM), revealing basic structural alterations that set Cas13s apart from AbiF. Finally, we map the key structural changes that enabled a non-guided TA system to evolve into an RNA-guided CRISPR system.
Collapse
Affiliation(s)
- Shai Zilberzwige-Tal
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Han Altae-Tran
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Soumya Kannan
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Max E Wilkinson
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Samuel Chau-Duy-Tam Vo
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Daniel Strebinger
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - KeHuan K Edmonds
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Chun-Chen Jerry Yao
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Molecular Cellular Biology, Harvard University, Cambridge, MA, USA
| | - Kepler S Mears
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Sergey A Shmakov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Rhiannon K Macrae
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Feng Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; McGovern Institute for Brain Research at MIT, Cambridge, MA 02139, USA; Department of Brain and Cognitive Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
| |
Collapse
|
7
|
Wang R, Schlick T. How Large is the Universe of RNA-Like Motifs? A Clustering Analysis of RNA Graph Motifs Using Topological Descriptors. ARXIV 2025:arXiv:2501.04258v1. [PMID: 39867422 PMCID: PMC11760235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
Identifying novel and functional RNA structures remains a significant challenge in RNA motif design and is crucial for developing RNA-based therapeutics. Here we introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content of RNA-like graph topologies. Specifically, we apply graph theory enumeration to generate all 110,667 possible 2D dual graphs for vertex numbers ranging from 2 to 9. Among them, only 0.11% (121 dual graphs) correspond to approximately 200,000 known RNA atomic fragments/substructures (collected in 2021) using the RNA-as-Graphs (RAG) mapping method. The remaining 99.89% of the dual graphs may be RNA-like or non-RNA-like. To determine which dual graphs in the 99.89% hypothetical set are more likely to be associated with RNA structures, we apply computational topology descriptors using the Persistent Spectral Graphs (PSG) method to characterize each graph using 19 PSG-based features and use clustering algorithms that partition all possible dual graphs into two clusters. The cluster with the higher percentage of known dual graphs for RNA is defined as the "RNA-like" cluster, while the other is considered as "non-RNA-like". The distance of each dual graph to the center of the RNA-like cluster represents the likelihood of it belonging to RNA structures. From validation, our PSG-based RNA-like cluster includes 97.3% of the 121 known RNA dual graphs, suggesting good performance. Furthermore, 46.017% of the hypothetical RNAs are predicted to be RNA-like. Among the top 15 graphs identified as high-likelihood candidates for novel RNA motifs, 4 were confirmed from the RNA dataset collected in 2022. Significantly, we observe that all the top 15 RNA-like dual graphs can be separated into multiple subgraphs, whereas the top 15 non-RNA-like dual graphs tend not to have any subgraphs (subgraphs preserve pseudoknots and junctions). Moreover, a significant topological difference between top RNA-like and non-RNA-like graphs is evident when comparing their topological features (e.g. Betti-0 and Betti-1 numbers). These findings provide valuable insights into the size of the RNA motif universe and RNA design strategies, offering a novel framework for predicting RNA graph topologies and guiding the discovery of novel RNA motifs, perhaps anti-viral therapeutics by subgraph assembly.
Collapse
Affiliation(s)
- Rui Wang
- Simons Center for Computational Physical Chemistry, New York University, New York, NY 10003, USA
| | - Tamar Schlick
- Simons Center for Computational Physical Chemistry, New York University, New York, NY 10003, USA
- Department of Chemistry, New York University, New York, NY 10003, USA
- Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA
- New York University-East China Normal University Center for Computational Chemistry, New York University Shanghai, Shanghai 200122, China
| |
Collapse
|
8
|
Jin L, Zhou Y, Zhang S, Chen SJ. mRNA vaccine sequence and structure design and optimization: Advances and challenges. J Biol Chem 2025; 301:108015. [PMID: 39608721 PMCID: PMC11728972 DOI: 10.1016/j.jbc.2024.108015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/13/2024] [Accepted: 11/16/2024] [Indexed: 11/30/2024] Open
Abstract
Messenger RNA (mRNA) vaccines have emerged as a powerful tool against communicable diseases and cancers, as demonstrated by their huge success during the coronavirus disease 2019 (COVID-19) pandemic. Despite the outstanding achievements, mRNA vaccines still face challenges such as stringent storage requirements, insufficient antigen expression, and unexpected immune responses. Since the intrinsic properties of mRNA molecules significantly impact vaccine performance, optimizing mRNA design is crucial in preclinical development. In this review, we outline four key principles for optimal mRNA sequence design: enhancing ribosome loading and translation efficiency through untranslated region (UTR) optimization, improving translation efficiency via codon optimization, increasing structural stability by refining global RNA sequence and extending in-cell lifetime and expression fidelity by adjusting local RNA structures. We also explore recent advancements in computational models for designing and optimizing mRNA vaccine sequences following these principles. By integrating current mRNA knowledge, addressing challenges, and examining advanced computational methods, this review aims to promote the application of computational approaches in mRNA vaccine development and inspire novel solutions to existing obstacles.
Collapse
Affiliation(s)
- Lei Jin
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Yuanzhe Zhou
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Sicheng Zhang
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
| | - Shi-Jie Chen
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA; Department of Biochemistry, MU Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA.
| |
Collapse
|
9
|
Wang M, Zhang W, Li C, Liu C, He X, Zhang Z, Cheng G. Association of R3HDM1 variants with growth and meat quality traits in Qinchuan cattle and its role in lipid accumulation. Gene 2024; 939:149177. [PMID: 39681147 DOI: 10.1016/j.gene.2024.149177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 12/12/2024] [Accepted: 12/13/2024] [Indexed: 12/18/2024]
Abstract
The R3H domain containing 1 (R3HDM1) gene has emerged as a candidate influencing residual feed intake and beef yield. Despite this, the genetic variation of R3HDM1 and its effects on beef cattle remain unexplored. This study identified four single nucleotide polymorphisms (SNPs) in the R3HDM1 gene of Qinchuan cattle, with the g.61695680 T > C SNP significantly associated with chest depth and backfat thickness. The g.61695680 T > C synonymous mutation significantly altered the RNA secondary structure and stability of R3HDM1. RNA interference experiments demonstrated that R3HDM1 knockdown reduced adipogenesis and lipid accumulation in bovine preadipocytes by modulating key adipogenic factors such as CEBPβ (P < 0.05), ACCα (P < 0.05), and ATGL (P < 0.01). These findings suggest that the g.61695680 T > C variants within R3HDM1 could serve as valuable molecular markers for selecting improved Qinchuan cattle, thus enhancing genetic selection strategies for beef production.
Collapse
Affiliation(s)
- Miaoli Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.
| | - Wentao Zhang
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.
| | - Chuang Li
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.
| | - Chenyang Liu
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.
| | - Xiaoping He
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.
| | - Ziyi Zhang
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China.
| | - Gong Cheng
- College of Animal Science and Technology, Northwest A&F University, Yangling 712100, China; National Beef Cattle Improvement Centre, Yangling 712100, China.
| |
Collapse
|
10
|
Cornwell-Arquitt RL, Nigh R, Hathaway MT, Yesselman JD, Hendrix DA. Analysis of natural structures and chemical mapping data reveals local stability compensation in RNA. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.12.11.627843. [PMID: 39713387 PMCID: PMC11661157 DOI: 10.1101/2024.12.11.627843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
RNA molecules adopt complex structures that perform essential biological functions across all forms of life, making them promising candidates for therapeutic applications. However, our ability to design new RNA structures remains limited by an incomplete understanding of their folding principles. While global metrics such as the minimum free energy are widely used, they are at odds with naturally occurring structures and incompatible with established design rules. Here, we introduce local stability compensation (LSC), a principle that RNA folding is governed by the local balance between destabilizing loops and their stabilizing adjacent stems, challenging the focus on global energetic optimization. Analysis of over 100,000 RNA structures revealed that LSC signatures are particularly pronounced in bulges and their adjacent stems, with distinct patterns across different RNA families that align with their biological functions. To validate LSC experimentally, we systematically analyzed thousands of RNA variants using DMS chemical mapping. Our results demonstrate that stem reactivity correlates strongly with LSC (R2 = 0.458 for hairpin loops) and that structural perturbations affect folding primarily within ~6 nucleotides from the loop. These findings establish LSC as a fundamental principle that could enhance the rational design of functional RNAs.
Collapse
Affiliation(s)
| | - Riley Nigh
- Department of Biochemistry, University of Nebraska-Lincoln
| | - Michael T. Hathaway
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, 97333, USA
- Department of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, 97333, USA
- Current affiliation: DocuSign Inc
| | | | - David A. Hendrix
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon, 97333, USA
- Department of Electrical Engineering and Computer Science, Oregon State University, Corvallis, Oregon, 97333, USA
| |
Collapse
|
11
|
Shulgina Y, Trinidad MI, Langeberg CJ, Nisonoff H, Chithrananda S, Skopintsev P, Nissley AJ, Patel J, Boger RS, Shi H, Yoon PH, Doherty EE, Pande T, Iyer AM, Doudna JA, Cate JHD. RNA language models predict mutations that improve RNA function. Nat Commun 2024; 15:10627. [PMID: 39638800 PMCID: PMC11621547 DOI: 10.1038/s41467-024-54812-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Accepted: 11/20/2024] [Indexed: 12/07/2024] Open
Abstract
Structured RNA lies at the heart of many central biological processes, from gene expression to catalysis. RNA structure prediction is not yet possible due to a lack of high-quality reference data associated with organismal phenotypes that could inform RNA function. We present GARNET (Gtdb Acquired RNa with Environmental Temperatures), a new database for RNA structural and functional analysis anchored to the Genome Taxonomy Database (GTDB). GARNET links RNA sequences to experimental and predicted optimal growth temperatures of GTDB reference organisms. Using GARNET, we develop sequence- and structure-aware RNA generative models, with overlapping triplet tokenization providing optimal encoding for a GPT-like model. Leveraging hyperthermophilic RNAs in GARNET and these RNA generative models, we identify mutations in ribosomal RNA that confer increased thermostability to the Escherichia coli ribosome. The GTDB-derived data and deep learning models presented here provide a foundation for understanding the connections between RNA sequence, structure, and function.
Collapse
Affiliation(s)
- Yekaterina Shulgina
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Marena I Trinidad
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
| | - Conner J Langeberg
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Hunter Nisonoff
- Center for Computational Biology, University of California, Berkeley, CA, USA
| | - Seyone Chithrananda
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Petr Skopintsev
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Amos J Nissley
- Department of Chemistry, University of California, Berkeley, CA, USA
| | - Jaymin Patel
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Ron S Boger
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Biophysics Graduate Program, University of California, Berkeley, CA, USA
| | - Honglue Shi
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
| | - Peter H Yoon
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Erin E Doherty
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Tara Pande
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Aditya M Iyer
- Department of Physics, University of California, Berkeley, CA, USA
| | - Jennifer A Doudna
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
- Department of Chemistry, University of California, Berkeley, CA, USA
- MBIB Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Gladstone Institutes, University of California, San Francisco, CA, USA
| | - Jamie H D Cate
- Innovative Genomics Institute, University of California, Berkeley, CA, USA.
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA.
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA.
- Department of Chemistry, University of California, Berkeley, CA, USA.
- MBIB Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
12
|
Shen T, Hu Z, Sun S, Liu D, Wong F, Wang J, Chen J, Wang Y, Hong L, Xiao J, Zheng L, Krishnamoorthi T, King I, Wang S, Yin P, Collins JJ, Li Y. Accurate RNA 3D structure prediction using a language model-based deep learning approach. Nat Methods 2024; 21:2287-2298. [PMID: 39572716 PMCID: PMC11621015 DOI: 10.1038/s41592-024-02487-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 09/25/2024] [Indexed: 12/07/2024]
Abstract
Accurate prediction of RNA three-dimensional (3D) structures remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to the scarcity of experimentally determined data, complicates computational prediction efforts. Here we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pretrained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate the superiority of RhoFold+ over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and interhelical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.
Collapse
Affiliation(s)
- Tao Shen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
- Shanghai Zelixir Biotech Company Ltd, Shanghai, China
- Shenzhen Institute of Advanced Technology, Shenzhen, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Siqi Sun
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China.
- Shanghai Artificial Intelligence Laboratory, Shanghai, China.
| | - Di Liu
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Center for Molecular Design and Biomimetics at the Biodesign Institute, Arizona State University, Tempe, AZ, USA.
- School of Molecular Sciences, Arizona State University, Tempe, AZ, USA.
| | - Felix Wong
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA
- Integrated Biosciences, Redwood City, CA, USA
| | - Jiuming Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
- OneAIM Ltd, Hong Kong SAR, China
| | - Jiayang Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yixuan Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Liang Hong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Jin Xiao
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Liangzhen Zheng
- Shanghai Zelixir Biotech Company Ltd, Shanghai, China
- Shenzhen Institute of Advanced Technology, Shenzhen, China
| | - Tejas Krishnamoorthi
- School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA
| | - Irwin King
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd, Shanghai, China.
- Shenzhen Institute of Advanced Technology, Shenzhen, China.
| | - Peng Yin
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
| | - James J Collins
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- The CUHK Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
13
|
Yang Z, Ji S, Liu L, Liu S, Wang B, Ma Y, Cao X. Promotion of TLR7-MyD88-dependent inflammation and autoimmunity in mice through stem-loop changes in Lnc-Atg16l1. Nat Commun 2024; 15:10224. [PMID: 39587108 PMCID: PMC11589596 DOI: 10.1038/s41467-024-54674-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 11/18/2024] [Indexed: 11/27/2024] Open
Abstract
Uncontrolled TLR signaling can cause inflammatory immunopathology and trigger autoimmune diseases. For example, TLR7 promotes pathogenesis of systemic lupus erythematosus. However, whether RNA structural changes affect nucleic acids-sensing TLRs signaling and impact disease progression is unclear. Here by iCLIP-seq we identify a TLR7-binding long non-coding RNA, Lnc-Atg16l1, and find that it promotes TLR7 and other MyD88-dependent TLRs signaling in various types of immune cells. Depletion of Lnc-Atg16l1 attenuates development of TLR7-linked autoimmune phenotypes in the mouse SLE model. Mechanistically, we find that Lnc-Atg16l1 binds to TLR7 at bases near U84 and MyD88 at bases around A129. The analysis of Lnc-Atg16l1 in situ structures show that it strengthens the interaction between TIR domain of TLR7 and MyD88 through specific stem-loop structure changes as a molecular scaffold after TLR7 activation to promote TLR7 downstream signaling. Therefore, we discover a mechanism for host RNA regulation of innate signaling and autoimmune disease through its structural changes. These findings provide insights into the pro-inflammatory function of self RNA in a structure-dependent manner and suggest a potential target for TLR-related autoimmune disorders.
Collapse
Affiliation(s)
- Zongheng Yang
- Department of Immunology, Center for Immunotherapy, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Shuchen Ji
- Department of Immunology, Center for Immunotherapy, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Lun Liu
- Department of Immunology, Center for Immunotherapy, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Shuo Liu
- Department of Immunology, Center for Immunotherapy, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Bingjing Wang
- Department of Immunology, Center for Immunotherapy, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China
| | - Yuanwu Ma
- Institute of Laboratory Animal Science, Chinese Academy of Medical Sciences, Beijing, China
| | - Xuetao Cao
- Department of Immunology, Center for Immunotherapy, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, China.
- Institute of Immunology, College of Life Sciences, Nankai University, Tianjin, China.
- National Key Laboratory of Immunity & Inflammation, Institute of Immunology, Navy Medical University, Shanghai, China.
| |
Collapse
|
14
|
Wang Z, Feng Y, Tian Q, Liu Z, Yan P, Li X. RNADiffFold: generative RNA secondary structure prediction using discrete diffusion models. Brief Bioinform 2024; 26:bbae618. [PMID: 39581872 PMCID: PMC11586127 DOI: 10.1093/bib/bbae618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 10/12/2024] [Accepted: 11/18/2024] [Indexed: 11/26/2024] Open
Abstract
Ribonucleic acid (RNA) molecules are essential macromolecules that perform diverse biological functions in living beings. Precise prediction of RNA secondary structures is instrumental in deciphering their complex three-dimensional architecture and functionality. Traditional methodologies for RNA structure prediction, including energy-based and learning-based approaches, often depict RNA secondary structures from a static perspective and rely on stringent a priori constraints. Inspired by the success of diffusion models, in this work, we introduce RNADiffFold, an innovative generative prediction approach of RNA secondary structures based on multinomial diffusion. We reconceptualize the prediction of contact maps as akin to pixel-wise segmentation and accordingly train a denoising model to refine the contact maps starting from a noise-infused state progressively. We also devise a potent conditioning mechanism that harnesses features extracted from RNA sequences to steer the model toward generating an accurate secondary structure. These features encompass one-hot encoded sequences, probabilistic maps generated from a pre-trained scoring network, and embeddings and attention maps derived from RNA foundation model. Experimental results on both within- and cross-family datasets demonstrate RNADiffFold's competitive performance compared with current state-of-the-art methods. Additionally, RNADiffFold has shown a notable proficiency in capturing the dynamic aspects of RNA structures, a claim corroborated by its performance on datasets comprising multiple conformations.
Collapse
Affiliation(s)
- Zhen Wang
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, Zhejiang, China
| | - Yizhen Feng
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, Zhejiang, China
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310014, Zhejiang, China
| | - Qingwen Tian
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, Zhejiang, China
- College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310014, Zhejiang, China
| | - Ziqi Liu
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, Zhejiang, China
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024, Zhejiang, China
| | - Pengju Yan
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, Zhejiang, China
| | - Xiaolin Li
- Hangzhou Institute of Medicine, Chinese Academy of Sciences, Hangzhou 310018, Zhejiang, China
| |
Collapse
|
15
|
James JS, Dai J, Chew WL, Cai Y. The design and engineering of synthetic genomes. Nat Rev Genet 2024:10.1038/s41576-024-00786-y. [PMID: 39506144 DOI: 10.1038/s41576-024-00786-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/23/2024] [Indexed: 11/08/2024]
Abstract
Synthetic genomics seeks to design and construct entire genomes to mechanistically dissect fundamental questions of genome function and to engineer organisms for diverse applications, including bioproduction of high-value chemicals and biologics, advanced cell therapies, and stress-tolerant crops. Recent progress has been fuelled by advancements in DNA synthesis, assembly, delivery and editing. Computational innovations, such as the use of artificial intelligence to provide prediction of function, also provide increasing capabilities to guide synthetic genome design and construction. However, translating synthetic genome-scale projects from idea to implementation remains highly complex. Here, we aim to streamline this implementation process by comprehensively reviewing the strategies for design, construction, delivery, debugging and tailoring of synthetic genomes as well as their potential applications.
Collapse
Affiliation(s)
- Joshua S James
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Junbiao Dai
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen Key Laboratory of Agricultural Synthetic Biology, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
- Shenzhen Key Laboratory of Synthetic Genomics, Guangdong Provincial Key Laboratory of Synthetic Genomics, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Wei Leong Chew
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), Singapore, Republic of Singapore
| | - Yizhi Cai
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK.
| |
Collapse
|
16
|
Wong F, He D, Krishnan A, Hong L, Wang AZ, Wang J, Hu Z, Omori S, Li A, Rao J, Yu Q, Jin W, Zhang T, Ilia K, Chen JX, Zheng S, King I, Li Y, Collins JJ. Deep generative design of RNA aptamers using structural predictions. NATURE COMPUTATIONAL SCIENCE 2024; 4:829-839. [PMID: 39506080 DOI: 10.1038/s43588-024-00720-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 10/07/2024] [Indexed: 11/08/2024]
Abstract
RNAs represent a class of programmable biomolecules capable of performing diverse biological functions. Recent studies have developed accurate RNA three-dimensional structure prediction methods, which may enable new RNAs to be designed in a structure-guided manner. Here, we develop a structure-to-sequence deep learning platform for the de novo generative design of RNA aptamers. We show that our approach can design RNA aptamers that are predicted to be structurally similar, yet sequence dissimilar, to known light-up aptamers that fluoresce in the presence of small molecules. We experimentally validate several generated RNA aptamers to have fluorescent activity, show that these aptamers can be optimized for activity in silico, and find that they exhibit a mechanism of fluorescence similar to that of known light-up aptamers. Our results demonstrate how structural predictions can guide the targeted and resource-efficient design of new RNA sequences.
Collapse
Affiliation(s)
- Felix Wong
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
- Integrated Biosciences, Redwood City, CA, USA
| | - Dongchen He
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
| | - Aarti Krishnan
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Liang Hong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Alexander Z Wang
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jiuming Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Satotaka Omori
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Integrated Biosciences, Redwood City, CA, USA
| | - Alicia Li
- Integrated Biosciences, Redwood City, CA, USA
| | - Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Qinze Yu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Wengong Jin
- Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Tianqing Zhang
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Katherine Ilia
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Jack X Chen
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, China
| | - Irwin King
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China
| | - Yu Li
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China.
- The CUHK Shenzhen Research Institute, Shenzhen, China.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
| | - James J Collins
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA.
| |
Collapse
|
17
|
Hanson WA, Romero Agosto GA, Rouskin S. Viral RNA Interactome: The Ultimate Researcher's Guide to RNA-Protein Interactions. Viruses 2024; 16:1702. [PMID: 39599817 PMCID: PMC11599142 DOI: 10.3390/v16111702] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2024] [Revised: 10/18/2024] [Accepted: 10/25/2024] [Indexed: 11/29/2024] Open
Abstract
RNA molecules in the cell are bound by a multitude of RNA-binding proteins (RBPs) with a variety of regulatory consequences. Often, interactions with these RNA-binding proteins are facilitated by the complex secondary and tertiary structures of RNA molecules. Viral RNAs especially are known to be heavily structured and interact with many RBPs, with roles including genome packaging, immune evasion, enhancing replication and transcription, and increasing translation efficiency. As such, the RNA-protein interactome represents a critical facet of the viral replication cycle. Characterization of these interactions is necessary for the development of novel therapeutics targeted at the disruption of essential replication cycle events. In this review, we aim to summarize the various roles of RNA structures in shaping the RNA-protein interactome, the regulatory roles of these interactions, as well as up-to-date methods developed for the characterization of the interactome and directions for novel, RNA-directed therapeutics.
Collapse
Affiliation(s)
| | | | - Silvi Rouskin
- Department of Microbiology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA; (W.A.H.); (G.A.R.A.)
| |
Collapse
|
18
|
Cao X, Zhang Y, Ding Y, Wan Y. Identification of RNA structures and their roles in RNA functions. Nat Rev Mol Cell Biol 2024; 25:784-801. [PMID: 38926530 DOI: 10.1038/s41580-024-00748-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2024] [Indexed: 06/28/2024]
Abstract
The development of high-throughput RNA structure profiling methods in the past decade has greatly facilitated our ability to map and characterize different aspects of RNA structures transcriptome-wide in cell populations, single cells and single molecules. The resulting high-resolution data have provided insights into the static and dynamic nature of RNA structures, revealing their complexity as they perform their respective functions in the cell. In this Review, we discuss recent technical advances in the determination of RNA structures, and the roles of RNA structures in RNA biogenesis and functions, including in transcription, processing, translation, degradation, localization and RNA structure-dependent condensates. We also discuss the current understanding of how RNA structures could guide drug design for treating genetic diseases and battling pathogenic viruses, and highlight existing challenges and future directions in RNA structure research.
Collapse
Affiliation(s)
- Xinang Cao
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore
| | - Yueying Zhang
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK
| | - Yiliang Ding
- Department of Cell and Developmental Biology, John Innes Centre, Norwich, UK.
| | - Yue Wan
- Stem Cell and Regenerative Biology, Genome Institute of Singapore, Singapore, Singapore.
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
19
|
Li Q, Hu Z, Wang Y, Li L, Fan Y, King I, Jia G, Wang S, Song L, Li Y. Progress and opportunities of foundation models in bioinformatics. Brief Bioinform 2024; 25:bbae548. [PMID: 39461902 PMCID: PMC11512649 DOI: 10.1093/bib/bbae548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/20/2024] [Accepted: 10/12/2024] [Indexed: 10/29/2024] Open
Abstract
Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly through foundation models (FMs), which address longstanding challenges in bioinformatics such as limited annotated data and data noise. These AI techniques have demonstrated remarkable efficacy across various downstream validation tasks, effectively representing diverse biological entities and heralding a new era in computational biology. The primary goal of this survey is to conduct a general investigation and summary of FMs in bioinformatics, tracing their evolutionary trajectory, current research landscape, and methodological frameworks. Our primary focus is on elucidating the application of FMs to specific biological problems, offering insights to guide the research community in choosing appropriate FMs for tasks like sequence analysis, structure prediction, and function annotation. Each section delves into the intricacies of the targeted challenges, contrasting the architectures and advancements of FMs with conventional methods and showcasing their utility across different biological domains. Further, this review scrutinizes the hurdles and constraints encountered by FMs in biology, including issues of data noise, model interpretability, and potential biases. This analysis provides a theoretical groundwork for understanding the circumstances under which certain FMs may exhibit suboptimal performance. Lastly, we outline prospective pathways and methodologies for the future development of FMs in biological research, facilitating ongoing innovation in the field. This comprehensive examination not only serves as an academic reference but also as a roadmap for forthcoming explorations and applications of FMs in biology.
Collapse
Affiliation(s)
- Qing Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Zhihang Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Yixuan Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Lei Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Yimin Fan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Irwin King
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| | - Gengjie Jia
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangdong, 518120, China
| | - Sheng Wang
- Shanghai Zelixir Biotech Company Ltd., Shanghai, 200030, China
- Shenzhen Institute of Advanced Technology, Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen, Guangdong, 518055, China
| | - Le Song
- BioMap, Zhongguancun Life Science Park, Haidian District, Beijing, 100085, China
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, 999077, China
| |
Collapse
|
20
|
Fallah A, Havaei SA, Sedighian H, Kachuei R, Fooladi AAI. Prediction of aptamer affinity using an artificial intelligence approach. J Mater Chem B 2024; 12:8825-8842. [PMID: 39158322 DOI: 10.1039/d4tb00909f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/20/2024]
Abstract
Aptamers are oligonucleotide sequences that can connect to particular target molecules, similar to monoclonal antibodies. They can be chosen by systematic evolution of ligands by exponential enrichment (SELEX), and are modifiable and can be synthesized. Even if the SELEX approach has been improved a lot, it is frequently challenging and time-consuming to identify aptamers experimentally. In particular, structure-based methods are the most used in computer-aided design and development of aptamers. For this purpose, numerous web-based platforms have been suggested for the purpose of forecasting the secondary structure and 3D configurations of RNAs and DNAs. Also, molecular docking and molecular dynamics (MD), which are commonly utilized in protein compound selection by structural information, are suitable for aptamer selection. On the other hand, from a large number of sequences, artificial intelligence (AI) may be able to quickly discover the possible aptamer candidates. Conversely, sophisticated machine and deep-learning (DL) models have demonstrated efficacy in forecasting the binding properties between ligands and targets during drug discovery; as such, they may provide a reliable and precise method for forecasting the binding of aptamers to targets. This research looks at advancements in AI pipelines and strategies for aptamer binding ability prediction, such as machine and deep learning, as well as structure-based approaches, molecular dynamics and molecular docking simulation methods.
Collapse
Affiliation(s)
- Arezoo Fallah
- Department of Bacteriology and Virology, Faculty of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Seyed Asghar Havaei
- Department of Microbiology, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Hamid Sedighian
- Applied Microbiology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| | - Reza Kachuei
- Molecular Biology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Abbas Ali Imani Fooladi
- Applied Microbiology Research Center, Biomedicine Technologies Institute, Baqiyatallah University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
21
|
Shulgina Y, Trinidad MI, Langeberg CJ, Nisonoff H, Chithrananda S, Skopintsev P, Nissley AJ, Patel J, Boger RS, Shi H, Yoon PH, Doherty EE, Pande T, Iyer AM, Doudna JA, Cate JHD. RNA language models predict mutations that improve RNA function. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.05.588317. [PMID: 38617247 PMCID: PMC11014562 DOI: 10.1101/2024.04.05.588317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Structured RNA lies at the heart of many central biological processes, from gene expression to catalysis. While advances in deep learning enable the prediction of accurate protein structural models, RNA structure prediction is not possible at present due to a lack of abundant high-quality reference data1. Furthermore, available sequence data are generally not associated with organismal phenotypes that could inform RNA function2-4. We created GARNET (Gtdb Acquired RNa with Environmental Temperatures), a new database for RNA structural and functional analysis anchored to the Genome Taxonomy Database (GTDB)5. GARNET links RNA sequences derived from GTDB genomes to experimental and predicted optimal growth temperatures of GTDB reference organisms. This enables construction of deep and diverse RNA sequence alignments to be used for machine learning. Using GARNET, we define the minimal requirements for a sequence- and structure-aware RNA generative model. We also develop a GPT-like language model for RNA in which overlapping triplet tokenization provides optimal encoding. Leveraging hyperthermophilic RNAs in GARNET and these RNA generative models, we identified mutations in ribosomal RNA that confer increased thermostability to the Escherichia coli ribosome. The GTDB-derived data and deep learning models presented here provide a foundation for understanding the connections between RNA sequence, structure, and function.
Collapse
Affiliation(s)
- Yekaterina Shulgina
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Marena I Trinidad
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
| | - Conner J Langeberg
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Hunter Nisonoff
- Center for Computational Biology, University of California, Berkeley, CA, United States
| | - Seyone Chithrananda
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Petr Skopintsev
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Amos J Nissley
- Department of Chemistry, University of California, Berkeley, CA, USA
| | - Jaymin Patel
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
| | - Ron S Boger
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Biophysics Graduate Program, University of California, Berkeley, CA, USA
| | - Honglue Shi
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
| | - Peter H Yoon
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- Department of Chemistry, University of California, Berkeley, CA, USA
| | - Erin E Doherty
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
| | - Tara Pande
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA
| | - Aditya M Iyer
- Department of Physics, University of California, Berkeley, CA, USA
| | - Jennifer A Doudna
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
- Howard Hughes Medical Institute, University of California, Berkeley, CA, USA
- Department of Chemistry, University of California, Berkeley, CA, USA
- MBIB Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Gladstone Institutes, University of California, San Francisco, CA, USA
| | - Jamie H D Cate
- Innovative Genomics Institute, University of California, Berkeley, CA, USA
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
- California Institute for Quantitative Biosciences, University of California, Berkeley, CA, USA
- Department of Chemistry, University of California, Berkeley, CA, USA
- MBIB Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| |
Collapse
|
22
|
Qi F, Chen J, Chen Y, Sun J, Lin Y, Chen Z, Kapranov P. Evaluating Performance of Different RNA Secondary Structure Prediction Programs Using Self-cleaving Ribozymes. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae043. [PMID: 39317944 DOI: 10.1093/gpbjnl/qzae043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 03/02/2024] [Accepted: 06/05/2024] [Indexed: 09/26/2024]
Abstract
Accurate identification of the correct, biologically relevant RNA structures is critical to understanding various aspects of RNA biology since proper folding represents the key to the functionality of all types of RNA molecules and plays pivotal roles in many essential biological processes. Thus, a plethora of approaches have been developed to predict, identify, or solve RNA structures based on various computational, molecular, genetic, chemical, or physicochemical strategies. Purely computational approaches hold distinct advantages over all other strategies in terms of the ease of implementation, time, speed, cost, and throughput, but they strongly underperform in terms of accuracy that significantly limits their broader application. Nonetheless, the advantages of these methods led to a steady development of multiple in silico RNA secondary structure prediction approaches including recent deep learning-based programs. Here, we compared the accuracy of predictions of biologically relevant secondary structures of dozens of self-cleaving ribozyme sequences using seven in silico RNA folding prediction tools with tasks of varying complexity. We found that while many programs performed well in relatively simple tasks, their performance varied significantly in more complex RNA folding problems. However, in general, a modern deep learning method outperformed the other programs in the complex tasks in predicting the RNA secondary structures, at least based on the specific class of sequences tested, suggesting that it may represent the future of RNA structure prediction algorithms.
Collapse
Affiliation(s)
- Fei Qi
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Junjie Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Yue Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Jianfeng Sun
- Botnar Research Centre, University of Oxford, Oxford, OX3 7LD, United Kingdom
| | - Yiting Lin
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Zipeng Chen
- Institute of Genomics, School of Medicine, Huaqiao University, Xiamen 361021, China
| | - Philipp Kapranov
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen 361102, China
| |
Collapse
|
23
|
Zhang S, Li J, Chen SJ. Machine learning in RNA structure prediction: Advances and challenges. Biophys J 2024; 123:2647-2657. [PMID: 38297836 PMCID: PMC11393687 DOI: 10.1016/j.bpj.2024.01.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/08/2024] [Accepted: 01/24/2024] [Indexed: 02/02/2024] Open
Abstract
RNA molecules play a crucial role in various biological processes, with their functionality closely tied to their structures. The remarkable advancements in machine learning techniques for protein structure prediction have shown promise in the field of RNA structure prediction. In this perspective, we discuss the advances and challenges encountered in constructing machine learning-based models for RNA structure prediction. We explore topics including model building strategies, specific challenges involved in predicting RNA secondary (2D) and tertiary (3D) structures, and approaches to these challenges. In addition, we highlight the advantages and challenges of constructing RNA language models. Given the rapid advances of machine learning techniques, we anticipate that machine learning-based models will serve as important tools for predicting RNA structures, thereby enriching our understanding of RNA structures and their corresponding functions.
Collapse
Affiliation(s)
- Sicheng Zhang
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Jun Li
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri
| | - Shi-Jie Chen
- Department of Physics and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri; Department of Biochemistry, University of Missouri, Columbia, Missouri.
| |
Collapse
|
24
|
Akmal Shukri AM, Wang SM, Feng C, Chia SL, Mohd Nawi SFA, Citartan M. In silico selection of aptamers against SARS-CoV-2. Analyst 2024. [PMID: 39221970 DOI: 10.1039/d4an00812j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Aptamers are molecular recognition elements that have been extensively deployed in a wide array of applications ranging from diagnostics to therapeutics. Due to their unique properties as compared to antibodies, aptamers were also largely isolated during the COVID-19 pandemic for multiple purposes. Typically generated by conventional SELEX, the inherent drawbacks of the process including the time-consuming, cumbersome and resource-intensive nature catalysed the move to adopt in silico approaches to isolate aptamers. Impressive performances of these in silico-derived aptamers in their respective assays have been documented thus far, bearing testimony to the huge potential of the in silico approaches, akin to the traditional SELEX in isolating aptamers. In this study, we provide an overview of the in silico selection of aptamers against SARS-CoV-2 by providing insights into the basic steps involved, which comprise the selection of the initial single-stranded nucleic acids, determination of the secondary and tertiary structures and in silico approaches that include both rigid docking and molecular dynamics simulations. The different approaches involving aptamers against SARS-CoV-2 were illuminated and the need to verify these aptamers by experimental validation was also emphasized. Cognizant of the need to continuously improve aptamers, the strategies embraced thus far for post-in silico selection modifications were enumerated. Shedding light on the steps involved in the in silico selection can set the stage for further improvisation to augment the functionalities of the aptamers in the future.
Collapse
Affiliation(s)
- Amir Muhaimin Akmal Shukri
- Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200 Kepala Batas, Penang, Malaysia.
- Institute of Medical Molecular Biotechnology (IMMB), Faculty of Medicine, Universiti Teknologi MARA, Sungai Buloh Campus, Selangor, Malaysia
| | - Seok Mui Wang
- Institute of Medical Molecular Biotechnology (IMMB), Faculty of Medicine, Universiti Teknologi MARA, Sungai Buloh Campus, Selangor, Malaysia
- Department of Medical Microbiology and Parasitology, Faculty of Medicine, Universiti Teknologi MARA, Sungai Buloh Campus, Selangor, Malaysia.
- Institute of Pathology, Laboratory and Forensic Medicine (I-PPerForM), Universiti Teknologi MARA, Sungai Buloh Campus, Selangor, Malaysia
- Non-Destructive Biomedical and Pharmaceutical Research Center, Smart Manufacturing Research Institute (SMRI), Universiti Teknologi MARA, Puncak Alam Campus, Selangor, Malaysia
| | - Chaoli Feng
- Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200 Kepala Batas, Penang, Malaysia.
| | - Suet Lin Chia
- Department of Microbiology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, UPM Serdang, Selangor, Malaysia
- UPM-MAKNA Cancer Research Laboratory, Institute of Bioscience, Universiti Putra Malaysia, UPM Serdang, Selangor, Malaysia
- Malaysia Genome and Vaccine Institute, National Institutes of Biotechnology Malaysia, Jalan Bangi, Kajang, Selangor, Malaysia
| | - Siti Farah Alwani Mohd Nawi
- Department of Medical Microbiology and Parasitology, Faculty of Medicine, Universiti Teknologi MARA, Sungai Buloh Campus, Selangor, Malaysia.
| | - Marimuthu Citartan
- Advanced Medical & Dental Institute (AMDI), Universiti Sains Malaysia, Bertam, 13200 Kepala Batas, Penang, Malaysia.
| |
Collapse
|
25
|
Ferrer Florensa A, Almagro Armenteros J, Nielsen H, Aarestrup F, Clausen P. SpanSeq: similarity-based sequence data splitting method for improved development and assessment of deep learning projects. NAR Genom Bioinform 2024; 6:lqae106. [PMID: 39157582 PMCID: PMC11327874 DOI: 10.1093/nargab/lqae106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 07/26/2024] [Accepted: 08/05/2024] [Indexed: 08/20/2024] Open
Abstract
The use of deep learning models in computational biology has increased massively in recent years, and it is expected to continue with the current advances in the fields such as Natural Language Processing. These models, although able to draw complex relations between input and target, are also inclined to learn noisy deviations from the pool of data used during their development. In order to assess their performance on unseen data (their capacity to generalize), it is common to split the available data randomly into development (train/validation) and test sets. This procedure, although standard, has been shown to produce dubious assessments of generalization due to the existing similarity between samples in the databases used. In this work, we present SpanSeq, a database partition method for machine learning that can scale to most biological sequences (genes, proteins and genomes) in order to avoid data leakage between sets. We also explore the effect of not restraining similarity between sets by reproducing the development of two state-of-the-art models on bioinformatics, not only confirming the consequences of randomly splitting databases on the model assessment, but expanding those repercussions to the model development. SpanSeq is available at https://github.com/genomicepidemiology/SpanSeq.
Collapse
Affiliation(s)
- Alfred Ferrer Florensa
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Jose Juan Almagro Armenteros
- Informatics and Predictive Sciences Research, Bristol Myers Squibb Company, Calle Isaac Newton 4, 41092 Sevilla, Spain
| | - Henrik Nielsen
- Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Frank Møller Aarestrup
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| | - Philip Thomas Lanken Conradsen Clausen
- Research Group for Genomic Epidemiology, DTU National Food Institute, Technical University of Denmark, Anker Engelunds Vej 1, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
26
|
Hwang G, Kwon M, Seo D, Kim DH, Lee D, Lee K, Kim E, Kang M, Ryu JH. ASOptimizer: Optimizing antisense oligonucleotides through deep learning for IDO1 gene regulation. MOLECULAR THERAPY. NUCLEIC ACIDS 2024; 35:102186. [PMID: 38706632 PMCID: PMC11066473 DOI: 10.1016/j.omtn.2024.102186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 04/03/2024] [Indexed: 05/07/2024]
Abstract
Recent studies have highlighted the effectiveness of using antisense oligonucleotides (ASOs) for cellular RNA regulation, including targets that are considered undruggable; however, manually designing optimal ASO sequences can be labor intensive and time consuming, which potentially limits their broader application. To address this challenge, we introduce a platform, the ASOptimizer, a deep-learning-based framework that efficiently designs ASOs at a low cost. This platform not only selects the most efficient mRNA target sites but also optimizes the chemical modifications for enhanced performance. Indoleamine 2,3-dioxygenase 1 (IDO1) promotes cancer survival by depleting tryptophan and producing kynurenine, leading to immunosuppression through the aryl-hydrocarbon receptor (Ahr) pathway within the tumor microenvironment. We used ASOptimizer to identify ASOs that target IDO1 mRNA as potential cancer therapeutics. Our methodology consists of two stages: sequence engineering and chemical engineering. During the sequence-engineering stage, we optimized and predicted ASO sequences that could target IDO1 mRNA efficiently. In the chemical-engineering stage, we further refined these ASOs to enhance their inhibitory activity while reducing their potential cytotoxicity. In conclusion, our research demonstrates the potential of ASOptimizer for identifying ASOs with improved efficacy and safety.
Collapse
Affiliation(s)
- Gyeongjo Hwang
- Spidercore Inc, 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
| | - Mincheol Kwon
- BIORCHESTRA Co., Ltd., 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
| | - Dongjin Seo
- Spidercore Inc, 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
| | - Dae Hoon Kim
- BIORCHESTRA Co., Ltd., 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
| | - Daehwan Lee
- Spidercore Inc, 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
| | - Kiwon Lee
- Spidercore Inc, 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
| | - Eunyoung Kim
- BIORCHESTRA Co., Ltd., 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
| | - Mingeun Kang
- Spidercore Inc, 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
- Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, South Korea
| | - Jin-Hyeob Ryu
- BIORCHESTRA Co., Ltd., 17, Techno 4-ro, Yuseong-gu, Daejeon 34013, South Korea
- BIORCHESTRA US., Inc., 1 Kendall Square, Building 200, Suite 2-103, Cambridge, MA 02139, USA
| |
Collapse
|
27
|
Dubovichenko MV, Batsa M, Bobkov G, Vlasov G, El-Deeb A, Kolpashchikov D. Multivalent DNAzyme agents for cleaving folded RNA. Nucleic Acids Res 2024; 52:5866-5879. [PMID: 38661191 PMCID: PMC11162777 DOI: 10.1093/nar/gkae295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 04/03/2024] [Accepted: 04/16/2024] [Indexed: 04/26/2024] Open
Abstract
Multivalent recognition and binding of biological molecules is a natural phenomenon that increases the binding stability (avidity) without decreasing the recognition specificity. In this study, we took advantage of this phenomenon to increase the efficiency and maintain high specificity of RNA cleavage by DNAzymes (Dz). We designed a series of DNA constructs containing two Dz agents, named here bivalent Dz devices (BDD). One BDD increased the cleavage efficiency of a folded RNA fragment up to 17-fold in comparison with the Dz of a conventional design. Such an increase was achieved due to both the improved RNA binding and the increased probability of RNA cleavage by the two catalytic cores. By moderating the degree of Dz agent association in BDD, we achieved excellent selectivity in differentiating single-base mismatched RNA, while maintaining relatively high cleavage rates. Furthermore, a trivalent Dz demonstrated an even greater efficiency than the BDD in cleaving folded RNA. The data suggests that the cooperative action of several RNA-cleaving units can significantly improve the efficiency and maintain high specificity of RNA cleavage, which is important for the development of Dz-based gene knockdown agents.
Collapse
Affiliation(s)
- Mikhail V Dubovichenko
- Laboratory of Frontier Nucleic Acid Technologies in Gene Therapy of Cancer, SCAMT Institute, ITMO University, Saint-Petersburg, 191002, Russia
| | - Michael Batsa
- Laboratory of Frontier Nucleic Acid Technologies in Gene Therapy of Cancer, SCAMT Institute, ITMO University, Saint-Petersburg, 191002, Russia
| | - Gleb A Bobkov
- Laboratory of Frontier Nucleic Acid Technologies in Gene Therapy of Cancer, SCAMT Institute, ITMO University, Saint-Petersburg, 191002, Russia
| | - Gleb S Vlasov
- Laboratory of Frontier Nucleic Acid Technologies in Gene Therapy of Cancer, SCAMT Institute, ITMO University, Saint-Petersburg, 191002, Russia
| | - Ahmed A El-Deeb
- Laboratory of Frontier Nucleic Acid Technologies in Gene Therapy of Cancer, SCAMT Institute, ITMO University, Saint-Petersburg, 191002, Russia
| | - Dmitry M Kolpashchikov
- Laboratory of Frontier Nucleic Acid Technologies in Gene Therapy of Cancer, SCAMT Institute, ITMO University, Saint-Petersburg, 191002, Russia
- Chemistry Department, University of Central Florida, Orlando, FL 32816, USA
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32816, USA
- National Center for Forensic Science, University of Central Florida, Orlando, FL, 32816, USA
| |
Collapse
|
28
|
Yu M, Zhou X, Chen D, Jiao Y, Han G, Tao F. HacA, a key transcription factor for the unfolded protein response, is required for fungal development, aflatoxin biosynthesis and pathogenicity of Aspergillus flavus. Int J Food Microbiol 2024; 417:110693. [PMID: 38653122 DOI: 10.1016/j.ijfoodmicro.2024.110693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 03/16/2024] [Accepted: 04/02/2024] [Indexed: 04/25/2024]
Abstract
Aspergillus flavus is a fungus notorious for contaminating food and feed with aflatoxins. As a saprophytic fungus, it secretes large amounts of enzymes to access nutrients, making endoplasmic reticulum (ER) homeostasis important for protein folding and secretion. The role of HacA, a key transcription factor in the unfolded protein response pathway, remains poorly understood in A. flavus. In this study, the hacA gene in A. flavus was knockout. Results showed that the absence of hacA led to a decreased pathogenicity of the strain, as it failed to colonize intact maize kernels. This may be due to retarded vegetable growth, especially the abnormal development of swollen tips and shorter hyphal septa. Deletion of hacA also hindered conidiogenesis and sclerotial development. Notably, the mutant strain failed to produce aflatoxin B1. Moreover, compared to the wild type, the mutant strain showed increased sensitivity to ER stress inducer such as Dithiothreitol (DTT), and heat stress. It also displayed heightened sensitivity to other environmental stresses, including cell wall, osmotic, and pH stresses. Further transcriptomic analysis revealed the involvement of the hacA in numerous biological processes, including filamentous growth, asexual reproduction, mycotoxin biosynthetic process, signal transduction, budding cell apical bud growth, invasive filamentous growth, response to stimulus, and so on. Taken together, HacA plays a vital role in fungal development, pathogenicity and aflatoxins biosynthesis. This highlights the potential of targeting hacA as a novel approach for early prevention of A. flavus contamination.
Collapse
Affiliation(s)
- Min Yu
- School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Xiaoling Zhou
- School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Dongyue Chen
- School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Yuan Jiao
- School of Life Sciences, Anhui Agricultural University, Hefei 230036, China
| | - Guomin Han
- School of Life Sciences, Anhui Agricultural University, Hefei 230036, China; National Engineering Laboratory of Crop Stress Resistance Breeding, Anhui Agricultural University, Hefei 230036, China
| | - Fang Tao
- School of Life Sciences, Anhui Agricultural University, Hefei 230036, China.
| |
Collapse
|
29
|
Yang S, Kim SH, Yang E, Kang M, Joo JY. Molecular insights into regulatory RNAs in the cellular machinery. Exp Mol Med 2024; 56:1235-1249. [PMID: 38871819 PMCID: PMC11263585 DOI: 10.1038/s12276-024-01239-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
It is apparent that various functional units within the cellular machinery are derived from RNAs. The evolution of sequencing techniques has resulted in significant insights into approaches for transcriptome studies. Organisms utilize RNA to govern cellular systems, and a heterogeneous class of RNAs is involved in regulatory functions. In particular, regulatory RNAs are increasingly recognized to participate in intricately functioning machinery across almost all levels of biological systems. These systems include those mediating chromatin arrangement, transcription, suborganelle stabilization, and posttranscriptional modifications. Any class of RNA exhibiting regulatory activity can be termed a class of regulatory RNA and is typically represented by noncoding RNAs, which constitute a substantial portion of the genome. These RNAs function based on the principle of structural changes through cis and/or trans regulation to facilitate mutual RNA‒RNA, RNA‒DNA, and RNA‒protein interactions. It has not been clearly elucidated whether regulatory RNAs identified through deep sequencing actually function in the anticipated mechanisms. This review addresses the dominant properties of regulatory RNAs at various layers of the cellular machinery and covers regulatory activities, structural dynamics, modifications, associated molecules, and further challenges related to therapeutics and deep learning.
Collapse
Affiliation(s)
- Sumin Yang
- Department of Pharmacy, College of Pharmacy, Hanyang University, Ansan, Gyeonggi-do, 15588, Republic of Korea
| | - Sung-Hyun Kim
- Department of Pharmacy, College of Pharmacy, Hanyang University, Ansan, Gyeonggi-do, 15588, Republic of Korea
| | - Eunjeong Yang
- Department of Pharmacy, College of Pharmacy, Hanyang University, Ansan, Gyeonggi-do, 15588, Republic of Korea
| | - Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, NV, 89154, USA
| | - Jae-Yeol Joo
- Department of Pharmacy, College of Pharmacy, Hanyang University, Ansan, Gyeonggi-do, 15588, Republic of Korea.
| |
Collapse
|
30
|
Bernard C, Postic G, Ghannay S, Tahi F. State-of-the-RNArt: benchmarking current methods for RNA 3D structure prediction. NAR Genom Bioinform 2024; 6:lqae048. [PMID: 38745991 PMCID: PMC11091930 DOI: 10.1093/nargab/lqae048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/05/2024] [Accepted: 05/08/2024] [Indexed: 05/16/2024] Open
Abstract
RNAs are essential molecules involved in numerous biological functions. Understanding RNA functions requires the knowledge of their 3D structures. Computational methods have been developed for over two decades to predict the 3D conformations from RNA sequences. These computational methods have been widely used and are usually categorised as either ab initio or template-based. The performances remain to be improved. Recently, the rise of deep learning has changed the sight of novel approaches. Deep learning methods are promising, but their adaptation to RNA 3D structure prediction remains difficult. In this paper, we give a brief review of the ab initio, template-based and novel deep learning approaches. We highlight the different available tools and provide a benchmark on nine methods using the RNA-Puzzles dataset. We provide an online dashboard that shows the predictions made by benchmarked methods, freely available on the EvryRNA platform: https://evryrna.ibisc.univ-evry.fr/evryrna/state_of_the_rnart/.
Collapse
Affiliation(s)
- Clément Bernard
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
- LISN - CNRS/Université Paris-Saclay, 91400 Orsay, France
| | - Guillaume Postic
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
| | - Sahar Ghannay
- LISN - CNRS/Université Paris-Saclay, 91400 Orsay, France
| | - Fariza Tahi
- Université Paris-Saclay, Univ. Evry, IBISC, 91020 Evry-Courcouronnes, France
| |
Collapse
|
31
|
Bugnon LA, Di Persia L, Gerard M, Raad J, Prochetto S, Fenoy E, Chorostecki U, Ariel F, Stegmayer G, Milone DH. sincFold: end-to-end learning of short- and long-range interactions in RNA secondary structure. Brief Bioinform 2024; 25:bbae271. [PMID: 38855913 PMCID: PMC11163250 DOI: 10.1093/bib/bbae271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/03/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
MOTIVATION Coding and noncoding RNA molecules participate in many important biological processes. Noncoding RNAs fold into well-defined secondary structures to exert their functions. However, the computational prediction of the secondary structure from a raw RNA sequence is a long-standing unsolved problem, which after decades of almost unchanged performance has now re-emerged due to deep learning. Traditional RNA secondary structure prediction algorithms have been mostly based on thermodynamic models and dynamic programming for free energy minimization. More recently deep learning methods have shown competitive performance compared with the classical ones, but there is still a wide margin for improvement. RESULTS In this work we present sincFold, an end-to-end deep learning approach, that predicts the nucleotides contact matrix using only the RNA sequence as input. The model is based on 1D and 2D residual neural networks that can learn short- and long-range interaction patterns. We show that structures can be accurately predicted with minimal physical assumptions. Extensive experiments were conducted on several benchmark datasets, considering sequence homology and cross-family validation. sincFold was compared with classical methods and recent deep learning models, showing that it can outperform the state-of-the-art methods.
Collapse
Affiliation(s)
- Leandro A Bugnon
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Leandro Di Persia
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Matias Gerard
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Jonathan Raad
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Santiago Prochetto
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
- Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina
| | - Emilio Fenoy
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Uciel Chorostecki
- Faculty of Medicine and Health Sciences, Universitat Internacional de Catalunya, Barcelona, Spain
| | - Federico Ariel
- Instituto de Agrobiotecnología del Litoral, CONICET-UNL, CCT-Santa Fe, Ruta Nacional N° 168 Km 0, s/n, Paraje el Pozo, 3000, Santa Fe, Argentina
| | - Georgina Stegmayer
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence, sinc(i), FICH-UNL, CONICET, Ciudad Universitaria UNL, 3000, Santa Fe, Argentina
| |
Collapse
|
32
|
Yang TH. DEBFold: Computational Identification of RNA Secondary Structures for Sequences across Structural Families Using Deep Learning. J Chem Inf Model 2024; 64:3756-3766. [PMID: 38648189 PMCID: PMC11094721 DOI: 10.1021/acs.jcim.4c00458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Revised: 04/09/2024] [Accepted: 04/09/2024] [Indexed: 04/25/2024]
Abstract
It is now known that RNAs play more active roles in cellular pathways beyond simply serving as transcription templates. These biological mechanisms might be mediated by higher RNA stereo conformations, triggering the need to understand RNA secondary structures first. However, experimental protocols for solving RNA structures are unavailable for large-scale investigation due to their high costs and time-consuming nature. Various computational tools were thus developed to predict the RNA secondary structures from sequences. Recently, deep networks have been investigated to help predict RNA structures directly from their sequences. However, existing deep-learning-based tools are more or less suffering from model overfitting due to their complicated problem formulation and defective model training processes, limiting their applications across sequences from different structural families. In this research, we designed a two-stage RNA structure prediction strategy called DEBFold (deep ensemble boosting and folding) based on convolution encoding/decoding and self-attention mechanisms to enhance the existing thermodynamic structure models. Moreover, the model training process followed rigorous steps to achieve an acceptable prediction generalization. On the family-wise reserved test sets and the PDB-derived test set, DEBFold achieves better structure prediction performance over traditional tools and existing deep-learning methods. In summary, we obtained a cutting-edge deep-learning-based structure prediction tool with supreme across-family generalization performance. The DEBFold tool can be accessed at https://cobis.bme.ncku.edu.tw/DEBFold/.
Collapse
Affiliation(s)
- Tzu-Hsien Yang
- Department
of Biomedical Engineering, National Cheng
Kung University, No.1, University Road, Tainan 701, Taiwan
- Medical
Device Innovation Center, National Cheng
Kung University, No.1,
University Road, Tainan 701, Taiwan
| |
Collapse
|
33
|
Chen K, Litfin T, Singh J, Zhan J, Zhou Y. MARS and RNAcmap3: The Master Database of All Possible RNA Sequences Integrated with RNAcmap for RNA Homology Search. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae018. [PMID: 38872612 DOI: 10.1093/gpbjnl/qzae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 09/24/2023] [Accepted: 10/31/2023] [Indexed: 06/15/2024]
Abstract
Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by incorporating the non-coding RNA (ncRNA) sequences from RNAcentral, the transcriptome assembly and metagenome assembly from metagenomics RAST (MG-RAST), the genomic sequences from Genome Warehouse (GWH), and the genomic sequences from MGnify, in addition to the nucleotide (nt) database and its subsets in National Center of Biotechnology Information (NCBI). The resulting Master database of All possible RNA sequences (MARS) is 20-fold larger than NCBI's nt database or 60-fold larger than RNAcentral. The new dataset along with a new split-search strategy allows a substantial improvement in homology search over existing state-of-the-art techniques. It also yields more accurate and more sensitive multiple sequence alignments (MSAs) than manually curated MSAs from Rfam for the majority of structured RNAs mapped to Rfam. The results indicate that MARS coupled with the fully automatic homology search tool RNAcmap will be useful for improved structural and functional inference of ncRNAs and RNA language models based on MSAs. MARS is accessible at https://ngdc.cncb.ac.cn/omix/release/OMIX003037, and RNAcmap3 is accessible at http://zhouyq-lab.szbl.ac.cn/download/.
Collapse
Affiliation(s)
- Ke Chen
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
- Peking University Shenzhen Graduate School, Shenzhen 518055, China
- University of Science and Technology of China, Hefei 230026, China
- Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou 215123, China
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| | - Jaswinder Singh
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Jian Zhan
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518055, China
- Peking University Shenzhen Graduate School, Shenzhen 518055, China
- Institute for Glycomics, Griffith University, Southport, QLD 4222, Australia
| |
Collapse
|
34
|
Li X, Qu W, Yan J, Tan J. RPI-EDLCN: An Ensemble Deep Learning Framework Based on Capsule Network for ncRNA-Protein Interaction Prediction. J Chem Inf Model 2024; 64:2221-2235. [PMID: 37158609 DOI: 10.1021/acs.jcim.3c00377] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Noncoding RNAs (ncRNAs) play crucial roles in many cellular life activities by interacting with proteins. Identification of ncRNA-protein interactions (ncRPIs) is key to understanding the function of ncRNAs. Although a number of computational methods for predicting ncRPIs have been developed, the problem of predicting ncRPIs remains challenging. It has always been the focus of ncRPIs research to select suitable feature extraction methods and develop a deep learning architecture with better recognition performance. In this work, we proposed an ensemble deep learning framework, RPI-EDLCN, based on a capsule network (CapsuleNet) to predict ncRPIs. In terms of feature input, we extracted the sequence features, secondary structure sequence features, motif information, and physicochemical properties of ncRNA/protein. The sequence and secondary structure sequence features of ncRNA/protein are encoded by the conjoint k-mer method and then input into an ensemble deep learning model based on CapsuleNet by combining the motif information and physicochemical properties. In this model, the encoding features are processed by convolution neural network (CNN), deep neural network (DNN), and stacked autoencoder (SAE). Then the advanced features obtained from the processing are input into the CapsuleNet for further feature learning. Compared with other state-of-the-art methods under 5-fold cross-validation, the performance of RPI-EDLCN is the best, and the accuracy of RPI-EDLCN on RPI1807, RPI2241, and NPInter v2.0 data sets was 93.8%, 88.2%, and 91.9%, respectively. The results of the independent test indicated that RPI-EDLCN can effectively predict potential ncRPIs in different organisms. In addition, RPI-EDLCN successfully predicted hub ncRNAs and proteins in Mus musculus ncRNA-protein networks. Overall, our model can be used as an effective tool to predict ncRPIs and provides some useful guidance for future biological studies.
Collapse
Affiliation(s)
- Xiaoyi Li
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Wenyan Qu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jing Yan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing University of Technology, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing 100124, China
| |
Collapse
|
35
|
Chakraborty C, Bhattacharya M, Sharma AR, Chatterjee S, Agoramoorthy G, Lee SS. Structural Landscape of nsp Coding Genomic Regions of SARS-CoV-2-ssRNA Genome: A Structural Genomics Approach Toward Identification of Druggable Genome, Ligand-Binding Pockets, and Structure-Based Druggability. Mol Biotechnol 2024; 66:641-662. [PMID: 36463562 PMCID: PMC9735222 DOI: 10.1007/s12033-022-00605-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 11/07/2022] [Indexed: 12/05/2022]
Abstract
SARS-CoV-2 has a single-stranded RNA genome (+ssRNA), and synthesizes structural and non-structural proteins (nsps). All 16 nsp are synthesized from the ORF1a, and ORF1b regions associated with different life cycle preprocesses, including replication. The regions of ORF1a synthesizes nsp1 to 11, and ORF1b synthesizes nsp12 to 16. In this paper, we have predicted the secondary structure conformations, entropy & mountain plots, RNA secondary structure in a linear fashion, and 3D structure of nsp coding genes of the SARS-CoV-2 genome. We have also analyzed the A, T, G, C, A+T, and G+C contents, GC-profiling of these genes, showing the range of the GC content from 34.23 to 48.52%. We have observed that the GC-profile value of the nsp coding genomic regions was less (about 0.375) compared to the whole genome (about 0.38). Additionally, druggable pockets were identified from the secondary structure-guided 3D structural conformations. For secondary structure generation of all the nsp coding genes (nsp 1-16), we used a recent algorithm-based tool (deep learning-based) along with the conventional algorithms (centroid and MFE-based) to develop secondary structural conformations, and we found stem-loop, multi-branch loop, pseudoknot, and the bulge structural components, etc. The 3D model shows bound and unbound forms, branched structures, duplex structures, three-way junctions, four-way junctions, etc. Finally, we identified binding pockets of nsp coding genes which will help as a fundamental resource for future researchers to develop RNA-targeted therapeutics using the druggable genome.
Collapse
Affiliation(s)
- Chiranjib Chakraborty
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India.
| | - Manojit Bhattacharya
- Department of Zoology, Fakir Mohan University, Vyasa Vihar, Balasore, Odisha, 756020, India
| | - Ashish Ranjan Sharma
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si, Gangwon-do, 24252, Republic of Korea
| | - Srijan Chatterjee
- Department of Biotechnology, School of Life Science and Biotechnology, Adamas University, Kolkata, West Bengal, 700126, India
| | | | - Sang-Soo Lee
- Institute for Skeletal Aging & Orthopaedic Surgery, Hallym University-Chuncheon Sacred Heart Hospital, Chuncheon-si, Gangwon-do, 24252, Republic of Korea
| |
Collapse
|
36
|
Szikszai M, Magnus M, Sanghi S, Kadyan S, Bouatta N, Rivas E. RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.30.578025. [PMID: 38352531 PMCID: PMC10862857 DOI: 10.1101/2024.01.30.578025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
With advances in protein structure prediction thanks to deep learning models like AlphaFold, RNA structure prediction has recently received increased attention from deep learning researchers. RNAs introduce substantial challenges due to the sparser availability and lower structural diversity of the experimentally resolved RNA structures in comparison to protein structures. These challenges are often poorly addressed by the existing literature, many of which report inflated performance due to using training and testing sets with significant structural overlap. Further, the most recent Critical Assessment of Structure Prediction (CASP15) has shown that deep learning models for RNA structure are currently outperformed by traditional methods. In this paper we present RNA3DB, a dataset of structured RNAs, derived from the Protein Data Bank (PDB), that is designed for training and benchmarking deep learning models. The RNA3DB method arranges the RNA 3D chains into distinct groups (Components) that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets. Any split of these structurally-dissimilar Components are guaranteed to produce test and validations sets that are distinct by sequence and structure from those in the training set. We provide the RNA3DB dataset, a particular train/test split of the RNA3DB Components (in an approximate 70/30 ratio) that will be updated periodically. We also provide the RNA3DB methodology along with the source-code, with the goal of creating a reproducible and customizable tool for producing structurally-dissimilar dataset splits for structural RNAs.
Collapse
Affiliation(s)
- Marcell Szikszai
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Marcin Magnus
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| | - Siddhant Sanghi
- Department of Systems Biology, Columbia University, New York, 10027, NY, USA
- College of Biological Sciences, UC Davis, Davis, 95616, CA, USA
| | - Sachin Kadyan
- Department of Systems Biology, Columbia University, New York, 10027, NY, USA
| | - Nazim Bouatta
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, 02115, MA, USA
| | - Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA
| |
Collapse
|
37
|
Gong T, Ju F, Bu D. Accurate prediction of RNA secondary structure including pseudoknots through solving minimum-cost flow with learned potentials. Commun Biol 2024; 7:297. [PMID: 38461362 PMCID: PMC10924946 DOI: 10.1038/s42003-024-05952-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/21/2024] [Indexed: 03/11/2024] Open
Abstract
Pseudoknots are key structure motifs of RNA and pseudoknotted RNAs play important roles in a variety of biological processes. Here, we present KnotFold, an accurate approach to the prediction of RNA secondary structure including pseudoknots. The key elements of KnotFold include a learned potential function and a minimum-cost flow algorithm to find the secondary structure with the lowest potential. KnotFold learns the potential from the RNAs with known structures using an attention-based neural network, thus avoiding the inaccuracy of hand-crafted energy functions. The specially designed minimum-cost flow algorithm used by KnotFold considers all possible combinations of base pairs and selects from them the optimal combination. The algorithm breaks the restriction of nested base pairs required by the widely used dynamic programming algorithms, thus enabling the identification of pseudoknots. Using 1,009 pseudoknotted RNAs as representatives, we demonstrate the successful application of KnotFold in predicting RNA secondary structures including pseudoknots with accuracy higher than the state-of-the-art approaches. We anticipate that KnotFold, with its superior accuracy, will greatly facilitate the understanding of RNA structures and functionalities.
Collapse
Affiliation(s)
- Tiansu Gong
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China
- University of Chinese Academy of Sciences, 100190, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China.
- University of Chinese Academy of Sciences, 100190, Beijing, China.
- Central China Artificial Intelligence Research Institute, Henan Academy of Sciences, Zhengzhou, 450046, Henan, China.
| |
Collapse
|
38
|
Kang J, Wei S, Jia Z, Ma Y, Chen H, Sun C, Xu J, Tao J, Dong Y, Lv W, Tian H, Guo X, Bi S, Zhang C, Jiang Y, Lv H, Zhang M. Effects of genetic variation on the structure of RNA and protein. Proteomics 2024; 24:e2300235. [PMID: 38197532 DOI: 10.1002/pmic.202300235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 12/15/2023] [Accepted: 12/19/2023] [Indexed: 01/11/2024]
Abstract
Changes in the structure of RNA and protein, have an important impact on biological functions and are even important determinants of disease pathogenesis and treatment. Some genetic variations, including copy number variation, single nucleotide variation, and so on, can lead to changes in biological function and increased susceptibility to certain diseases by changing the structure of RNA or protein. With the development of structural biology and sequencing technology, a large amount of RNA and protein structure data and genetic variation data resources has emerged to be used to explain biological processes. Here, we reviewed the effects of genetic variation on the structure of RNAs and proteins, and investigated their impact on several diseases. An online resource (http://www.onethird-lab.com/gems/) to support convenient retrieval of common tools is also built. Finally, the challenges and future development of the effects of genetic variation on RNA and protein were discussed.
Collapse
Affiliation(s)
- Jingxuan Kang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Siyu Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Zhe Jia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Yingnan Ma
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Haiyan Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Chen Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Jing Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Junxian Tao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Yu Dong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Wenhua Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongsheng Tian
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Xuying Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shuo Bi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Chen Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| | - Mingming Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
- The Epigenome-Wide Association Study Project, Harbin, China
| |
Collapse
|
39
|
Busaranuvong P, Ammartayakun A, Korkin D, Khosravi-Far R. Graph Convolutional Network for predicting secondary structure of RNA. RESEARCH SQUARE 2024:rs.3.rs-3798842. [PMID: 38464300 PMCID: PMC10925402 DOI: 10.21203/rs.3.rs-3798842/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
The prediction of RNA secondary structures is essential for understanding its underlying principles and applications in diverse fields, including molecular diagnostics and RNA-based therapeutic strategies. However, the complexity of the search space presents a challenge. This work proposes a Graph Convolutional Network (GCNfold) for predicting the RNA secondary structure. GCNfold considers an RNA sequence as graph-structured data and predicts posterior base-pairing probabilities given the prior base-pairing probabilities, calculated using McCaskill's partition function. The performance of GCNfold surpasses that of the state-of-the-art folding algorithms, as we have incorporated minimum free energy information into the richly parameterized network, enhancing its robustness in predicting non-homologous RNA secondary structures. A Symmetric Argmax Post-processing algorithm ensures that GCNfold formulates valid structures. To validate our algorithm, we applied it to the SARS-CoV-2 E gene and determined the secondary structure of the E-gene across the Betacoronavirus subgenera.
Collapse
Affiliation(s)
- Palawat Busaranuvong
- Department of Data Science, Worcester Polytechnic Institute, Worcester, 01609, Massachusetts, USA
- InnoTech Precision Medicine, Boston, 02130, Massachusetts, USA
| | - Aukkawut Ammartayakun
- Department of Data Science, Worcester Polytechnic Institute, Worcester, 01609, Massachusetts, USA
| | - Dmitry Korkin
- Department of Computer Science, Worcester Polytechnic Institute, Worcester, 01609, Massachusetts, USA
| | | |
Collapse
|
40
|
Ramakers J, Blum CF, König S, Harmeling S, Kollmann M. De novo prediction of RNA 3D structures with deep generative models. PLoS One 2024; 19:e0297105. [PMID: 38358972 PMCID: PMC10868834 DOI: 10.1371/journal.pone.0297105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 12/24/2023] [Indexed: 02/17/2024] Open
Abstract
We present a Deep Learning approach to predict 3D folding structures of RNAs from their nucleic acid sequence. Our approach combines an autoregressive Deep Generative Model, Monte Carlo Tree Search, and a score model to find and rank the most likely folding structures for a given RNA sequence. We show that RNA de novo structure prediction by deep learning is possible at atom resolution, despite the low number of experimentally measured structures that can be used for training. We confirm the predictive power of our approach by achieving competitive results in a retrospective evaluation of the RNA-Puzzles prediction challenges, without using structural contact information from multiple sequence alignments or additional data from chemical probing experiments. Blind predictions for recent RNA-Puzzle challenges under the name "Dfold" further support the competitive performance of our approach.
Collapse
Affiliation(s)
- Julius Ramakers
- Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
| | | | - Sabrina König
- Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
| | - Stefan Harmeling
- Department of Computer Science, Technical University Dortmund, Dortmund, Germany
| | - Markus Kollmann
- Department of Computer Science, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany
| |
Collapse
|
41
|
Rinaldi S, Moroni E, Rozza R, Magistrato A. Frontiers and Challenges of Computing ncRNAs Biogenesis, Function and Modulation. J Chem Theory Comput 2024; 20:993-1018. [PMID: 38287883 DOI: 10.1021/acs.jctc.3c01239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Non-coding RNAs (ncRNAs), generated from nonprotein coding DNA sequences, constitute 98-99% of the human genome. Non-coding RNAs encompass diverse functional classes, including microRNAs, small interfering RNAs, PIWI-interacting RNAs, small nuclear RNAs, small nucleolar RNAs, and long non-coding RNAs. With critical involvement in gene expression and regulation across various biological and physiopathological contexts, such as neuronal disorders, immune responses, cardiovascular diseases, and cancer, non-coding RNAs are emerging as disease biomarkers and therapeutic targets. In this review, after providing an overview of non-coding RNAs' role in cell homeostasis, we illustrate the potential and the challenges of state-of-the-art computational methods exploited to study non-coding RNAs biogenesis, function, and modulation. This can be done by directly targeting them with small molecules or by altering their expression by targeting the cellular engines underlying their biosynthesis. Drawing from applications, also taken from our work, we showcase the significance and role of computer simulations in uncovering fundamental facets of ncRNA mechanisms and modulation. This information may set the basis to advance gene modulation tools and therapeutic strategies to address unmet medical needs.
Collapse
Affiliation(s)
- Silvia Rinaldi
- National Research Council of Italy (CNR) - Institute of Chemistry of OrganoMetallic Compounds (ICCOM), c/o Area di Ricerca CNR di Firenze Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy
| | - Elisabetta Moroni
- National Research Council of Italy (CNR) - Institute of Chemical Sciences and Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
42
|
Lu S, Tang Y, Yin S, Sun L. RNA structure: implications in viral infections and neurodegenerative diseases. ADVANCED BIOTECHNOLOGY 2024; 2:3. [PMID: 39883271 PMCID: PMC11740852 DOI: 10.1007/s44307-024-00010-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 01/31/2025]
Abstract
RNA is an intermediary between DNA and protein, a catalyzer of biochemical reactions, and a regulator of genes and transcripts. RNA structures are essential for complicated functions. Recent years have witnessed rapid advancements in RNA secondary structure probing techniques. These technological strides provided comprehensive insights into RNA structures, which significantly contributed to our understanding of diverse cellular regulatory processes, including gene regulation, epigenetic regulation, and post-transactional regulation. Meanwhile, they have facilitated the creation of therapeutic tools for tackling human diseases. Despite their therapeutic applications, RNA structure probing methods also offer a promising avenue for exploring the mechanisms of human diseases, potentially providing the key to overcoming existing research constraints and obtaining the in-depth information necessary for a deeper understanding of disease mechanisms.
Collapse
Affiliation(s)
- Suiru Lu
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Shandong University, Qingdao, 266237, China
- Taishan College, Shandong University, Qingdao, 266237, China
| | - Yongkang Tang
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Shandong University, Qingdao, 266237, China
| | - Shaozhen Yin
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Shandong University, Qingdao, 266237, China
| | - Lei Sun
- Pingyuan Laboratory, Xinxiang, Henan, 453007, China.
- Shandong Provincial Key Laboratory of Animal Cell and Developmental Biology, School of Life Sciences, Shandong University, Qingdao, 266237, China.
- Taishan College, Shandong University, Qingdao, 266237, China.
- State Key Laboratory of Microbial Technology, Shandong University, Qingdao, 266237, China.
| |
Collapse
|
43
|
Zhang Y, Lang M, Jiang J, Gao Z, Xu F, Litfin T, Chen K, Singh J, Huang X, Song G, Tian Y, Zhan J, Chen J, Zhou Y. Multiple sequence alignment-based RNA language model and its application to structural inference. Nucleic Acids Res 2024; 52:e3. [PMID: 37941140 PMCID: PMC10783488 DOI: 10.1093/nar/gkad1031] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 10/21/2023] [Indexed: 11/10/2023] Open
Abstract
Compared with proteins, DNA and RNA are more difficult languages to interpret because four-letter coded DNA/RNA sequences have less information content than 20-letter coded protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models have been developed for RNA, they are ineffective at capturing the evolutionary information from homologous sequences because unlike proteins, RNA sequences are less conserved. Here, we have developed an unsupervised multiple sequence alignment-based RNA language model (RNA-MSM) by utilizing homologous sequences from an automatic pipeline, RNAcmap, as it can provide significantly more homologous sequences than manually annotated Rfam. We demonstrate that the resulting unsupervised, two-dimensional attention maps and one-dimensional embeddings from RNA-MSM contain structural information. In fact, they can be directly mapped with high accuracy to 2D base pairing probabilities and 1D solvent accessibilities, respectively. Further fine-tuning led to significantly improved performance on these two downstream tasks compared with existing state-of-the-art techniques including SPOT-RNA2 and RNAsnap2. By comparison, RNA-FM, a BERT-based RNA language model, performs worse than one-hot encoding with its embedding in base pair and solvent-accessible surface area prediction. We anticipate that the pre-trained RNA-MSM model can be fine-tuned on many other tasks related to RNA structure and function.
Collapse
Affiliation(s)
- Yikun Zhang
- School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China
- AI for Science (AI4S)-Preferred Program, Peking University Shenzhen Graduate School, Shenzen 518055, China
| | - Mei Lang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jiuhong Jiang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Zhiqiang Gao
- Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Fan Xu
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD 4215, Australia
| | - Ke Chen
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jaswinder Singh
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | | | - Guoli Song
- Peng Cheng Laboratory, Shenzhen 518066, China
| | | | - Jian Zhan
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
| | - Jie Chen
- School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China
- Peng Cheng Laboratory, Shenzhen 518066, China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
- Institute for Glycomics, Griffith University, Parklands Dr, Southport, QLD 4215, Australia
| |
Collapse
|
44
|
Dueñas Rey A, Del Pozo Valero M, Bouckaert M, Wood KA, Van den Broeck F, Daich Varela M, Thomas HB, Van Heetvelde M, De Bruyne M, Van de Sompele S, Bauwens M, Lenaerts H, Mahieu Q, Josifova D, Rivolta C, O'Keefe RT, Ellingford J, Webster AR, Arno G, Ayuso C, De Zaeytijd J, Leroy BP, De Baere E, Coppieters F. Combining a prioritization strategy and functional studies nominates 5'UTR variants underlying inherited retinal disease. Genome Med 2024; 16:7. [PMID: 38184646 PMCID: PMC10771650 DOI: 10.1186/s13073-023-01277-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 12/15/2023] [Indexed: 01/08/2024] Open
Abstract
BACKGROUND 5' untranslated regions (5'UTRs) are essential modulators of protein translation. Predicting the impact of 5'UTR variants is challenging and rarely performed in routine diagnostics. Here, we present a combined approach of a comprehensive prioritization strategy and functional assays to evaluate 5'UTR variation in two large cohorts of patients with inherited retinal diseases (IRDs). METHODS We performed an isoform-level re-analysis of retinal RNA-seq data to identify the protein-coding transcripts of 378 IRD genes with highest expression in retina. We evaluated the coverage of their 5'UTRs by different whole exome sequencing (WES) kits. The selected 5'UTRs were analyzed in whole genome sequencing (WGS) and WES data from IRD sub-cohorts from the 100,000 Genomes Project (n = 2397 WGS) and an in-house database (n = 1682 WES), respectively. Identified variants were annotated for 5'UTR-relevant features and classified into seven categories based on their predicted functional consequence. We developed a variant prioritization strategy by integrating population frequency, specific criteria for each category, and family and phenotypic data. A selection of candidate variants underwent functional validation using diverse approaches. RESULTS Isoform-level re-quantification of retinal gene expression revealed 76 IRD genes with a non-canonical retina-enriched isoform, of which 20 display a fully distinct 5'UTR compared to that of their canonical isoform. Depending on the probe design, 3-20% of IRD genes have 5'UTRs fully captured by WES. After analyzing these regions in both cohorts, we prioritized 11 (likely) pathogenic variants in 10 genes (ARL3, MERTK, NDP, NMNAT1, NPHP4, PAX6, PRPF31, PRPF4, RDH12, RD3), of which 7 were novel. Functional analyses further supported the pathogenicity of three variants. Mis-splicing was demonstrated for the PRPF31:c.-9+1G>T variant. The MERTK:c.-125G>A variant, overlapping a transcriptional start site, was shown to significantly reduce both luciferase mRNA levels and activity. The RDH12:c.-123C>T variant was found in cis with the hypomorphic RDH12:c.701G>A (p.Arg234His) variant in 11 patients. This 5'UTR variant, predicted to introduce an upstream open reading frame, was shown to result in reduced RDH12 protein but unaltered mRNA levels. CONCLUSIONS This study demonstrates the importance of 5'UTR variants implicated in IRDs and provides a systematic approach for 5'UTR annotation and validation that is applicable to other inherited diseases.
Collapse
Affiliation(s)
- Alfredo Dueñas Rey
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Marta Del Pozo Valero
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz, University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
| | - Manon Bouckaert
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Katherine A Wood
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Filip Van den Broeck
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
| | - Malena Daich Varela
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Huw B Thomas
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Mattias Van Heetvelde
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Marieke De Bruyne
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Stijn Van de Sompele
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Miriam Bauwens
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Hanne Lenaerts
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Quinten Mahieu
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | | | - Carlo Rivolta
- Department of Ophthalmology, University of Basel, Basel, Switzerland
- Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland
- Department of Genetics and Genome Biology, University of Leicester, Leicester, UK
| | - Raymond T O'Keefe
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
| | - Jamie Ellingford
- Division of Evolution, Infection and Genomics, School of Biological Sciences, Faculty of Biology, Medicines and Health, University of Manchester, Manchester, UK
- Genomics England, London, UK
- Manchester Centre for Genomic Medicine, St Mary's Hospital, Manchester University NHS Foundation Trust, Manchester, UK
| | - Andrew R Webster
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Gavin Arno
- UCL Institute of Ophthalmology, University College London, London, UK
- Moorfields Eye Hospital, London, UK
| | - Carmen Ayuso
- Department of Genetics, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz, University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III, Madrid, Spain
| | - Julie De Zaeytijd
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
| | - Bart P Leroy
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Ophthalmology, Ghent University Hospital, Ghent, Belgium
- Department of Head & Skin, Ghent University, Ghent, Belgium
- Division of Ophthalmology, Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Elfride De Baere
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium
| | - Frauke Coppieters
- Center for Medical Genetics Ghent (CMGG), Ghent University Hospital, Ghent, Belgium.
- Department of Biomolecular Medicine, Ghent University, Corneel Heymanslaan 10, Ghent, 9000, Belgium.
- Department of Pharmaceutics, Ghent University, Ghent, Belgium.
| |
Collapse
|
45
|
Nedorezova DD, Dubovichenko MV, Kalnin AJ, Nour MAY, Eldeeb AA, Ashmarova AI, Kurbanov GF, Kolpashchikov DM. Cleaving Folded RNA with DNAzyme Agents. Chembiochem 2024; 25:e202300637. [PMID: 37870555 DOI: 10.1002/cbic.202300637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 10/17/2023] [Accepted: 10/23/2023] [Indexed: 10/24/2023]
Abstract
Cleavage of biological mRNA by DNAzymes (Dz) has been proposed as a variation of oligonucleotide gene therapy (OGT). The design of Dz-based OGT agents includes computational prediction of two RNA-binding arms with low affinity (melting temperatures (Tm ) close to the reaction temperature of 37 °C) to avoid product inhibition and maintain high specificity. However, RNA cleavage might be limited by the RNA binding step especially if the RNA is folded in secondary structures. This calls for the need for two high-affinity RNA-binding arms. In this study, we optimized 10-23 Dz-based OGT agents for cleavage of three RNA targets with different folding energies under multiple turnover conditions in 2 mM Mg2+ at 37 °C. Unexpectedly, one optimized Dz had each RNA-binding arm with a Tm ≥60 °C, without suffering from product inhibition or low selectivity. This phenomenon was explained by the folding of the RNA cleavage products into stable secondary structures. This result suggests that Dz with long (high affinity) RNA-binding arms should not be excluded from the candidate pool for OGT agents. Rather, analysis of the cleavage products' folding should be included in Dz selection algorithms. The Dz optimization workflow should include testing with folded rather than linear RNA substrates.
Collapse
Affiliation(s)
- Daria D Nedorezova
- Laboratory of molecular robotics and biosensor systems, Laboratory of Frontier nucleic acid technologies in gene therapy of cancer, SCAMT Institute, ITMO University, St. Petersburg, 191002, Russian Federation
| | - Mikhail V Dubovichenko
- Laboratory of molecular robotics and biosensor systems, Laboratory of Frontier nucleic acid technologies in gene therapy of cancer, SCAMT Institute, ITMO University, St. Petersburg, 191002, Russian Federation
| | - Arseniy J Kalnin
- Laboratory of molecular robotics and biosensor systems, Laboratory of Frontier nucleic acid technologies in gene therapy of cancer, SCAMT Institute, ITMO University, St. Petersburg, 191002, Russian Federation
| | - Moustapha A Y Nour
- Laboratory of molecular robotics and biosensor systems, Laboratory of Frontier nucleic acid technologies in gene therapy of cancer, SCAMT Institute, ITMO University, St. Petersburg, 191002, Russian Federation
| | - Ahmed A Eldeeb
- Laboratory of molecular robotics and biosensor systems, Laboratory of Frontier nucleic acid technologies in gene therapy of cancer, SCAMT Institute, ITMO University, St. Petersburg, 191002, Russian Federation
| | - Anna I Ashmarova
- Laboratory of molecular robotics and biosensor systems, Laboratory of Frontier nucleic acid technologies in gene therapy of cancer, SCAMT Institute, ITMO University, St. Petersburg, 191002, Russian Federation
| | - Gabdulla F Kurbanov
- Laboratory of molecular robotics and biosensor systems, Laboratory of Frontier nucleic acid technologies in gene therapy of cancer, SCAMT Institute, ITMO University, St. Petersburg, 191002, Russian Federation
| | - Dmitry M Kolpashchikov
- Laboratory of molecular robotics and biosensor systems, Laboratory of Frontier nucleic acid technologies in gene therapy of cancer, SCAMT Institute, ITMO University, St. Petersburg, 191002, Russian Federation
- Chemistry Department, University of Central Florida, Orlando, FL 32816-2366, USA
- Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
46
|
Ma J, Tsuboi T. Efficient Prediction Model of mRNA End-to-End Distance and Conformation: Three-Dimensional RNA Illustration Program (TRIP). Methods Mol Biol 2024; 2784:191-200. [PMID: 38502487 DOI: 10.1007/978-1-0716-3766-1_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
The secondary and tertiary structures of RNA play a vital role in the regulation of biological reactions. These structures have been experimentally studied through in vivo and in vitro analyses, and in silico models have become increasingly accurate in predicting them. Recent technologies have diversified RNA structure predictions, from the earliest thermodynamic and molecular dynamic-based RNA structure predictions to deep learning-based conformation predictions in the past decade. While most research on RNA structure prediction has focused on short non-coding RNAs, there has been limited research on predicting the conformation of longer mRNAs. Our study introduces a computer simulation model called the Three-dimensional RNA Illustration Program (TRIP). TRIP is based on single-chain models and angle restriction of each bead component from previously reported single-molecule fluorescence in situ hybridization (smFISH) experiments. TRIP is a fast and efficient application that only requires up to three inputs to acquire outputs. It can also provide a rough visualization of the 3D conformation of RNA, making it a valuable tool for predicting RNA end-to-end distance.
Collapse
Affiliation(s)
- Jiayun Ma
- Institute of Biopharmaceutical and Health Engineering, Tsinghua Shenzhen International Graduate School, University Town of Shenzhen, Shenzhen, Guangdong, China
| | - Tatsuhisa Tsuboi
- Institute of Biopharmaceutical and Health Engineering, Tsinghua Shenzhen International Graduate School, University Town of Shenzhen, Shenzhen, Guangdong, China.
| |
Collapse
|
47
|
Rocca R, Grillone K, Citriniti EL, Gualtieri G, Artese A, Tagliaferri P, Tassone P, Alcaro S. Targeting non-coding RNAs: Perspectives and challenges of in-silico approaches. Eur J Med Chem 2023; 261:115850. [PMID: 37839343 DOI: 10.1016/j.ejmech.2023.115850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 09/08/2023] [Accepted: 09/29/2023] [Indexed: 10/17/2023]
Abstract
The growing information currently available on the central role of non-coding RNAs (ncRNAs) including microRNAs (miRNAS) and long non-coding RNAs (lncRNAs) for chronic and degenerative human diseases makes them attractive therapeutic targets. RNAs carry out different functional roles in human biology and are deeply deregulated in several diseases. So far, different attempts to therapeutically target the 3D RNA structures with small molecules have been reported. In this scenario, the development of computational tools suitable for describing RNA structures and their potential interactions with small molecules is gaining more and more interest. Here, we describe the most suitable strategies to study ncRNAs through computational tools. We focus on methods capable of predicting 2D and 3D ncRNA structures. Furthermore, we describe computational tools to identify, design and optimize small molecule ncRNA binders. This review aims to outline the state of the art and perspectives of computational methods for ncRNAs over the past decade.
Collapse
Affiliation(s)
- Roberta Rocca
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy
| | - Katia Grillone
- Department of Experimental and Clinical Medicine, Magna Græcia University, Catanzaro, Italy
| | | | | | - Anna Artese
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy.
| | | | - Pierfrancesco Tassone
- Department of Experimental and Clinical Medicine, Magna Græcia University, Catanzaro, Italy
| | - Stefano Alcaro
- Department of Health Science, Magna Graecia University, Catanzaro, Italy; Net4Science srl, Academic Spinoff, Magna Græcia University, Catanzaro, Italy
| |
Collapse
|
48
|
Nasaev SS, Mukanov AR, Kuznetsov II, Veselovsky AV. AliNA - a deep learning program for RNA secondary structure prediction. Mol Inform 2023; 42:e202300113. [PMID: 37710142 DOI: 10.1002/minf.202300113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 09/13/2023] [Accepted: 09/14/2023] [Indexed: 09/16/2023]
Abstract
Nowadays there are numerous discovered natural RNA variations participating in different cellular processes and artificial RNA, e. g., aptamers, riboswitches. One of the required tasks in the investigation of their functions and mechanism of influence on cells and interaction with targets is the prediction of RNA secondary structures. The classic thermodynamic-based prediction algorithms do not consider the specificity of biological folding and deep learning methods that were designed to resolve this issue suffer from homology-based methods problems. Herein, we present a method for RNA secondary structure prediction based on deep learning - AliNA (ALIgned Nucleic Acids). Our method successfully predicts secondary structures for non-homologous to train-data RNA families thanks to usage of the data augmentation techniques. Augmentation extends existing datasets with easily-accessible simulated data. The proposed method shows a high quality of prediction across different benchmarks including pseudoknots. The method is available on GitHub for free (https://github.com/Arty40m/AliNA).
Collapse
Affiliation(s)
- Shamsudin S Nasaev
- Institute of Biomedical Chemistry, 10, Pogodinskaya str., 119121, Moscow, Russia
| | - Artem R Mukanov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18, Kremlyovskaya str., 420008, Kazan, Russia
| | - Ivan I Kuznetsov
- Moscow University of Finance and Law, 10 block 1, Serpuhovsky val str., 115191, Moscow, Russia
| | | |
Collapse
|
49
|
Qiu J, Li L, Sun J, Peng J, Shi P, Zhang R, Dong Y, Lam K, Lo FPW, Xiao B, Yuan W, Wang N, Xu D, Lo B. Large AI Models in Health Informatics: Applications, Challenges, and the Future. IEEE J Biomed Health Inform 2023; 27:6074-6087. [PMID: 37738186 DOI: 10.1109/jbhi.2023.3316750] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Large AI models, or foundation models, are models recently emerging with massive scales both parameter-wise and data-wise, the magnitudes of which can reach beyond billions. Once pretrained, large AI models demonstrate impressive performance in various downstream tasks. A prime example is ChatGPT, whose capability has compelled people's imagination about the far-reaching influence that large AI models can have and their potential to transform different domains of our lives. In health informatics, the advent of large AI models has brought new paradigms for the design of methodologies. The scale of multi-modal data in the biomedical and health domain has been ever-expanding especially since the community embraced the era of deep learning, which provides the ground to develop, validate, and advance large AI models for breakthroughs in health-related areas. This article presents a comprehensive review of large AI models, from background to their applications. We identify seven key sectors in which large AI models are applicable and might have substantial influence, including: 1) bioinformatics; 2) medical diagnosis; 3) medical imaging; 4) medical informatics; 5) medical education; 6) public health; and 7) medical robotics. We examine their challenges, followed by a critical discussion about potential future directions and pitfalls of large AI models in transforming the field of health informatics.
Collapse
|
50
|
Binet T, Padiolleau-Lefèvre S, Octave S, Avalle B, Maffucci I. Comparative Study of Single-stranded Oligonucleotides Secondary Structure Prediction Tools. BMC Bioinformatics 2023; 24:422. [PMID: 37940855 PMCID: PMC10634105 DOI: 10.1186/s12859-023-05532-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 10/13/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND Single-stranded nucleic acids (ssNAs) have important biological roles and a high biotechnological potential linked to their ability to bind to numerous molecular targets. This depends on the different spatial conformations they can assume. The first level of ssNAs spatial organisation corresponds to their base pairs pattern, i.e. their secondary structure. Many computational tools have been developed to predict the ssNAs secondary structures, making the choice of the appropriate tool difficult, and an up-to-date guide on the limits and applicability of current secondary structure prediction tools is missing. Therefore, we performed a comparative study of the performances of 9 freely available tools (mfold, RNAfold, CentroidFold, CONTRAfold, MC-Fold, LinearFold, UFold, SPOT-RNA, and MXfold2) on a dataset of 538 ssNAs with known experimental secondary structure. RESULTS The minimum free energy-based tools, namely mfold and RNAfold, and some tools based on artificial intelligence, namely CONTRAfold and MXfold2, provided the best results, with [Formula: see text] of exact predictions, whilst MC-fold seemed to be the worst performing tool, with only [Formula: see text] of exact predictions. In addition, UFold and SPOT-RNA are the only options for pseudoknots prediction. Including in the analysis of mfold and RNAfold results 5-10 suboptimal solutions further improved the performances of these tools. Nevertheless, we could observe issues in predicting particular motifs, such as multiple-ways junctions and mini-dumbbells, or the ssNAs whose structure has been determined in complex with a protein. In addition, our benchmark shows that some effort has to be paid for ssDNA secondary structure predictions. CONCLUSIONS In general, Mfold, RNAfold, and MXfold2 seem to currently be the best choice for the ssNAs secondary structure prediction, although they still show some limits linked to specific structural motifs. Nevertheless, actual trends suggest that artificial intelligence has a high potential to overcome these remaining issues, for example the recently developed UFold and SPOT-RNA have a high success rate in predicting pseudoknots.
Collapse
Affiliation(s)
- Thomas Binet
- Université de technologie de Compiègne, UPJV, CNRS, Enzyme and Cell Engineering, Centre de recherche Royallieu - CS 60 319, 60203, Compiègne Cedex, France
| | - Séverine Padiolleau-Lefèvre
- Université de technologie de Compiègne, UPJV, CNRS, Enzyme and Cell Engineering, Centre de recherche Royallieu - CS 60 319, 60203, Compiègne Cedex, France
| | - Stéphane Octave
- Université de technologie de Compiègne, UPJV, CNRS, Enzyme and Cell Engineering, Centre de recherche Royallieu - CS 60 319, 60203, Compiègne Cedex, France
| | - Bérangère Avalle
- Université de technologie de Compiègne, UPJV, CNRS, Enzyme and Cell Engineering, Centre de recherche Royallieu - CS 60 319, 60203, Compiègne Cedex, France.
| | - Irene Maffucci
- Université de technologie de Compiègne, UPJV, CNRS, Enzyme and Cell Engineering, Centre de recherche Royallieu - CS 60 319, 60203, Compiègne Cedex, France.
| |
Collapse
|