1
|
Angelo M, Bhargava Y, Aoki ST. A primer for junior trainees: Recognition of RNA modifications by RNA-binding proteins. BIOCHEMISTRY AND MOLECULAR BIOLOGY EDUCATION : A BIMONTHLY PUBLICATION OF THE INTERNATIONAL UNION OF BIOCHEMISTRY AND MOLECULAR BIOLOGY 2024. [PMID: 39037148 DOI: 10.1002/bmb.21854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 06/19/2024] [Accepted: 07/12/2024] [Indexed: 07/23/2024]
Abstract
The complexity of RNA cannot be fully expressed with the canonical A, C, G, and U alphabet. To date, over 170 distinct chemical modifications to RNA have been discovered in living systems. RNA modifications can profoundly impact the cellular outcomes of messenger RNAs (mRNAs), transfer and ribosomal RNAs, and noncoding RNAs. Additionally, aberrant RNA modifications are associated with human disease. RNA modifications are a rising topic within the fields of biochemistry and molecular biology. The role of RNA modifications in gene regulation, disease pathogenesis, and therapeutic applications increasingly captures the attention of the scientific community. This review aims to provide undergraduates, junior trainees, and educators with an appreciation for the significance of RNA modifications in eukaryotic organisms, alongside the skills required to identify and analyze fundamental RNA-protein interactions. The pumilio RNA-binding protein and YT521-B homology (YTH) family of modified RNA-binding proteins serve as examples to highlight the fundamental biochemical interactions that underlie the specific recognition of both unmodified and modified ribonucleotides, respectively. By instilling these foundational, textbook concepts through practical examples, this review contributes an analytical toolkit that facilitates engagement with RNA modifications research at large.
Collapse
Affiliation(s)
- Murphy Angelo
- Department of Biochemistry and Molecular Biology, School of Medicine, Indiana University Purdue University Indianapolis, Indianapolis, Indiana, USA
| | - Yash Bhargava
- Department of Biochemistry and Molecular Biology, School of Medicine, Indiana University Purdue University Indianapolis, Indianapolis, Indiana, USA
| | - Scott Takeo Aoki
- Department of Biochemistry and Molecular Biology, School of Medicine, Indiana University Purdue University Indianapolis, Indianapolis, Indiana, USA
| |
Collapse
|
2
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
3
|
Song M, Zhao J, Zhang C, Jia C, Yang J, Zhao H, Zhai J, Lei B, Tao S, Chen S, Su R, Ma C. PEA-m6A: an ensemble learning framework for accurately predicting N6-methyladenosine modifications in plants. PLANT PHYSIOLOGY 2024; 195:1200-1213. [PMID: 38428981 DOI: 10.1093/plphys/kiae120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 01/11/2024] [Accepted: 02/01/2024] [Indexed: 03/03/2024]
Abstract
N 6-methyladenosine (m6A), which is the mostly prevalent modification in eukaryotic mRNAs, is involved in gene expression regulation and many RNA metabolism processes. Accurate prediction of m6A modification is important for understanding its molecular mechanisms in different biological contexts. However, most existing models have limited range of application and are species-centric. Here we present PEA-m6A, a unified, modularized and parameterized framework that can streamline m6A-Seq data analysis for predicting m6A-modified regions in plant genomes. The PEA-m6A framework builds ensemble learning-based m6A prediction models with statistic-based and deep learning-driven features, achieving superior performance with an improvement of 6.7% to 23.3% in the area under precision-recall curve compared with state-of-the-art regional-scale m6A predictor WeakRM in 12 plant species. Especially, PEA-m6A is capable of leveraging knowledge from pretrained models via transfer learning, representing an innovation in that it can improve prediction accuracy of m6A modifications under small-sample training tasks. PEA-m6A also has a strong capability for generalization, making it suitable for application in within- and cross-species m6A prediction. Overall, this study presents a promising m6A prediction tool, PEA-m6A, with outstanding performance in terms of its accuracy, flexibility, transferability, and generalization ability. PEA-m6A has been packaged using Galaxy and Docker technologies for ease of use and is publicly available at https://github.com/cma2015/PEA-m6A.
Collapse
Affiliation(s)
- Minggui Song
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jiawen Zhao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chujun Zhang
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Chengchao Jia
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jing Yang
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Haonan Zhao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jingjing Zhai
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853, USA
| | - Beilei Lei
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Shiheng Tao
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Siqi Chen
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Ran Su
- School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin 300072, China
| | - Chuang Ma
- State Key Laboratory of Crop Stress Resistance and High-Efficiency Production, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, Shaanxi 712100, China
- Key Laboratory of Biology and Genetics Improvement of Maize in Arid Area of Northwest Region, Ministry of Agriculture, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
4
|
Li J, Sun F, He K, Zhang L, Meng J, Huang D, Zhang Y. Detection and Quantification of 5moU RNA Modification from Direct RNA Sequencing Data. Curr Genomics 2024; 25:212-225. [PMID: 39086998 PMCID: PMC11288159 DOI: 10.2174/0113892029288843240402042529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 02/27/2024] [Accepted: 03/08/2024] [Indexed: 08/02/2024] Open
Abstract
Background Chemically modified therapeutic mRNAs have gained momentum recently. In addition to commonly used modifications (e.g., pseudouridine), 5moU is considered a promising substitution for uridine in therapeutic mRNAs. Accurate identification of 5-methoxyuridine (5moU) would be crucial for the study and quality control of relevant in vitro-transcribed (IVT) mRNAs. However, current methods exhibit deficiencies in providing quantitative methodologies for detecting such modification. Utilizing the capabilities of Oxford nanopore direct RNA sequencing, in this study, we present NanoML-5moU, a machine-learning framework designed specifically for the read-level detection and quantification of 5moU modification for IVT data. Materials and Methods Nanopore direct RNA sequencing data from both 5moU-modified and unmodified control samples were collected. Subsequently, a comprehensive analysis and modeling of signal event characteristics (mean, median current intensities, standard deviations, and dwell times) were performed. Furthermore, classical machine learning algorithms, notably the Support Vector Machine (SVM), Random Forest (RF), and XGBoost were employed to discern 5moU modifications within NNUNN (where N represents A, C, U, or G) 5-mers. Results Notably, the signal event attributes pertaining to each constituent base of the NNUNN 5-mers, in conjunction with the utilization of the XGBoost algorithm, exhibited remarkable performance levels (with a maximum AUROC of 0.9567 in the "AGTTC" reference 5-mer dataset and a minimum AUROC of 0.8113 in the "TGTGC" reference 5-mer dataset). This accomplishment markedly exceeded the efficacy of the prevailing background error comparison model (ELIGOs AUC 0.751 for site-level prediction). The model's performance was further validated through a series of curated datasets, which featured customized modification ratios designed to emulate broader data patterns, demonstrating its general applicability in quality control of IVT mRNA vaccines. The NanoML-5moU framework is publicly available on GitHub (https://github.com/JiayiLi21/NanoML-5moU). Conclusion NanoML-5moU enables accurate read-level profiling of 5moU modification with nanopore direct RNA-sequencing, which is a powerful tool specialized in unveiling signal patterns in in vitro-transcribed (IVT) mRNAs.
Collapse
Affiliation(s)
- Jiayi Li
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Feiyang Sun
- Department of Computer Science, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Kunyang He
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Jia Meng
- Department of Biological Science, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Daiyun Huang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Yuxin Zhang
- Department of Biological Science, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
| |
Collapse
|
5
|
Maestri S, Furlan M, Mulroney L, Coscujuela Tarrero L, Ugolini C, Dalla Pozza F, Leonardi T, Birney E, Nicassio F, Pelizzola M. Benchmarking of computational methods for m6A profiling with Nanopore direct RNA sequencing. Brief Bioinform 2024; 25:bbae001. [PMID: 38279646 PMCID: PMC10818168 DOI: 10.1093/bib/bbae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 10/27/2023] [Accepted: 12/28/2023] [Indexed: 01/28/2024] Open
Abstract
N6-methyladenosine (m6A) is the most abundant internal eukaryotic mRNA modification, and is involved in the regulation of various biological processes. Direct Nanopore sequencing of native RNA (dRNA-seq) emerged as a leading approach for its identification. Several software were published for m6A detection and there is a strong need for independent studies benchmarking their performance on data from different species, and against various reference datasets. Moreover, a computational workflow is needed to streamline the execution of tools whose installation and execution remains complicated. We developed NanOlympicsMod, a Nextflow pipeline exploiting containerized technology for comparing 14 tools for m6A detection on dRNA-seq data. NanOlympicsMod was tested on dRNA-seq data generated from in vitro (un)modified synthetic oligos. The m6A hits returned by each tool were compared to the m6A position known by design of the oligos. In addition, NanOlympicsMod was used on dRNA-seq datasets from wild-type and m6A-depleted yeast, mouse and human, and each tool's hits were compared to reference m6A sets generated by leading orthogonal methods. The performance of the tools markedly differed across datasets, and methods adopting different approaches showed different preferences in terms of precision and recall. Changing the stringency cut-offs allowed for tuning the precision-recall trade-off towards user preferences. Finally, we determined that precision and recall of tools are markedly influenced by sequencing depth, and that additional sequencing would likely reveal additional m6A sites. Thanks to the possibility of including novel tools, NanOlympicsMod will streamline the benchmarking of m6A detection tools on dRNA-seq data, improving future RNA modification characterization.
Collapse
Affiliation(s)
- Simone Maestri
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - Mattia Furlan
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - Logan Mulroney
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, U.K
- Epigenetics and Neurobiology Unit, European Molecular Biology Laboratory (EMBL), Rome, Italy
| | - Lucia Coscujuela Tarrero
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - Camilla Ugolini
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - Fabio Dalla Pozza
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - Tommaso Leonardi
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, U.K
| | - Francesco Nicassio
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
| | - Mattia Pelizzola
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Milan, Italy
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
6
|
Cerneckis J, Ming GL, Song H, He C, Shi Y. The rise of epitranscriptomics: recent developments and future directions. Trends Pharmacol Sci 2024; 45:24-38. [PMID: 38103979 PMCID: PMC10843569 DOI: 10.1016/j.tips.2023.11.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Revised: 11/09/2023] [Accepted: 11/09/2023] [Indexed: 12/19/2023]
Abstract
The epitranscriptomics field has undergone tremendous growth since the discovery that the RNA N6-methyladenosine (m6A) modification is reversible and is distributed throughout the transcriptome. Efforts to map RNA modifications transcriptome-wide and reshape the epitranscriptome in disease settings have facilitated mechanistic understanding and drug discovery in the field. In this review we discuss recent advancements in RNA modification detection methods and consider how these developments can be applied to gain novel insights into the epitranscriptome. We also highlight drug discovery efforts aimed at developing epitranscriptomic therapeutics for cancer and other diseases. Finally, we consider engineering of the epitranscriptome as an emerging direction to investigate RNA modifications and their causal effects on RNA processing at high specificity.
Collapse
Affiliation(s)
- Jonas Cerneckis
- Department of Neurodegenerative Diseases, Beckman Research Institute of City of Hope, Duarte, CA 91010, USA; Irell & Manella Graduate School of Biological Sciences, Beckman Research Institute of City of Hope, Duarte, CA 91010, USA
| | - Guo-Li Ming
- Department of Neuroscience and Mahoney Institute for Neurosciences, Department of Cell and Developmental Biology, Department of Psychiatry, Institute for Regenerative Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hongjun Song
- Department of Neuroscience and Mahoney Institute for Neurosciences, Department of Cell and Developmental Biology, the Epigenetics Institute, Institute for Regenerative Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Chuan He
- Department of Chemistry, Department of Biochemistry and Molecular Biology, Institute for Biophysical Dynamics, Howard Hughes Medical Institute, the University of Chicago, Chicago, IL 60637, USA
| | - Yanhong Shi
- Department of Neurodegenerative Diseases, Beckman Research Institute of City of Hope, Duarte, CA 91010, USA; Irell & Manella Graduate School of Biological Sciences, Beckman Research Institute of City of Hope, Duarte, CA 91010, USA.
| |
Collapse
|
7
|
Zarnack K, Eyras E. 'Artificial intelligence and machine learning in RNA biology'. Brief Bioinform 2023; 24:bbad415. [PMID: 37965807 PMCID: PMC10646484 DOI: 10.1093/bib/bbad415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 10/26/2023] [Indexed: 11/16/2023] Open
Affiliation(s)
- Kathi Zarnack
- Buchmann Institute for Molecular Life Sciences (BMLS), Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
- Institute of Molecular Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 15, 60438 Frankfurt a.M., Germany
| | - Eduardo Eyras
- EMBL Australia Partner Laboratory Network at the Australian National University, Canberra, Australia
- The Shine-Dalgarno Centre for RNA Innovation, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
- The Centre for Computational Biomedical Sciences, The John Curtin School of Medical Research, Australian National University, Canberra, Australia
| |
Collapse
|