1
|
Hassan D, Ariyur A, Daulatabad SV, Mir Q, Janga SC. Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2´-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets. RNA Biol 2024; 21:1-15. [PMID: 38758523 PMCID: PMC11110688 DOI: 10.1080/15476286.2024.2352192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/01/2024] [Indexed: 05/18/2024] Open
Abstract
2´-O-methylation (Nm) is one of the most abundant modifications found in both mRNAs and noncoding RNAs. It contributes to many biological processes, such as the normal functioning of tRNA, the protection of mRNA against degradation by the decapping and exoribonuclease (DXO) protein, and the biogenesis and specificity of rRNA. Recent advancements in single-molecule sequencing techniques for long read RNA sequencing data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications from sequencing data. In this study, we propose a bio-computational framework, Nm-Nano, for predicting the presence of Nm sites in direct RNA sequencing data generated from two human cell lines. The Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites: Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with K-mer embedding. Evaluation on benchmark datasets from direct RNA sequecing of HeLa and HEK293 cell lines, demonstrates high accuracy (99% with XGBoost and 92% with RF) in identifying Nm sites. Deploying Nm-Nano on HeLa and HEK293 cell lines reveals genes that are frequently modified with Nm. In HeLa cell lines, 125 genes are identified as frequently Nm-modified, showing enrichment in 30 ontologies related to immune response and cellular processes. In HEK293 cell lines, 61 genes are identified as frequently Nm-modified, with enrichment in processes like glycolysis and protein localization. These findings underscore the diverse regulatory roles of Nm modifications in metabolic pathways, protein degradation, and cellular processes. The source code of Nm-Nano can be freely accessed at https://github.com/Janga-Lab/Nm-Nano.
Collapse
Affiliation(s)
- Doaa Hassan
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
- Computers and Systems Department, National Telecommunication Institute, Cairo, Egypt
| | - Aditya Ariyur
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
| | - Swapna Vidhur Daulatabad
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
| | - Quoseena Mir
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
| | - Sarath Chandra Janga
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
- Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana
| |
Collapse
|
2
|
Li H, Chen L, Huang Z, Luo X, Li H, Ren J, Xie Y. DeepOMe: A Web Server for the Prediction of 2'-O-Me Sites Based on the Hybrid CNN and BLSTM Architecture. Front Cell Dev Biol 2021; 9:686894. [PMID: 34055810 PMCID: PMC8160107 DOI: 10.3389/fcell.2021.686894] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 04/23/2021] [Indexed: 11/13/2022] Open
Abstract
2'-O-methylations (2'-O-Me or Nm) are one of the most important layers of regulatory control over gene expression. With increasing attentions focused on the characteristics, mechanisms and influences of 2'-O-Me, a revolutionary technique termed Nm-seq were established, allowing the identification of precise 2'-O-Me sites in RNA sequences with high sensitivity. However, as the costs and complexities involved with this new method, the large-scale detection and in-depth study of 2'-O-Me is still largely limited. Therefore, the development of a novel computational method to identify 2'-O-Me sites with adequate reliability is urgently needed at the current stage. To address the above issue, we proposed a hybrid deep-learning algorithm named DeepOMe that combined Convolutional Neural Networks (CNN) and Bidirectional Long Short-term Memory (BLSTM) to accurately predict 2'-O-Me sites in human transcriptome. Validating under 4-, 6-, 8-, and 10-fold cross-validation, we confirmed that our proposed model achieved a high performance (AUC close to 0.998 and AUPR close to 0.880). When testing in the independent data set, DeepOMe was substantially superior to NmSEER V2.0. To facilitate the usage of DeepOMe, a user-friendly web-server was constructed, which can be freely accessed at http://deepome.renlab.org.
Collapse
Affiliation(s)
- Hongyu Li
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Li Chen
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Zaoli Huang
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Xiaotong Luo
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Huiqin Li
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Jian Ren
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yubin Xie
- School of Life Sciences, Precision Medicine Institute, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
3
|
Zhou Y, Cui Q, Zhou Y. NmSEER V2.0: a prediction tool for 2'-O-methylation sites based on random forest and multi-encoding combination. BMC Bioinformatics 2019; 20:690. [PMID: 31874624 PMCID: PMC6929462 DOI: 10.1186/s12859-019-3265-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background 2′-O-methylation (2′-O-me or Nm) is a post-transcriptional RNA methylation modified at 2′-hydroxy, which is common in mRNAs and various non-coding RNAs. Previous studies revealed the significance of Nm in multiple biological processes. With Nm getting more and more attention, a revolutionary technique termed Nm-seq, was developed to profile Nm sites mainly in mRNA with single nucleotide resolution and high sensitivity. In a recent work, supported by the Nm-seq data, we have reported a method in silico for predicting Nm sites, which relies on nucleotide sequence information, and established an online server named NmSEER. More recently, a more confident dataset produced by refined Nm-seq was available. Therefore, in this work, we redesigned the prediction model to achieve a more robust performance on the new data. Results We redesigned the prediction model from two perspectives, including machine learning algorithm and multi-encoding scheme combination. With optimization by 5-fold cross-validation tests and evaluation by independent test respectively, random forest was selected as the most robust algorithm. Meanwhile, one-hot encoding, together with position-specific dinucleotide sequence profile and K-nucleotide frequency encoding were collectively applied to build the final predictor. Conclusions The predictor of updated version, named NmSEER V2.0, achieves an accurate prediction performance (AUROC = 0.862) and has been settled into a brand-new server, which is available at http://www.rnanut.net/nmseer-v2/ for free.
Collapse
Affiliation(s)
- Yiran Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China.,Center of Bioinformatics, Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Yuan Zhou
- Department of Biomedical Informatics, Department of Physiology and Pathophysiology, Center for Noncoding RNA Medicine, MOE Key Lab of Cardiovascular Sciences, School of Basic Medical Sciences, Peking University, 38 Xueyuan Rd, Beijing, 100191, China.
| |
Collapse
|
4
|
Glabonjat RA, Ehgartner J, Duncan EG, Raber G, Jensen KB, Krikowa F, Maher WA, Francesconi KA. Arsenolipid biosynthesis by the unicellular alga Dunaliella tertiolecta is influenced by As/P ratio in culture experiments. Metallomics 2018; 10:145-153. [DOI: 10.1039/c7mt00249a] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Culture experiments exposing unicellular algae to varying arsenate/phosphate regimes and determining their arsenometallomes by HPLC–MS shows the interconnection of arsenolipids and water-soluble arsenicals.
Collapse
Affiliation(s)
- Ronald A. Glabonjat
- Institute of Chemistry
- NAWI Graz
- University of Graz
- Universitaetsplatz 1
- 8010 Graz
| | - Josef Ehgartner
- Institute of Chemistry
- NAWI Graz
- University of Graz
- Universitaetsplatz 1
- 8010 Graz
| | - Elliott G. Duncan
- Ecochemistry Laboratory
- Institute for Applied Ecology
- University of Canberra
- University Drive
- Bruce
| | - Georg Raber
- Institute of Chemistry
- NAWI Graz
- University of Graz
- Universitaetsplatz 1
- 8010 Graz
| | - Kenneth B. Jensen
- Institute of Chemistry
- NAWI Graz
- University of Graz
- Universitaetsplatz 1
- 8010 Graz
| | - Frank Krikowa
- Ecochemistry Laboratory
- Institute for Applied Ecology
- University of Canberra
- University Drive
- Bruce
| | - William A. Maher
- Ecochemistry Laboratory
- Institute for Applied Ecology
- University of Canberra
- University Drive
- Bruce
| | | |
Collapse
|
5
|
Incarnato D, Anselmi F, Morandi E, Neri F, Maldotti M, Rapelli S, Parlato C, Basile G, Oliviero S. High-throughput single-base resolution mapping of RNA 2΄-O-methylated residues. Nucleic Acids Res 2017; 45:1433-1441. [PMID: 28180324 PMCID: PMC5388417 DOI: 10.1093/nar/gkw810] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Revised: 08/09/2016] [Accepted: 09/03/2016] [Indexed: 01/28/2023] Open
Abstract
Functional characterization of the transcriptome requires tools for the systematic investigation of RNA post-transcriptional modifications. 2΄-O-methylation (2΄-OMe) of the ribose moiety is one of the most abundant post-transcriptional modifications of RNA, although its systematic analysis is difficult due to the lack of reliable high-throughput mapping methods. We describe here a novel high-throughput approach, named 2OMe-seq, that enables fast and accurate mapping at single-base resolution, and relative quantitation, of 2΄-OMe modified residues. We compare our method to other state-of-art approaches, and show that it achieves higher sensitivity and specificity. By applying 2OMe-seq to HeLa cells, we show that it is able to recover the majority of the annotated 2΄-OMe sites on ribosomal RNA. By performing knockdown of the Fibrillarin methyltransferase in mouse embryonic stem cells (ESCs) we show the ability of 2OMe-seq to capture 2΄-O-Methylation level variations. Moreover, using 2OMe-seq data we here report the discovery of 12 previously unannotated 2΄-OMe sites across 18S and 28S rRNAs, 11 of which are conserved in both human and mouse cells, and assigned the respective snoRNAs for all sites. Our approach expands the repertoire of methods for transcriptome-wide mapping of RNA post-transcriptional modifications, and promises to provide novel insights into the role of this modification.
Collapse
Affiliation(s)
- Danny Incarnato
- Dipartimento di Scienze della Vita e Biologia dei Sistemi, Università di Torino, Via Accademia Albertina 13, Torino, Italy,Human Genetics Foundation (HuGeF), via Nizza 52, Torino, Italy
| | - Francesca Anselmi
- Dipartimento di Scienze della Vita e Biologia dei Sistemi, Università di Torino, Via Accademia Albertina 13, Torino, Italy,Human Genetics Foundation (HuGeF), via Nizza 52, Torino, Italy
| | - Edoardo Morandi
- Dipartimento di Scienze della Vita e Biologia dei Sistemi, Università di Torino, Via Accademia Albertina 13, Torino, Italy,Human Genetics Foundation (HuGeF), via Nizza 52, Torino, Italy
| | - Francesco Neri
- Human Genetics Foundation (HuGeF), via Nizza 52, Torino, Italy
| | - Mara Maldotti
- Dipartimento di Scienze della Vita e Biologia dei Sistemi, Università di Torino, Via Accademia Albertina 13, Torino, Italy,Human Genetics Foundation (HuGeF), via Nizza 52, Torino, Italy
| | - Stefania Rapelli
- Dipartimento di Scienze della Vita e Biologia dei Sistemi, Università di Torino, Via Accademia Albertina 13, Torino, Italy,Human Genetics Foundation (HuGeF), via Nizza 52, Torino, Italy
| | | | - Giulia Basile
- Human Genetics Foundation (HuGeF), via Nizza 52, Torino, Italy
| | - Salvatore Oliviero
- Dipartimento di Scienze della Vita e Biologia dei Sistemi, Università di Torino, Via Accademia Albertina 13, Torino, Italy,Human Genetics Foundation (HuGeF), via Nizza 52, Torino, Italy
| |
Collapse
|
6
|
Structural studies of RNA-protein complexes: A hybrid approach involving hydrodynamics, scattering, and computational methods. Methods 2016; 118-119:146-162. [PMID: 27939506 DOI: 10.1016/j.ymeth.2016.12.002] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Revised: 12/01/2016] [Accepted: 12/05/2016] [Indexed: 01/01/2023] Open
Abstract
The diverse functional cellular roles played by ribonucleic acids (RNA) have emphasized the need to develop rapid and accurate methodologies to elucidate the relationship between the structure and function of RNA. Structural biology tools such as X-ray crystallography and Nuclear Magnetic Resonance are highly useful methods to obtain atomic-level resolution models of macromolecules. However, both methods have sample, time, and technical limitations that prevent their application to a number of macromolecules of interest. An emerging alternative to high-resolution structural techniques is to employ a hybrid approach that combines low-resolution shape information about macromolecules and their complexes from experimental hydrodynamic (e.g. analytical ultracentrifugation) and solution scattering measurements (e.g., solution X-ray or neutron scattering), with computational modeling to obtain atomic-level models. While promising, scattering methods rely on aggregation-free, monodispersed preparations and therefore the careful development of a quality control pipeline is fundamental to an unbiased and reliable structural determination. This review article describes hydrodynamic techniques that are highly valuable for homogeneity studies, scattering techniques useful to study the low-resolution shape, and strategies for computational modeling to obtain high-resolution 3D structural models of RNAs, proteins, and RNA-protein complexes.
Collapse
|
7
|
Lee KW, Bogenhagen DF. Assignment of 2'-O-methyltransferases to modification sites on the mammalian mitochondrial large subunit 16 S ribosomal RNA (rRNA). J Biol Chem 2014; 289:24936-42. [PMID: 25074936 DOI: 10.1074/jbc.c114.581868] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Advances in proteomics and large scale studies of potential mitochondrial proteins have led to the identification of many novel mitochondrial proteins in need of further characterization. Among these novel proteins are three mammalian rRNA methyltransferase family members RNMTL1, MRM1, and MRM2. MRM1 and MRM2 have bacterial and yeast homologs, whereas RNMTL1 appears to have evolved later in higher eukaryotes. We recently confirmed the localization of the three proteins to mitochondria, specifically in the vicinity of mtDNA nucleoids. In this study, we took advantage of the ability of 2'-O-ribose modification to block site-specific cleavage of RNA by DNAzymes to show that MRM1, MRM2, and RNMTL1 are responsible for modification of human large subunit rRNA at residues G(1145), U(1369), and G(1370), respectively.
Collapse
Affiliation(s)
- Ken-Wing Lee
- From the Department of Pharmacological Sciences, Stony Brook University, Stony Brook, New York 11794-8651
| | - Daniel F Bogenhagen
- From the Department of Pharmacological Sciences, Stony Brook University, Stony Brook, New York 11794-8651
| |
Collapse
|