1
|
Wang R, Chung CR, Lee TY. Interpretable Multi-Scale Deep Learning for RNA Methylation Analysis across Multiple Species. Int J Mol Sci 2024; 25:2869. [PMID: 38474116 DOI: 10.3390/ijms25052869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 02/19/2024] [Accepted: 02/28/2024] [Indexed: 03/14/2024] Open
Abstract
RNA modification plays a crucial role in cellular regulation. However, traditional high-throughput sequencing methods for elucidating their functional mechanisms are time-consuming and labor-intensive, despite extensive research. Moreover, existing methods often limit their focus to specific species, neglecting the simultaneous exploration of RNA modifications across diverse species. Therefore, a versatile computational approach is necessary for interpretable analysis of RNA modifications across species. A multi-scale biological language-based deep learning model is proposed for interpretable, sequential-level prediction of diverse RNA modifications. Benchmark comparisons across species demonstrate the model's superiority in predicting various RNA methylation types over current state-of-the-art methods. The cross-species validation and attention weight visualization also highlight the model's capability to capture sequential and functional semantics from genomic backgrounds. Our analysis of RNA modifications helps us find the potential existence of "biological grammars" in each modification type, which could be effective for mapping methylation-related sequential patterns and understanding the underlying biological mechanisms of RNA modifications.
Collapse
Affiliation(s)
- Rulan Wang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan
| |
Collapse
|
2
|
Khanal J, Kandel J, Tayara H, Chong KT. CapsNh-Kcr: Capsule network-based prediction of lysine crotonylation sites in human non-histone proteins. Comput Struct Biotechnol J 2022; 21:120-127. [PMID: 36544479 PMCID: PMC9735261 DOI: 10.1016/j.csbj.2022.11.056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 11/10/2022] [Accepted: 11/26/2022] [Indexed: 12/04/2022] Open
Abstract
Lysine crotonylation (Kcr) is one of the most important post-translational modifications (PTMs) that is widely detected in both histone and non-histone proteins. In fact, Kcr is reported to be involved in various biological processes, such as metabolism and cell differentiation. However, the available experimental methods for Kcr site identification are laborious and costly. To effectively replace existing experimental approaches, some computational methods have been developed in the last few years. The available computational methods still lack some important aspects, as they can only identify Kcr sites on either histone-only or combined histone and nonhistone proteins. Although a tool was developed to identify Kcr sites on non-histone proteins only, its performance is inadequate and the exploration of hidden Kcr patterns (motifs) has been completely ignored, which might be significant for detailed Kcr studies. Therefore, algorithms that can more effectively predict Kcr sites on non-histone proteins with their biological meaning need to be designed. Accordingly, we developed a novel deep learning (capsule network)-based model, named CapsNh-Kcr, for Kcr site prediction, particularly focusing on non-histone proteins. Based on the independent results, the proposed model achieves an AUC of 0.9120, which is approximately 6% higher than that of previous nhKcr model in the prediction of Kcr sites on non-histone proteins. Further, we revealed, for the first time, that the proposed model can represent obvious motif distribution across Kcr sites in non-histone proteins. The source code (in Python) is publicly available at https://github.com/Jhabindra-bioinfo/CapsNh-Kcr.
Collapse
Affiliation(s)
- Jhabindra Khanal
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Jeevan Kandel
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea,Corresponding authors at: School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea (H. Tayara); Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea (K.T. Chong).
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea,Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, South Korea,Corresponding authors at: School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, South Korea (H. Tayara); Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea (K.T. Chong).
| |
Collapse
|
3
|
Que-Salinas U, Martinez-Peon D, Reyes-Figueroa AD, Ibarra I, Scheckhuber CQ. On the Prediction of In Vitro Arginine Glycation of Short Peptides Using Artificial Neural Networks. SENSORS (BASEL, SWITZERLAND) 2022; 22:5237. [PMID: 35890916 PMCID: PMC9324327 DOI: 10.3390/s22145237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/08/2022] [Accepted: 07/11/2022] [Indexed: 06/15/2023]
Abstract
One of the hallmarks of diabetes is an increased modification of cellular proteins. The most prominent type of modification stems from the reaction of methylglyoxal with arginine and lysine residues, leading to structural and functional impairments of target proteins. For lysine glycation, several algorithms allow a prediction of occurrence; thus, making it possible to pinpoint likely targets. However, according to our knowledge, no approaches have been published for predicting the likelihood of arginine glycation. There are indications that arginine and not lysine is the most prominent target for the toxic dialdehyde. One of the reasons why there is no arginine glycation predictor is the limited availability of quantitative data. Here, we used a recently published high-quality dataset of arginine modification probabilities to employ an artificial neural network strategy. Despite the limited data availability, our results achieve an accuracy of about 75% of correctly predicting the exact value of the glycation probability of an arginine-containing peptide without setting thresholds upon whether it is decided if a given arginine is modified or not. This contribution suggests a solution for predicting arginine glycation of short peptides.
Collapse
Affiliation(s)
- Ulices Que-Salinas
- Centro de Ciencias de la Tierra, Universidad Veracruzana, Xalapa 91090, VER, Mexico;
| | - Dulce Martinez-Peon
- Department of Electrical and Electronic Engineering, National Technological Institute of Mexico/IT, Monterrey 67170, NL, Mexico;
| | - Angel D. Reyes-Figueroa
- Consejo Nacional de Ciencia y Tecnología, Av. Insurgentes Sur 1582, Col. Crédito Constructor, Benito Juárez, Mexico City 03940, DF, Mexico;
- Centro de Investigación en Matemáticas Unidad Monterrey, Parque de Investigación e Innovación Tecnológica (PIIT), Av. Alianza Centro No. 502, Apodaca 66628, NL, Mexico
| | - Ivonne Ibarra
- Independent Researcher, Monterrey 66620, NL, Mexico;
| | - Christian Quintus Scheckhuber
- Departamento de Bioingeniería, Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey 64849, NL, Mexico
| |
Collapse
|
4
|
Ma R, Li S, Li W, Yao L, Huang HD, Lee TY. KinasePhos 3.0: Redesign and Expansion of the Prediction on Kinase-specific Phosphorylation Sites. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022:S1672-0229(22)00081-X. [PMID: 35781048 PMCID: PMC10373160 DOI: 10.1016/j.gpb.2022.06.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Revised: 05/30/2022] [Accepted: 06/27/2022] [Indexed: 06/04/2023]
Abstract
The purpose of this work is to enhance KinasePhos, a machine learning-based kinase-specific phosphorylation site prediction tool. Experimentally verified kinase-specific phosphorylation data were collected from PhosphoSitePlus, UniProtKB, the Group-based Prediction System 5.0, and Phospho.ELM. In total, 41,421 experimentally verified kinase-specific phosphorylation sites were identified. A total of 1380 unique kinases were identified, including 753 with existing classification information from KinBase and the remaining 627 annotated by building a phylogenetic tree. Based on this kinase classification, a total of 771 predictive models were built at the individual, family, and group levels, using at least 15 experimentally verified substrate sites in positive training datasets. The improved models demonstrated their effectiveness compared with other prediction tools. For example, the prediction of sites phosphorylated by the protein kinase B, casein kinase 2, and protein kinase A families had accuracies of 94.5%, 92.5%, and 90.0%, respectively. The average prediction accuracy for all 771 models was 87.2%. For enhancing interpretability, the SHapley Additive exPlanations (SHAP) method was employed to assess feature importance. The web interface of KinasePhos 3.0 has been redesigned to provide comprehensive annotations of kinase-specific phosphorylation sites on multiple proteins. Additionally, considering the large scale of phosphoproteomic data, a downloadable prediction tool is available at https://awi.cuhk.edu.cn/KinasePhos/download.html or https://github.com/tom-209/KinasePhos-3.0-executable-file.
Collapse
Affiliation(s)
- Renfei Ma
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
| | - Shangfu Li
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Lantian Yao
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China.
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China; School of Life and Health Sciences, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China.
| |
Collapse
|
5
|
de Brevern AG, Rebehmed J. Current status of PTMs structural databases: applications, limitations and prospects. Amino Acids 2022; 54:575-590. [PMID: 35020020 DOI: 10.1007/s00726-021-03119-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 12/20/2021] [Indexed: 12/11/2022]
Abstract
Protein 3D structures, determined by their amino acid sequences, are the support of major crucial biological functions. Post-translational modifications (PTMs) play an essential role in regulating these functions by altering the physicochemical properties of proteins. By virtue of their importance, several PTM databases have been developed and released in decades, but very few of these databases incorporate real 3D structural data. Since PTMs influence the function of the protein and their aberrant states are frequently implicated in human diseases, providing structural insights to understand the influence and dynamics of PTMs is crucial for unraveling the underlying processes. This review is dedicated to the current status of databases providing 3D structural data on PTM sites in proteins. Some of these databases are general, covering multiple types of PTMs in different organisms, while others are specific to one particular type of PTM, class of proteins or organism. The importance of these databases is illustrated with two major types of in silico applications: predicting PTM sites in proteins using machine learning approaches and investigating protein structure-function relationships involving PTMs. Finally, these databases suffer from multiple problems and care must be taken when analyzing the PTMs data.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université de Paris, INSERM, UMR_S 1134, DSIMB, 75739, Paris, France.,Université de la Réunion, INSERM, UMR_S 1134, DSIMB, 97715, Saint-Denis de La Réunion, France.,Laboratoire d'Excellence GR-Ex, 75739, Paris, France
| | - Joseph Rebehmed
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon.
| |
Collapse
|
6
|
Subba P, Prasad TSK. Protein Crotonylation Expert Review: A New Lens to Take Post-Translational Modifications and Cell Biology to New Heights. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2021; 25:617-625. [PMID: 34582706 DOI: 10.1089/omi.2021.0132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genome regulation, temporal and spatial variations in cell function, continues to puzzle and interest life scientists who aim to unravel the molecular basis of human health and disease, not to mention plant biology and ecosystem diversity. Despite important advances in epigenomics and protein post-translational modifications over the past decade, there is a need for new conceptual lenses to understand biological mechanisms that can help unravel the fundamental regulatory questions in genomes and the cell. To these ends, lys crotonylation (Kcr) is a reversible protein modification catalyzed by protein crotonyl transferases and decrotonylases. First identified on histones, Kcr regulates cellular processes at the chromatin level. Research thus far has revealed that Kcr marks promoter sites of active genes and potential enhancers. Eventually, Kcr on a number of nonhistone proteins was reported. The abundance of Kcr on ribosomal and myofilament proteins indicates its functional roles in protein synthesis and muscle contraction. Kcr has also been associated with pluripotency, spermiogenesis, and DNA repair. In plants, large-scale mass spectrometry-based experiments validated the roles of Kcr in photosynthesis. In this expert review, we present the latest thinking and findings on lys crotonylation with an eye to regulation of cell biology. We discuss the enrichment techniques, putative biological functions, and challenges associated with studying this protein modification with vast biological implications. Finally, we reflect on the future outlook about the broader relevance of Kcr in animals, microbes, and plant species.
Collapse
Affiliation(s)
- Pratigya Subba
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore, India
| | | |
Collapse
|