1
|
Wang X, Zhang Z, Liu C. iACP-DFSRA: Identification of Anticancer Peptides Based on a Dual-channel Fusion Strategy of ResCNN and Attention. J Mol Biol 2024; 436:168810. [PMID: 39362624 DOI: 10.1016/j.jmb.2024.168810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 09/10/2024] [Accepted: 09/27/2024] [Indexed: 10/05/2024]
Abstract
Anticancer peptides (ACPs) have been widely applied in the treatment of cancer owing to good safety, rational side effects, and high selectivity. However, the number of ACPs that have been experimentally validated is limited as identification of ACPs is extremely expensive. Hence, accurate and cost-effective identification methods for ACPs are urgently needed. In this work, we proposed a deep learning-based model, named iACP-DFSRA, for ACPs identification. Specifically, we adopted two kinds of sequence embedding technologies, ProtBert_BFD pre-training language model and handcrafted features to encode protein sequences. Then, the LightGBM was used for feature selection, and the selected features were input into ResCNN and Attention mechanism, respectively, to extract local and global features. Finally, the concatenate features were deeply fused by using the Attention mechanism to allow key features to be paid more attention to by the model and make predictions by fully connected layer. The results of 10-fold cross-validation demonstrated that the iACP-DFSRA model delivered improved results in most metrics with Sp of 94.15%, Sn of 95.32%, Acc of 94.74% and MCC of 89.48% compared to the latest AACFlow model. Indeed, the iACP-DFSRA model is the only model with Acc > 90% and MCC > 80% on this independent test dataset. Furthermore, we have further demonstrated the superiority of our model on additional datasets. In addition, t-SNE and SHAP interpretation analysis demonstrated that it is crucial to use two channels for feature extraction and use the Attention mechanism for deep fusion, which helps the iACP-DFSRA to predict ACPs more effectively.
Collapse
Affiliation(s)
- Xin Wang
- School of Science, Dalian Maritime University, Dalian 116026, China.
| | - Zimeng Zhang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Chang Liu
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
2
|
Qin Z, Ren H, Zhao P, Wang K, Liu H, Miao C, Du Y, Li J, Wu L, Chen Z. Current computational tools for protein lysine acylation site prediction. Brief Bioinform 2024; 25:bbae469. [PMID: 39316944 PMCID: PMC11421846 DOI: 10.1093/bib/bbae469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/20/2024] [Accepted: 09/07/2024] [Indexed: 09/26/2024] Open
Abstract
As a main subtype of post-translational modification (PTM), protein lysine acylations (PLAs) play crucial roles in regulating diverse functions of proteins. With recent advancements in proteomics technology, the identification of PTM is becoming a data-rich field. A large amount of experimentally verified data is urgently required to be translated into valuable biological insights. With computational approaches, PLA can be accurately detected across the whole proteome, even for organisms with small-scale datasets. Herein, a comprehensive summary of 166 in silico PLA prediction methods is presented, including a single type of PLA site and multiple types of PLA sites. This recapitulation covers important aspects that are critical for the development of a robust predictor, including data collection and preparation, sample selection, feature representation, classification algorithm design, model evaluation, and method availability. Notably, we discuss the application of protein language models and transfer learning to solve the small-sample learning issue. We also highlight the prediction methods developed for functionally relevant PLA sites and species/substrate/cell-type-specific PLA sites. In conclusion, this systematic review could potentially facilitate the development of novel PLA predictors and offer useful insights to researchers from various disciplines.
Collapse
Affiliation(s)
- Zhaohui Qin
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Haoran Ren
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Pei Zhao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang 455000, China
| | - Kaiyuan Wang
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Huixia Liu
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Chunbo Miao
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Yanxiu Du
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Junzhou Li
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Liuji Wu
- National Key Laboratory of Wheat and Maize Crop Science, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| | - Zhen Chen
- Collaborative Innovation Center of Henan Grain Crops, Henan Key Laboratory of Rice Molecular Breeding and High Efficiency Production, College of Agronomy, Henan Agricultural University, Zhengzhou 450046, China
| |
Collapse
|
3
|
Hu F, Gao J, Zheng J, Kwoh C, Jia C. N-GlycoPred: A hybrid deep learning model for accurate identification of N-glycosylation sites. Methods 2024; 227:48-57. [PMID: 38734394 DOI: 10.1016/j.ymeth.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/16/2024] [Accepted: 05/03/2024] [Indexed: 05/13/2024] Open
Abstract
Studies have shown that protein glycosylation in cells reflects the real-time dynamics of biological processes, and the occurrence and development of many diseases are closely related to protein glycosylation. Abnormal protein glycosylation can be used as a potential diagnostic and prognostic marker of a disease, as well as a therapeutic target and a new breakthrough point for exploring pathogenesis. To address the issue of significant differences in the prediction results of previous models for different species, we constructed a hybrid deep learning model N-GlycoPred on the basis of dual-layer convolution, a paired attention mechanism and BiLSTM for accurate identification of N-glycosylation sites. By adopting one-hot encoding or the AAindex, we specifically selected the optimum combination of features and deep learning frameworks for human and mouse to refine the models. Based on six independent test datasets, our N-GlycoPred model achieved an average AUC of 0.9553, which is 0.23% higher than MusiteDeep. The comparison results indicate that our model can serve as a powerful tool for N-glycosylation site prescreening for biological researchers.
Collapse
Affiliation(s)
- Fengzhu Hu
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jie Gao
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cheekeong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China.
| |
Collapse
|
4
|
Gao J, Zhao Y, Chen C, Ning Q. MVNN-HNHC:A multi-view neural network for identification of human non-histone crotonylation sites. Anal Biochem 2024; 687:115426. [PMID: 38141798 DOI: 10.1016/j.ab.2023.115426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/21/2023] [Accepted: 12/06/2023] [Indexed: 12/25/2023]
Abstract
Crotonylation on lysine sites in human non-histone proteins plays a crucial role in biology activities. However, because traditional experimental methods for crotonylation site identification are time-consuming and labor-intensive, computational prediction methods have become increasingly popular in recent years. Despite its significance, crotonylation site prediction has received less attention in non-histone proteins than in histones. In this study, we proposed a Multi-View Neural Network for identification of Human Non-Histone Crotonylation sites, named MVNN-HNHC. MVNN-HNHC integrated multi-view encoding features and adaptive encoding features through multi-channel neural network to deeply learn about attribute differences between crotonylation sites and non-crotonylation sites from various aspects. In MVNN-HNHC, convolutional neural networks can obtain local information from these features, and bidirectional long short term memory networks were utilized to extract sequence information. Then, we employ the attention mechanism to fuse the outputs of various feature extraction modules. Finally, the fully connection network acted as the classifier to predict whether a lysine site was crotonylation site or non-crotonylation site. Performance metrics on independent test set, including sensitivity, specificity, accuracy, Matthews correlation coefficient, and area under the curve (AUC) values reach 80.06 %, 75.77 %, 77.06 %, 0.5203, and 0.7792, respectively. To verify the effectiveness of this method, we carry out a series of experiments and the results show that MVNN-HNHC is an effective tool for predicting crotonylation sites in non-histone proteins. The data and code are available on https://github.com/xbbxhbc/junjun0612.git.
Collapse
Affiliation(s)
- Jun Gao
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Yaomiao Zhao
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Chen Chen
- Naval Architecture and Ocean Engineering College, Dalian Maritime University, Dalian, 116026, China.
| | - Qiao Ning
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China.
| |
Collapse
|
5
|
Stastna M. Post-translational modifications of proteins in cardiovascular diseases examined by proteomic approaches. FEBS J 2024. [PMID: 38440918 DOI: 10.1111/febs.17108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 01/22/2024] [Accepted: 02/20/2024] [Indexed: 03/06/2024]
Abstract
Over 400 different types of post-translational modifications (PTMs) have been reported and over 200 various types of PTMs have been discovered using mass spectrometry (MS)-based proteomics. MS-based proteomics has proven to be a powerful method capable of global PTM mapping with the identification of modified proteins/peptides, the localization of PTM sites and PTM quantitation. PTMs play regulatory roles in protein functions, activities and interactions in various heart related diseases, such as ischemia/reperfusion injury, cardiomyopathy and heart failure. The recognition of PTMs that are specific to cardiovascular pathology and the clarification of the mechanisms underlying these PTMs at molecular levels are crucial for discovery of novel biomarkers and application in a clinical setting. With sensitive MS instrumentation and novel biostatistical methods for precise processing of the data, low-abundance PTMs can be successfully detected and the beneficial or unfavorable effects of specific PTMs on cardiac function can be determined. Moreover, computational proteomic strategies that can predict PTM sites based on MS data have gained an increasing interest and can contribute to characterization of PTM profiles in cardiovascular disorders. More recently, machine learning- and deep learning-based methods have been employed to predict the locations of PTMs and explore PTM crosstalk. In this review article, the types of PTMs are briefly overviewed, approaches for PTM identification/quantitation in MS-based proteomics are discussed and recently published proteomic studies on PTMs associated with cardiovascular diseases are included.
Collapse
Affiliation(s)
- Miroslava Stastna
- Institute of Analytical Chemistry of the Czech Academy of Sciences, Brno, Czech Republic
| |
Collapse
|
6
|
Ertelt M, Mulligan VK, Maguire JB, Lyskov S, Moretti R, Schiffner T, Meiler J, Schoeder CT. Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins. PLoS Comput Biol 2024; 20:e1011939. [PMID: 38484014 DOI: 10.1371/journal.pcbi.1011939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 03/26/2024] [Accepted: 02/20/2024] [Indexed: 03/27/2024] Open
Abstract
Post-translational modifications (PTMs) of proteins play a vital role in their function and stability. These modifications influence protein folding, signaling, protein-protein interactions, enzyme activity, binding affinity, aggregation, degradation, and much more. To date, over 400 types of PTMs have been described, representing chemical diversity well beyond the genetically encoded amino acids. Such modifications pose a challenge to the successful design of proteins, but also represent a major opportunity to diversify the protein engineering toolbox. To this end, we first trained artificial neural networks (ANNs) to predict eighteen of the most abundant PTMs, including protein glycosylation, phosphorylation, methylation, and deamidation. In a second step, these models were implemented inside the computational protein modeling suite Rosetta, which allows flexible combination with existing protocols to model the modified sites and understand their impact on protein stability as well as function. Lastly, we developed a new design protocol that either maximizes or minimizes the predicted probability of a particular site being modified. We find that this combination of ANN prediction and structure-based design can enable the modification of existing, as well as the introduction of novel, PTMs. The potential applications of our work include, but are not limited to, glycan masking of epitopes, strengthening protein-protein interactions through phosphorylation, as well as protecting proteins from deamidation liabilities. These applications are especially important for the design of new protein therapeutics where PTMs can drastically change the therapeutic properties of a protein. Our work adds novel tools to Rosetta's protein engineering toolbox that allow for the rational design of PTMs.
Collapse
Affiliation(s)
- Moritz Ertelt
- Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany
| | - Vikram Khipple Mulligan
- Center for Computational Biology, Flatiron Institute, New York, New York, United States of America
| | - Jack B Maguire
- Program in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Sergey Lyskov
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Rocco Moretti
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Torben Schiffner
- Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany
| | - Jens Meiler
- Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, United States of America
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Clara T Schoeder
- Institute for Drug Discovery, Leipzig University Medical Faculty, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence ScaDS.AI, Dresden/Leipzig, Germany
| |
Collapse
|
7
|
Emami N, Ferdousi R. HormoNet: a deep learning approach for hormone-drug interaction prediction. BMC Bioinformatics 2024; 25:87. [PMID: 38418979 PMCID: PMC10903040 DOI: 10.1186/s12859-024-05708-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 02/16/2024] [Indexed: 03/02/2024] Open
Abstract
Several experimental evidences have shown that the human endogenous hormones can interact with drugs in many ways and affect drug efficacy. The hormone drug interactions (HDI) are essential for drug treatment and precision medicine; therefore, it is essential to understand the hormone-drug associations. Here, we present HormoNet to predict the HDI pairs and their risk level by integrating features derived from hormone and drug target proteins. To the best of our knowledge, this is one of the first attempts to employ deep learning approach for prediction of HDI prediction. Amino acid composition and pseudo amino acid composition were applied to represent target information using 30 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied synthetic minority over-sampling technique technique. Additionally, we constructed novel datasets for HDI prediction and the risk level of their interaction. HormoNet achieved high performance on our constructed hormone-drug benchmark datasets. The results provide insights into the understanding of the relationship between hormone and a drug, and indicate the potential benefit of reducing risk levels of interactions in designing more effective therapies for patients in drug treatments. Our benchmark datasets and the source codes for HormoNet are available in: https://github.com/EmamiNeda/HormoNet .
Collapse
Affiliation(s)
- Neda Emami
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
8
|
Jiang Y, Yan R, Wang X. PlantNh-Kcr: a deep learning model for predicting non-histone crotonylation sites in plants. PLANT METHODS 2024; 20:28. [PMID: 38360730 PMCID: PMC10870457 DOI: 10.1186/s13007-024-01157-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/07/2024] [Indexed: 02/17/2024]
Abstract
BACKGROUND Lysine crotonylation (Kcr) is a crucial protein post-translational modification found in histone and non-histone proteins. It plays a pivotal role in regulating diverse biological processes in both animals and plants, including gene transcription and replication, cell metabolism and differentiation, as well as photosynthesis. Despite the significance of Kcr, detection of Kcr sites through biological experiments is often time-consuming, expensive, and only a fraction of crotonylated peptides can be identified. This reality highlights the need for efficient and rapid prediction of Kcr sites through computational methods. Currently, several machine learning models exist for predicting Kcr sites in humans, yet models tailored for plants are rare. Furthermore, no downloadable Kcr site predictors or datasets have been developed specifically for plants. To address this gap, it is imperative to integrate existing Kcr sites detected in plant experiments and establish a dedicated computational model for plants. RESULTS Most plant Kcr sites are located on non-histones. In this study, we collected non-histone Kcr sites from five plants, including wheat, tabacum, rice, peanut, and papaya. We then conducted a comprehensive analysis of the amino acid distribution surrounding these sites. To develop a predictive model for plant non-histone Kcr sites, we combined a convolutional neural network (CNN), a bidirectional long short-term memory network (BiLSTM), and attention mechanism to build a deep learning model called PlantNh-Kcr. On both five-fold cross-validation and independent tests, PlantNh-Kcr outperformed multiple conventional machine learning models and other deep learning models. Furthermore, we conducted an analysis of species-specific effect on the PlantNh-Kcr model and found that a general model trained using data from multiple species outperforms species-specific models. CONCLUSION PlantNh-Kcr represents a valuable tool for predicting plant non-histone Kcr sites. We expect that this model will aid in addressing key challenges and tasks in the study of plant crotonylation sites.
Collapse
Affiliation(s)
- Yanming Jiang
- College of Mathematics and Computer Sciences, Shanxi Normal University, Taiyuan, 030031, China
| | - Renxiang Yan
- The Key Laboratory of Marine Enzyme Engineering of Fujian Province, Fuzhou University, Fuzhou, 350002, China
- College of Biological Science and Engineering, Fuzhou University, Fuzhou, 350002, China
| | - Xiaofeng Wang
- College of Mathematics and Computer Sciences, Shanxi Normal University, Taiyuan, 030031, China.
| |
Collapse
|
9
|
Hu F, Li W, Li Y, Hou C, Ma J, Jia C. O-GlcNAcPRED-DL: Prediction of Protein O-GlcNAcylation Sites Based on an Ensemble Model of Deep Learning. J Proteome Res 2024; 23:95-106. [PMID: 38054441 DOI: 10.1021/acs.jproteome.3c00458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
O-linked β-N-acetylglucosamine (O-GlcNAc) is a post-translational modification (i.e., O-GlcNAcylation) on serine/threonine residues of proteins, regulating a plethora of physiological and pathological events. As a dynamic process, O-GlcNAc functions in a site-specific manner. However, the experimental identification of the O-GlcNAc sites remains challenging in many scenarios. Herein, by leveraging the recent progress in cataloguing experimentally identified O-GlcNAc sites and advanced deep learning approaches, we establish an ensemble model, O-GlcNAcPRED-DL, a deep learning-based tool, for the prediction of O-GlcNAc sites. In brief, to make a benchmark O-GlcNAc data set, we extracted the information on O-GlcNAc from the recently constructed database O-GlcNAcAtlas, which contains thousands of experimentally identified and curated O-GlcNAc sites on proteins from multiple species. To overcome the imbalance between positive and negative data sets, we selected five groups of negative data sets in humans and mice to construct an ensemble predictor based on connection of a convolutional neural network and bidirectional long short-term memory. By taking into account three types of sequence information, we constructed four network frameworks, with the systematically optimized parameters used for the models. The thorough comparison analysis on two independent data sets of humans and mice and six independent data sets from other species demonstrated remarkably increased sensitivity and accuracy of the O-GlcNAcPRED-DL models, outperforming other existing tools. Moreover, a user-friendly Web server for O-GlcNAcPRED-DL has been constructed, which is freely available at http://oglcnac.org/pred_dl.
Collapse
Affiliation(s)
- Fengzhu Hu
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Weiyu Li
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, District of Columbia 20007, United States
| | - Yaoxiang Li
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, District of Columbia 20007, United States
| | - Chunyan Hou
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, District of Columbia 20007, United States
| | - Junfeng Ma
- Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, Washington, District of Columbia 20007, United States
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
10
|
Sharma S, Sarkar O, Ghosh R. Exploring the Role of Unconventional Post-Translational Modifications in Cancer Diagnostics and Therapy. Curr Protein Pept Sci 2024; 25:780-796. [PMID: 38910429 DOI: 10.2174/0113892037274615240528113148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 04/22/2024] [Accepted: 04/29/2024] [Indexed: 06/25/2024]
Abstract
Unconventional Post-Translational Modifications (PTMs) have gained increasing attention as crucial players in cancer development and progression. Understanding the role of unconventional PTMs in cancer has the potential to revolutionize cancer diagnosis, prognosis, and therapeutic interventions. These modifications, which include O-GlcNAcylation, glutathionylation, crotonylation, including hundreds of others, have been implicated in the dysregulation of critical cellular processes and signaling pathways in cancer cells. This review paper aims to provide a comprehensive analysis of unconventional PTMs in cancer as diagnostic markers and therapeutic targets. The paper includes reviewing the current knowledge on the functional significance of various conventional and unconventional PTMs in cancer biology. Furthermore, the paper highlights the advancements in analytical techniques, such as biochemical analyses, mass spectrometry and bioinformatic tools etc., that have enabled the detection and characterization of unconventional PTMs in cancer. These techniques have contributed to the identification of specific PTMs associated with cancer subtypes. The potential use of Unconventional PTMs as biomarkers will further help in better diagnosis and aid in discovering potent therapeutics. The knowledge about the role of Unconventional PTMs in a vast and rapidly expanding field will help in detection and targeted therapy of cancer.
Collapse
Affiliation(s)
- Sayan Sharma
- Department of Biotechnology, Amity University Kolkata, AIBNK, Kolkata, West Bengal, India
| | - Oindrila Sarkar
- Department of Biotechnology, Amity University Kolkata, AIBNK, Kolkata, West Bengal, India
| | - Rajgourab Ghosh
- Department of Biotechnology, Amity University Kolkata, AIBNK, Kolkata, West Bengal, India
| |
Collapse
|
11
|
Chen L, Chen Y. RMTLysPTM: recognizing multiple types of lysine PTM sites by deep analysis on sequences. Brief Bioinform 2023; 25:bbad450. [PMID: 38066710 PMCID: PMC10783864 DOI: 10.1093/bib/bbad450] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 10/24/2023] [Accepted: 11/15/2023] [Indexed: 12/18/2023] Open
Abstract
Post-translational modification (PTM) occurs after a protein is translated from ribonucleic acid. It is an important living creature life phenomenon because it is implicated in almost all cellular processes. Identification of PTM sites from a given protein sequence is a hot topic in bioinformatics. Lots of computational methods have been proposed, and they provide good performance. However, most previous methods can only tackle one PTM type. Few methods consider multiple PTM types. In this study, a multi-label classification model, named RMTLysPTM, was developed to recognize four types of lysine (K) PTM sites, including acetylation, crotonylation, methylation and succinylation. The surrounding sites of a lysine site were selected to constitute a peptide segment, representing the lysine at the center. Deep analysis was conducted to count the distribution of 2-residues with fixed location across the four types of lysine PTM sites. By aggregating the distribution information of 2-residues in one peptide segment, the peptide segment was encoded by informative features. Furthermore, a prediction engine that can precisely capture the traits of the above representations was designed to recognize the types of lysine PTM sites. The cross-validation results on two datasets (Qiu and CPLM training datasets) suggested that the model had extremely high performance and RMTLysPTM had strong generalization ability by testing it on protein Q16778 and CPLM testing datasets. The model was found to be generally superior to all previous models and those using popular methods and features. A web server was set up for RMTLysPTM, and it can be accessed at http://119.3.127.138/.
Collapse
Affiliation(s)
- Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People’s Republic of China
| | - Yuwei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai 201306, People’s Republic of China
| |
Collapse
|
12
|
Chen HX, Wang XC, Hou HT, Wang J, Yang Q, Chen YL, Chen HZ, He GW. Lysine crotonylation of SERCA2a correlates to cardiac dysfunction and arrhythmia in Sirt1 cardiac-specific knockout mice. Int J Biol Macromol 2023; 242:125151. [PMID: 37270127 DOI: 10.1016/j.ijbiomac.2023.125151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 03/08/2023] [Accepted: 05/27/2023] [Indexed: 06/05/2023]
Abstract
Protein post-translational modifications (PTMs) are important regulators of protein functions and produce proteome complexity. SIRT1 has NAD+-dependent deacylation of acyl-lysine residues. The present study aimed to explore the correlation between lysine crotonylation (Kcr) on cardiac function and rhythm in Sirt1 cardiac-specific knockout (ScKO) mice and related mechanism. Quantitative proteomics and bioinformatics analysis of Kcr were performed in the heart tissue of ScKO mice established with a tamoxifen-inducible Cre-loxP system. The expression and enzyme activity of crotonylated protein were assessed by western blot, co-immunoprecipitation, and cell biology experiment. Echocardiography and electrophysiology were performed to investigate the influence of decrotonylation on cardiac function and rhythm in ScKO mice. The Kcr of SERCA2a was significantly increased on Lys120 (1.973 folds). The activity of SERCA2a decreased due to lower binding energy of crotonylated SERCA2a and ATP. Changes in expression of PPAR-related proteins suggest abnormal energy metabolism in the heart. ScKO mice had cardiac hypertrophy, impaired cardiac function, and abnormal ultrastructure and electrophysiological activities. We conclude that knockout of SIRT1 alters the ultrastructure of cardiac myocytes, induces cardiac hypertrophy and dysfunction, causes arrhythmia, and changes energy metabolism by regulating Kcr of SERCA2a. These findings provide new insight into the role of PTMs in heart diseases.
Collapse
Affiliation(s)
- Huan-Xin Chen
- The Institute of Cardiovascular Diseases & Department of Cardiovascular Surgery, TEDA International Cardiovascular Hospital, Tianjin University & Chinese Academy of Medical Sciences, Tianjin 300457, China
| | - Xiang-Chong Wang
- The Institute of Cardiovascular Diseases & Department of Cardiovascular Surgery, TEDA International Cardiovascular Hospital, Tianjin University & Chinese Academy of Medical Sciences, Tianjin 300457, China
| | - Hai-Tao Hou
- The Institute of Cardiovascular Diseases & Department of Cardiovascular Surgery, TEDA International Cardiovascular Hospital, Tianjin University & Chinese Academy of Medical Sciences, Tianjin 300457, China
| | - Jun Wang
- The Institute of Cardiovascular Diseases & Department of Cardiovascular Surgery, TEDA International Cardiovascular Hospital, Tianjin University & Chinese Academy of Medical Sciences, Tianjin 300457, China
| | - Qin Yang
- The Institute of Cardiovascular Diseases & Department of Cardiovascular Surgery, TEDA International Cardiovascular Hospital, Tianjin University & Chinese Academy of Medical Sciences, Tianjin 300457, China
| | - Yuan-Lu Chen
- Department of Electrophysiology, TEDA International Cardiovascular Hospital, Tianjin University & Chinese Academy of Medical Sciences, Tianjin 300457, China
| | - Hou-Zao Chen
- State Key Laboratory of Medical Molecular Biology, Department of Biochemistry and Molecular Biology, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100005, China
| | - Guo-Wei He
- The Institute of Cardiovascular Diseases & Department of Cardiovascular Surgery, TEDA International Cardiovascular Hospital, Tianjin University & Chinese Academy of Medical Sciences, Tianjin 300457, China; Department of Surgery, Oregon Health & Science University, Portland, OR 97239-3098, USA.
| |
Collapse
|
13
|
Wang R, Chung CR, Huang HD, Lee TY. Identification of species-specific RNA N6-methyladinosine modification sites from RNA sequences. Brief Bioinform 2023; 24:7008797. [PMID: 36715277 DOI: 10.1093/bib/bbac573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/11/2022] [Accepted: 11/24/2022] [Indexed: 01/31/2023] Open
Abstract
N6-methyladinosine (m6A) modification is the most abundant co-transcriptional modification in eukaryotic RNA and plays important roles in cellular regulation. Traditional high-throughput sequencing experiments used to explore functional mechanisms are time-consuming and labor-intensive, and most of the proposed methods focused on limited species types. To further understand the relevant biological mechanisms among different species with the same RNA modification, it is necessary to develop a computational scheme that can be applied to different species. To achieve this, we proposed an attention-based deep learning method, adaptive-m6A, which consists of convolutional neural network, bi-directional long short-term memory and an attention mechanism, to identify m6A sites in multiple species. In addition, three conventional machine learning (ML) methods, including support vector machine, random forest and logistic regression classifiers, were considered in this work. In addition to the performance of ML methods for multi-species prediction, the optimal performance of adaptive-m6A yielded an accuracy of 0.9832 and the area under the receiver operating characteristic curve of 0.98. Moreover, the motif analysis and cross-validation among different species were conducted to test the robustness of one model towards multiple species, which helped improve our understanding about the sequence characteristics and biological functions of RNA modifications in different species.
Collapse
Affiliation(s)
- Rulan Wang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Chia-Ru Chung
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Life Sciences, University of Science and Technology of China, 230026, Hefei, Anhui, P.R. China
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| |
Collapse
|
14
|
Shang S, Liu J, Hua F. Protein acylation: mechanisms, biological functions and therapeutic targets. Signal Transduct Target Ther 2022; 7:396. [PMID: 36577755 PMCID: PMC9797573 DOI: 10.1038/s41392-022-01245-y] [Citation(s) in RCA: 45] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/27/2022] [Accepted: 11/06/2022] [Indexed: 12/30/2022] Open
Abstract
Metabolic reprogramming is involved in the pathogenesis of not only cancers but also neurodegenerative diseases, cardiovascular diseases, and infectious diseases. With the progress of metabonomics and proteomics, metabolites have been found to affect protein acylations through providing acyl groups or changing the activities of acyltransferases or deacylases. Reciprocally, protein acylation is involved in key cellular processes relevant to physiology and diseases, such as protein stability, protein subcellular localization, enzyme activity, transcriptional activity, protein-protein interactions and protein-DNA interactions. Herein, we summarize the functional diversity and mechanisms of eight kinds of nonhistone protein acylations in the physiological processes and progression of several diseases. We also highlight the recent progress in the development of inhibitors for acyltransferase, deacylase, and acylation reader proteins for their potential applications in drug discovery.
Collapse
Affiliation(s)
- Shuang Shang
- grid.506261.60000 0001 0706 7839CAMS Key Laboratory of Molecular Mechanism and Target Discovery of Metabolic Disorder and Tumorigenesis, State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, 100050 Beijing, P.R. China
| | - Jing Liu
- grid.506261.60000 0001 0706 7839CAMS Key Laboratory of Molecular Mechanism and Target Discovery of Metabolic Disorder and Tumorigenesis, State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, 100050 Beijing, P.R. China
| | - Fang Hua
- grid.506261.60000 0001 0706 7839CAMS Key Laboratory of Molecular Mechanism and Target Discovery of Metabolic Disorder and Tumorigenesis, State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences & Peking Union Medical College, 100050 Beijing, P.R. China
| |
Collapse
|
15
|
Wang H, Li H, Gao W, Xie J. PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy. Anal Biochem 2022; 658:114935. [PMID: 36206844 DOI: 10.1016/j.ab.2022.114935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 12/30/2022]
Abstract
Identification of ubiquitination sites is central to many biological experiments. Ubiquitination is a kind of post-translational protein modification (PTM). It is a key mechanism for increasing protein diversity and plays a vital role in regulating cell function. In recent years, many models have been developed to predict ubiquitination sites in humans, mice and yeast. However, few studies have predicted ubiquitination sites in Arabidopsis thaliana. In view of this, a deep network model named PrUb-EL is proposed to predict ubiquitination sites in Arabidopsis thaliana. Firstly, six features based on the protein sequence are extracted with amino acid index database (AAindex), dipeptide deviates from the expected mean (DDE), dipeptide composition (DPC), blocks substitution matrix (BLOSUM62), enhanced amino acid composition (EAAC) and binary encoding. Secondly, the synthetic minority over-sampling technique (SMOTE) is utilized to process the imbalanced data set. Then a new classifier named DG is presented, which includes Dense block, Residual block and Gated recurrent unit (GRU) block. Finally, each of six feature extraction methods is integrated into the DG model, and the ensemble learning strategy is used to gain the final prediction result. Experimental results show that PrUb-EL has good predictive ability with the accuracy (ACC) and area under the ROC curve (auROC) values of 91.00% and 97.70% using 5-fold cross-validation, respectively. Note that the values of ACC and auROC are 88.58% and 96.09% in the independent test, respectively. Compared with previous studies, our model has significantly improved performance thus it is an excellent method for identifying ubiquitination sites in Arabidopsis thaliana. The datasets and code used for the article are available at https://github.com/Tom-Wangy/PreUb-EL.git.
Collapse
Affiliation(s)
- Houqiang Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Hong Li
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China.
| | - Weifeng Gao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Jin Xie
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
16
|
Kha QH, Tran TO, Nguyen TTD, Nguyen VN, Than K, Le NQK. An interpretable deep learning model for classifying adaptor protein complexes from sequence information. Methods 2022; 207:90-96. [PMID: 36174933 DOI: 10.1016/j.ymeth.2022.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 08/19/2022] [Accepted: 09/22/2022] [Indexed: 11/15/2022] Open
Abstract
Adaptor proteins (APs) are a family of proteins that aids in intracellular membrane trafficking, and their impairments or defects are closely related to various disorders. Traditional methods to identify and classify APs require time and complex techniques, which were then advanced by machine learning and computational approaches to facilitate the APs recognition task. However, most studies focused on recognizing separate ones in the APs family or the APs in general with non-APs, lacking one comprehensive strategy to distinguish the complexes of AP subtypes. Herein, we proposed a novel method to implement one novel task as discriminating the AP complexes in the APs family, utilizing an interpretable deep neural network architecture on sequence-based encoding features. This work also introduced a benchmark data set of AP complexes originating from the UniProt and GeneOntology databases. To assess the robustness of our proposed method, we compared our performance to various machine learning algorithms and feature extraction strategies. Furthermore, the interpretation of the model's prediction performance was implemented using t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and SHapley Additive exPlanations (SHAP) analysis to show the distribution of AP complexes on optimal features. The promising performance of our architecture can assist scientists not only in AP complexes distinction but also in general protein sequences. Moreover, we have also made our work publicly on GitHub https://github.com/khanhlee/adaptor-dnn.
Collapse
Affiliation(s)
- Quang-Hien Kha
- International Master/Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Thi-Oanh Tran
- International Ph.D. Program for Cell Therapy and Regeneration Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Trinh-Trung-Duong Nguyen
- Personalised Medicine Cluster, Department of Drug Design and Pharmacology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Van-Nui Nguyen
- University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Viet Nam
| | - Khoat Than
- School of Information and Communication Technology, Hanoi University of Science and Technology, Viet Nam
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan.
| |
Collapse
|
17
|
Akmal MA, Hassan MA, Muhammad S, Khurshid KS, Mohamed A. An analytical study on the identification of N-linked glycosylation sites using machine learning model. PeerJ Comput Sci 2022; 8:e1069. [PMID: 36262138 PMCID: PMC9575850 DOI: 10.7717/peerj-cs.1069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
N-linked is the most common type of glycosylation which plays a significant role in identifying various diseases such as type I diabetes and cancer and helps in drug development. Most of the proteins cannot perform their biological and psychological functionalities without undergoing such modification. Therefore, it is essential to identify such sites by computational techniques because of experimental limitations. This study aims to analyze and synthesize the progress to discover N-linked places using machine learning methods. It also explores the performance of currently available tools to predict such sites. Almost seventy research articles published in recognized journals of the N-linked glycosylation field have shortlisted after the rigorous filtering process. The findings of the studies have been reported based on multiple aspects: publication channel, feature set construction method, training algorithm, and performance evaluation. Moreover, a literature survey has developed a taxonomy of N-linked sequence identification. Our study focuses on the performance evaluation criteria, and the importance of N-linked glycosylation motivates us to discover resources that use computational methods instead of the experimental method due to its limitations.
Collapse
Affiliation(s)
- Muhammad Aizaz Akmal
- Department of Computer Science, University of Engineering and Technology, KSK, Lahore, Punjab, Pakistan
| | - Muhammad Awais Hassan
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Shoaib Muhammad
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Khaldoon S. Khurshid
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | | |
Collapse
|
18
|
Mini-review: Recent advances in post-translational modification site prediction based on deep learning. Comput Struct Biotechnol J 2022; 20:3522-3532. [PMID: 35860402 PMCID: PMC9284371 DOI: 10.1016/j.csbj.2022.06.045] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 11/23/2022] Open
Abstract
Post-translational modifications (PTMs) are closely linked to numerous diseases, playing a significant role in regulating protein structures, activities, and functions. Therefore, the identification of PTMs is crucial for understanding the mechanisms of cell biology and diseases therapy. Compared to traditional machine learning methods, the deep learning approaches for PTM prediction provide accurate and rapid screening, guiding the downstream wet experiments to leverage the screened information for focused studies. In this paper, we reviewed the recent works in deep learning to identify phosphorylation, acetylation, ubiquitination, and other PTM types. In addition, we summarized PTM databases and discussed future directions with critical insights.
Collapse
Key Words
- AAindex, Amino acid index
- ATP, Adenosine triphosphate
- AUC, Area under curve
- Ac, Acetylation
- BE, Binary encoding
- BLOSUM, Blocks substitution matrix
- Bi-LSTM, Bidirectional LSTM
- CKSAAP, Composition of k-spaced amino acid Pairs
- CNN, Convolutional neural network
- CNNOH, CNN with the one-hot encoding
- CNNWE, CNN with the word-embedding encoding
- CNNrgb, CNN red green blue
- CV, Cross-validation
- DC-CNN, Densely connected convolutional neural network
- DL, Deep learning
- DNNs, Deep neural networks
- Deep learning
- E. coli, Escherichia coli
- EBGW, Encoding based on grouped weight
- EGAAC, Enhanced grouped amino acids content
- IG, Information gain
- K, Lysine
- KNN, k nearest neighbor
- LASSO, Least absolute shrinkage and selection operator
- LSTM, Long short-term memory
- LSTMWE, LSTM with the word-embedding encoding
- M.musculus, Mus musculus
- MDC, Modular densely connected convolutional networks
- MDCAN, Multilane dense convolutional attention network
- ML, Machine learning
- MLP, Multilayer perceptron
- MMI, Multivariate mutual information
- Machine learning
- Mass spectrometry
- NMBroto, Normalized Moreau-Broto autocorrelation
- P, Proline
- PSP, PhosphoSitePlus
- PSSM, Position-specific scoring matrix
- PTM, Post-translational modifications
- Ph, Phosphorylation
- Post-translational modification
- Prediction
- PseAAC, Pseudo-amino acid composition
- R, Arginine
- RF, Random forest
- RNN, Recurrent neural network
- ROC, Receiver operating characteristic
- S, Serine
- S. typhimurium, Salmonella typhimurium
- S.cerevisiae, Saccharomyces cerevisiae
- SE, Squeeze and excitation
- SEV, Split to Equal Validation
- ST, Source and target
- SUMO, Small ubiquitin-like modifier
- SVM, Support vector machines
- T, Threonine
- Ub, Ubiquitination
- Y, Tyrosine
- ZSL, Zero-shot learning
Collapse
|
19
|
Li F, Yin J, Lu M, Yang Q, Zeng Z, Zhang B, Li Z, Qiu Y, Dai H, Chen Y, Zhu F. ConSIG: consistent discovery of molecular signature from OMIC data. Brief Bioinform 2022; 23:6618243. [PMID: 35758241 DOI: 10.1093/bib/bbac253] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/09/2022] [Accepted: 05/31/2022] [Indexed: 12/12/2022] Open
Abstract
The discovery of proper molecular signature from OMIC data is indispensable for determining biological state, physiological condition, disease etiology, and therapeutic response. However, the identified signature is reported to be highly inconsistent, and there is little overlap among the signatures identified from different biological datasets. Such inconsistency raises doubts about the reliability of reported signatures and significantly hampers its biological and clinical applications. Herein, an online tool, ConSIG, was constructed to realize consistent discovery of gene/protein signature from any uploaded transcriptomic/proteomic data. This tool is unique in a) integrating a novel strategy capable of significantly enhancing the consistency of signature discovery, b) determining the optimal signature by collective assessment, and c) confirming the biological relevance by enriching the disease/gene ontology. With the increasingly accumulated concerns about signature consistency and biological relevance, this online tool is expected to be used as an essential complement to other existing tools for OMIC-based signature discovery. ConSIG is freely accessible to all users without login requirement at https://idrblab.org/consig/.
Collapse
Affiliation(s)
- Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, 79 QingChun Road, Hangzhou, Zhejiang 310000, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yuzong Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China.,Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
20
|
Deep Learning-Based Advances In Protein Posttranslational Modification Site and Protein Cleavage Prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2499:285-322. [PMID: 35696087 DOI: 10.1007/978-1-0716-2317-6_15] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Posttranslational modification (PTM ) is a ubiquitous phenomenon in both eukaryotes and prokaryotes which gives rise to enormous proteomic diversity. PTM mostly comes in two flavors: covalent modification to polypeptide chain and proteolytic cleavage. Understanding and characterization of PTM is a fundamental step toward understanding the underpinning of biology. Recent advances in experimental approaches, mainly mass-spectrometry-based approaches, have immensely helped in obtaining and characterizing PTMs. However, experimental approaches are not enough to understand and characterize more than 450 different types of PTMs and complementary computational approaches are becoming popular. Recently, due to the various advancements in the field of Deep Learning (DL), along with the explosion of applications of DL to various fields, the field of computational prediction of PTM has also witnessed the development of a plethora of deep learning (DL)-based approaches. In this book chapter, we first review some recent DL-based approaches in the field of PTM site prediction. In addition, we also review the recent advances in the not-so-studied PTM , that is, proteolytic cleavage predictions. We describe advances in PTM prediction by highlighting the Deep learning architecture, feature encoding, novelty of the approaches, and availability of the tools/approaches. Finally, we provide an outlook and possible future research directions for DL-based approaches for PTM prediction.
Collapse
|
21
|
Liu Y, Liu Y, Wang GA, Cheng Y, Bi S, Zhu X. BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for Homo sapiens. FRONTIERS IN BIOINFORMATICS 2022; 2:834153. [PMID: 36304324 PMCID: PMC9580886 DOI: 10.3389/fbinf.2022.834153] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 01/20/2022] [Indexed: 12/21/2022] Open
Abstract
As one of the most important posttranslational modifications (PTMs), protein lysine glycation changes the characteristics of the proteins and leads to the dysfunction of the proteins, which may cause diseases. Accurately detecting the glycation sites is of great benefit for understanding the biological function and potential mechanism of glycation in the treatment of diseases. However, experimental methods are expensive and time-consuming for lysine glycation site identification. Instead, computational methods, with their higher efficiency and lower cost, could be an important supplement to the experimental methods. In this study, we proposed a novel predictor, BERT-Kgly, for protein lysine glycation site prediction, which was developed by extracting embedding features of protein segments from pretrained Bidirectional Encoder Representations from Transformers (BERT) models. Three pretrained BERT models were explored to get the embeddings with optimal representability, and three downstream deep networks were employed to build our models. Our results showed that the model based on embeddings extracted from the BERT model pretrained on 556,603 protein sequences of UniProt outperforms other models. In addition, an independent test set was used to evaluate and compare our model with other existing methods, which indicated that our model was superior to other existing models.
Collapse
Affiliation(s)
| | | | | | | | - Shoudong Bi
- *Correspondence: Shoudong Bi, ; Xiaolei Zhu,
| | - Xiaolei Zhu
- *Correspondence: Shoudong Bi, ; Xiaolei Zhu,
| |
Collapse
|
22
|
Wang R, Wang Z, Li Z, Lee TY. Residue-Residue Contact Can Be a Potential Feature for the Prediction of Lysine Crotonylation Sites. Front Genet 2022; 12:788467. [PMID: 35058968 PMCID: PMC8764140 DOI: 10.3389/fgene.2021.788467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/23/2021] [Indexed: 11/13/2022] Open
Abstract
Lysine crotonylation (Kcr) is involved in plenty of activities in the human body. Various technologies have been developed for Kcr prediction. Sequence-based features are typically adopted in existing methods, in which only linearly neighboring amino acid composition was considered. However, modified Kcr sites are neighbored by not only the linear-neighboring amino acid but also those spatially surrounding residues around the target site. In this paper, we have used residue-residue contact as a new feature for Kcr prediction, in which features encoded with not only linearly surrounding residues but also those spatially nearby the target site. Then, the spatial-surrounding residue was used as a new scheme for feature encoding for the first time, named residue-residue composition (RRC) and residue-residue pair composition (RRPC), which were used in supervised learning classification for Kcr prediction. As the result suggests, RRC and RRPC have achieved the best performance of RRC at an accuracy of 0.77 and an area under curve (AUC) value of 0.78, RRPC at an accuracy of 0.74, and an AUC value of 0.80. In order to show that the spatial feature is of a competitively high significance as other sequence-based features, feature selection was carried on those sequence-based features together with feature RRPC. In addition, different ranges of the surrounding amino acid compositions' radii were used for comparison of the performance. After result assessment, RRC and RRPC features have shown competitively outstanding performance as others or in some cases even around 0.20 higher in accuracy or 0.3 higher in AUC values compared with sequence-based features.
Collapse
Affiliation(s)
- Rulan Wang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China
| | - Zhongyan Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, China
| |
Collapse
|
23
|
Dou L, Zhang Z, Xu L, Zou Q. iKcr_CNN: A novel computational tool for imbalance classification of human nonhistone crotonylation sites based on convolutional neural networks with focal loss. Comput Struct Biotechnol J 2022; 20:3268-3279. [PMID: 35832615 PMCID: PMC9251780 DOI: 10.1016/j.csbj.2022.06.032] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 06/13/2022] [Accepted: 06/13/2022] [Indexed: 11/26/2022] Open
Abstract
Lysine crotonylation (Kcr) is a newly discovered protein post-translational modification and has been proved to be widely involved in various biological processes and human diseases. Thus, the accurate and fast identification of this modification became the preliminary task in investigating the related biological functions. Due to the long duration, high cost and intensity of traditional high-throughput experimental techniques, constructing bioinformatics predictors based on machine learning algorithms is treated as a most popular solution. Although dozens of predictors have been reported to identify Kcr sites, only two, nhKcr and DeepKcrot, focused on human nonhistone protein sequences. Moreover, due to the imbalance nature of data distribution, associated detection performance is severely biased towards the major negative samples and remains much room for improvement. In this research, we developed a convolutional neural network framework, dubbed iKcr_CNN, to identify the human nonhistone Kcr modification. To overcome the imbalance issue (Kcr: 15,274; non-Kcr: 74,018 with imbalance ratio: 1:4), we applied the focal loss function instead of the standard cross-entropy as the indicator to optimize the model, which not only assigns different weights to samples belonging to different categories but also distinguishes easy- and hard-classified samples. Ultimately, the obtained model presents more balanced prediction scores between real-world positive and negative samples than existing tools. The user-friendly web server is accessible at ikcrcnn.webmalab.cn/, and the involved Python scripts can be conveniently downloaded at github.com/lijundou/iKcr_CNN/. The proposed model may serve as an efficient tool to assist academicians with their experimental researches.
Collapse
|
24
|
Tng SS, Le NQK, Yeh HY, Chua MCH. Improved Prediction Model of Protein Lysine Crotonylation Sites Using Bidirectional Recurrent Neural Networks. J Proteome Res 2021; 21:265-273. [PMID: 34812044 DOI: 10.1021/acs.jproteome.1c00848] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Histone lysine crotonylation (Kcr) is a post-translational modification of histone proteins that is involved in the regulation of gene transcription, acute and chronic kidney injury, spermatogenesis, depression, cancer, and so forth. The identification of Kcr sites in proteins is important for characterizing and regulating primary biological mechanisms. The use of computational approaches such as machine learning and deep learning algorithms have emerged in recent years as the traditional wet-lab experiments are time-consuming and costly. We propose as part of this study a deep learning model based on a recurrent neural network (RNN) termed as Sohoko-Kcr for the prediction of Kcr sites. Through the embedded encoding of the peptide sequences, we investigate the efficiency of RNN-based models such as long short-term memory (LSTM), bidirectional LSTM (BiLSTM), and bidirectional gated recurrent unit (BiGRU) networks using cross-validation and independent tests. We also established the comparison between Sohoko-Kcr and other published tools to verify the efficiency of our model based on 3-fold, 5-fold, and 10-fold cross-validations using independent set tests. The results then show that the BiGRU model has consistently displayed outstanding performance and computational efficiency. Based on the proposed model, a webserver called Sohoko-Kcr was deployed for free use and is accessible at https://sohoko-research-9uu23.ondigitalocean.app.
Collapse
Affiliation(s)
- Sian Soo Tng
- Institute of Systems Science, National University of Singapore, 29 Heng Mui Keng Terrace, Singapore 119620, Singapore
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan.,Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan.,Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Hui-Yuan Yeh
- Medical Humanities Research Cluster, School of Humanities, Nanyang Technological University, 48 Nanyang Avenue, Singapore 639818, Singapore
| | - Matthew Chin Heng Chua
- Institute of Systems Science, National University of Singapore, 29 Heng Mui Keng Terrace, Singapore 119620, Singapore
| |
Collapse
|
25
|
Subba P, Prasad TSK. Protein Crotonylation Expert Review: A New Lens to Take Post-Translational Modifications and Cell Biology to New Heights. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2021; 25:617-625. [PMID: 34582706 DOI: 10.1089/omi.2021.0132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genome regulation, temporal and spatial variations in cell function, continues to puzzle and interest life scientists who aim to unravel the molecular basis of human health and disease, not to mention plant biology and ecosystem diversity. Despite important advances in epigenomics and protein post-translational modifications over the past decade, there is a need for new conceptual lenses to understand biological mechanisms that can help unravel the fundamental regulatory questions in genomes and the cell. To these ends, lys crotonylation (Kcr) is a reversible protein modification catalyzed by protein crotonyl transferases and decrotonylases. First identified on histones, Kcr regulates cellular processes at the chromatin level. Research thus far has revealed that Kcr marks promoter sites of active genes and potential enhancers. Eventually, Kcr on a number of nonhistone proteins was reported. The abundance of Kcr on ribosomal and myofilament proteins indicates its functional roles in protein synthesis and muscle contraction. Kcr has also been associated with pluripotency, spermiogenesis, and DNA repair. In plants, large-scale mass spectrometry-based experiments validated the roles of Kcr in photosynthesis. In this expert review, we present the latest thinking and findings on lys crotonylation with an eye to regulation of cell biology. We discuss the enrichment techniques, putative biological functions, and challenges associated with studying this protein modification with vast biological implications. Finally, we reflect on the future outlook about the broader relevance of Kcr in animals, microbes, and plant species.
Collapse
Affiliation(s)
- Pratigya Subba
- Center for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore, India
| | | |
Collapse
|