1
|
Hao B, Chen K, Zhai L, Liu M, Liu B, Tan M. Substrate and Functional Diversity of Protein Lysine Post-translational Modifications. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae019. [PMID: 38862432 DOI: 10.1093/gpbjnl/qzae019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/01/2023] [Revised: 11/11/2023] [Accepted: 01/08/2024] [Indexed: 06/13/2024]
Abstract
Lysine post-translational modifications (PTMs) are widespread and versatile protein PTMs that are involved in diverse biological processes by regulating the fundamental functions of histone and non-histone proteins. Dysregulation of lysine PTMs is implicated in many diseases, and targeting lysine PTM regulatory factors, including writers, erasers, and readers, has become an effective strategy for disease therapy. The continuing development of mass spectrometry (MS) technologies coupled with antibody-based affinity enrichment technologies greatly promotes the discovery and decoding of PTMs. The global characterization of lysine PTMs is crucial for deciphering the regulatory networks, molecular functions, and mechanisms of action of lysine PTMs. In this review, we focus on lysine PTMs, and provide a summary of the regulatory enzymes of diverse lysine PTMs and the proteomics advances in lysine PTMs by MS technologies. We also discuss the types and biological functions of lysine PTM crosstalks on histone and non-histone proteins and current druggable targets of lysine PTM regulatory factors for disease therapy.
Collapse
Affiliation(s)
- Bingbing Hao
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Tianjian Laboratory of Advanced Biomedical Sciences, Institute of Advanced Biomedical Sciences, Zhengzhou University, Zhengzhou 450001, China
| | - Kaifeng Chen
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Linhui Zhai
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528400, China
- State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, Nanjing 210023, China
| | - Muyin Liu
- Department of Cardiology, Shanghai Institute of Cardiovascular Diseases, Zhongshan Hospital, Fudan University, Shanghai 200032, China
| | - Bin Liu
- Jiangsu Key Laboratory of Marine Pharmaceutical Compound Screening, College of Pharmacy, Jiangsu Ocean University, Lianyungang 222005, China
| | - Minjia Tan
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
- University of Chinese Academy of Sciences, Beijing 100049, China
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Zhongshan 528400, China
| |
Collapse
|
2
|
Liu X, Zhu B, Dai XW, Xu ZA, Li R, Qian Y, Lu YP, Zhang W, Liu Y, Zheng J. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier. BMC Genomics 2023; 24:765. [PMID: 38082413 PMCID: PMC10712101 DOI: 10.1186/s12864-023-09834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention. RESULTS In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively. CONCLUSION GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite .
Collapse
Affiliation(s)
- Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Bao Zhu
- Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Xia-Wei Dai
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Zhi-Ao Xu
- School of Life Sciences, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Rui Li
- School of Life Sciences, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yuting Qian
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ya-Ping Lu
- School of Humanities and Arts, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, China
| | - Wenqing Zhang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yong Liu
- Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Junnian Zheng
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Center of Clinical Oncology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, 221002, China.
| |
Collapse
|
3
|
Kumari S, Gupta R, Ambasta RK, Kumar P. Emerging trends in post-translational modification: Shedding light on Glioblastoma multiforme. Biochim Biophys Acta Rev Cancer 2023; 1878:188999. [PMID: 37858622 DOI: 10.1016/j.bbcan.2023.188999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
Recent multi-omics studies, including proteomics, transcriptomics, genomics, and metabolomics have revealed the critical role of post-translational modifications (PTMs) in the progression and pathogenesis of Glioblastoma multiforme (GBM). Further, PTMs alter the oncogenic signaling events and offer a novel avenue in GBM therapeutics research through PTM enzymes as potential biomarkers for drug targeting. In addition, PTMs are critical regulators of chromatin architecture, gene expression, and tumor microenvironment (TME), that play a crucial function in tumorigenesis. Moreover, the implementation of artificial intelligence and machine learning algorithms enhances GBM therapeutics research through the identification of novel PTM enzymes and residues. Herein, we briefly explain the mechanism of protein modifications in GBM etiology, and in altering the biologics of GBM cells through chromatin remodeling, modulation of the TME, and signaling pathways. In addition, we highlighted the importance of PTM enzymes as therapeutic biomarkers and the role of artificial intelligence and machine learning in protein PTM prediction.
Collapse
Affiliation(s)
- Smita Kumari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; School of Medicine, University of South Carolina, Columbia, SC, United States of America
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India.
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India.
| |
Collapse
|
4
|
Wang X, Ding Z, Wang R, Lin X. Deepro-Glu: combination of convolutional neural network and Bi-LSTM models using ProtBert and handcrafted features to identify lysine glutarylation sites. Brief Bioinform 2023; 24:6991122. [PMID: 36653898 DOI: 10.1093/bib/bbac631] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 12/11/2022] [Accepted: 12/28/2022] [Indexed: 01/20/2023] Open
Abstract
Lysine glutarylation (Kglu) is a newly discovered post-translational modification of proteins with important roles in mitochondrial functions, oxidative damage, etc. The established biological experimental methods to identify glutarylation sites are often time-consuming and costly. Therefore, there is an urgent need to develop computational methods for efficient and accurate identification of glutarylation sites. Most of the existing computational methods only utilize handcrafted features to construct the prediction model and do not consider the positive impact of the pre-trained protein language model on the prediction performance. Based on this, we develop an ensemble deep-learning predictor Deepro-Glu that combines convolutional neural network and bidirectional long short-term memory network using the deep learning features and traditional handcrafted features to predict lysine glutaryation sites. The deep learning features are generated from the pre-trained protein language model called ProtBert, and the handcrafted features consist of sequence-based features, physicochemical property-based features and evolution information-based features. Furthermore, the attention mechanism is used to efficiently integrate the deep learning features and the handcrafted features by learning the appropriate attention weights. 10-fold cross-validation and independent tests demonstrate that Deepro-Glu achieves competitive or superior performance than the state-of-the-art methods. The source codes and data are publicly available at https://github.com/xwanggroup/Deepro-Glu.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Zhaoyuan Ding
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Xi Lin
- Instiute of Artificial Intelligence, Xiamen University, No.4221, Xiang'an South Road, 361000, Xiamen, China
| |
Collapse
|
5
|
Manavi F, Sharma A, Sharma R, Tsunoda T, Shatabda S, Dehzangi I. CNN-Pred: Prediction of single-stranded and double-stranded DNA-binding protein using convolutional neural networks. Gene X 2023; 853:147045. [PMID: 36503892 DOI: 10.1016/j.gene.2022.147045] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Revised: 10/10/2022] [Accepted: 11/08/2022] [Indexed: 11/27/2022] Open
Abstract
DNA-binding proteins play a vital role in biological activity including DNA replication, DNA packing, and DNA reparation. DNA-binding proteins can be classified into single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Determining whether a protein is DSB or SSB helps determine the protein's function. Therefore, many studies have been conducted to accurately identify DSB and SSB in recent years. Despite all the efforts have been made so far, the DSB and SSB prediction performance remains limited. In this study, we propose a new method called CNN-Pred to accurately predict DSB and SSB. To build CNN-Pred, we first extract evolutionary-based features in the form of mono-gram and bi-gram profiles using position specific scoring matrix (PSSM). We then, use 1D-convolutional neural network (CNN) as the classifier to our extracted features. Our results demonstrate that CNN-Pred can enhance the DSB and SSB prediction accuracies by more than 4%, on the independent test compared to previous studies found in the literature. CNN-pred as a standalone tool and all its source codes are publicly available at: https://github.com/MLBC-lab/CNN-Pred.
Collapse
Affiliation(s)
- Farnoush Manavi
- Computer Science and Engineering and Information Technology Department, Shiraz University, Shiraz, Iran
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Institute for Integrated and Intelligent Systems, Griffith University, Nathan, Brisbane, QLD 4111, Australia
| | - Ronesh Sharma
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan; Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo 113-0033, Japan; Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo 113-0033, Japan
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, USA
| |
Collapse
|
6
|
Jia J, Sun M, Wu G, Qiu W. DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2815-2830. [PMID: 36899559 DOI: 10.3934/mbe.2023132] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
As a key issue in orchestrating various biological processes and functions, protein post-translational modification (PTM) occurs widely in the mechanism of protein's function of animals and plants. Glutarylation is a type of protein-translational modification that occurs at active ε-amino groups of specific lysine residues in proteins, which is associated with various human diseases, including diabetes, cancer, and glutaric aciduria type I. Therefore, the issue of prediction for glutarylation sites is particularly important. This study developed a brand-new deep learning-based prediction model for glutarylation sites named DeepDN_iGlu via adopting attention residual learning method and DenseNet. The focal loss function is utilized in this study in place of the traditional cross-entropy loss function to address the issue of a substantial imbalance in the number of positive and negative samples. It can be noted that DeepDN_iGlu based on the deep learning model offers a greater potential for the glutarylation site prediction after employing the straightforward one hot encoding method, with Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), Mathews Correlation Coefficient (MCC), and Area Under Curve (AUC) of 89.29%, 61.97%, 65.15%, 0.33 and 0.80 accordingly on the independent test set. To the best of the authors' knowledge, this is the first time that DenseNet has been used for the prediction of glutarylation sites. DeepDN_iGlu has been deployed as a web server (https://bioinfo.wugenqiang.top/~smw/DeepDN_iGlu/) that is available to make glutarylation site prediction data more accessible.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Mingwei Sun
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
7
|
Naseer S, Ali RF, Khan YD, Dominic PDD. iGluK-Deep: computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions. J Biomol Struct Dyn 2022; 40:11691-11704. [PMID: 34396935 DOI: 10.1080/07391102.2021.1962738] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Lysine glutarylation is a post-translation modification which plays an important regulatory role in a variety of physiological and enzymatic processes including mitochondrial functions and metabolic processes both in eukaryotic and prokaryotic cells. This post-translational modification influences chromatin structure and thereby results in global regulation of transcription, defects in cell-cycle progression, DNA damage repair, and telomere silencing. To better understand the mechanism of lysine glutarylation, its identification in a protein is necessary, however, experimental methods are time-consuming and labor-intensive. Herein, we propose a new computational prediction approach to supplement experimental methods for identification of lysine glutarylation site prediction by deep neural networks and Chou's Pseudo Amino Acid Composition (PseAAC). We employed well-known deep neural networks for feature representation learning and classification of peptide sequences. Our approach opts raw pseudo amino acid compositions and obsoletes the need to separately perform costly and cumbersome feature extraction and selection. Among the developed deep learning-based predictors, the standard neural network-based predictor demonstrated highest scores in terms of accuracy and all other performance evaluation measures and outperforms majority of previously reported predictors without requiring expensive feature extraction process. iGluK-Deep:Computational Identification of lysine glutarylationsites using deep neural networks with general Pseudo Amino Acid Compositions Sheraz Naseer, Rao Faizan Ali, Yaser Daanial Khan, P.D.D DominicCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sheraz Naseer
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| | - Rao Faizan Ali
- Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak Darul Ridzuan, Malaysia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| | - P D D Dominic
- Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak Darul Ridzuan, Malaysia
| |
Collapse
|
8
|
Asim MN, Fazeel A, Ibrahim MA, Dengel A, Ahmed S. MP-VHPPI: Meta predictor for viral host protein-protein interaction prediction in multiple hosts and viruses. Front Med (Lausanne) 2022; 9:1025887. [PMID: 36465911 PMCID: PMC9709337 DOI: 10.3389/fmed.2022.1025887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 10/17/2022] [Indexed: 09/19/2023] Open
Abstract
Viral-host protein-protein interaction (VHPPI) prediction is essential to decoding molecular mechanisms of viral pathogens and host immunity processes that eventually help to control the propagation of viral diseases and to design optimized therapeutics. Multiple AI-based predictors have been developed to predict diverse VHPPIs across a wide range of viruses and hosts, however, these predictors produce better performance only for specific types of hosts and viruses. The prime objective of this research is to develop a robust meta predictor (MP-VHPPI) capable of more accurately predicting VHPPI across multiple hosts and viruses. The proposed meta predictor makes use of two well-known encoding methods Amphiphilic Pseudo-Amino Acid Composition (APAAC) and Quasi-sequence (QS) Order that capture amino acids sequence order and distributional information to most effectively generate the numerical representation of complete viral-host raw protein sequences. Feature agglomeration method is utilized to transform the original feature space into a more informative feature space. Random forest (RF) and Extra tree (ET) classifiers are trained on optimized feature space of both APAAC and QS order separate encoders and by combining both encodings. Further predictions of both classifiers are utilized to feed the Support Vector Machine (SVM) classifier that makes final predictions. The proposed meta predictor is evaluated over 7 different benchmark datasets, where it outperforms existing VHPPI predictors with an average performance of 3.07, 6.07, 2.95, and 2.85% in terms of accuracy, Mathews correlation coefficient, precision, and sensitivity, respectively. To facilitate the scientific community, the MP-VHPPI web server is available at https://sds_genetic_analysis.opendfki.de/MP-VHPPI/.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Ahtisham Fazeel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
| |
Collapse
|
9
|
Sohrawordi M, Hossain MA, Hasan MAM. PLP_FS: prediction of lysine phosphoglycerylation sites in protein using support vector machine and fusion of multiple F_Score feature selection. Brief Bioinform 2022; 23:6655632. [PMID: 35929355 DOI: 10.1093/bib/bbac306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 11/14/2022] Open
Abstract
A newly invented post-translational modification (PTM), phosphoglycerylation, has shown its essential role in the construction and functional properties of proteins and dangerous human diseases. Hence, it is very urgent to know about the molecular mechanism behind the phosphoglycerylation process to develop the drugs for related diseases. But accurately identifying of phosphoglycerylation site from a protein sequence in a laboratory is a very difficult and challenging task. Hence, the construction of an efficient computation model is greatly sought for this purpose. A little number of computational models are currently available for identifying the phosphoglycerylation sites, which are not able to reach their prediction capability at a satisfactory level. Therefore, an effective predictor named PLP_FS has been designed and constructed to identify phosphoglycerylation sites in this study. For the training purpose, an optimal number of feature sets was obtained by fusion of multiple F_Score feature selection techniques from the features generated by three types of sequence-based feature extraction methods and fitted with the support vector machine classification technique to the prediction model. On the other hand, the k-neighbor near cleaning and SMOTE methods were also implemented to balance the benchmark dataset. The suggested model in 10-fold cross-validation obtained an accuracy of 99.22%, a sensitivity of 98.17% and a specificity of 99.75% according to the experimental findings, which are better than other currently available predictors for accurately identifying the phosphoglycerylation sites.
Collapse
Affiliation(s)
- Md Sohrawordi
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
- Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Md Ali Hossain
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Al Mehedi Hasan
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| |
Collapse
|
10
|
Phukan H, Sarma A, Rex DA, Rai AB, Prasad TS, Madanan MG. Unique Posttranslational Modification Sites of Acetylation, Citrullination, Glutarylation, and Phosphorylation Are Found to Be Specific to the Proteins Partitioned in the Triton X-114 Fractions of Leptospira. ACS OMEGA 2022; 7:18569-18576. [PMID: 35694507 PMCID: PMC9178745 DOI: 10.1021/acsomega.2c01245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 05/13/2022] [Indexed: 06/15/2023]
Abstract
Posttranslational modifications (PTMs) are decisive factors in the structure, function, and localization of proteins in prokaryotic and eukaryotic organisms. However, prokaryotic organisms lack subcellular organelles, and protein localization based on subcellular locations like cytoplasm, inner membrane, periplasm, and outer membrane can be accounted for functional characterization. We have identified 131 acetylated, 1182 citrullinated, 72 glutarylated, 5 palmitoylated, and 139 phosphorylated proteins from Triton X-114 fractionated proteins of Leptospira, the pathogen of re-emerging zoonotic disease leptospirosis. In total, 74.7% of proteins were found exclusively in different Triton X-114 fractions. Additionally, 21.9% of proteins in multiple fractions had one or more PTM specific to different Triton X-114 fractions. Altogether, 96.6% of proteins showed exclusiveness to different Triton X-114 fractions either due to the presence of the entire protein or with a specific PTM type or position. Further, the PTM distribution within Triton X-114 fractions showed higher acetylation in aqueous, glutarylation in detergent, phosphorylation in pellet, and citrullination in wash fractions representing cytoplasmic, outer membrane, inner membrane, and extracellular locations, respectively. Identification of PTMs in proteins with respect to the subcellular localization will help to characterize candidate proteins before developing novel drugs and vaccines rationally to combat leptospirosis.
Collapse
Affiliation(s)
- Homen Phukan
- ICMR-Regional
Medical Research Centre, Port Blair 744103, Andaman and Nicobar
Islands, India
| | - Abhijit Sarma
- ICMR-Regional
Medical Research Centre, Port Blair 744103, Andaman and Nicobar
Islands, India
| | - Devasahayam Arokia
Balaya Rex
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangaluru 575018, Karnataka, India
| | - Akhila Balakrishna Rai
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangaluru 575018, Karnataka, India
| | | | | |
Collapse
|
11
|
Taherzadeh G, Campbell M, Zhou Y. Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins. Methods Mol Biol 2022; 2499:177-186. [PMID: 35696081 DOI: 10.1007/978-1-0716-2317-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew's correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/ .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- Department of Mathematics and Computer Science, Wilkes University, Wilkes-Barre, PA, USA.
| | - Matthew Campbell
- Institute for Glycomics, Griffith University, Southport, QLD, Australia
| | - Yaoqi Zhou
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China
| |
Collapse
|
12
|
Dehzangi I, Sharma A, Shatabda S. iProtGly-SS: A Tool to Accurately Predict Protein Glycation Site Using Structural-Based Features. Methods Mol Biol 2022; 2499:125-134. [PMID: 35696077 DOI: 10.1007/978-1-0716-2317-6_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Posttranslational modification (PTM) is an important biological mechanism to promote functional diversity among the proteins. So far, a wide range of PTMs has been identified. Among them, glycation is considered as one of the most important PTMs. Glycation is associated with different neurological disorders including Parkinson and Alzheimer. It is also shown to be responsible for different diseases, including vascular complications of diabetes mellitus. Despite all the efforts have been made so far, the prediction performance of glycation sites using computational methods remains limited. Here we present a newly developed machine learning tool called iProtGly-SS that utilizes sequential and structural information as well as Support Vector Machine (SVM) classifier to enhance lysine glycation site prediction accuracy. The performance of iProtGly-SS was investigated using the three most popular benchmarks used for this task. Our results demonstrate that iProtGly-SS is able to achieve 81.61%, 93.62%, and 92.95% prediction accuracies on these benchmarks, which are significantly better than those results reported in the previous studies. iProtGly-SS is implemented as a web-based tool which is publicly available at http://brl.uiu.ac.bd/iprotgly-ss/ .
Collapse
Affiliation(s)
- Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA.
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia.
- Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
| |
Collapse
|
13
|
Sohrawordi M, Hossain MA. Prediction of lysine formylation sites using support vector machine based on the sample selection from majority classes and synthetic minority over-sampling techniques. Biochimie 2021; 192:125-135. [PMID: 34627982 DOI: 10.1016/j.biochi.2021.10.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Revised: 10/03/2021] [Accepted: 10/05/2021] [Indexed: 12/22/2022]
Abstract
Lysine formylation is a newly discovered and mostly interested type of post-translational modification (PTM) that is generally found on core and linker histone proteins of prokaryote and eukaryote and plays various important roles on the regulation of various cellular mechanisms. Hence, it is very urgent to properly identify formylation site in protein for understanding the molecular mechanism of formylation deeply and defining drug for relevant diseases. As experimentally identification of formylation site using traditional processes are expensive and time consuming, a simple and high speedy mathematical model for predicting accurately lysine formylation sites is highly desired. A useful computational model named PLF_SVM is deigned and proposed in this study by using binary encoding (BE), amino acid composition (AAC), reverse position relative incidence matrix (RPRIM), position relative incidence matrix (PRIM), and position specific amino acid propensity (PSAAP) feature generation methods for predicting formylated and non-formylated lysine sites. Besides, the Synthetic Minority Oversampling Technique (SMOTE) and a proposed sample selection strategy named EnSVM are applied to handle the imbalance training dataset problem. Thereafter, the optimal number of features are selected by F-score method to train the model. Finally, it has been seen that PLF_SVM outperforms the state-of-the-art approaches in validation and independent test with an accuracy of 98.61% and 98.77% respectively. At https://plf-svm.herokuapp.com/, a user-friendly web tool is also created for identifying formylation sites. Therefore, the proposed method may be helpful guideline for the analysis and prediction of formylated lysine and knowing the process of cellular regulation.
Collapse
Affiliation(s)
- Md Sohrawordi
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh; Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh.
| | - Md Ali Hossain
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| |
Collapse
|
14
|
Sharma A, Lysenko A, Boroevich KA, Vans E, Tsunoda T. DeepFeature: feature selection in nonimage data using convolutional neural network. Brief Bioinform 2021; 22:6343526. [PMID: 34368836 PMCID: PMC8575039 DOI: 10.1093/bib/bbab297] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 06/30/2021] [Accepted: 07/14/2021] [Indexed: 12/14/2022] Open
Abstract
Artificial intelligence methods offer exciting new capabilities for the discovery of biological mechanisms from raw data because they are able to detect vastly more complex patterns of association that cannot be captured by classical statistical tests. Among these methods, deep neural networks are currently among the most advanced approaches and, in particular, convolutional neural networks (CNNs) have been shown to perform excellently for a variety of difficult tasks. Despite that applications of this type of networks to high-dimensional omics data and, most importantly, meaningful interpretation of the results returned from such models in a biomedical context remains an open problem. Here we present, an approach applying a CNN to nonimage data for feature selection. Our pipeline, DeepFeature, can both successfully transform omics data into a form that is optimal for fitting a CNN model and can also return sets of the most important genes used internally for computing predictions. Within the framework, the Snowfall compression algorithm is introduced to enable more elements in the fixed pixel framework, and region accumulation and element decoder is developed to find elements or genes from the class activation maps. In comparative tests for cancer type prediction task, DeepFeature simultaneously achieved superior predictive performance and better ability to discover key pathways and biological processes meaningful for this context. Capabilities offered by the proposed framework can enable the effective use of powerful deep learning methods to facilitate the discovery of causal mechanisms in high-dimensional biomedical data.
Collapse
Affiliation(s)
- Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Artem Lysenko
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Keith A Boroevich
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Edwin Vans
- STEMP, University of the South Pacific, Suva, Fiji
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo 113-0033, Japan
| |
Collapse
|
15
|
Xie L, Xiao Y, Meng F, Li Y, Shi Z, Qian K. Functions and Mechanisms of Lysine Glutarylation in Eukaryotes. Front Cell Dev Biol 2021; 9:667684. [PMID: 34249920 PMCID: PMC8264553 DOI: 10.3389/fcell.2021.667684] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 06/01/2021] [Indexed: 01/22/2023] Open
Abstract
Lysine glutarylation (Kglu) is a newly discovered post-translational modification (PTM), which is considered to be reversible, dynamic, and conserved in prokaryotes and eukaryotes. Recent developments in the identification of Kglu by mass spectrometry have shown that Kglu is mainly involved in the regulation of metabolism, oxidative damage, chromatin dynamics and is associated with various diseases. In this review, we firstly summarize the development history of glutarylation, the biochemical processes of glutarylation and deglutarylation. Then we focus on the pathophysiological functions such as glutaric acidemia 1, asthenospermia, etc. Finally, the current computational tools for predicting glutarylation sites are discussed. These emerging findings point to new functions for lysine glutarylation and related enzymes, and also highlight the mechanisms by which glutarylation regulates diverse cellular processes.
Collapse
Affiliation(s)
- Longxiang Xie
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Yafei Xiao
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Fucheng Meng
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Yongqiang Li
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Zhenyu Shi
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Keli Qian
- Infection Control Department, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
16
|
Faulkner S, Maksimovic I, David Y. A chemical field guide to histone nonenzymatic modifications. Curr Opin Chem Biol 2021; 63:180-187. [PMID: 34157651 DOI: 10.1016/j.cbpa.2021.05.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 04/07/2021] [Accepted: 05/03/2021] [Indexed: 12/29/2022]
Abstract
Histone nonenzymatic covalent modifications (NECMs) have recently emerged as an understudied class of posttranslational modifications that regulate chromatin structure and function. These NECMs alter the surface topology of histone proteins, their interactions with DNA and chromatin regulators, as well as compete for modification sites with enzymatic posttranslational modifications. NECM formation depends on the chemical compatibility between a reactive molecule and its target site, in addition to their relative stoichiometries. Here we survey the chemical reactions and conditions that govern the addition of NECMs onto histones as a manual to guide the identification of new physiologically relevant chemical adducts. Characterizing NECMs on chromatin is critical to attain a comprehensive understanding of this new chapter of the so-called "histone code".
Collapse
Affiliation(s)
- Sarah Faulkner
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States
| | - Igor Maksimovic
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States; Tri-Institutional PhD Program in Chemical Biology, New York, NY 10065, United States
| | - Yael David
- Chemical Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, United States; Tri-Institutional PhD Program in Chemical Biology, New York, NY 10065, United States; Department of Pharmacology, Weill Cornell Medicine, New York, NY 10065, United States; Department of Physiology, Biophysics and Systems Biology, Weill Cornell Medicine, New York, NY 10065, United States.
| |
Collapse
|