1
|
Liu X, Zhu B, Dai XW, Xu ZA, Li R, Qian Y, Lu YP, Zhang W, Liu Y, Zheng J. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier. BMC Genomics 2023; 24:765. [PMID: 38082413 PMCID: PMC10712101 DOI: 10.1186/s12864-023-09834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention. RESULTS In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively. CONCLUSION GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite .
Collapse
Affiliation(s)
- Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Bao Zhu
- Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Xia-Wei Dai
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Zhi-Ao Xu
- School of Life Sciences, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Rui Li
- School of Life Sciences, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yuting Qian
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ya-Ping Lu
- School of Humanities and Arts, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, China
| | - Wenqing Zhang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yong Liu
- Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Junnian Zheng
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Center of Clinical Oncology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, 221002, China.
| |
Collapse
|
2
|
Jia J, Sun M, Wu G, Qiu W. DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2815-2830. [PMID: 36899559 DOI: 10.3934/mbe.2023132] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
As a key issue in orchestrating various biological processes and functions, protein post-translational modification (PTM) occurs widely in the mechanism of protein's function of animals and plants. Glutarylation is a type of protein-translational modification that occurs at active ε-amino groups of specific lysine residues in proteins, which is associated with various human diseases, including diabetes, cancer, and glutaric aciduria type I. Therefore, the issue of prediction for glutarylation sites is particularly important. This study developed a brand-new deep learning-based prediction model for glutarylation sites named DeepDN_iGlu via adopting attention residual learning method and DenseNet. The focal loss function is utilized in this study in place of the traditional cross-entropy loss function to address the issue of a substantial imbalance in the number of positive and negative samples. It can be noted that DeepDN_iGlu based on the deep learning model offers a greater potential for the glutarylation site prediction after employing the straightforward one hot encoding method, with Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), Mathews Correlation Coefficient (MCC), and Area Under Curve (AUC) of 89.29%, 61.97%, 65.15%, 0.33 and 0.80 accordingly on the independent test set. To the best of the authors' knowledge, this is the first time that DenseNet has been used for the prediction of glutarylation sites. DeepDN_iGlu has been deployed as a web server (https://bioinfo.wugenqiang.top/~smw/DeepDN_iGlu/) that is available to make glutarylation site prediction data more accessible.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Mingwei Sun
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
3
|
Naseer S, Ali RF, Khan YD, Dominic PDD. iGluK-Deep: computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions. J Biomol Struct Dyn 2022; 40:11691-11704. [PMID: 34396935 DOI: 10.1080/07391102.2021.1962738] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Lysine glutarylation is a post-translation modification which plays an important regulatory role in a variety of physiological and enzymatic processes including mitochondrial functions and metabolic processes both in eukaryotic and prokaryotic cells. This post-translational modification influences chromatin structure and thereby results in global regulation of transcription, defects in cell-cycle progression, DNA damage repair, and telomere silencing. To better understand the mechanism of lysine glutarylation, its identification in a protein is necessary, however, experimental methods are time-consuming and labor-intensive. Herein, we propose a new computational prediction approach to supplement experimental methods for identification of lysine glutarylation site prediction by deep neural networks and Chou's Pseudo Amino Acid Composition (PseAAC). We employed well-known deep neural networks for feature representation learning and classification of peptide sequences. Our approach opts raw pseudo amino acid compositions and obsoletes the need to separately perform costly and cumbersome feature extraction and selection. Among the developed deep learning-based predictors, the standard neural network-based predictor demonstrated highest scores in terms of accuracy and all other performance evaluation measures and outperforms majority of previously reported predictors without requiring expensive feature extraction process. iGluK-Deep:Computational Identification of lysine glutarylationsites using deep neural networks with general Pseudo Amino Acid Compositions Sheraz Naseer, Rao Faizan Ali, Yaser Daanial Khan, P.D.D DominicCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sheraz Naseer
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| | - Rao Faizan Ali
- Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak Darul Ridzuan, Malaysia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| | - P D D Dominic
- Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak Darul Ridzuan, Malaysia
| |
Collapse
|
4
|
Ning Q, Qi Z, Wang Y, Deng A, Chen C. FCCCSR_Glu: a semi-supervised learning model based on FCCCSR algorithm for prediction of glutarylation sites. Brief Bioinform 2022; 23:6720406. [PMID: 36168700 DOI: 10.1093/bib/bbac421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Revised: 08/15/2022] [Accepted: 08/30/2022] [Indexed: 12/14/2022] Open
Abstract
Glutarylation is a post-translational modification which plays an irreplaceable role in various functions of the cell. Therefore, it is very important to accurately identify the glutarylation substrates and its corresponding glutarylation sites. In recent years, many computational methods of glutarylation sites have emerged one after another, but there are still many limitations, among which noisy data and the class imbalance problem caused by the uncertainty of non-glutarylation sites are great challenges. In this study, we propose a new semi-supervised learning algorithm, named FCCCSR, to identify reliable non-glutarylation lysine sites from unlabeled samples as negative samples. FCCCSR first finds core objects from positive samples according to reverse nearest neighbor information, and then clusters core objects based on natural neighbor structure. Finally, reliable negative samples are selected according to clustering result. With FCCCSR algorithm, we propose a new method named FCCCSR_Glu for glutarylation sites identification. In this study, multi-view features are extracted and fused to describe peptides, including amino acid composition, BLOSUM62, amino acid factors and composition of k-spaced amino acid pairs. Then, reliable negative samples selected by FCCCSR and positive samples are combined to establish models and XGBoost optimized by differential evolution algorithm is used as the classifier. On the independent testing dataset, FCCCSR_Glu achieves 85.18%, 98.36%, 94.31% and 0.8651 in sensitivity, specificity, accuracy and Matthew's Correlation Coefficient, respectively, which is superior to state-of-the-art methods in predicting glutarylation sites. Therefore, FCCCSR_Glu can be a useful tool for glutarylation sites prediction and FCCCSR algorithm can effectively select reliable negative samples from unlabeled samples. The data and code are available on https://github.com/xbbxhbc/FCCCSR_Glu.git.
Collapse
Affiliation(s)
- Qiao Ning
- Department of Information Science and Technology, Dalian Maritime University, Lingshui Street, 116026, Dalian, China
| | - Zedong Qi
- Department of Information Science and Technology, Dalian Maritime University, Lingshui Street, 116026, Dalian, China
| | - Yue Wang
- Department of Information Science and Technology, Dalian Maritime University, Lingshui Street, 116026, Dalian, China
| | - Ansheng Deng
- Department of Information Science and Technology, Dalian Maritime University, Lingshui Street, 116026, Dalian, China
| | - Chen Chen
- Naval Architecture and Ocean Engineering college, Dalian Maritime University, Lingshui Street, 116026, Dalian, China
| |
Collapse
|
5
|
Ning Q, Zhao X, Ma Z. A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2632-2641. [PMID: 34236968 DOI: 10.1109/tcbb.2021.3095482] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Glutarylation is a type of post-translational modification that occurs on lysine residues. It plays an irreplaceable role in various cellular functions. Therefore, identification of glutarylation sites is significant for understanding the molecular mechanism of glutarylation. In this study, we proposed a method named DEXGB_Glu to identify lysine glutarylation sites using XGBoost as classifier which was optimized by differential evolution algorithm. Aiming at the imbalance between positive samples and negative samples, Borderline-SMOTE method was employed to synthesize positive samples, increasing their amount equal to negative samples. Then, Tomek links technique was applied to filter out noise data. Analysis of this method and its results showed that differential evolution algorithm obviously improved the performance and the combination of Borderline-SMOTE and Tomek links effectively solved the imbalance between positive samples and negative samples. Finally, the performance of this method was much better than other methods in prediction of glutarylation sites. The data and code are available on https://github.com/ningq669/DEXGB_Glu.
Collapse
|
6
|
Liu CM, Ta VD, Le NQK, Tadesse DA, Shi C. Deep Neural Network Framework Based on Word Embedding for Protein Glutarylation Sites Prediction. Life (Basel) 2022; 12:life12081213. [PMID: 36013392 PMCID: PMC9410500 DOI: 10.3390/life12081213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/03/2022] [Accepted: 08/05/2022] [Indexed: 04/08/2023] Open
Abstract
In recent years, much research has found that dysregulation of glutarylation is associated with many human diseases, such as diabetes, cancer, and glutaric aciduria type I. Therefore, glutarylation identification and characterization are essential tasks for determining modification-specific proteomics. This study aims to propose a novel deep neural network framework based on word embedding techniques for glutarylation sites prediction. Multiple deep neural network models are implemented to evaluate the performance of glutarylation sites prediction. Furthermore, an extensive experimental comparison of word embedding techniques is conducted to utilize the most efficient method for improving protein sequence data representation. The results suggest that the proposed deep neural networks not only improve protein sequence representation but also work effectively in glutarylation sites prediction by obtaining a higher accuracy and confidence rate compared to the previous work. Moreover, embedding techniques were proven to be more productive than the pre-trained word embedding techniques for glutarylation sequence representation. Our proposed method has significantly outperformed all traditional performance metrics compared to the advanced integrated vector support, with accuracy, specificity, sensitivity, and correlation coefficient of 0.79, 0.89, 0.59, and 0.51, respectively. It shows the potential to detect new glutarylation sites and uncover the relationships between glutarylation and well-known lysine modification.
Collapse
Affiliation(s)
- Chuan-Ming Liu
- Department of Computer Science and Information Engineering, National Taipei University of Technology (Taipei Tech), Taipei City 106, Taiwan
- Correspondence: (C.-M.L.); (C.S.); Tel.: +886-2-2771-2171 (ext. 4251) (C.-M.L.)
| | - Van-Dai Ta
- Samsung Display Vietnam (SDV), Yen Phong Industrial Park, Bac Ninh 16000, Vietnam
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei City 106, Taiwan
| | | | - Chongyang Shi
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 102488, China
- Correspondence: (C.-M.L.); (C.S.); Tel.: +886-2-2771-2171 (ext. 4251) (C.-M.L.)
| |
Collapse
|
7
|
Sohrawordi M, Hossain MA, Hasan MAM. PLP_FS: prediction of lysine phosphoglycerylation sites in protein using support vector machine and fusion of multiple F_Score feature selection. Brief Bioinform 2022; 23:6655632. [PMID: 35929355 DOI: 10.1093/bib/bbac306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 11/14/2022] Open
Abstract
A newly invented post-translational modification (PTM), phosphoglycerylation, has shown its essential role in the construction and functional properties of proteins and dangerous human diseases. Hence, it is very urgent to know about the molecular mechanism behind the phosphoglycerylation process to develop the drugs for related diseases. But accurately identifying of phosphoglycerylation site from a protein sequence in a laboratory is a very difficult and challenging task. Hence, the construction of an efficient computation model is greatly sought for this purpose. A little number of computational models are currently available for identifying the phosphoglycerylation sites, which are not able to reach their prediction capability at a satisfactory level. Therefore, an effective predictor named PLP_FS has been designed and constructed to identify phosphoglycerylation sites in this study. For the training purpose, an optimal number of feature sets was obtained by fusion of multiple F_Score feature selection techniques from the features generated by three types of sequence-based feature extraction methods and fitted with the support vector machine classification technique to the prediction model. On the other hand, the k-neighbor near cleaning and SMOTE methods were also implemented to balance the benchmark dataset. The suggested model in 10-fold cross-validation obtained an accuracy of 99.22%, a sensitivity of 98.17% and a specificity of 99.75% according to the experimental findings, which are better than other currently available predictors for accurately identifying the phosphoglycerylation sites.
Collapse
Affiliation(s)
- Md Sohrawordi
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
- Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Md Ali Hossain
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Al Mehedi Hasan
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| |
Collapse
|
8
|
Indriani F, Mahmudah KR, Purnama B, Satou K. ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites. Front Genet 2022; 13:885929. [PMID: 35711929 PMCID: PMC9194472 DOI: 10.3389/fgene.2022.885929] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/26/2022] [Indexed: 11/16/2022] Open
Abstract
Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can prove useful for rapid identification of glutarylation. In this study, we propose a model called ProtTrans-Glutar to classify a protein sequence into positive or negative glutarylation site by combining traditional sequence-based features with features derived from a pre-trained transformer-based protein model. The features of the model were constructed by combining several feature sets, namely the distribution feature (from composition/transition/distribution encoding), enhanced amino acid composition (EAAC), and features derived from the ProtT5-XL-UniRef50 model. Combined with random under-sampling and XGBoost classification method, our model obtained recall, specificity, and AUC scores of 0.7864, 0.6286, and 0.7075 respectively on an independent test set. The recall and AUC scores were notably higher than those of the previous glutarylation prediction models using the same dataset. This high recall score suggests that our method has the potential to identify new glutarylation sites and facilitate further research on the glutarylation process.
Collapse
Affiliation(s)
- Fatma Indriani
- Graduate School of Natural Science and Technology, Kanazawa University, Kanazawa, Japan.,Department of Computer Science, Lambung Mangkurat University, Banjarmasin, Indonesia
| | - Kunti Robiatul Mahmudah
- Department of Postgraduate of Mathematics Education, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
| | - Bedy Purnama
- School of Computing, Telkom University, Bandung, Indonesia
| | - Kenji Satou
- Institute of Science and Engineering, Kanazawa University, Kanazawa, Japan
| |
Collapse
|
9
|
Rex DB, Patil AH, Modi PK, Kandiyil MK, Kasaragod S, Pinto SM, Tanneru N, Sijwali PS, Prasad TSK. Dissecting Plasmodium yoelii Pathobiology: Proteomic Approaches for Decoding Novel Translational and Post-Translational Modifications. ACS OMEGA 2022; 7:8246-8257. [PMID: 35309442 PMCID: PMC8928344 DOI: 10.1021/acsomega.1c03892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Malaria is a vector-borne disease. It is caused by Plasmodium parasites. Plasmodium yoelii is a rodent model parasite, primarily used for studying parasite development in liver cells and vectors. To better understand parasite biology, we carried out a high-throughput-based proteomic analysis of P. yoelii. From the same mass spectrometry (MS)/MS data set, we also captured several post-translational modified peptides by following a bioinformatics analysis without any prior enrichment. Further, we carried out a proteogenomic analysis, which resulted in improvements to some of the existing gene models along with the identification of several novel genes. Analysis of proteome and post-translational modifications (PTMs) together resulted in the identification of 3124 proteins. The identified PTMs were found to be enriched in mitochondrial metabolic pathways. Subsequent bioinformatics analysis provided an insight into proteins associated with metabolic regulatory mechanisms. Among these, the tricarboxylic acid (TCA) cycle and the isoprenoid synthesis pathway are found to be essential for parasite survival and drug resistance. The proteogenomic analysis discovered 43 novel protein-coding genes. The availability of an in-depth proteomic landscape of a malaria pathogen model will likely facilitate further molecular-level investigations on pre-erythrocytic stages of malaria.
Collapse
Affiliation(s)
- Devasahayam
Arokia Balaya Rex
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Arun H. Patil
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Prashant Kumar Modi
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Mrudula Kinarulla Kandiyil
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Sandeep Kasaragod
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Sneha M. Pinto
- Center
for Systems Biology and Molecular Medicine, Yenepoya Research Centre, Yenepoya (Deemed to be University), Mangalore 575018, India
| | - Nandita Tanneru
- CSIR-Centre
for Cellular and Molecular Biology, Hyderabad 500007, Telangana, India
| | - Puran Singh Sijwali
- CSIR-Centre
for Cellular and Molecular Biology, Hyderabad 500007, Telangana, India
- Academy
of Scientific and Innovative Research, Ghaziabad 201002, Uttar Pradesh, India
| | | |
Collapse
|
10
|
Huang KY, Tseng YJ, Kao HJ, Chen CH, Yang HH, Weng SL. Identification of subtypes of anticancer peptides based on sequential features and physicochemical properties. Sci Rep 2021; 11:13594. [PMID: 34193950 PMCID: PMC8245499 DOI: 10.1038/s41598-021-93124-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 06/08/2021] [Indexed: 11/25/2022] Open
Abstract
Anticancer peptides (ACPs) are a kind of bioactive peptides which could be used as a novel type of anticancer drug that has several advantages over chemistry-based drug, including high specificity, strong tumor penetration capacity, and low toxicity to normal cells. As the number of experimentally verified bioactive peptides has increased significantly, various of in silico approaches are imperative for investigating the characteristics of ACPs. However, the lack of methods for investigating the differences in physicochemical properties of ACPs. In this study, we compared the N- and C-terminal amino acid composition for each peptide, there are three major subtypes of ACPs that are defined based on the distribution of positively charged residues. For the first time, we were motivated to develop a two-step machine learning model for identification of the subtypes of ACPs, which classify the input data into the corresponding group before applying the classifier. Further, to improve the predictive power, the hybrid feature sets were considered for prediction. Evaluation by five-fold cross-validation showed that the two-step model trained with sequence-based features and physicochemical properties was most effective in discriminating between ACPs and non-ACPs. The two-step model trained with the hybrid features performed well, with a sensitivity of 86.75%, a specificity of 85.75%, an accuracy of 86.08%, and a Matthews Correlation Coefficient value of 0.703. Furthermore, the model also consistently provides the effective performance in independent testing set, with sensitivity of 77.6%, specificity of 94.74%, accuracy of 88.99% and the MCC value reached 0.75. Finally, the two-step model has been implemented as a web-based tool, namely iDACP, which is now freely available at http://mer.hc.mmh.org.tw/iDACP/ .
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City, 300, Taiwan
- Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan
| | - Yi-Jhan Tseng
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City, 300, Taiwan
| | - Hui-Ju Kao
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City, 300, Taiwan
| | - Chia-Hung Chen
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City, 300, Taiwan
| | - Hsiao-Hsiang Yang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City, 300, Taiwan
| | - Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City, 300, Taiwan.
- Mackay Junior College of Medicine, Medicine, Nursing and Management College, Taipei City, 112, Taiwan.
| |
Collapse
|
11
|
Xie L, Xiao Y, Meng F, Li Y, Shi Z, Qian K. Functions and Mechanisms of Lysine Glutarylation in Eukaryotes. Front Cell Dev Biol 2021; 9:667684. [PMID: 34249920 PMCID: PMC8264553 DOI: 10.3389/fcell.2021.667684] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 06/01/2021] [Indexed: 01/22/2023] Open
Abstract
Lysine glutarylation (Kglu) is a newly discovered post-translational modification (PTM), which is considered to be reversible, dynamic, and conserved in prokaryotes and eukaryotes. Recent developments in the identification of Kglu by mass spectrometry have shown that Kglu is mainly involved in the regulation of metabolism, oxidative damage, chromatin dynamics and is associated with various diseases. In this review, we firstly summarize the development history of glutarylation, the biochemical processes of glutarylation and deglutarylation. Then we focus on the pathophysiological functions such as glutaric acidemia 1, asthenospermia, etc. Finally, the current computational tools for predicting glutarylation sites are discussed. These emerging findings point to new functions for lysine glutarylation and related enzymes, and also highlight the mechanisms by which glutarylation regulates diverse cellular processes.
Collapse
Affiliation(s)
- Longxiang Xie
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Yafei Xiao
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Fucheng Meng
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Yongqiang Li
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Zhenyu Shi
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Keli Qian
- Infection Control Department, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
12
|
Dou L, Yang F, Xu L, Zou Q. A comprehensive review of the imbalance classification of protein post-translational modifications. Brief Bioinform 2021; 22:6217722. [PMID: 33834199 DOI: 10.1093/bib/bbab089] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 02/17/2021] [Accepted: 02/24/2021] [Indexed: 12/13/2022] Open
Abstract
Post-translational modifications (PTMs) play significant roles in regulating protein structure, activity and function, and they are closely involved in various pathologies. Therefore, the identification of associated PTMs is the foundation of in-depth research on related biological mechanisms, disease treatments and drug design. Due to the high cost and time consumption of high-throughput sequencing techniques, developing machine learning-based predictors has been considered an effective approach to rapidly recognize potential modified sites. However, the imbalanced distribution of true and false PTM sites, namely, the data imbalance problem, largely effects the reliability and application of prediction tools. In this article, we conduct a systematic survey of the research progress in the imbalanced PTMs classification. First, we describe the modeling process in detail and outline useful data imbalance solutions. Then, we summarize the recently proposed bioinformatics tools based on imbalanced PTM data and simultaneously build a convenient website, ImClassi_PTMs (available at lab.malab.cn/∼dlj/ImbClassi_PTMs/), to facilitate the researchers to view. Moreover, we analyze the challenges of current computational predictors and propose some suggestions to improve the efficiency of imbalance learning. We hope that this work will provide comprehensive knowledge of imbalanced PTM recognition and contribute to advanced predictors in the future.
Collapse
Affiliation(s)
- Lijun Dou
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Fenglong Yang
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
13
|
Huang KY, Hung FY, Kao HJ, Lau HH, Weng SL. iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features. BMC Bioinformatics 2020; 21:568. [PMID: 33297954 PMCID: PMC7727188 DOI: 10.1186/s12859-020-03916-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/30/2020] [Indexed: 11/24/2022] Open
Abstract
Background Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites.
Result We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites. Conclusion The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan.,Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan
| | - Fang-Yu Hung
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan
| | - Hui-Ju Kao
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan
| | - Hui-Hsuan Lau
- Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan. .,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan. .,Department of Obstetrics and Gynecology, Mackay Memorial Hospital, Taipei City 104, Taiwan.
| | - Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan. .,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan. .,Mackay Junior College of Medicine, Medicine, Nursing and Management College, Taipei City 112, Taiwan.
| |
Collapse
|
14
|
Wang R, Wang Z, Wang H, Pang Y, Lee TY. Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian. Sci Rep 2020; 10:20447. [PMID: 33235255 PMCID: PMC7686339 DOI: 10.1038/s41598-020-77173-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 11/03/2020] [Indexed: 12/14/2022] Open
Abstract
Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict efficiently only on histone or non-histone protein. Therefore, this work aims to give a more balanced performance in different species, here plant (non-histone) and mammalian (histone) are involved. SVM (support vector machine) and RF (random forest) were employed in this study. According to the results of cross-validations, the RF classifier based on EGAAC attribute achieved the best predictive performance which performs competitively good as existed methods, meanwhile more robust when dealing with imbalanced datasets. Moreover, an independent test was carried out, which compared the performance of this study and existed methods based on the same features or the same classifier. The classifiers of SVM and RF could achieve best performances with 92% sensitivity, 88% specificity, 90% accuracy, and an MCC of 0.80 in the mammalian dataset, and 77% sensitivity, 83% specificity, 70% accuracy and 0.54 MCC in a relatively small dataset of mammalian and a large-scaled plant dataset respectively. Moreover, a cross-species independent testing was also carried out in this study, which has proved the species diversity in plant and mammalian.
Collapse
Affiliation(s)
- Rulan Wang
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, People's Republic of China
| | - Zhuo Wang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, People's Republic of China.,School of Life Sciences, University of Science and Technology of China, Hefei, 230026, Anhui, People's Republic of China
| | - Hongfei Wang
- Department of Orthopaedics and Traumatology, The University of Hong Kong, Pok Fu Lam, Hong Kong
| | - Yuxuan Pang
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, People's Republic of China
| | - Tzong-Yi Lee
- School of Life and Health Sciences, The Chinese University of Hong Kong (Shenzhen), Shenzhen, 518172, Guangdong, People's Republic of China.
| |
Collapse
|
15
|
Liu X, Wang L, Li J, Hu J, Zhang X. Mal-Prec: computational prediction of protein Malonylation sites via machine learning based feature integration : Malonylation site prediction. BMC Genomics 2020; 21:812. [PMID: 33225896 PMCID: PMC7682087 DOI: 10.1186/s12864-020-07166-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 10/20/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Malonylation is a recently discovered post-translational modification that is associated with a variety of diseases such as Type 2 Diabetes Mellitus and different types of cancers. Compared with experimental identification of malonylation sites, computational method is a time-effective process with comparatively low costs. RESULTS In this study, we proposed a novel computational model called Mal-Prec (Malonylation Prediction) for malonylation site prediction through the combination of Principal Component Analysis and Support Vector Machine. One-hot encoding, physio-chemical properties, and composition of k-spaced acid pairs were initially performed to extract sequence features. PCA was then applied to select optimal feature subsets while SVM was adopted to predict malonylation sites. Five-fold cross-validation results showed that Mal-Prec can achieve better prediction performance compared with other approaches. AUC (area under the receiver operating characteristic curves) analysis achieved 96.47 and 90.72% on 5-fold cross-validation of independent data sets, respectively. CONCLUSION Mal-Prec is a computationally reliable method for identifying malonylation sites in protein sequences. It outperforms existing prediction tools and can serve as a useful tool for identifying and discovering novel malonylation sites in human proteins. Mal-Prec is coded in MATLAB and is publicly available at https://github.com/flyinsky6/Mal-Prec , together with the data sets used in this study.
Collapse
Affiliation(s)
- Xin Liu
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004 Jiangsu China
| | - Liang Wang
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004 Jiangsu China
- Jiangsu Key Laboratory of New Drug Research and Clinical Pharmacy, School of Pharmacy, Xuzhou Medical University, Xuzhou, 221000 Jiangsu China
| | - Jian Li
- School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA 70118 USA
| | - Junfeng Hu
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004 Jiangsu China
| | - Xiao Zhang
- Department of Bioinformatics, School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, 221004 Jiangsu China
| |
Collapse
|
16
|
Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier. J Proteome Res 2020; 20:191-201. [PMID: 33090794 DOI: 10.1021/acs.jproteome.0c00314] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Lysine glutarylation is a newly reported post-translational modification (PTM) that plays significant roles in regulating metabolic and mitochondrial processes. Accurate identification of protein glutarylation is the primary task to better investigate molecular functions and various applications. Due to the common disadvantages of the time-consuming and expensive nature of traditional biological sequencing techniques as well as the explosive growth of protein data, building precise computational models to rapidly diagnose glutarylation is a popular and feasible solution. In this work, we proposed a novel AdaBoost-based predictor called iGlu_AdaBoost to distinguish glutarylation and non-glutarylation sequences. Here, the top 37 features were chosen from a total of 1768 combined features using Chi2 following incremental feature selection (IFS) to build the model, including 188D, the composition of k-spaced amino acid pairs (CKSAAP), and enhanced amino acid composition (EAAC). With the help of the hybrid-sampling method SMOTE-Tomek, the AdaBoost algorithm was performed with satisfactory recall, specificity, and AUC values of 87.48%, 72.49%, and 0.89 over 10-fold cross validation as well as 72.73%, 71.92%, and 0.63 over independent test, respectively. Further feature analysis inferred that positively charged amino acids RK play critical roles in glutarylation recognition. Our model presented the well generalization ability and consistency of the prediction results of positive and negative samples, which is comparable to four published tools. The proposed predictor is an efficient tool to find potential glutarylation sites and provides helpful suggestions for further research on glutarylation mechanisms and concerned disease treatments.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150000, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| |
Collapse
|
17
|
Arafat ME, Ahmad MW, Shovan S, Dehzangi A, Dipta SR, Hasan MAM, Taherzadeh G, Shatabda S, Sharma A. Accurately Predicting Glutarylation Sites Using Sequential Bi-Peptide-Based Evolutionary Features. Genes (Basel) 2020; 11:E1023. [PMID: 32878321 PMCID: PMC7565944 DOI: 10.3390/genes11091023] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/19/2020] [Accepted: 08/27/2020] [Indexed: 02/07/2023] Open
Abstract
Post Translational Modification (PTM) is defined as the alteration of protein sequence upon interaction with different macromolecules after the translation process. Glutarylation is considered one of the most important PTMs, which is associated with a wide range of cellular functioning, including metabolism, translation, and specified separate subcellular localizations. During the past few years, a wide range of computational approaches has been proposed to predict Glutarylation sites. However, despite all the efforts that have been made so far, the prediction performance of the Glutarylation sites has remained limited. One of the main challenges to tackle this problem is to extract features with significant discriminatory information. To address this issue, we propose a new machine learning method called BiPepGlut using the concept of a bi-peptide-based evolutionary method for feature extraction. To build this model, we also use the Extra-Trees (ET) classifier for the classification purpose, which, to the best of our knowledge, has never been used for this task. Our results demonstrate BiPepGlut is able to significantly outperform previously proposed models to tackle this problem. BiPepGlut achieves 92.0%, 84.8%, 95.6%, 0.82, and 0.88 in accuracy, sensitivity, specificity, Matthew's Correlation Coefficient, and F1-score, respectively. BiPepGlut is implemented as a publicly available online predictor.
Collapse
Affiliation(s)
- Md. Easin Arafat
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Md. Wakil Ahmad
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - S.M. Shovan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh; (S.M.S.); (M.A.M.H.)
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA;
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Shubhashis Roy Dipta
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Md. Al Mehedi Hasan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh; (S.M.S.); (M.A.M.H.)
| | - Ghazaleh Taherzadeh
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD 20742, USA
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD 4111, Australia
- Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
| |
Collapse
|
18
|
Ju Z, Wang SY. Computational Identification of Lysine Glutarylation Sites Using Positive-Unlabeled Learning. Curr Genomics 2020; 21:204-211. [PMID: 33071614 PMCID: PMC7521029 DOI: 10.2174/1389202921666200511072327] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/27/2022] Open
Abstract
Background
As a new type of protein acylation modification, lysine glutarylation has been found to play a crucial role in metabolic processes and mitochondrial functions. To further explore the biological mechanisms and functions of glutarylation, it is significant to predict the potential glutarylation sites. In the existing glutarylation site predictors, experimentally verified glutarylation sites are treated as positive samples and non-verified lysine sites as the negative samples to train predictors. However, the non-verified lysine sites may contain some glutarylation sites which have not been experimentally identified yet. Methods
In this study, experimentally verified glutarylation sites are treated as the positive samples, whereas the remaining non-verified lysine sites are treated as unlabeled samples. A bioinformatics tool named PUL-GLU was developed to identify glutarylation sites using a positive-unlabeled learning algorithm. Results
Experimental results show that PUL-GLU significantly outperforms the current glutarylation site predictors. Therefore, PUL-GLU can be a powerful tool for accurate identification of protein glutarylation sites. Conclusion
A user-friendly web-server for PUL-GLU is available at http://bioinform.cn/pul_glu/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| |
Collapse
|
19
|
Ahmad S, Gromiha MM, Raghava GPS, Schönbach C, Ranganathan S. APBioNet's annual International Conference on Bioinformatics (InCoB) returns to India in 2018. BMC Genomics 2019; 19:266. [PMID: 30999857 PMCID: PMC7402400 DOI: 10.1186/s12864-019-5582-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
InCoB, one of the largest annual bioinformatics conferences in the Asia-Pacific region since its launch in 2002, returned to New Delhi, India after 12 years, with a conference attendance of 314 delegates. The 2018 conference had sessions on Big Data and Algorithms, Next Generation Sequencing and Omics Science, Structure, Function and Interactions, Disease and Drug Discovery and Plant and Agricultural Bioinformatics. The conference also featured an industry track as well as panel discussions on Women in Bioinformatics and Democratization vs. Quality control in academic publishing. Asia Pacific Bioinformatics Interaction & Networking Society (APbians) was launched as an APBionet Special Interest Group. Of the 52 oral presentations made, 22 were accepted in supplemental issues of BMC Bioinformatics, BMC Genomics or BMC Medical Genomics and are briefly reviewed here. Next year’s InCoB will be held in Jakarta, Indonesia from September 10–12, 2019.
Collapse
Affiliation(s)
- Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110 067, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamilnadu, 600 036, India
| | - Gajendra P S Raghava
- Centre for Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, 110020, India
| | - Christian Schönbach
- Department of Biology, School of Science and Technology, Nazarbayev University, Astana, Kazakhstan.,International Research Center for Medical Sciences, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, 860-0811, Japan
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, 2109, Australia. .,Transformational Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.
| |
Collapse
|