1
|
Zuo Y, Zhang B, Dong Y, He W, Bi Y, Liu X, Zeng X, Deng Z. Glypred: Lysine Glycation Site Prediction via CCU-LightGBM-BiLSTM Framework with Multi-Head Attention Mechanism. J Chem Inf Model 2024; 64:6699-6711. [PMID: 39121059 DOI: 10.1021/acs.jcim.4c01034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/11/2024]
Abstract
Glycation, a type of posttranslational modification, preferentially occurs on lysine and arginine residues, impairing protein functionality and altering characteristics. This process is linked to diseases such as Alzheimer's, diabetes, and atherosclerosis. Traditional wet lab experiments are time-consuming, whereas machine learning has significantly streamlined the prediction of protein glycation sites. Despite promising results, challenges remain, including data imbalance, feature redundancy, and suboptimal classifier performance. This research introduces Glypred, a lysine glycation site prediction model combining ClusterCentroids Undersampling (CCU), LightGBM, and bidirectional long short-term memory network (BiLSTM) methodologies, with an additional multihead attention mechanism integrated into the BiLSTM. To achieve this, the study undertakes several key steps: selecting diverse feature types to capture comprehensive protein information, employing a cluster-based undersampling strategy to balance the data set, using LightGBM for feature selection to enhance model performance, and implementing a bidirectional LSTM network for accurate classification. Together, these approaches ensure that Glypred effectively identifies glycation sites with high accuracy and robustness. For feature encoding, five distinct feature types─AAC, KMER, DR, PWAA, and EBGW─were selected to capture a broad spectrum of protein sequence and biological information. These encoded features were integrated and validated to ensure comprehensive protein information acquisition. To address the issue of highly imbalanced positive and negative samples, various undersampling algorithms, including random undersampling, NearMiss, edited nearest neighbor rule, and CCU, were evaluated. CCU was ultimately chosen to remove redundant nonglycated training data, establishing a balanced data set that enhances the model's accuracy and robustness. For feature selection, the LightGBM ensemble learning algorithm was employed to reduce feature dimensionality by identifying the most significant features. This approach accelerates model training, enhances generalization capabilities, and ensures good transferability of the model. Finally, a bidirectional long short-term memory network was used as the classifier, with a network structure designed to capture glycation modification site features from both forward and backward directions. To prevent overfitting, appropriate regularization parameters and dropout rates were introduced, achieving efficient classification. Experimental results show that Glypred achieved optimal performance. This model provides new insights for bioinformatics and encourages the application of similar strategies in other fields. A lysine glycation site prediction software tool was also developed using the PyQt5 library, offering researchers an auxiliary screening tool to reduce workload and improve efficiency. The software and data sets are available on GitHub: https://github.com/ZBYnb/Glypred.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Bangyi Zhang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Yinkang Dong
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| | - Wenying He
- School of Artificial Intelligence, Hebei University of Technology, Tianjin 300130, China
| | - Yue Bi
- Department of Biochemistry and Molecular Biology and Biomedicine Discovery Institute, Monash University, Clayton 3800, Australia
| | - Xiangrong Liu
- Department of Computer Science and Technology, National Institute for Data Science in Health and Medicine, Xiamen Key Laboratory of Intelligent Storage and Computing, Xiamen University, Xiamen 361005, China
| | - Xiangxiang Zeng
- School of Information Science and Engineering, Hunan University, Changsha 410012, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214000, China
| |
Collapse
|
2
|
Vizuete AFK, Fróes F, Seady M, Hansen F, Ligabue-Braun R, Gonçalves CA, Souza DO. A Mechanism of Action of Metformin in the Brain: Prevention of Methylglyoxal-Induced Glutamatergic Impairment in Acute Hippocampal Slices. Mol Neurobiol 2024; 61:3223-3239. [PMID: 37980327 DOI: 10.1007/s12035-023-03774-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 11/05/2023] [Indexed: 11/20/2023]
Abstract
Metformin, a biguanide compound (N-1,1-dimethylbiguanide), is widely prescribed for diabetes mellitus type 2 (T2D) treatment. It also presents a plethora of properties, such as anti-oxidant, anti-inflammatory, anti-apoptosis, anti-tumorigenic, and anti-AGE formation activity. However, the precise mechanism of action of metformin in the central nervous system (CNS) needs to be clarified. Herein, we investigated the neuroprotective role of metformin in acute hippocampal slices exposed to methylglyoxal (MG), a highly reactive dicarbonyl compound and a key molecule in T2D developmental pathophysiology. Metformin protected acute hippocampal slices from MG-induced glutamatergic neurotoxicity and neuroinflammation by reducing IL-1β synthesis and secretion and RAGE protein expression. The drug also improved astrocyte function, particularly with regard to the glutamatergic system, increasing glutamate uptake. Moreover, we observed a direct effect of metformin on glutamate transporters, where the compound prevented glycation, by facilitating enzymatic phosphorylation close to Lys residues, suggesting a new neuroprotective role of metformin via PKC ζ in preventing dysfunction in glutamatergic system induced by MG.
Collapse
Affiliation(s)
- Adriana Fernanda K Vizuete
- Laboratory of Calcium-Binding Proteins in the CNS, Department of Biochemistry, Institute of Basic Health Sciences, Universidade Federal Do Rio Grande Do Sul (UFRGS), Porto Alegre, RS, Brazil.
- Post Graduate Program in Biochemistry, Institute of Basic Health Sciences, UFRGS, Porto Alegre, RS, Brazil.
- Department of Biochemistry, Institute of Basic Health Sciences, UFRGS, Ramiro Barcelos, 2600-Anexo, Porto Alegre, RS, 90035-003, Brazil.
| | - Fernanda Fróes
- Laboratory of Calcium-Binding Proteins in the CNS, Department of Biochemistry, Institute of Basic Health Sciences, Universidade Federal Do Rio Grande Do Sul (UFRGS), Porto Alegre, RS, Brazil
- Post Graduate Program in Biochemistry, Institute of Basic Health Sciences, UFRGS, Porto Alegre, RS, Brazil
| | - Marina Seady
- Laboratory of Calcium-Binding Proteins in the CNS, Department of Biochemistry, Institute of Basic Health Sciences, Universidade Federal Do Rio Grande Do Sul (UFRGS), Porto Alegre, RS, Brazil
- Post Graduate Program in Biochemistry, Institute of Basic Health Sciences, UFRGS, Porto Alegre, RS, Brazil
| | - Fernanda Hansen
- Department of Nutrition, Health Sciences Center, Federal University of Santa Catarina, University Campus, Trindade, Florianópolis, Santa Catarina, 88040-900, Brazil
| | - Rodrigo Ligabue-Braun
- Department of Pharmacosciences, Federal University of Health Sciences of Porto Alegre (UFCSPA), Avenida Sarmento Leite 245, Porto Alegre, 90050-130, Brazil
| | - Carlos-Alberto Gonçalves
- Laboratory of Calcium-Binding Proteins in the CNS, Department of Biochemistry, Institute of Basic Health Sciences, Universidade Federal Do Rio Grande Do Sul (UFRGS), Porto Alegre, RS, Brazil
- Post Graduate Program in Biochemistry, Institute of Basic Health Sciences, UFRGS, Porto Alegre, RS, Brazil
- Department of Biochemistry, Institute of Basic Health Sciences, UFRGS, Ramiro Barcelos, 2600-Anexo, Porto Alegre, RS, 90035-003, Brazil
| | - Diogo O Souza
- Post Graduate Program in Biochemistry, Institute of Basic Health Sciences, UFRGS, Porto Alegre, RS, Brazil
- Department of Biochemistry, Institute of Basic Health Sciences, UFRGS, Ramiro Barcelos, 2600-Anexo, Porto Alegre, RS, 90035-003, Brazil
| |
Collapse
|
3
|
Jia J, Wu G, Li M. iGly-IDN: Identifying Lysine Glycation Sites in Proteins Based on Improved DenseNet. J Comput Biol 2024; 31:161-174. [PMID: 38016151 DOI: 10.1089/cmb.2023.0112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023] Open
Abstract
Lysine glycation is one of the most significant protein post-translational modifications, which changes the properties of the proteins and causes them to be dysfunctional. Accurately identifying glycation sites helps to understand the biological function and potential mechanism of glycation in disease treatments. Nonetheless, the experimental methods are ordinarily inefficient and costly, so effective computational methods need to be developed. In this study, we proposed the new model called iGly-IDN based on the improved densely connected convolutional networks (DenseNet). First, one hot encoding was adopted to obtain the original feature maps. Afterward, the improved DenseNet was adopted to capture feature information with the importance degrees during the feature learning. According to the experimental results, Acc reaches 66%, and Mathews correlation coefficient reaches 0.33 on the independent testing data set, which indicates that the iGly-IDN can provide more effective glycation site identification than the current predictors.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
- College of Modern Economics and Management, Jiangxi University of Finance and Economics, Nanchang, China
| | - Meifang Li
- School of Computer Information Engineering, Nanchang Institute of Technology, Nanchang, China
| |
Collapse
|
4
|
Moises JE, Regl C, Hinterholzer A, Huber CG, Schubert M. Unambiguous Identification of Glucose-Induced Glycation in mAbs and other Proteins by NMR Spectroscopy. Pharm Res 2023; 40:1341-1353. [PMID: 36510116 PMCID: PMC10338404 DOI: 10.1007/s11095-022-03454-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 11/30/2022] [Indexed: 12/14/2022]
Abstract
OBJECTIVE Glycation is a non-enzymatic and spontaneous post-translational modification (PTM) generated by the reaction between reducing sugars and primary amine groups within proteins. Because glycation can alter the properties of proteins, it is a critical quality attribute of therapeutic monoclonal antibodies (mAbs) and should therefore be carefully monitored. The most abundant product of glycation is formed by glucose and lysine side chains resulting in fructoselysine after Amadori rearrangement. In proteomics, which routinely uses a combination of chromatography and mass spectrometry to analyze PTMs, there is no straight-forward way to distinguish between glycation products of a reducing monosaccharide and an additional hexose within a glycan, since both lead to a mass difference of 162 Da. METHODS To verify that the observed mass change is indeed a glycation product, we developed an approach based on 2D NMR spectroscopy spectroscopy and full-length protein samples denatured using high concentrations of deuterated urea. RESULTS The dominating β-pyranose form of the Amadori product shows a characteristic chemical shift correlation pattern in 1H-13C HSQC spectra suited to identify glucose-induced glycation. The same pattern was observed in spectra of a variety of artificially glycated proteins, including two mAbs, as well as natural proteins. CONCLUSION Based on this unique correlation pattern, 2D NMR spectroscopy can be used to unambiguously identify glucose-induced glycation in any protein of interest. We provide a robust method that is orthogonal to MS-based methods and can also be used for cross-validation.
Collapse
Affiliation(s)
- Jennifer E Moises
- Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria
| | - Christof Regl
- Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria
- Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria
| | - Arthur Hinterholzer
- Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria
- Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria
| | - Christian G Huber
- Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria
- Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria
| | - Mario Schubert
- Department of Biosciences and Medical Biology, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria.
- Christian Doppler Laboratory for Innovative Tools for Biosimilar Characterization, University of Salzburg, Hellbrunner Strasse 34, 5020, Salzburg, Austria.
| |
Collapse
|
5
|
Que-Salinas U, Martinez-Peon D, Reyes-Figueroa AD, Ibarra I, Scheckhuber CQ. On the Prediction of In Vitro Arginine Glycation of Short Peptides Using Artificial Neural Networks. SENSORS (BASEL, SWITZERLAND) 2022; 22:5237. [PMID: 35890916 PMCID: PMC9324327 DOI: 10.3390/s22145237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 07/08/2022] [Accepted: 07/11/2022] [Indexed: 06/15/2023]
Abstract
One of the hallmarks of diabetes is an increased modification of cellular proteins. The most prominent type of modification stems from the reaction of methylglyoxal with arginine and lysine residues, leading to structural and functional impairments of target proteins. For lysine glycation, several algorithms allow a prediction of occurrence; thus, making it possible to pinpoint likely targets. However, according to our knowledge, no approaches have been published for predicting the likelihood of arginine glycation. There are indications that arginine and not lysine is the most prominent target for the toxic dialdehyde. One of the reasons why there is no arginine glycation predictor is the limited availability of quantitative data. Here, we used a recently published high-quality dataset of arginine modification probabilities to employ an artificial neural network strategy. Despite the limited data availability, our results achieve an accuracy of about 75% of correctly predicting the exact value of the glycation probability of an arginine-containing peptide without setting thresholds upon whether it is decided if a given arginine is modified or not. This contribution suggests a solution for predicting arginine glycation of short peptides.
Collapse
Affiliation(s)
- Ulices Que-Salinas
- Centro de Ciencias de la Tierra, Universidad Veracruzana, Xalapa 91090, VER, Mexico;
| | - Dulce Martinez-Peon
- Department of Electrical and Electronic Engineering, National Technological Institute of Mexico/IT, Monterrey 67170, NL, Mexico;
| | - Angel D. Reyes-Figueroa
- Consejo Nacional de Ciencia y Tecnología, Av. Insurgentes Sur 1582, Col. Crédito Constructor, Benito Juárez, Mexico City 03940, DF, Mexico;
- Centro de Investigación en Matemáticas Unidad Monterrey, Parque de Investigación e Innovación Tecnológica (PIIT), Av. Alianza Centro No. 502, Apodaca 66628, NL, Mexico
| | - Ivonne Ibarra
- Independent Researcher, Monterrey 66620, NL, Mexico;
| | - Christian Quintus Scheckhuber
- Departamento de Bioingeniería, Escuela de Ingeniería y Ciencias, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, Monterrey 64849, NL, Mexico
| |
Collapse
|
6
|
Suresh SA, Ethiraj S, Rajnish KN. A systematic review of recent trends in research on therapeutically significant L-asparaginase and acute lymphoblastic leukemia. Mol Biol Rep 2022; 49:11281-11287. [PMID: 35816224 DOI: 10.1007/s11033-022-07688-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 06/08/2022] [Indexed: 12/01/2022]
Abstract
L-asparaginases are mostly obtained from bacterial sources for their application in the therapy and food industry. Bacterial L-asparaginases are employed in the treatment of Acute Lymphoblastic Leukemia (ALL) and its subtypes, a type of blood and bone marrow cancer that results in the overproduction of immature blood cells. It also plays a role in the food industry in reducing the acrylamide formed during baking, roasting, and frying starchy foods. This importance of the enzyme makes it to be of constant interest to the researchers to isolate novel sources. Presently L-asparaginases from E. coli native and PEGylated form, Dickeya chrysanthemi (Erwinia chrysanthemi) are in the treatment regime. In therapy, the intrinsic glutaminase activity of the enzyme is a major drawback as the patients in treatment experience side effects like fever, skin rashes, anaphylaxis, pancreatitis, steatosis in the liver, and many complications. Its significance in the food industry in mitigating acrylamide is also a major reason. Acrylamide, a potent carcinogen was formed when treating starchy foods at higher temperatures. Acrylamide content in food was analyzed and pre-treatment was considered a valuable option. Immobilization of the enzyme is an advancing and promising technique in the effective delivery of the enzyme than in free form. The concept of machine learning by employing the Artificial Network and Genetic Algorithm has paved the way to optimize the production of L-asparaginase from its sources. Gene-editing tools are gaining momentum in the study of several diseases and this review focuses on the CRISPR-Cas9 gene-editing tool in ALL.
Collapse
Affiliation(s)
| | | | - K N Rajnish
- SRM Institute of Science and Technology, Chennai, Tamil Nadu, India.
| |
Collapse
|
7
|
Liu Y, Liu Y, Wang GA, Cheng Y, Bi S, Zhu X. BERT-Kgly: A Bidirectional Encoder Representations From Transformers (BERT)-Based Model for Predicting Lysine Glycation Site for Homo sapiens. FRONTIERS IN BIOINFORMATICS 2022; 2:834153. [PMID: 36304324 PMCID: PMC9580886 DOI: 10.3389/fbinf.2022.834153] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 01/20/2022] [Indexed: 12/21/2022] Open
Abstract
As one of the most important posttranslational modifications (PTMs), protein lysine glycation changes the characteristics of the proteins and leads to the dysfunction of the proteins, which may cause diseases. Accurately detecting the glycation sites is of great benefit for understanding the biological function and potential mechanism of glycation in the treatment of diseases. However, experimental methods are expensive and time-consuming for lysine glycation site identification. Instead, computational methods, with their higher efficiency and lower cost, could be an important supplement to the experimental methods. In this study, we proposed a novel predictor, BERT-Kgly, for protein lysine glycation site prediction, which was developed by extracting embedding features of protein segments from pretrained Bidirectional Encoder Representations from Transformers (BERT) models. Three pretrained BERT models were explored to get the embeddings with optimal representability, and three downstream deep networks were employed to build our models. Our results showed that the model based on embeddings extracted from the BERT model pretrained on 556,603 protein sequences of UniProt outperforms other models. In addition, an independent test set was used to evaluate and compare our model with other existing methods, which indicated that our model was superior to other existing models.
Collapse
Affiliation(s)
| | | | | | | | - Shoudong Bi
- *Correspondence: Shoudong Bi, ; Xiaolei Zhu,
| | - Xiaolei Zhu
- *Correspondence: Shoudong Bi, ; Xiaolei Zhu,
| |
Collapse
|
8
|
Dehzangi I, Sharma A, Shatabda S. iProtGly-SS: A Tool to Accurately Predict Protein Glycation Site Using Structural-Based Features. Methods Mol Biol 2022; 2499:125-134. [PMID: 35696077 DOI: 10.1007/978-1-0716-2317-6_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Posttranslational modification (PTM) is an important biological mechanism to promote functional diversity among the proteins. So far, a wide range of PTMs has been identified. Among them, glycation is considered as one of the most important PTMs. Glycation is associated with different neurological disorders including Parkinson and Alzheimer. It is also shown to be responsible for different diseases, including vascular complications of diabetes mellitus. Despite all the efforts have been made so far, the prediction performance of glycation sites using computational methods remains limited. Here we present a newly developed machine learning tool called iProtGly-SS that utilizes sequential and structural information as well as Support Vector Machine (SVM) classifier to enhance lysine glycation site prediction accuracy. The performance of iProtGly-SS was investigated using the three most popular benchmarks used for this task. Our results demonstrate that iProtGly-SS is able to achieve 81.61%, 93.62%, and 92.95% prediction accuracies on these benchmarks, which are significantly better than those results reported in the previous studies. iProtGly-SS is implemented as a web-based tool which is publicly available at http://brl.uiu.ac.bd/iprotgly-ss/ .
Collapse
Affiliation(s)
- Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA.
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, Australia.
- Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
| |
Collapse
|
9
|
βLact-Pred: A Predictor Developed for Identification of Beta-Lactamases Using Statistical Moments and PseAAC via 5-Step Rule. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:8974265. [PMID: 34956358 PMCID: PMC8709780 DOI: 10.1155/2021/8974265] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/22/2021] [Indexed: 12/02/2022]
Abstract
Beta-lactamase (β-lactamase) produced by different bacteria confers resistance against β-lactam-containing drugs. The gene encoding β-lactamase is plasmid-borne and can easily be transferred from one bacterium to another during conjugation. By such transformations, the recipient also acquires resistance against the drugs of the β-lactam family. β-Lactam antibiotics play a vital significance in clinical treatment of disastrous diseases like soft tissue infections, gonorrhoea, skin infections, urinary tract infections, and bronchitis. Herein, we report a prediction classifier named as βLact-Pred for the identification of β-lactamase proteins. The computational model uses the primary amino acid sequence structure as its input. Various metrics are derived from the primary structure to form a feature vector. Experimentally determined data of positive and negative beta-lactamases are collected and transformed into feature vectors. An operating algorithm based on the artificial neural network is used by integrating the position relative features and sequence statistical moments in PseAAC for training the neural networks. The results for the proposed computational model were validated by employing numerous types of approach, i.e., self-consistency testing, jackknife testing, cross-validation, and independent testing. The overall accuracy of the predictor for self-consistency, jackknife testing, cross-validation, and independent testing presents 99.76%, 96.07%, 94.20%, and 91.65%, respectively, for the proposed model. Stupendous experimental results demonstrated that the proposed predictor “βLact-Pred” has surpassed results from the existing methods.
Collapse
|
10
|
Islam MKB, Rahman J, Hasan MAM, Ahmad S. predForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance. Comput Biol Chem 2021; 94:107553. [PMID: 34384997 DOI: 10.1016/j.compbiolchem.2021.107553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 06/22/2021] [Accepted: 07/28/2021] [Indexed: 10/20/2022]
Abstract
Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5% sensitivity, 99.8% specificity and 99.8% overall accuracy with AUC of 0.999 in the jackknife test. In the independent test, it has also achieved more than 97% sensitivity and 99% specificity. Similarly, in benchmarking with recent method CKSAAP_FormSite, the proposed predictor significantly outperformed in all the measures, particularly sensitivity by around 20%, specificity by nearly 30% and overall accuracy by more than 22%. These experimental results show that the proposed predForm-Site can be used as a complementary tool for the fast exploration of formylation sites. For convenience of the scientific community, predForm-Site has been deployed as an online tool, accessible at http://103.99.176.239:8080/predForm-Site.
Collapse
Affiliation(s)
- Md Khaled Ben Islam
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; Department of Computer Science & Engineering, Pabna University of Science and Technology, Pabna, Bangladesh.
| | - Julia Rahman
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh.
| | - Md Al Mehedi Hasan
- Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Department of Computer Science & Engineering, Rajshahi University, Rajshahi, Bangladesh
| |
Collapse
|
11
|
McEwen JM, Fraser S, Guir ALS, Dave J, Scheck RA. Synergistic sequence contributions bias glycation outcomes. Nat Commun 2021; 12:3316. [PMID: 34083524 PMCID: PMC8175500 DOI: 10.1038/s41467-021-23625-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 05/06/2021] [Indexed: 12/30/2022] Open
Abstract
The methylglyoxal-derived hydroimidazolone isomer, MGH-1, is an abundant advanced glycation end-product (AGE) associated with disease and age-related disorders. As AGE formation occurs spontaneously and without an enzyme, it remains unknown why certain sites on distinct proteins become modified with specific AGEs. Here, we use a combinatorial peptide library to determine the chemical features that favor MGH-1. When properly positioned, tyrosine is found to play an active mechanistic role that facilitates MGH-1 formation. This work offers mechanistic insight connecting multiple AGEs, including MGH-1 and carboxyethylarginine (CEA), and reconciles the role of negative charge in influencing glycation outcomes. Further, this study provides clear evidence that glycation outcomes can be influenced through long- or medium-range cooperative interactions. This work demonstrates that these chemical features also predictably template selective glycation on full-length protein targets expressed in mammalian cells. This information is vital for developing methods that control glycation in living cells and will enable the study of glycation as a functional post-translational modification. Advanced glycation end-products (AGEs), such as methylglyoxal-derived hydroimidazolone isomer (MGH-1), are associated with disease and age-related disorders, and occur spontaneously, so it is unclear why specific protein sites become modified with specific AGEs. Here, the authors use a combinatorial peptide library to determine the chemical features that favour MGH-1 formation for short peptides and demonstrate a key role of tyrosine in this process.
Collapse
Affiliation(s)
| | - Sasha Fraser
- Department of Chemistry, Tufts University, Medford, MA, USA
| | | | - Jaydev Dave
- Department of Chemistry, Tufts University, Medford, MA, USA
| | | |
Collapse
|
12
|
Dou L, Yang F, Xu L, Zou Q. A comprehensive review of the imbalance classification of protein post-translational modifications. Brief Bioinform 2021; 22:6217722. [PMID: 33834199 DOI: 10.1093/bib/bbab089] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 02/17/2021] [Accepted: 02/24/2021] [Indexed: 12/13/2022] Open
Abstract
Post-translational modifications (PTMs) play significant roles in regulating protein structure, activity and function, and they are closely involved in various pathologies. Therefore, the identification of associated PTMs is the foundation of in-depth research on related biological mechanisms, disease treatments and drug design. Due to the high cost and time consumption of high-throughput sequencing techniques, developing machine learning-based predictors has been considered an effective approach to rapidly recognize potential modified sites. However, the imbalanced distribution of true and false PTM sites, namely, the data imbalance problem, largely effects the reliability and application of prediction tools. In this article, we conduct a systematic survey of the research progress in the imbalanced PTMs classification. First, we describe the modeling process in detail and outline useful data imbalance solutions. Then, we summarize the recently proposed bioinformatics tools based on imbalanced PTM data and simultaneously build a convenient website, ImClassi_PTMs (available at lab.malab.cn/∼dlj/ImbClassi_PTMs/), to facilitate the researchers to view. Moreover, we analyze the challenges of current computational predictors and propose some suggestions to improve the efficiency of imbalance learning. We hope that this work will provide comprehensive knowledge of imbalanced PTM recognition and contribute to advanced predictors in the future.
Collapse
Affiliation(s)
- Lijun Dou
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Fenglong Yang
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
13
|
Yang Y, Wang H, Li W, Wang X, Wei S, Liu Y, Xu Y. Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks. BMC Bioinformatics 2021; 22:171. [PMID: 33789579 PMCID: PMC8010967 DOI: 10.1186/s12859-021-04101-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 03/23/2021] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein's function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. METHOD We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. RESULTS In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN . CONCLUSIONS The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.
Collapse
Affiliation(s)
- Yingxi Yang
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Hui Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China
| | - Wen Li
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaobo Wang
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Shizhao Wei
- No. 15 Research Institute, China Electronics Technology Group Corporation, Beijing, 100083, China
| | - Yulong Liu
- No. 15 Research Institute, China Electronics Technology Group Corporation, Beijing, 100083, China
| | - Yan Xu
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China.
| |
Collapse
|
14
|
Yao Y, Zhao X, Ning Q, Zhou J. ABC-Gly: Identifying Protein Lysine Glycation Sites with Artificial Bee Colony Algorithm. CURR PROTEOMICS 2021. [DOI: 10.2174/1570164617666191227120136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Glycation is a nonenzymatic post-translational modification process by attaching
a sugar molecule to a protein or lipid molecule. It may impair the function and change the characteristic
of the proteins which may lead to some metabolic diseases. In order to understand the underlying molecular
mechanisms of glycation, computational prediction methods have been developed because of their
convenience and high speed. However, a more effective computational tool is still a challenging task in
computational biology.
Methods:
In this study, we showed an accurate identification tool named ABC-Gly for predicting lysine
glycation sites. At first, we utilized three informative features, including position-specific amino
acid propensity, secondary structure and the composition of k-spaced amino acid pairs to encode the
peptides. Moreover, to sufficiently exploit discriminative features thus can improve the prediction and
generalization ability of the model, we developed a two-step feature selection, which combined the
Fisher score and an improved binary artificial bee colony algorithm based on the support vector machine.
Finally, based on the optimal feature subset, we constructed an effective model by using the
Support Vector Machine on the training dataset.
Results:
The performance of the proposed predictor ABC-Gly was measured with the sensitivity of
76.43%, the specificity of 91.10%, the balanced accuracy of 83.76%, the Area Under the receiveroperating
characteristic Curve (AUC) of 0.9313, a Matthew’s Correlation Coefficient (MCC) of
0.6861 by 10-fold cross-validation on training dataset, and a balanced accuracy of 59.05% on independent
dataset. Compared to the state-of-the-art predictors on the training dataset, the proposed predictor
achieved significant improvement in the AUC of 0.156 and MCC of 0.336.
Conclusion:
The detailed analysis results indicated that our predictor may serve as a powerful complementary
tool to other existing methods for predicting protein lysine glycation. The source code and
datasets of the ABC-Gly were provided in the Supplementary File 1.
Collapse
Affiliation(s)
- Yanqiu Yao
- College of Computer Science and Technology, Changchun Normal University, Changchun, 130032, China
| | - Xiaosa Zhao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, China
| | - Qiao Ning
- School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, China
| | - Junping Zhou
- School of Computer Science and Information Technology, Northeast Normal University, Changchun, 130117, China
| |
Collapse
|
15
|
Ju Z, Wang SY. Prediction of Neddylation Sites Using the Composition of k-spaced Amino Acid Pairs and Fuzzy SVM. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191114123453] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Introduction:
Neddylation is the process of ubiquitin-like protein NEDD8 attaching
substrate lysine via isopeptide bonds. As a highly dynamic and reversible post-translational
modification, lysine neddylation has been found to be involved in various biological processes and
closely associated with many diseases.
Objective:
The accurate identification of neddylation sites is necessary to elucidate the underlying
molecular mechanisms of neddylation. As traditional experimental methods are often expensive
and time-consuming, it is imperative to design computational methods to identify neddylation
sites.
Methods:
In this study, a novel predictor named CKSAAP_NeddSite is developed to detect
neddylation sites. An effective feature encoding technology, the composition of k-spaced amino
acid pairs, is used to encode neddylation sites. And the F-score feature selection method is adopted
to remove the redundant features. Moreover, a fuzzy support vector machine algorithm is
employed to overcome the class imbalance and noise problem.
Results:
As illustrated by 10-fold cross-validation, CKSAAP_NeddSite achieves an AUC of
0.9848. Independent tests also show that CKSAAP_NeddSite significantly outperforms existing
neddylation sites predictor. Therefore, CKSAAP_NeddSite can be a useful bioinformatics tool for
the prediction of neddylation sites. Feature analysis shows that some residues around neddylation
sites may play an important role in the prediction.
Conclusion:
The results of analysis and prediction could offer useful information for elucidating
the molecular mechanisms of neddylation. A user-friendly web-server for CKSAAP_NeddSite is
established at 123.206.31.171/CKSAAP_NeddSite.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, Shenyang 110136, China
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, Shenyang 110136, China
| |
Collapse
|
16
|
Ju Z, Wang SY. Identify Lysine Neddylation Sites Using Bi-profile Bayes Feature Extraction via the Chou's 5-steps Rule and General Pseudo Components. Curr Genomics 2019; 20:592-601. [PMID: 32581647 PMCID: PMC7290059 DOI: 10.2174/1389202921666191223154629] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Revised: 10/19/2019] [Accepted: 11/07/2019] [Indexed: 01/06/2023] Open
Abstract
Introduction Neddylation is a highly dynamic and reversible post-translational modification. The abnormality of neddylation has previously been shown to be closely related to some human diseases. The detection of neddylation sites is essential for elucidating the regulation mechanisms of protein neddylation. Objective As the detection of the lysine neddylation sites by the traditional experimental method is often expensive and time-consuming, it is imperative to design computational methods to identify neddylation sites. Methods In this study, a bioinformatics tool named NeddPred is developed to identify underlying protein neddylation sites. A bi-profile bayes feature extraction is used to encode neddylation sites and a fuzzy support vector machine model is utilized to overcome the problem of noise and class imbalance in the prediction. Results Matthew's correlation coefficient of NeddPred achieved 0.7082 and an area under the receiver operating characteristic curve of 0.9769. Independent tests show that NeddPred significantly outperforms existing lysine neddylation sites predictor NeddyPreddy. Conclusion Therefore, NeddPred can be a complement to the existing tools for the prediction of neddylation sites. A user-friendly webserver for NeddPred is accessible at 123.206.31.171/NeddPred/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| |
Collapse
|
17
|
Chen Z, Liu X, Li F, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu D, Smith AI, Li L, Chou KC, Song J. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 2019; 20:2267-2290. [PMID: 30285084 PMCID: PMC6954452 DOI: 10.1093/bib/bby089] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/17/2018] [Accepted: 08/18/2018] [Indexed: 12/22/2022] Open
Abstract
Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Xuhan Liu
- Medicinal Chemistry, Leiden Academic Centre for Drug Research,Einsteinweg, Leiden, The Netherlands
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- Institute of Molecular Systems Biology, ETH Zürich,Auguste-Piccard-Hof, Zürich, Switzerland
| | - Tatiana Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research,Kyoto University, Uji, Kyoto, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Dakang Xu
- Faculty of Medical Laboratory Science, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Molecular and Translational Science, Faculty of Medicine, Hudson Institute of Medical Research, Monash University, Melbourne, VIC, Australia
| | - Alexander Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Lei Li
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
18
|
Delmar JA, Wang J, Choi SW, Martins JA, Mikhail JP. Machine Learning Enables Accurate Prediction of Asparagine Deamidation Probability and Rate. MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT 2019; 15:264-274. [PMID: 31890727 PMCID: PMC6923510 DOI: 10.1016/j.omtm.2019.09.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Accepted: 09/16/2019] [Indexed: 12/20/2022]
Abstract
The spontaneous conversion of asparagine residues to aspartic acid or iso-aspartic acid, via deamidation, is a major pathway of protein degradation and is often seriously disruptive to biological systems. Deamidation has been shown to negatively affect both in vitro stability and in vivo biological function of diverse classes of proteins. During protein therapeutics development, deamidation liabilities that are overlooked necessitate expensive and time-consuming remediation strategies, sometimes leading to termination of the project. In this paper, we apply machine learning to a large (n = 776) liquid chromatography-tandem mass spectrometry (LC-MS/MS) dataset of monoclonal antibody peptides to create computational models for the post-translational modification asparagine deamidation, using the random decision forest method. We show that our categorical model predicts antibody deamidation with nearly 5% increased accuracy and 0.2 MCC over the best currently available models. Surprisingly, our model also paces or outperforms advanced and conventional models on an independent non-antibody dataset. In addition to deamidation probability, we are able to accurately predict deamidation rate (R2 = 0.963 and Q2 = 0.822), a capability with no peer in current models. This method should enable significant improvement in protein candidate selection, especially in biopharmaceutical development, and can be applied with similar accuracy to enzymes, monoclonal antibodies, next-generation formats, vaccine component antigens, and gene therapy vectors such as adeno-associated virus.
Collapse
Affiliation(s)
- Jared A Delmar
- Analytical Sciences, Biopharmaceutical Development, AstraZeneca, One MedImmune Way, Gaithersburg, MD 20878, USA
| | - Jihong Wang
- Analytical Sciences, Biopharmaceutical Development, AstraZeneca, One MedImmune Way, Gaithersburg, MD 20878, USA
| | - Seo Woo Choi
- David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Jason A Martins
- David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - John P Mikhail
- David H. Koch School of Chemical Engineering Practice, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| |
Collapse
|
19
|
Sanghvi VR, Leibold J, Mina M, Mohan P, Berishaj M, Li Z, Miele MM, Lailler N, Zhao C, de Stanchina E, Viale A, Akkari L, Lowe SW, Ciriello G, Hendrickson RC, Wendel HG. The Oncogenic Action of NRF2 Depends on De-glycation by Fructosamine-3-Kinase. Cell 2019; 178:807-819.e21. [PMID: 31398338 PMCID: PMC6693658 DOI: 10.1016/j.cell.2019.07.031] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 06/23/2019] [Accepted: 07/17/2019] [Indexed: 12/28/2022]
Abstract
The NRF2 transcription factor controls a cell stress program that is implicated in cancer and there is great interest in targeting NRF2 for therapy. We show that NRF2 activity depends on Fructosamine-3-kinase (FN3K)-a kinase that triggers protein de-glycation. In its absence, NRF2 is extensively glycated, unstable, and defective at binding to small MAF proteins and transcriptional activation. Moreover, the development of hepatocellular carcinoma triggered by MYC and Keap1 inactivation depends on FN3K in vivo. N-acetyl cysteine treatment partially rescues the effects of FN3K loss on NRF2 driven tumor phenotypes indicating a key role for NRF2-mediated redox balance. Mass spectrometry reveals that other proteins undergo FN3K-sensitive glycation, including translation factors, heat shock proteins, and histones. How glycation affects their functions remains to be defined. In summary, our study reveals a surprising role for the glycation of cellular proteins and implicates FN3K as targetable modulator of NRF2 activity in cancer.
Collapse
Affiliation(s)
- Viraj R Sanghvi
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Josef Leibold
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Marco Mina
- Department of Computational Biology, University of Lausanne, 1005 Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), 1005 Lausanne, Switzerland
| | - Prathibha Mohan
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Marjan Berishaj
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Zhuoning Li
- Microchemistry and Proteomics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Matthew M Miele
- Microchemistry and Proteomics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Nathalie Lailler
- Integrated Genomics Operation, Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Chunying Zhao
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Elisa de Stanchina
- Antitumor Assessment Core and Molecular Pharmacology Department, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Agnes Viale
- Integrated Genomics Operation, Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Leila Akkari
- Oncode Institute, Tumor Biology and Immunology division, the Netherlands Cancer Institute, 1006 BE, Amsterdam, the Netherlands
| | - Scott W Lowe
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Howard Hughes Medical Institute, New York, NY 10065, USA
| | - Giovanni Ciriello
- Department of Computational Biology, University of Lausanne, 1005 Lausanne, Switzerland; Swiss Institute of Bioinformatics (SIB), 1005 Lausanne, Switzerland
| | - Ronald C Hendrickson
- Microchemistry and Proteomics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Molecular Pharmacology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Hans-Guido Wendel
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
20
|
Feng X, Li J, Li H, Chen H, Li F, Liu Q, You ZH, Zhou F. Age Is Important for the Early-Stage Detection of Breast Cancer on Both Transcriptomic and Methylomic Biomarkers. Front Genet 2019; 10:212. [PMID: 30984234 PMCID: PMC6448048 DOI: 10.3389/fgene.2019.00212] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Accepted: 02/27/2019] [Indexed: 12/27/2022] Open
Abstract
Patients at different ages have different rates of cell development and metabolisms. As a result, age should be an essential part of how a disease diagnosis model is trained and optimized. Unfortunately, most of the existing studies have not taken age into account. This study demonstrated that disease diagnosis models could be improved by merely applying individual models for patients of different age groups. Both transcriptomes and methylomes of the TCGA breast cancer dataset (TCGA-BRCA) were utilized for the analysis procedure of feature selection and classification. Our experimental data strongly suggested that disease diagnosis modeling should integrate patient age into the whole experimental design.
Collapse
Affiliation(s)
- Xin Feng
- BioKnow Health Informatics Lab, College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Jialiang Li
- BioKnow Health Informatics Lab, College of Software, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Han Li
- BioKnow Health Informatics Lab, College of Software, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Hang Chen
- BioKnow Health Informatics Lab, College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Fei Li
- BioKnow Health Informatics Lab, College of Software, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Quewang Liu
- BioKnow Health Informatics Lab, College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Fengfeng Zhou
- BioKnow Health Informatics Lab, College of Computer Science and Technology, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.,BioKnow Health Informatics Lab, College of Software, Jilin University, Changchun, China.,Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
21
|
Reddy HM, Sharma A, Dehzangi A, Shigemizu D, Chandra AA, Tsunoda T. GlyStruct: glycation prediction using structural properties of amino acid residues. BMC Bioinformatics 2019; 19:547. [PMID: 30717650 PMCID: PMC7394324 DOI: 10.1186/s12859-018-2547-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 11/29/2018] [Indexed: 02/06/2023] Open
Abstract
Background Glycation is a one of the post-translational modifications (PTM) where sugar molecules and residues in protein sequences are covalently bonded. It has become one of the clinically important PTM in recent times attributed to many chronic and age related complications. Being a non-enzymatic reaction, it is a great challenge when it comes to its prediction due to the lack of significant bias in the sequence motifs. Results We developed a classifier, GlyStruct based on support vector machine, to predict glycated and non-glycated lysine residues using structural properties of amino acid residues. The features used were secondary structure, accessible surface area and the local backbone torsion angles. For this work, a benchmark dataset was extracted containing 235 glycated and 303 non-glycated lysine residues. GlyStruct demonstrated improved performance of approximately 10% in comparison to benchmark method of Gly-PseAAC. The performance for GlyStruct on the metrics, sensitivity, specificity, accuracy and Mathew’s correlation coefficient were 0.7013, 0.7989, 0.7562, and 0.5065, respectively for 10-fold cross-validation. Conclusion Glycation has emerged to be one of the clinically important PTM of proteins in recent times. Therefore, the development of computational tools become necessary to predict glycation, which could help medical professionals administer drugs and manage patients more effectively. The proposed predictor manages to classify glycated and non-glycated lysine residues with promising results consistently on various cross-validation schemes and outperforms other state of the art methods. Electronic supplementary material The online version of this article (10.1186/s12859-018-2547-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Alok Sharma
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji. .,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan. .,Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia. .,CREST, JST, Tokyo, Japan.
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD, USA
| | - Daichi Shigemizu
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan.,CREST, JST, Tokyo, Japan.,Division of Genomic Medicine, Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Aichi, Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| | | | - Tatushiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Tokyo, Japan.,CREST, JST, Tokyo, Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
22
|
Yu J, Shi S, Zhang F, Chen G, Cao M. PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization. Bioinformatics 2018; 35:2749-2756. [DOI: 10.1093/bioinformatics/bty1043] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 12/13/2018] [Accepted: 12/20/2018] [Indexed: 01/22/2023] Open
Abstract
Abstract
Motivation
Protein glycation is a familiar post-translational modification (PTM) which is a two-step non-enzymatic reaction. Glycation not only impairs the function but also changes the characteristics of the proteins so that it is related to many human diseases. It is still much more difficult to systematically detect glycation sites due to the glycated residues without crucial patterns. Computational approaches, which can filter supposed sites prior to experimental verification, can extremely increase the efficiency of experiment work. However, the previous lysine glycation prediction method uses a small number of training datasets. Hence, the model is not generalized or pervasive.
Results
By searching from a new database, we collected a large dataset in Homo sapiens. PredGly, a novel software, can predict lysine glycation sites for H.sapiens, which was developed by combining multiple features. In addition, XGboost was adopted to optimize feature vectors and to improve the model performance. Through comparing various classifiers, support vector machine achieved an optimal performance. On the basis of a new independent test set, PredGly outperformed other glycation tools. It suggests that PredGly can provide more instructive guidance for further experimental research of lysine glycation.
Availability and implementation
https://github.com/yujialinncu/PredGly
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jialin Yu
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Fang Zhang
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Guodong Chen
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| | - Man Cao
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, China
| |
Collapse
|
23
|
Dehzangi A, López Y, Taherzadeh G, Sharma A, Tsunoda T. SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure. Molecules 2018; 23:E3260. [PMID: 30544729 PMCID: PMC6320791 DOI: 10.3390/molecules23123260] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Revised: 11/30/2018] [Accepted: 12/05/2018] [Indexed: 12/13/2022] Open
Abstract
Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD 21251, USA.
| | - Yosvany López
- Genesis Institute of Genetic Research, Genesis Healthcare Co., Tokyo 150-6015, Japan.
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane 4111, Australia.
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.
- CREST, JST, Tokyo 102-0076, Japan.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.
- CREST, JST, Tokyo 102-0076, Japan.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
| |
Collapse
|
24
|
Yang Y, Wang H, Ding J, Xu Y. iAcet-Sumo: Identification of lysine acetylation and sumoylation sites in proteins by multi-class transformation methods. Comput Biol Med 2018; 100:144-151. [DOI: 10.1016/j.compbiomed.2018.07.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/30/2018] [Accepted: 07/08/2018] [Indexed: 11/16/2022]
|
25
|
Islam MM, Saha S, Rahman MM, Shatabda S, Farid DM, Dehzangi A. iProtGly-SS: Identifying protein glycation sites using sequence and structure based features. Proteins 2018; 86:777-789. [PMID: 29675975 DOI: 10.1002/prot.25511] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 02/27/2018] [Accepted: 04/14/2018] [Indexed: 12/20/2022]
Abstract
Glycation is chemical reaction by which sugar molecule bonds with a protein without the help of enzymes. This is often cause to many diseases and therefore the knowledge about glycation is very important. In this paper, we present iProtGly-SS, a protein lysine glycation site identification method based on features extracted from sequence and secondary structural information. In the experiments, we found the best feature groups combination: Amino Acid Composition, Secondary Structure Motifs, and Polarity. We used support vector machine classifier to train our model and used an optimal set of features using a group based forward feature selection technique. On standard benchmark datasets, our method is able to significantly outperform existing methods for glycation prediction. A web server for iProtGly-SS is implemented and publicly available to use: http://brl.uiu.ac.bd/iprotgly-ss/.
Collapse
Affiliation(s)
- Md Mofijul Islam
- Department of CSE, University of Dhaka, Dhaka, Bangladesh.,Department of CSE, United International University, Dhaka, Bangladesh
| | - Sanjay Saha
- Department of CSE, United International University, Dhaka, Bangladesh
| | | | - Swakkhar Shatabda
- Department of CSE, United International University, Dhaka, Bangladesh
| | - Dewan Md Farid
- Department of CSE, United International University, Dhaka, Bangladesh
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD, 21251, USA
| |
Collapse
|
26
|
Ren Y, Zhao S, Jiang D, Feng X, Zhang Y, Wei Z, Wang Z, Zhang W, Zhou QF, Li Y, Hou H, Xu Y, Zhou F. Proteomic biomarkers for lung cancer progression. Biomark Med 2018; 12:205-215. [DOI: 10.2217/bmm-2018-0015] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Aim: Lung adenocarcinoma (LUAD) and lung squamous-cell carcinoma (LUSC) are two major subtypes of lung cancer and constitute about 70% of all the lung cancer cases. The patient's lifespan and living quality will be significantly improved if they are diagnosed at an early stage and adequately treated. Methods & results: This study comprehensively screened the proteomic dataset of both LUAD and LUSC, and proposed classification models for the progression stages of LUAD and LUSC with accuracies 86.51 and 89.47%, respectively. Discussion & conclusion: A comparative analysis was also carried out on related transcriptomic datasets, which indicates that the proposed biomarkers provide discerning power for accurate stage prediction, and will be improved when larger-scale proteomic quantitative technologies become available.
Collapse
Affiliation(s)
- Yanjiao Ren
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Shishun Zhao
- Center for Applied Statistical Research, College of Mathematics, Jilin University, Changchun, Jilin 130012, PR China
| | - Dandan Jiang
- Center for Applied Statistical Research, College of Mathematics, Jilin University, Changchun, Jilin 130012, PR China
| | - Xin Feng
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Yexian Zhang
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Zhipeng Wei
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Zhongyu Wang
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Wenniu Zhang
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| | - Qing F Zhou
- School of Electrical Engineering & Intelligentization, Dongguan University of Technology, Dongguan 523000, PR China
| | - Yong Li
- Department of Electronic Engineering, Tsinghua University, Beijing 100084, PR China
| | - Hanxu Hou
- School of Electrical Engineering & Intelligentization, Dongguan University of Technology, Dongguan 523000, PR China
| | - Ying Xu
- Computational Systems Biology Lab, Department of Biochemistry & Molecular Biology, University of Georgia, Athens, GA 30602, USA
- College of Computer Science & Technology, & College of Public Health, Jilin University, Changchun, Jilin 130012, PR China
| | - Fengfeng Zhou
- College of Computer Science & Technology, & Key Laboratory of Symbolic Computation & Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, PR China
| |
Collapse
|
27
|
Zhao X, Zhao X, Bao L, Zhang Y, Dai J, Yin M. Glypre: In Silico Prediction of Protein Glycation Sites by Fusing Multiple Features and Support Vector Machine. Molecules 2017; 22:molecules22111891. [PMID: 29099805 PMCID: PMC6150326 DOI: 10.3390/molecules22111891] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2017] [Accepted: 10/26/2017] [Indexed: 12/22/2022] Open
Abstract
Glycation is a non-enzymatic process occurring inside or outside the host body by attaching a sugar molecule to a protein or lipid molecule. It is an important form of post-translational modification (PTM), which impairs the function and changes the characteristics of the proteins so that the identification of the glycation sites may provide some useful guidelines to understand various biological functions of proteins. In this study, we proposed an accurate prediction tool, named Glypre, for lysine glycation. Firstly, we used multiple informative features to encode the peptides. These features included the position scoring function, secondary structure, AAindex, and the composition of k-spaced amino acid pairs. Secondly, the distribution of distinctive features of the residues surrounding the glycation and non-glycation sites was statistically analysed. Thirdly, based on the distribution of these features, we developed a new predictor by using different optimal window sizes for different properties and a two-step feature selection method, which utilized the maximum relevance minimum redundancy method followed by a greedy feature selection procedure. The performance of Glypre was measured with a sensitivity of 57.47%, a specificity of 90.78%, an accuracy of 79.68%, area under the receiver-operating characteristic (ROC) curve (AUC) of 0.86, and a Matthews’s correlation coefficient (MCC) of 0.52 by 10-fold cross-validation. The detailed analysis results showed that our predictor may play a complementary role to other existing methods for identifying protein lysine glycation. The source code and datasets of the Glypre are available in the Supplementary File.
Collapse
Affiliation(s)
- Xiaowei Zhao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| | - Xiaosa Zhao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China.
| | - Lingling Bao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China.
| | - Yonggang Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
| | - Jiangyan Dai
- School of Computer Engineering, Weifang University, Weifang 261061, China.
| | - Minghao Yin
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China.
| |
Collapse
|
28
|
Ju Z, Sun J, Li Y, Wang L. Predicting lysine glycation sites using bi-profile bayes feature extraction. Comput Biol Chem 2017; 71:98-103. [PMID: 29040908 DOI: 10.1016/j.compbiolchem.2017.10.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 09/14/2017] [Accepted: 10/07/2017] [Indexed: 12/21/2022]
Abstract
Glycation is a nonenzymatic post-translational modification which has been found to be involved in various biological processes and closely associated with many metabolic diseases. The accurate identification of glycation sites is important to understand the underlying molecular mechanisms of glycation. As the traditional experimental methods are often labor-intensive and time-consuming, it is desired to develop computational methods to predict glycation sites. In this study, a novel predictor named BPB_GlySite is proposed to predict lysine glycation sites by using bi-profile bayes feature extraction and support vector machine algorithm. As illustrated by 10-fold cross-validation, BPB_GlySite achieves a satisfactory performance with a Sensitivity of 63.68%, a Specificity of 72.60%, an Accuracy of 69.63% and a Matthew's correlation coefficient of 0.3499. Experimental results also indicate that BPB_GlySite significantly outperforms three existing glycation sites predictors: NetGlycate, PreGly and Gly-PseAAC. Therefore, BPB_GlySite can be a useful bioinformatics tool for the prediction of glycation sites. A user-friendly web-server for BPB_GlySite is established at 123.206.31.171/BPB_GlySite/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China.
| | - Juhe Sun
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| | - Yanjie Li
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| | - Li Wang
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| |
Collapse
|