1
|
Adejor J, Tumukunde E, Li G, Lin H, Xie R, Wang S. Impact of Lysine Succinylation on the Biology of Fungi. Curr Issues Mol Biol 2024; 46:1020-1046. [PMID: 38392183 PMCID: PMC10888112 DOI: 10.3390/cimb46020065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 02/24/2024] Open
Abstract
Post-translational modifications (PTMs) play a crucial role in protein functionality and the control of various cellular processes and secondary metabolites (SMs) in fungi. Lysine succinylation (Ksuc) is an emerging protein PTM characterized by the addition of a succinyl group to a lysine residue, which induces substantial alteration in the chemical and structural properties of the affected protein. This chemical alteration is reversible, dynamic in nature, and evolutionarily conserved. Recent investigations of numerous proteins that undergo significant succinylation have underscored the potential significance of Ksuc in various biological processes, encompassing normal physiological functions and the development of certain pathological processes and metabolites. This review aims to elucidate the molecular mechanisms underlying Ksuc and its diverse functions in fungi. Both conventional investigation techniques and predictive tools for identifying Ksuc sites were also considered. A more profound comprehension of Ksuc and its impact on the biology of fungi have the potential to unveil new insights into post-translational modification and may pave the way for innovative approaches that can be applied across various clinical contexts in the management of mycotoxins.
Collapse
Affiliation(s)
- John Adejor
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Elisabeth Tumukunde
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Guoqi Li
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Hong Lin
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Rui Xie
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Shihua Wang
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| |
Collapse
|
2
|
Ahmed FF, Podder A, Bulbul MF, Hossain MA, Hasan M, Sarkar MAR, Kim D. Investigating the Precise Identification of Citrullination Sites with High- Performance Score Metrics Using a Powerful Computation Predicting Tool. Comb Chem High Throughput Screen 2024; 27:1381-1393. [PMID: 37702240 DOI: 10.2174/1386207326666230912151932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 06/18/2023] [Accepted: 08/02/2023] [Indexed: 09/14/2023]
Abstract
BACKGROUND To elucidate the detailed mechanisms of citrullination at the molecular level and design drugs applicable to major human diseases, predicting protein citrullination sites (PCSs) is essential. Using experimental approaches to predict PCSs is time-consuming and costly. However, there is a limited scope of the current PCS predictors. In particular, most predictors are commonly used for PCS prediction and have limited performance scores. OBJECTIVE This work aims to provide an improved sophisticated predictor of citrullination sites using a benchmark dataset in a machine learning platform. METHODS This study presents a reliable citrullination site predictor based on a benchmark dataset containing a 1:1 ratio of positive and negative samples. We classified citrullination sites using the Composition of the K-Spaced Amino Acid Pairs (CKSAAP) and Support Vector Machine (SVM). RESULTS We developed PCS predictors using integrated machine-learning methods that produced the highest average scores. Using 10-fold cross-validation on test datasets, the True Positive Rate (TPR) was 98.34%, the True Negative Rate (TNR) was 99.44%, the accuracy was 98.89%, the Mathew Correlation Coefficient (MCC) was 98.21%, the Area Under the ROC Curve (AUC) was 0.999, and the partial Area Under the ROC Curve (pAUC) was 0.1968. CONCLUSION According to overall performance, our developed predictor has a significantly higher implementation in comparison with the current tools on the same benchmark dataset. Moreover, it showed better performance metrics on both test and training datasets. Our developed predictor is promising and can be implemented as a complementary technique for identifying fast and precise citrullination sites.
Collapse
Affiliation(s)
- Fee Faysal Ahmed
- Department of Mathematics, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Anamika Podder
- Department of Mathematics, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Farhad Bulbul
- Department of Mathematics, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
- Department of Computer Science & Engineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam, Pohang 37673, Korea
| | - Md Amzad Hossain
- Department of Electrical and Electronic Engineering, Jashore University of Science and Technology, Jashore -7408, Bangladesh
| | - Mahedi Hasan
- Department of Computer Science and Engineering, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Abdur Rauf Sarkar
- Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore 7408, Bangladesh
| | - Daijin Kim
- Department of Computer Science & Engineering, Pohang University of Science and Technology (POSTECH), 77 Cheongam, Pohang 37673, Korea
| |
Collapse
|
3
|
Ahmed SS, Rifat ZT, Rahman MS, Rahman MS. Succinylated lysine residue prediction revisited. Brief Bioinform 2023; 24:6865109. [PMID: 36460620 DOI: 10.1093/bib/bbac510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/30/2022] [Accepted: 10/25/2022] [Indexed: 12/04/2022] Open
Abstract
Lysine succinylation is a kind of post-translational modification (PTM) that plays a crucial role in regulating the cellular processes. Aberrant succinylation may cause inflammation, cancers, metabolism diseases and nervous system diseases. The experimental methods to detect succinylation sites are time-consuming and costly. This thus calls for computational models with high efficacy, and attention has been given in the literature to develop such models, albeit with only moderate success in the context of different evaluation metrics. One crucial aspect in this context is the biochemical and physicochemical properties of amino acids, which appear to be useful as features for such computational predictors. However, some of the existing computational models did not use the biochemical and physicochemical properties of amino acids. In contrast, some others used them without considering the inter-dependency among the properties. The combinations of biochemical and physicochemical properties derived through our optimization process achieve better results than the results achieved by combining all the properties. We propose three deep learning architectures: CNN+Bi-LSTM (CBL), Bi-LSTM+CNN (BLC) and their combination (CBL_BLC). We find that CBL_BLC outperforms the other two. Ensembling of different models successfully improves the results. Notably, tuning the threshold of the ensemble classifiers further improves the results. Upon comparing our work with other existing works on two datasets, we successfully achieve better sensitivity and specificity by varying the threshold value.
Collapse
Affiliation(s)
- Shehab Sarar Ahmed
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, 1000, Dhaka, Bangladesh
| | - Zaara Tasnim Rifat
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, 1000, Dhaka, Bangladesh
| | - M Saifur Rahman
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, 1000, Dhaka, Bangladesh
| | - M Sohel Rahman
- Computer Science and Engineering, Bangladesh University of Engineering and Technology, 1000, Dhaka, Bangladesh
| |
Collapse
|
4
|
Jia J, Wu G, Li M, Qiu W. pSuc-EDBAM: Predicting lysine succinylation sites in proteins based on ensemble dense blocks and an attention module. BMC Bioinformatics 2022; 23:450. [PMID: 36316638 PMCID: PMC9620660 DOI: 10.1186/s12859-022-05001-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 10/25/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lysine succinylation is a newly discovered protein post-translational modifications. Predicting succinylation sites helps investigate the metabolic disease treatments. However, the biological experimental approaches are costly and inefficient, it is necessary to develop efficient computational approaches. RESULTS In this paper, we proposed a novel predictor based on ensemble dense blocks and an attention module, called as pSuc-EDBAM, which adopted one hot encoding to derive the feature maps of protein sequences, and generated the low-level feature maps through 1-D CNN. Afterward, the ensemble dense blocks were used to capture feature information at different levels in the process of feature learning. We also introduced an attention module to evaluate the importance degrees of different features. The experimental results show that Acc reaches 74.25%, and MCC reaches 0.2927 on the testing dataset, which suggest that the pSuc-EDBAM outperforms the existing predictors. CONCLUSIONS The experimental results of ten-fold cross-validation on the training dataset and independent test on the testing dataset showed that pSuc-EDBAM outperforms the existing succinylation site predictors and can predict potential succinylation sites effectively. The pSuc-EDBAM is feasible and obtains the credible predictive results, which may also provide valuable references for other related research. To make the convenience of the experimental scientists, a user-friendly web server has been established ( http://bioinfo.wugenqiang.top/pSuc-EDBAM/ ), by which the desired results can be easily obtained.
Collapse
Affiliation(s)
- Jianhua Jia
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 China
| | - Genqiang Wu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 China
| | - Meifang Li
- Computer Department, Nanchang Institute of Technology, Nanchang, 330044 China
| | - Wangren Qiu
- Computer Department, Jingdezhen Ceramic University, Jingdezhen, 333403 China
| |
Collapse
|
5
|
Liu X, Xu LL, Lu YP, Yang T, Gu XY, Wang L, Liu Y. Deep_KsuccSite: A novel deep learning method for the identification of lysine succinylation sites. Front Genet 2022; 13:1007618. [PMID: 36246655 PMCID: PMC9557156 DOI: 10.3389/fgene.2022.1007618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open
Abstract
Identification of lysine (symbol Lys or K) succinylation (Ksucc) sites centralizes the basis for disclosing the mechanism and function of lysine succinylation modifications. Traditional experimental methods for Ksucc site ientification are often costly and time-consuming. Therefore, it is necessary to construct an efficient computational method to prediction the presence of Ksucc sites in protein sequences. In this study, we proposed a novel and effective predictor for the identification of Ksucc sites based on deep learning algorithms that was termed as Deep_KsuccSite. The predictor adopted Composition, Transition, and Distribution (CTD) Composition (CTDC), Enhanced Grouped Amino Acid Composition (EGAAC), Amphiphilic Pseudo-Amino Acid Composition (APAAC), and Embedding Encoding methods to encode peptides, then constructed three base classifiers using one-dimensional (1D) convolutional neural network (CNN) and 2D-CNN, and finally utilized voting method to get the final results. K-fold cross-validation and independent testing showed that Deep_KsuccSite could serve as an effective tool to identify Ksucc sites in protein sequences. In addition, the ablation experiment results based on voting, feature combination, and model architecture showed that Deep_KsuccSite could make full use of the information of different features to construct an effective classifier. Taken together, we developed Deep_KsuccSite in this study, which was based on deep learning algorithm and could achieved better prediction accuracy than current methods for lysine succinylation sites. The code and dataset involved in this methodological study are permanently available at the URL https://github.com/flyinsky6/Deep_KsuccSite.
Collapse
Affiliation(s)
- Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| | - Lin-Lin Xu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Ya-Ping Lu
- College of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Ting Yang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Xin-Yu Gu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Liang Wang
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| | - Yong Liu
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| |
Collapse
|
6
|
Xia Y, Jiang M, Luo Y, Feng G, Jia G, Zhang H, Wang P, Ge R. SuccSPred2.0: A Two-Step Model to Predict Succinylation Sites Based on Multifeature Fusion and Selection Algorithm. J Comput Biol 2022; 29:1085-1094. [PMID: 35714347 DOI: 10.1089/cmb.2022.0109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein succinylation is a novel type of post-translational modification in recent decade years. It played an important role in biological structure and functions verified by experiments. However, it is time consuming and laborious for the wet experimental identification of succinylation sites. Traditional technology cannot adapt to the rapid growth of the biological sequence data sets. In this study, a new computational method named SuccSPred2.0 was proposed to identify succinylation sites in the protein sequences based on multifeature fusion and maximal information coefficient (MIC) method. SuccSPred2.0 was implemented based on a two-step strategy. At first, high-dimension features were reduced by linear discriminant analysis to prevent overfitting. Subsequently, MIC method was employed to select the important features binding classifiers to predict succinylation sites. From the compared experiments on 10-fold cross-validation and independent test data sets, SuccSPred2.0 obtained promising improvements. Comparative experiments showed that SuccSPred2.0 was superior to previous tools in identifying succinylation sites in the given proteins.
Collapse
Affiliation(s)
- Yixiao Xia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Minchao Jiang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Yizhang Luo
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Guanwen Feng
- Xi'an Key Laboratory of Big Data and Intelligent Vision, School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Gangyong Jia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Hua Zhang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Pu Wang
- Computer School, Hubei University of Arts and Science, Xiangyang, China
| | - Ruiquan Ge
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| |
Collapse
|
7
|
Wang H, Zhao H, Zhang J, Han J, Liu Z. A parallel model of DenseCNN and ordered-neuron LSTM for generic and species-specific succinylation site prediction. Biotechnol Bioeng 2022; 119:1755-1767. [PMID: 35320585 DOI: 10.1002/bit.28091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 03/12/2022] [Accepted: 03/19/2022] [Indexed: 11/07/2022]
Abstract
Lysine succinylation (Ksucc) regulates various metabolic processes, participates in vital life processes, ans is involved in the occurrence and development of numerous diseases. Accurate recognition of succinylation sites can reveal underlying functional mechanisms and pathogenesis. However, most remain undetected. Moreover, a deep learning architecture focusing on generic and species-specific predictions is still lacking. Thus, we proposed a deep learning-based framework named Deep-Ksucc, combining a dense convolutional network (DenseCNN) and ordered-neuron long short-term memory (OnLSTM) in parallel, which took the cascading characteristics of sequence information and physicochemical properties as the input. The results of the generic and species-specific predictions indicated that Deep-Ksucc can identify sequence patterns of different organisms and recognize plenty of succinylation sites. The case study showed that Deep-Ksucc can serve as a reliable tool for biology verification and computer-aided recognition of succinylation sites. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Hong Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jing Zhang
- Engineering Training Center, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jiale Han
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Zhihao Liu
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| |
Collapse
|
8
|
Zhang D, Wang S. A protein succinylation sites prediction method based on the hybrid architecture of LSTM network and CNN. J Bioinform Comput Biol 2022; 20:2250003. [PMID: 35191361 DOI: 10.1142/s0219720022500032] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The succinylation modification of protein participates in the regulation of a variety of cellular processes. Identification of modified substrates with precise sites is the basis for understanding the molecular mechanism and regulation of succinylation. In this work, we picked and chose five superior feature codes: CKSAAP, ACF, BLOSUM62, AAindex, and one-hot, according to their performance in the problem of succinylation sites prediction. Then, LSTM network and CNN were used to construct four models: LSTM-CNN, CNN-LSTM, LSTM, and CNN. The five selected features were, respectively, input into each of these four models for training to compare the four models. Based on the performance of each model, the optimal model among them was chosen to construct a hybrid model DeepSucc that was composed of five sub-modules for integrating heterogeneous information. Under the 10-fold cross-validation, the hybrid model DeepSucc achieves 86.26% accuracy, 84.94% specificity, 87.57% sensitivity, 0.9406 AUC, and 0.7254 MCC. When compared with other prediction tools using an independent test set, DeepSucc outperformed them in sensitivity and MCC. The datasets and source codes can be accessed at https://github.com/1835174863zd/DeepSucc.
Collapse
Affiliation(s)
- Die Zhang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, P. R. China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming 650504, P. R. China
| |
Collapse
|
9
|
Tasmia SA, Kibria MK, Tuly KF, Islam MA, Khatun MS, Hasan MM, Mollah MNH. Prediction of serine phosphorylation sites mapping on Schizosaccharomyces Pombe by fusing three encoding schemes with the random forest classifier. Sci Rep 2022; 12:2632. [PMID: 35173235 PMCID: PMC8850546 DOI: 10.1038/s41598-022-06529-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 02/01/2022] [Indexed: 11/08/2022] Open
Abstract
Serine phosphorylation is one type of protein post-translational modifications (PTMs), which plays an essential role in various cellular processes and disease pathogenesis. Numerous methods are used for the prediction of phosphorylation sites. However, the traditional wet-lab based experimental approaches are time-consuming, laborious, and expensive. In this work, a computational predictor was proposed to predict serine phosphorylation sites mapping on Schizosaccharomyces pombe (SP) by the fusion of three encoding schemes namely k-spaced amino acid pair composition (CKSAAP), binary and amino acid composition (AAC) with the random forest (RF) classifier. So far, the proposed method is firstly developed to predict serine phosphorylation sites for SP. Both the training and independent test performance scores were used to investigate the success of the proposed RF based fusion prediction model compared to others. We also investigated their performances by 5-fold cross-validation (CV). In all cases, it was observed that the recommended predictor achieves the largest scores of true positive rate (TPR), true negative rate (TNR), accuracy (ACC), Mathew coefficient of correlation (MCC), Area under the ROC curve (AUC) and pAUC (partial AUC) at false positive rate (FPR) = 0.20. Thus, the prediction performance as discussed in this paper indicates that the proposed approach may be a beneficial and motivating computational resource for predicting serine phosphorylation sites in the case of Fungi. The online interface of the software for the proposed prediction model is publicly available at http://mollah-bioinformaticslab-stat.ru.ac.bd/PredSPS/ .
Collapse
Affiliation(s)
- Samme Amena Tasmia
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Kaderi Kibria
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Khanis Farhana Tuly
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Ariful Islam
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Mst Shamima Khatun
- Department of Microbiology and Immunology, Tulane University School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Md Nurul Haque Mollah
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
10
|
Iannetta AA, Hicks LM. Maximizing Depth of PTM Coverage: Generating Robust MS Datasets for Computational Prediction Modeling. Methods Mol Biol 2022; 2499:1-41. [PMID: 35696073 DOI: 10.1007/978-1-0716-2317-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Post-translational modifications (PTMs) regulate complex biological processes through the modulation of protein activity, stability, and localization. Insights into the specific modification type and localization within a protein sequence can help ascertain functional significance. Computational models are increasingly demonstrated to offer a low-cost, high-throughput method for comprehensive PTM predictions. Algorithms are optimized using existing experimental PTM data, thus accurate prediction performance relies on the creation of robust datasets. Herein, advancements in mass spectrometry-based proteomics technologies to maximize PTM coverage are reviewed. Further, requisite experimental validation approaches for PTM predictions are explored to ensure that follow-up mechanistic studies are focused on accurate modification sites.
Collapse
Affiliation(s)
- Anthony A Iannetta
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Leslie M Hicks
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| |
Collapse
|
11
|
pQLyCar: Peptide-based dynamic query-driven sample rescaling strategy for identifying carboxylation sites combined with KNN and SVM. Anal Biochem 2021; 633:114386. [PMID: 34543644 DOI: 10.1016/j.ab.2021.114386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 09/02/2021] [Accepted: 09/14/2021] [Indexed: 11/23/2022]
Abstract
Lysine carboxylation is one of the most crucial type of post-translation modification, which plays a significant role in catalytic mechanisms. Therefore, it is essential to study lysine carboxylation and explore its biological mechanism. Compared with traditional experimental methods that are labor-intensive and time-consuming, computational methods are much more convenience and faster. Therefore, it is urgent to establish an accurate carboxylation identification model. Herein we proposed a method, named pQLyCar for identification of lysine carboxylation using SVM as classifier. In pQLyCar, a peptide-based dynamic query-driven sample rescaling strategy (pDQD-SR) is proposed to address the class imbalance of training data, which builds a specific prediction model for each query sample. KNN algorithm calculates distance between samples according to original sequences instead of feature vectors. Information entropy is applied to select optimal size of sliding window and various types of sequence- and position-based features are incorporated for construction of feature space, including residues composition (RC), K-space and position-special amino acid propensity (PSAAP). Finally, the performance of pQLyCar is measured with a specificity of 96.49% and a sensibility of 99.59% using jackknife test method, which indicated that pQLyCar method can be a useful tool for prediction of lysine carboxylation sites.
Collapse
|
12
|
Charoenkwan P, Chiangjong W, Hasan MM, Nantasenamat C, Shoombuatong W. Review and comparative analysis of machine learning-based predictors for predicting and analyzing of anti-angiogenic peptides. Curr Med Chem 2021; 29:849-864. [PMID: 34375178 DOI: 10.2174/0929867328666210810145806] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]
Abstract
Cancer is one of the leading causes of death worldwide and underlying this is angiogenesis that represents one of the hallmarks of cancer. Ongoing effort is already under way in the discovery of anti-angiogenic peptides (AAPs) as a promising therapeutic route by tackling the formation of new blood vessels. As such, the identification of AAPs constitutes a viable path for understanding their mechanistic properties pertinent for the discovery of new anti-cancer drugs. In spite of the abundance of peptide sequences in public databases, experimental efforts in the identification of anti-angiogenic peptides have progressed very slowly owing to its high expenditures and laborious nature. Owing to its inherent ability to make sense of large volumes of data, machine learning (ML) represents a lucrative technique that can be harnessed for peptide-based drug discovery. In this review, we conducted a comprehensive and comparative analysis of ML-based AAP predictors in terms of their employed feature descriptors, ML algorithms, cross-validation methods and prediction performance. Moreover, the common framework of these AAP predictors and their inherent weaknesses are also discussed. Particularly, we explore future perspectives for improving the prediction accuracy and model interpretability, which represents an interesting avenue for overcoming some of the inherent weaknesses of existing AAP predictors. We anticipate that this review would assist researchers in the rapid screening and identification of promising AAPs for clinical use.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, United States
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
13
|
Charoenkwan P, Anuwongcharoen N, Nantasenamat C, Hasan MM, Shoombuatong W. In Silico Approaches for the Prediction and Analysis of Antiviral Peptides: A Review. Curr Pharm Des 2021; 27:2180-2188. [PMID: 33138759 DOI: 10.2174/1381612826666201102105827] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 08/20/2020] [Indexed: 11/22/2022]
Abstract
In light of the growing resistance toward current antiviral drugs, efforts to discover novel and effective antiviral therapeutic agents remain a pressing scientific effort. Antiviral peptides (AVPs) represent promising therapeutic agents due to their extraordinary advantages in terms of potency, efficacy and pharmacokinetic properties. The growing volume of newly discovered peptide sequences in the post-genomic era requires computational approaches for timely and accurate identification of AVPs. Machine learning (ML) methods such as random forest and support vector machine represent robust learning algorithms that are instrumental in successful peptide-based drug discovery. Therefore, this review summarizes the current state-of-the-art application of ML methods for identifying AVPs directly from the sequence information. We compare the efficiency of these methods in terms of the underlying characteristics of the dataset used along with feature encoding methods, ML algorithms, cross-validation methods and prediction performance. Finally, guidelines for the development of robust AVP models are also discussed. It is anticipated that this review will serve as a useful guide for the design and development of robust AVP and related therapeutic peptide predictors in the future.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nuttapat Anuwongcharoen
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| |
Collapse
|
14
|
Wang H, Zhao H, Yan Z, Zhao J, Han J. MDCAN-Lys: A Model for Predicting Succinylation Sites Based on Multilane Dense Convolutional Attention Network. Biomolecules 2021; 11:biom11060872. [PMID: 34208298 PMCID: PMC8231176 DOI: 10.3390/biom11060872] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 05/30/2021] [Accepted: 06/07/2021] [Indexed: 12/26/2022] Open
Abstract
Lysine succinylation is an important post-translational modification, whose abnormalities are closely related to the occurrence and development of many diseases. Therefore, exploring effective methods to identify succinylation sites is helpful for disease treatment and research of related drugs. However, most existing computational methods for the prediction of succinylation sites are still based on machine learning. With the increasing volume of data and complexity of feature representations, it is necessary to explore effective deep learning methods to recognize succinylation sites. In this paper, we propose a multilane dense convolutional attention network, MDCAN-Lys. MDCAN-Lys extracts sequence information, physicochemical properties of amino acids, and structural properties of proteins using a three-way network, and it constructs feature space. For each sub-network, MDCAN-Lys uses the cascading model of dense convolutional block and convolutional block attention module to capture feature information at different levels and improve the abstraction ability of the network. The experimental results of 10-fold cross-validation and independent testing show that MDCAN-Lys can recognize more succinylation sites, which is consistent with the conclusion of the case study. Thus, it is worthwhile to explore deep learning-based methods for the recognition of succinylation sites.
Collapse
|
15
|
LSTMCNNsucc: A Bidirectional LSTM and CNN-Based Deep Learning Method for Predicting Lysine Succinylation Sites. BIOMED RESEARCH INTERNATIONAL 2021; 2021:9923112. [PMID: 34159204 PMCID: PMC8188601 DOI: 10.1155/2021/9923112] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/25/2021] [Accepted: 05/03/2021] [Indexed: 11/17/2022]
Abstract
Lysine succinylation is a typical protein post-translational modification and plays a crucial role of regulation in the cellular process. Identifying succinylation sites is fundamental to explore its functions. Although many computational methods were developed to deal with this challenge, few considered semantic relationship between residues. We combined long short-term memory (LSTM) and convolutional neural network (CNN) into a deep learning method for predicting succinylation site. The proposed method obtained a Matthews correlation coefficient of 0.2508 on the independent test, outperforming state of the art methods. We also performed the enrichment analysis of succinylation proteins. The results showed that functions of succinylation were conserved across species but differed to a certain extent with species. On basis of the proposed method, we developed a user-friendly web server for predicting succinylation sites.
Collapse
|
16
|
Dong Y, Li P, Li P, Chen C. First comprehensive analysis of lysine succinylation in paper mulberry (Broussonetia papyrifera). BMC Genomics 2021; 22:255. [PMID: 33838656 PMCID: PMC8035759 DOI: 10.1186/s12864-021-07567-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/26/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Lysine succinylation is a naturally occurring post-translational modification (PTM) that is ubiquitous in organisms. Lysine succinylation plays important roles in regulating protein structure and function as well as cellular metabolism. Global lysine succinylation at the proteomic level has been identified in a variety of species; however, limited information on lysine succinylation in plant species, especially paper mulberry, is available. Paper mulberry is not only an important plant in traditional Chinese medicine, but it is also a tree species with significant economic value. Paper mulberry is found in the temperate and tropical zones of China. The present study analyzed the effects of lysine succinylation on the growth, development, and physiology of paper mulberry. RESULTS A total of 2097 lysine succinylation sites were identified in 935 proteins associated with the citric acid cycle (TCA cycle), glyoxylic acid and dicarboxylic acid metabolism, ribosomes and oxidative phosphorylation; these pathways play a role in carbon fixation in photosynthetic organisms and may be regulated by lysine succinylation. The modified proteins were distributed in multiple subcellular compartments and were involved in a wide variety of biological processes, such as photosynthesis and the Calvin-Benson cycle. CONCLUSION Lysine-succinylated proteins may play key regulatory roles in metabolism, primarily in photosynthesis and oxidative phosphorylation, as well as in many other cellular processes. In addition to the large number of succinylated proteins associated with photosynthesis and oxidative phosphorylation, some proteins associated with the TCA cycle are succinylated. Our study can serve as a reference for further proteomics studies of the downstream effects of succinylation on the physiology and biochemistry of paper mulberry.
Collapse
Affiliation(s)
- Yibo Dong
- College of Animal Science, Guizhou university, Guiyang, 550025, Guizhou, China
- Department of Plant Protection, Institute of Crop Protection, College of Agriculture, Guizhou University, Guiyang, 550025, Guizhou, China
| | - Ping Li
- Institute of Grassland Research, Sichuan Academy of Grassland Science, Chengdu, 610000, Sichuan, China
| | - Ping Li
- College of Animal Science, Guizhou university, Guiyang, 550025, Guizhou, China
| | - Chao Chen
- College of Animal Science, Guizhou university, Guiyang, 550025, Guizhou, China.
| |
Collapse
|
17
|
Islam MM, Alam MJ, Ahmed FF, Hasan MM, Mollah MNH. Improved Prediction of Protein-Protein Interaction Mapping on Homo Sapiens by Using Amino Acid Sequence Features in a Supervised Learning Framework. Protein Pept Lett 2021; 28:74-83. [PMID: 32520672 DOI: 10.2174/0929866527666200610141258] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 05/03/2020] [Accepted: 05/04/2020] [Indexed: 02/07/2023]
Abstract
BACKGROUND Protein-Protein Interaction (PPI) has emerged as a key role in the control of many biological processes including protein function, disease incidence, and therapy design. However, the identification of PPI by wet lab experiment is a challenging task, since it is laborious, time consuming and expensive. Therefore, computational prediction of PPI is now given emphasis before going to the experimental validation, since it is simultaneously less laborious, time saver and cost minimizer. OBJECTIVE The objective of this study is to develop an improved computational method for PPI prediction mapping on Homo sapiens by using the amino acid sequence features in a supervised learning framework. METHODS The experimentally validated 91 positive-PPI pairs of human protein sequences were collected from IntAct Molecular Interaction Database. Then we constructed three balanced datasets with ratios 1:1, 1:2 and 1:3 of positive and negative PPI samples. Then we partitioned each dataset into training (80%) and independent test (20%) datasets. Again each training dataset was partitioned into four mutually exclusive groups of equal sizes for interchanging each group with independent test group to perform 5-fold cross validation (CV). Then we trained candidate seven classifiers (NN, SVM, LR, NB, KNN, AB and RF) with each ratio case to obtain the better PPI predictor by comparing their performance scores. RESULTS The random forest (RF) based predictor that was trained with 1:2 ratio of positive-PPI and negative-PPI samples based on AAC encoding features provided the most accurate PPI prediction by producing the highest average performance scores of accuracy (93.50%), sensitivity (95.0%), MCC (85.2%), AUC (0.941) and pAUC (0.236) with the 5-fold cross-validation. It also achieved the highest average performance scores of accuracy (92.0%), sensitivity (94.0%), MCC (83.6%), AUC (0.922) and pAUC (0.207) with the independent test datasets in a comparison of the other candidate and existing predictors. CONCLUSION The final resultant prediction strongly recommend that the RF based predictor is a better prediction model of PPI mapping on Homo sapiens.
Collapse
Affiliation(s)
- Md Merajul Islam
- Bioinformatics Laboratory, Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh
| | - Md Jahangir Alam
- Bioinformatics Laboratory, Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh
| | - Fee Faysal Ahmed
- Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Md Mehedi Hasan
- Deptartment of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
| | - Md Nurul Haque Mollah
- Bioinformatics Laboratory, Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh
| |
Collapse
|
18
|
Auliah FN, Nilamyani AN, Shoombuatong W, Alam MA, Hasan MM, Kurata H. PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations. Int J Mol Sci 2021; 22:ijms22042120. [PMID: 33672741 PMCID: PMC7924619 DOI: 10.3390/ijms22042120] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 02/12/2021] [Accepted: 02/18/2021] [Indexed: 12/30/2022] Open
Abstract
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.
Collapse
Affiliation(s)
- Firda Nurul Auliah
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Andi Nur Nilamyani
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA;
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Correspondence:
| |
Collapse
|
19
|
Tasmia SA, Ahmed FF, Mosharaf P, Hasan M, Mollah NH. An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier. Curr Genomics 2021; 22:122-136. [PMID: 34220299 PMCID: PMC8188582 DOI: 10.2174/1389202922666210219114211] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 12/13/2020] [Accepted: 01/06/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development. METHODS In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of k-spaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources. RESULTS The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models. CONCLUSION The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population.
Collapse
Affiliation(s)
| | | | | | | | - Nurul Haque Mollah
- Address correspondence to this author at the Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh; E-mail:
| |
Collapse
|
20
|
Hasan MM, Alam MA, Shoombuatong W, Kurata H. IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations. J Comput Aided Mol Des 2021; 35:315-323. [PMID: 33392948 DOI: 10.1007/s10822-020-00368-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 12/06/2020] [Indexed: 12/11/2022]
Abstract
Redox-sensitive cysteine (RSC) thiol contributes to many biological processes. The identification of RSC plays an important role in clarifying some mechanisms of redox-sensitive factors; nonetheless, experimental investigation of RSCs is expensive and time-consuming. The computational approaches that quickly and accurately identify candidate RSCs using the sequence information are urgently needed. Herein, an improved and robust computational predictor named IRC-Fuse was developed to identify the RSC by fusing of multiple feature representations. To enhance the performance of our model, we integrated the probability scores evaluated by the random forest models implementing different encoding schemes. Cross-validation results exhibited that the IRC-Fuse achieved accuracy and AUC of 0.741 and 0.807, respectively. The IRC-Fuse outperformed exiting methods with improvement of 10% and 13% on accuracy and MCC, respectively, over independent test data. Comparative analysis suggested that the IRC-Fuse was more effective and promising than the existing predictors. For the convenience of experimental scientists, the IRC-Fuse online web server was implemented and publicly accessible at http://kurata14.bio.kyutech.ac.jp/IRC-Fuse/ .
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan. .,Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo, 102-0083, Japan.
| | - Md Ashad Alam
- Tulane Center of Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
| |
Collapse
|
21
|
Hasan MM, Khatun MS, Kurata H. iLBE for Computational Identification of Linear B-cell Epitopes by Integrating Sequence and Evolutionary Features. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:593-600. [PMID: 33099033 PMCID: PMC8377379 DOI: 10.1016/j.gpb.2019.04.004] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 01/13/2019] [Accepted: 04/19/2019] [Indexed: 12/17/2022]
Abstract
Linear B-cell epitopes are critically important for immunological applications, such as vaccine design, immunodiagnostic test, and antibody production, as well as disease diagnosis and therapy. The accurate identification of linear B-cell epitopes remains challenging despite several decades of research. In this work, we have developed a novel predictor, Identification of Linear B-cell Epitope (iLBE), by integrating evolutionary and sequence-based features. The successive feature vectors were optimized by a Wilcoxon-rank sum test. Then the random forest (RF) algorithm using the optimal consecutive feature vectors was applied to predict linear B-cell epitopes. We combined the RF scores by the logistic regression to enhance the prediction accuracy. iLBE yielded an area under curve score of 0.809 on the training dataset and outperformed other prediction models on a comprehensive independent dataset. iLBE is a powerful computational tool to identify the linear B-cell epitopes and would help to develop penetrating diagnostic tests. A web application with curated datasets for iLBE is freely accessible at http://kurata14.bio.kyutech.ac.jp/iLBE/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
22
|
Khatun MS, Hasan MM, Shoombuatong W, Kurata H. ProIn-Fuse: improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations. J Comput Aided Mol Des 2020; 34:1229-1236. [DOI: 10.1007/s10822-020-00343-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 09/16/2020] [Indexed: 12/11/2022]
|
23
|
Khatun MS, Shoombuatong W, Hasan MM, Kurata H. Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction. Curr Genomics 2020; 21:454-463. [PMID: 33093807 PMCID: PMC7536797 DOI: 10.2174/1389202921999200625103936] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 03/19/2020] [Accepted: 05/27/2020] [Indexed: 12/22/2022] Open
Abstract
Protein-protein interactions (PPIs) are the physical connections between two or more proteins via electrostatic forces or hydrophobic effects. Identification of the PPIs is pivotal, which contributes to many biological processes including protein function, disease incidence, and therapy design. The experimental identification of PPIs via high-throughput technology is time-consuming and expensive. Bioinformatics approaches are expected to solve such restrictions. In this review, our main goal is to provide an inclusive view of the existing sequence-based computational prediction of PPIs. Initially, we briefly introduce the currently available PPI databases and then review the state-of-the-art bioinformatics approaches, working principles, and their performances. Finally, we discuss the caveats and future perspective of the next generation algorithms for the prediction of PPIs.
Collapse
Affiliation(s)
| | | | - Md. Mehedi Hasan
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan; Tel: +81-948-297-828; E-mail: and Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| | - Hiroyuki Kurata
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan; Tel: +81-948-297-828; E-mail: and Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| |
Collapse
|
24
|
HybridSucc: A Hybrid-learning Architecture for General and Species-specific Succinylation Site Prediction. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:194-207. [PMID: 32861878 PMCID: PMC7647696 DOI: 10.1016/j.gpb.2019.11.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 09/17/2019] [Accepted: 11/13/2019] [Indexed: 11/21/2022]
Abstract
As an important protein acylation modification, lysine succinylation (Ksucc) is involved in diverse biological processes, and participates in human tumorigenesis. Here, we collected 26,243 non-redundant known Ksucc sites from 13 species as the benchmark data set, combined 10 types of informative features, and implemented a hybrid-learning architecture by integrating deep-learning and conventional machine-learning algorithms into a single framework. We constructed a new tool named HybridSucc, which achieved area under curve (AUC) values of 0.885 and 0.952 for general and human-specific prediction of Ksucc sites, respectively. In comparison, the accuracy of HybridSucc was 17.84%–50.62% better than that of other existing tools. Using HybridSucc, we conducted a proteome-wide prediction and prioritized 370 cancer mutations that change Ksucc states of 218 important proteins, including PKM2, SHMT2, and IDH2. We not only developed a high-profile tool for predicting Ksucc sites, but also generated useful candidates for further experimental consideration. The online service of HybridSucc can be freely accessed for academic research at http://hybridsucc.biocuckoo.org/.
Collapse
|
25
|
Thapa N, Chaudhari M, McManus S, Roy K, Newman RH, Saigo H, Kc DB. DeepSuccinylSite: a deep learning based approach for protein succinylation site prediction. BMC Bioinformatics 2020; 21:63. [PMID: 32321437 PMCID: PMC7178942 DOI: 10.1186/s12859-020-3342-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 01/08/2020] [Indexed: 01/15/2023] Open
Abstract
Background Protein succinylation has recently emerged as an important and common post-translation modification (PTM) that occurs on lysine residues. Succinylation is notable both in its size (e.g., at 100 Da, it is one of the larger chemical PTMs) and in its ability to modify the net charge of the modified lysine residue from + 1 to − 1 at physiological pH. The gross local changes that occur in proteins upon succinylation have been shown to correspond with changes in gene activity and to be perturbed by defects in the citric acid cycle. These observations, together with the fact that succinate is generated as a metabolic intermediate during cellular respiration, have led to suggestions that protein succinylation may play a role in the interaction between cellular metabolism and important cellular functions. For instance, succinylation likely represents an important aspect of genomic regulation and repair and may have important consequences in the etiology of a number of disease states. In this study, we developed DeepSuccinylSite, a novel prediction tool that uses deep learning methodology along with embedding to identify succinylation sites in proteins based on their primary structure. Results Using an independent test set of experimentally identified succinylation sites, our method achieved efficiency scores of 79%, 68.7% and 0.48 for sensitivity, specificity and MCC respectively, with an area under the receiver operator characteristic (ROC) curve of 0.8. In side-by-side comparisons with previously described succinylation predictors, DeepSuccinylSite represents a significant improvement in overall accuracy for prediction of succinylation sites. Conclusion Together, these results suggest that our method represents a robust and complementary technique for advanced exploration of protein succinylation.
Collapse
Affiliation(s)
- Niraj Thapa
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Meenal Chaudhari
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Sean McManus
- Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA
| | - Kaushik Roy
- Department of Computer Science, North Carolina A&T State University, Greensboro, NC, USA
| | - Robert H Newman
- Department of Biology, North Carolina A&T State University, Greensboro, NC, USA
| | - Hiroto Saigo
- Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
| | - Dukka B Kc
- Electrical Engineering and Computer Science Department, Wichita State University, Wichita, KS, USA.
| |
Collapse
|
26
|
Rashid MM, Shatabda S, Hasan MM, Kurata H. Recent Development of Machine Learning Methods in Microbial Phosphorylation Sites. Curr Genomics 2020; 21:194-203. [PMID: 33071613 PMCID: PMC7521030 DOI: 10.2174/1389202921666200427210833] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 01/10/2023] Open
Abstract
A variety of protein post-translational modifications has been identified that control many cellular functions. Phosphorylation studies in mycobacterial organisms have shown critical importance in diverse biological processes, such as intercellular communication and cell division. Recent technical advances in high-precision mass spectrometry have determined a large number of microbial phosphorylated proteins and phosphorylation sites throughout the proteome analysis. Identification of phosphorylated proteins with specific modified residues through experimentation is often labor-intensive, costly and time-consuming. All these limitations could be overcome through the application of machine learning (ML) approaches. However, only a limited number of computational phosphorylation site prediction tools have been developed so far. This work aims to present a complete survey of the existing ML-predictors for microbial phosphorylation. We cover a variety of important aspects for developing a successful predictor, including operating ML algorithms, feature selection methods, window size, and software utility. Initially, we review the currently available phosphorylation site databases of the microbiome, the state-of-the-art ML approaches, working principles, and their performances. Lastly, we discuss the limitations and future directions of the computational ML methods for the prediction of phosphorylation.
Collapse
Affiliation(s)
| | | | - Md. Mehedi Hasan
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828;, E-mail: and Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| | - Hiroyuki Kurata
- Address correspondence to these authors at the Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828;, E-mail: and Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; Tel: +81-948-297-828; E-mail:
| |
Collapse
|
27
|
Mosharaf MP, Hassan MM, Ahmed FF, Khatun MS, Moni MA, Mollah MNH. Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana. Comput Biol Chem 2020; 85:107238. [DOI: 10.1016/j.compbiolchem.2020.107238] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 01/22/2020] [Accepted: 02/18/2020] [Indexed: 02/06/2023]
|
28
|
Zhu Y, Jia C, Li F, Song J. Inspector: a lysine succinylation predictor based on edited nearest-neighbor undersampling and adaptive synthetic oversampling. Anal Biochem 2020; 593:113592. [DOI: 10.1016/j.ab.2020.113592] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/14/2020] [Accepted: 01/17/2020] [Indexed: 12/13/2022]
|
29
|
Chen Z, Liu X, Li F, Li C, Marquez-Lago T, Leier A, Akutsu T, Webb GI, Xu D, Smith AI, Li L, Chou KC, Song J. Large-scale comparative assessment of computational predictors for lysine post-translational modification sites. Brief Bioinform 2019; 20:2267-2290. [PMID: 30285084 PMCID: PMC6954452 DOI: 10.1093/bib/bby089] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Revised: 08/17/2018] [Accepted: 08/18/2018] [Indexed: 12/22/2022] Open
Abstract
Lysine post-translational modifications (PTMs) play a crucial role in regulating diverse functions and biological processes of proteins. However, because of the large volumes of sequencing data generated from genome-sequencing projects, systematic identification of different types of lysine PTM substrates and PTM sites in the entire proteome remains a major challenge. In recent years, a number of computational methods for lysine PTM identification have been developed. These methods show high diversity in their core algorithms, features extracted and feature selection techniques and evaluation strategies. There is therefore an urgent need to revisit these methods and summarize their methodologies, to improve and further develop computational techniques to identify and characterize lysine PTMs from the large amounts of sequence data. With this goal in mind, we first provide a comprehensive survey on a large collection of 49 state-of-the-art approaches for lysine PTM prediction. We cover a variety of important aspects that are crucial for the development of successful predictors, including operating algorithms, sequence and structural features, feature selection, model performance evaluation and software utility. We further provide our thoughts on potential strategies to improve the model performance. Second, in order to examine the feasibility of using deep learning for lysine PTM prediction, we propose a novel computational framework, termed MUscADEL (Multiple Scalable Accurate Deep Learner for lysine PTMs), using deep, bidirectional, long short-term memory recurrent neural networks for accurate and systematic mapping of eight major types of lysine PTMs in the human and mouse proteomes. Extensive benchmarking tests show that MUscADEL outperforms current methods for lysine PTM characterization, demonstrating the potential and power of deep learning techniques in protein PTM prediction. The web server of MUscADEL, together with all the data sets assembled in this study, is freely available at http://muscadel.erc.monash.edu/. We anticipate this comprehensive review and the application of deep learning will provide practical guide and useful insights into PTM prediction and inspire future bioinformatics studies in the related fields.
Collapse
Affiliation(s)
- Zhen Chen
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Xuhan Liu
- Medicinal Chemistry, Leiden Academic Centre for Drug Research,Einsteinweg, Leiden, The Netherlands
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Chen Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- Institute of Molecular Systems Biology, ETH Zürich,Auguste-Piccard-Hof, Zürich, Switzerland
| | - Tatiana Marquez-Lago
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - André Leier
- Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA
- Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research,Kyoto University, Uji, Kyoto, Japan
| | - Geoffrey I Webb
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| | - Dakang Xu
- Faculty of Medical Laboratory Science, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
- Department of Molecular and Translational Science, Faculty of Medicine, Hudson Institute of Medical Research, Monash University, Melbourne, VIC, Australia
| | - Alexander Ian Smith
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
| | - Lei Li
- School of Basic Medical Science, Qingdao University, Dengzhou Road, Qingdao, Shandong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC, Australia
- ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, Melbourne, VIC, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC, Australia
| |
Collapse
|
30
|
Huang KY, Hsu JBK, Lee TY. Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method. Sci Rep 2019; 9:16175. [PMID: 31700141 PMCID: PMC6838336 DOI: 10.1038/s41598-019-52552-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/18/2019] [Indexed: 12/14/2022] Open
Abstract
Succinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu city, 300, Taiwan
| | - Justin Bo-Kai Hsu
- Department of Medical Research, Taipei Medical University Hospital, Taipei city, 110, Taiwan
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China. .,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, 518172, China.
| |
Collapse
|
31
|
Abstract
Protein methylation is an important and reversible post-translational modification
that regulates many biological processes in cells. It occurs mainly on lysine and arginine
residues and involves many important biological processes, including transcriptional
activity, signal transduction, and the regulation of gene expression. Protein methylation
and its regulatory enzymes are related to a variety of human diseases, so improved identification
of methylation sites is useful for designing drugs for a variety of related diseases.
In this review, we systematically summarize and analyze the tools used for the prediction
of protein methylation sites on arginine and lysine residues over the last decade.
Collapse
Affiliation(s)
- Chunyan Ao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shunshan Jin
- Department of Neurology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Yuan Lin
- Department of System Integration, Sparebanken Vest, Bergen, Norway
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
32
|
Khatun S, Hasan M, Kurata H. Efficient computational model for identification of antitubercular peptides by integrating amino acid patterns and properties. FEBS Lett 2019; 593:3029-3039. [PMID: 31297788 DOI: 10.1002/1873-3468.13536] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/25/2019] [Accepted: 07/05/2019] [Indexed: 12/30/2022]
Abstract
Tuberculosis (TB) is a leading killer caused by Mycobacterium tuberculosis. Recently, anti-TB peptides have provided an alternative approach to combat antibiotic tolerance. We have developed an effective computational predictor, identification of antitubercular peptides (iAntiTB), by the integration of multiple feature vectors deriving from the amino acid sequences via random forest (RF) and support vector machine (SVM) classifiers. The iAntiTB combines the RF and SVM scores via linear regression to enhance the prediction accuracy. To make a robust and accurate predictor, we prepared the two datasets with different types of negative samples. The iAntiTB achieved area under the ROC curve values of 0.896 and 0.946 on the training datasets of the first and second datasets, respectively. The iAntiTB outperformed the other existing predictors.
Collapse
Affiliation(s)
- Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| | - Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan.,Biomedical Informatics R&D Center, Kyushu Institute of Technology, Iizuka, Fukuoka, Japan
| |
Collapse
|
33
|
Hasan MM, Rashid MM, Khatun MS, Kurata H. Computational identification of microbial phosphorylation sites by the enhanced characteristics of sequence information. Sci Rep 2019; 9:8258. [PMID: 31164681 PMCID: PMC6547684 DOI: 10.1038/s41598-019-44548-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 05/20/2019] [Indexed: 11/30/2022] Open
Abstract
Protein phosphorylation on serine (S) and threonine (T) has emerged as a key device in the control of many biological processes. Recently phosphorylation in microbial organisms has attracted much attention for its critical roles in various cellular processes such as cell growth and cell division. Here a novel machine learning predictor, MPSite (Microbial Phosphorylation Site predictor), was developed to identify microbial phosphorylation sites using the enhanced characteristics of sequence features. The final feature vectors optimized via a Wilcoxon rank sum test. A random forest classifier was then trained using the optimum features to build the predictor. Benchmarking investigation using the 5-fold cross-validation and independent datasets test showed that the MPSite is able to achieve robust performance on the S- and T-phosphorylation site prediction. It also outperformed other existing methods on the comprehensive independent datasets. We anticipate that the MPSite is a powerful tool for proteome-wide prediction of microbial phosphorylation sites and facilitates hypothesis-driven functional interrogation of phosphorylation proteins. A web application with the curated datasets is freely available at http://kurata14.bio.kyutech.ac.jp/MPSite/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Md Mamunur Rashid
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan. .,Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502, Japan.
| |
Collapse
|
34
|
Ning Q, Ma Z, Zhao X. dForml(KNN)-PseAAC: Detecting formylation sites from protein sequences using K-nearest neighbor algorithm via Chou's 5-step rule and pseudo components. J Theor Biol 2019; 470:43-49. [DOI: 10.1016/j.jtbi.2019.03.011] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 03/09/2019] [Accepted: 03/13/2019] [Indexed: 10/27/2022]
|
35
|
Khatun MS, Hasan MM, Kurata H. PreAIP: Computational Prediction of Anti-inflammatory Peptides by Integrating Multiple Complementary Features. Front Genet 2019; 10:129. [PMID: 30891059 PMCID: PMC6411759 DOI: 10.3389/fgene.2019.00129] [Citation(s) in RCA: 76] [Impact Index Per Article: 15.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 02/06/2019] [Indexed: 12/31/2022] Open
Abstract
Numerous inflammatory diseases and autoimmune disorders by therapeutic peptides have received substantial consideration; however, the exploration of anti-inflammatory peptides via biological experiments is often a time-consuming and expensive task. The development of novel in silico predictors is desired to classify potential anti-inflammatory peptides prior to in vitro investigation. Herein, an accurate predictor, called PreAIP (Predictor of Anti-Inflammatory Peptides) was developed by integrating multiple complementary features. We systematically investigated different types of features including primary sequence, evolutionary and structural information through a random forest classifier. The final PreAIP model achieved an AUC value of 0.833 in the training dataset via 10-fold cross-validation test, which was better than that of existing models. Moreover, we assessed the performance of the PreAIP with an AUC value of 0.840 on a test dataset to demonstrate that the proposed method outperformed the two existing methods. These results indicated that the PreAIP is an accurate predictor for identifying AIPs and contributes to the development of AIPs therapeutics and biomedical research. The curated datasets and the PreAIP are freely available at http://kurata14.bio.kyutech.ac.jp/PreAIP/.
Collapse
Affiliation(s)
- Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan.,Biomedical Informatics R&D Center, Kyushu Institute of Technology, Fukuoka, Japan
| |
Collapse
|
36
|
Hasan MM, Khatun MS, Kurata H. Large-Scale Assessment of Bioinformatics Tools for Lysine Succinylation Sites. Cells 2019; 8:cells8020095. [PMID: 30696115 PMCID: PMC6406724 DOI: 10.3390/cells8020095] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 01/24/2019] [Accepted: 01/24/2019] [Indexed: 12/19/2022] Open
Abstract
Lysine succinylation is a form of posttranslational modification of the proteins that play an essential functional role in every aspect of cell metabolism in both prokaryotes and eukaryotes. Aside from experimental identification of succinylation sites, there has been an intense effort geared towards the development of sequence-based prediction through machine learning, due to its promising and essential properties of being highly accurate, robust and cost-effective. In spite of these advantages, there are several problems that are in need of attention in the design and development of succinylation site predictors. Notwithstanding of many studies on the employment of machine learning approaches, few articles have examined this bioinformatics field in a systematic manner. Thus, we review the advancements regarding the current state-of-the-art prediction models, datasets, and online resources and illustrate the challenges and limitations to present a useful guideline for developing powerful succinylation site prediction tools.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
- Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
37
|
Hasan MM, Manavalan B, Khatun MS, Kurata H. Prediction of S-nitrosylation sites by integrating support vector machines and random forest. Mol Omics 2019; 15:451-458. [DOI: 10.1039/c9mo00098d] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Cysteine S-nitrosylation is a type of reversible post-translational modification of proteins, which controls diverse biological processes.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Japan Society for the Promotion of Science
| | | | - Mst. Shamima Khatun
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Biomedical Informatics R&D Center
| |
Collapse
|
38
|
Hasan MM, Kurata H. GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features. PLoS One 2018; 13:e0200283. [PMID: 30312302 PMCID: PMC6193575 DOI: 10.1371/journal.pone.0200283] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 06/22/2018] [Indexed: 01/09/2023] Open
Abstract
Lysine succinylation is one of the dominant post-translational modification of the protein that contributes to many biological processes including cell cycle, growth and signal transduction pathways. Identification of succinylation sites is an important step for understanding the function of proteins. The complicated sequence patterns of protein succinylation revealed by proteomic studies highlight the necessity of developing effective species-specific in silico strategies for global prediction succinylation sites. Here we have developed the generic and nine species-specific succinylation site classifiers through aggregating multiple complementary features. We optimized the consecutive features using the Wilcoxon-rank feature selection scheme. The final feature vectors were trained by a random forest (RF) classifier. With an integration of RF scores via logistic regression, the resulting predictor termed GPSuc achieved better performance than other existing generic and species-specific succinylation site predictors. To reveal the mechanism of succinylation and assist hypothesis-driven experimental design, our predictor serves as a valuable resource. To provide a promising performance in large-scale datasets, a web application was developed at http://kurata14.bio.kyutech.ac.jp/GPSuc/.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
- Biomedi Informatics R&D Center, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
- * E-mail:
| |
Collapse
|
39
|
Hasan MM, Khatun MS, Mollah MNH, Yong C, Dianjing G. NTyroSite: Computational Identification of Protein Nitrotyrosine Sites Using Sequence Evolutionary Features. Molecules 2018; 23:E1667. [PMID: 29987232 PMCID: PMC6099560 DOI: 10.3390/molecules23071667] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 06/28/2018] [Accepted: 06/28/2018] [Indexed: 02/06/2023] Open
Abstract
Nitrotyrosine is a product of tyrosine nitration mediated by reactive nitrogen species. As an indicator of cell damage and inflammation, protein nitrotyrosine serves to reveal biological change associated with various diseases or oxidative stress. Accurate identification of nitrotyrosine site provides the important foundation for further elucidating the mechanism of protein nitrotyrosination. However, experimental identification of nitrotyrosine sites through traditional methods are laborious and expensive. In silico prediction of nitrotyrosine sites based on protein sequence information are thus highly desired. Here, we report a novel predictor, NTyroSite, for accurate prediction of nitrotyrosine sites using sequence evolutionary information. The generated features were optimized using a Wilcoxon-rank sum test. A random forest classifier was then trained using these features to build the predictor. The final NTyroSite predictor achieved an area under a receiver operating characteristics curve (AUC) score of 0.904 in a 10-fold cross-validation test. It also significantly outperformed other existing implementations in an independent test. Meanwhile, for a better understanding of our prediction model, the predominant rules and informative features were extracted from the NTyroSite model to explain the prediction results. We expect that the NTyroSite predictor may serve as a useful computational resource for high-throughput nitrotyrosine site prediction. The online interface of the software is publicly available at https://biocomputer.bio.cuhk.edu.hk/NTyroSite/.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- School of Life Sciences and the State Key Lab of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| | - Mst Shamima Khatun
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Nurul Haque Mollah
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Cao Yong
- Department of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen 518000, China.
| | - Guo Dianjing
- School of Life Sciences and the State Key Lab of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong.
| |
Collapse
|
40
|
Hasan MM, Guo D, Kurata H. Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. MOLECULAR BIOSYSTEMS 2017; 13:2545-2550. [DOI: 10.1039/c7mb00491e] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Cysteine S-sulfenylation is a major type of posttranslational modification that contributes to protein structure and function regulation in many cellular processes.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
| | - Dianjing Guo
- School of Life Sciences and the State Key Lab of Agrobiotechnology
- The Chinese University of Hong Kong
- Shatin
- Hong Kong
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics
- Kyushu Institute of Technology
- Iizuka
- Japan
- Biomedical Informatics R&D Center
| |
Collapse
|