1
|
Kumari S, Gupta R, Ambasta RK, Kumar P. Emerging trends in post-translational modification: Shedding light on Glioblastoma multiforme. Biochim Biophys Acta Rev Cancer 2023; 1878:188999. [PMID: 37858622 DOI: 10.1016/j.bbcan.2023.188999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
Recent multi-omics studies, including proteomics, transcriptomics, genomics, and metabolomics have revealed the critical role of post-translational modifications (PTMs) in the progression and pathogenesis of Glioblastoma multiforme (GBM). Further, PTMs alter the oncogenic signaling events and offer a novel avenue in GBM therapeutics research through PTM enzymes as potential biomarkers for drug targeting. In addition, PTMs are critical regulators of chromatin architecture, gene expression, and tumor microenvironment (TME), that play a crucial function in tumorigenesis. Moreover, the implementation of artificial intelligence and machine learning algorithms enhances GBM therapeutics research through the identification of novel PTM enzymes and residues. Herein, we briefly explain the mechanism of protein modifications in GBM etiology, and in altering the biologics of GBM cells through chromatin remodeling, modulation of the TME, and signaling pathways. In addition, we highlighted the importance of PTM enzymes as therapeutic biomarkers and the role of artificial intelligence and machine learning in protein PTM prediction.
Collapse
Affiliation(s)
- Smita Kumari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; School of Medicine, University of South Carolina, Columbia, SC, United States of America
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India.
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India.
| |
Collapse
|
2
|
Gao W, Xu D, Li H, Du J, Wang G, Li D. Identification of adaptor proteins by incorporating deep learning and PSSM profiles. Methods 2023; 209:10-17. [PMID: 36427763 DOI: 10.1016/j.ymeth.2022.11.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 10/25/2022] [Accepted: 11/02/2022] [Indexed: 11/23/2022] Open
Abstract
Adaptor proteins, also known as signal transduction adaptor proteins, are important proteins in signal transduction pathways, and play a role in connecting signal proteins for signal transduction between cells. Studies have shown that adaptor proteins are closely related to some diseases, such as tumors and diabetes. Therefore, it is very meaningful to construct a relevant model to accurately identify adaptor proteins. In recent years, many studies have used a position-specific scoring matrix (PSSM) and neural network methods to identify adaptor proteins. However, ordinary neural network models cannot correlate the contextual information in PSSM profiles well, so these studies usually process 20×N (N > 20) PSSM into 20×20 dimensions, which results in the loss of a large amount of protein information; This research proposes an efficient method that combines one-dimensional convolution (1-D CNN) and a bidirectional long short-term memory network (biLSTM) to identify adaptor proteins. The complete PSSM profiles are the input of the model, and the complete information of the protein is retained during the training process. We perform cross-validation during model training and test the performance of the model on an independent test set; in the data set with 1224 adaptor proteins and 11,078 non-adaptor proteins, five indicators including specificity, sensitivity, accuracy, area under the receiver operating characteristic curve (AUC) metric and Matthews correlation coefficient (MCC), were employed to evaluate model performance. On the independent test set, the specificity, sensitivity, accuracy and MCC were 0.817, 0.865, 0.823 and 0.465, respectively. Those results show that our method is better than the state-of-the art methods. This study is committed to improve the accuracy of adaptor protein identification, and laid a foundation for further research on diseases related to adaptor protein. This research provided a new idea for the application of deep learning related models in bioinformatics and computational biology.
Collapse
Affiliation(s)
- Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China
| | - Dali Xu
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China
| | - Hongfei Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China
| | - Junping Du
- Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China.
| | - Dan Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China.
| |
Collapse
|
3
|
Sohrawordi M, Hossain MA, Hasan MAM. PLP_FS: prediction of lysine phosphoglycerylation sites in protein using support vector machine and fusion of multiple F_Score feature selection. Brief Bioinform 2022; 23:6655632. [PMID: 35929355 DOI: 10.1093/bib/bbac306] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 11/14/2022] Open
Abstract
A newly invented post-translational modification (PTM), phosphoglycerylation, has shown its essential role in the construction and functional properties of proteins and dangerous human diseases. Hence, it is very urgent to know about the molecular mechanism behind the phosphoglycerylation process to develop the drugs for related diseases. But accurately identifying of phosphoglycerylation site from a protein sequence in a laboratory is a very difficult and challenging task. Hence, the construction of an efficient computation model is greatly sought for this purpose. A little number of computational models are currently available for identifying the phosphoglycerylation sites, which are not able to reach their prediction capability at a satisfactory level. Therefore, an effective predictor named PLP_FS has been designed and constructed to identify phosphoglycerylation sites in this study. For the training purpose, an optimal number of feature sets was obtained by fusion of multiple F_Score feature selection techniques from the features generated by three types of sequence-based feature extraction methods and fitted with the support vector machine classification technique to the prediction model. On the other hand, the k-neighbor near cleaning and SMOTE methods were also implemented to balance the benchmark dataset. The suggested model in 10-fold cross-validation obtained an accuracy of 99.22%, a sensitivity of 98.17% and a specificity of 99.75% according to the experimental findings, which are better than other currently available predictors for accurately identifying the phosphoglycerylation sites.
Collapse
Affiliation(s)
- Md Sohrawordi
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
- Dept. of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Md Ali Hossain
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Al Mehedi Hasan
- Dept. of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| |
Collapse
|
4
|
Lu W, Shen J, Zhang Y, Wu H, Qian Y, Chen X, Fu Q. Identifying Membrane Protein Types Based on Lifelong Learning With Dynamically Scalable Networks. Front Genet 2022; 12:834488. [PMID: 35371189 PMCID: PMC8964460 DOI: 10.3389/fgene.2021.834488] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 12/21/2021] [Indexed: 11/13/2022] Open
Abstract
Membrane proteins are an essential part of the body's ability to maintain normal life activities. Further research into membrane proteins, which are present in all aspects of life science research, will help to advance the development of cells and drugs. The current methods for predicting proteins are usually based on machine learning, but further improvements in prediction effectiveness and accuracy are needed. In this paper, we propose a dynamic deep network architecture based on lifelong learning in order to use computers to classify membrane proteins more effectively. The model extends the application area of lifelong learning and provides new ideas for multiple classification problems in bioinformatics. To demonstrate the performance of our model, we conducted experiments on top of two datasets and compared them with other classification methods. The results show that our model achieves high accuracy (95.3 and 93.5%) on benchmark datasets and is more effective compared to other methods.
Collapse
Affiliation(s)
- Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China.,Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou University of Science and Technology, Suzhou, China.,Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, China
| | - Jiawei Shen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Yu Zhang
- Suzhou Industrial Park Institute of Services Outsourcing, Suzhou, China
| | - Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China.,Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology, Suzhou University of Science and Technology, Suzhou, China
| | - Yuqing Qian
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Xiaoyi Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| |
Collapse
|
5
|
Ning Q, Ma Z, Zhao X, Yin M. SSKM_Succ: A Novel Succinylation Sites Prediction Method Incorporating K-Means Clustering With a New Semi-Supervised Learning Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:643-652. [PMID: 32750881 DOI: 10.1109/tcbb.2020.3006144] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Protein succinylation is a type of post-translational modification (PTM) that occurs on lysine sites and plays a key role in protein conformation regulation and cellular function control. When training in computational method, it is difficult to designate negative samples because of the uncertainty of non-succinylation lysine sites, and if not handled properly, it may affect the performance of computational models dramatically. Therefore, we propose a new semi-supervised learning method to identify reliable non-succinylation lysine sites as negative samples. This method, named SSKM_Succ, also employs K-means clustering to divide data into 5 clusters. Besides, information of proximal PTMs and three kinds of sequence features (grey pseudo amino acid composition, K-space and position-special amino acid propensity) are utilized to formulate protein. Then, we perform a two-step feature selection to remove redundant features and construct the optimization model for each cluster. Finally, support vector machine is applied to construct a prediction model for each cluster. Promising results are obtained by this method with an accuracy of 80.18 percent for succinylation sites on the independent testing dataset. Meanwhile, we compare the result with other existing tools, and it shows that our method is promising for predicting succinylation sites. Through analysis, we further verify that succinylated protein has potential effects on amino acid degradation and fatty acid metabolism, and speculate that protein succinylation may be closely related to neurodegenerative diseases. The code of SSKM_Succ is available on the web https://github.com/yangyq505/SSKM_Succ.git.
Collapse
|
6
|
RAM-PGK: Prediction of Lysine Phosphoglycerylation Based on Residue Adjacency Matrix. Genes (Basel) 2020; 11:genes11121524. [PMID: 33419274 PMCID: PMC7766696 DOI: 10.3390/genes11121524] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 12/15/2020] [Accepted: 12/16/2020] [Indexed: 11/29/2022] Open
Abstract
Background: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. Results: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. Conclusion: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.
Collapse
|
7
|
Huang KY, Hung FY, Kao HJ, Lau HH, Weng SL. iDPGK: characterization and identification of lysine phosphoglycerylation sites based on sequence-based features. BMC Bioinformatics 2020; 21:568. [PMID: 33297954 PMCID: PMC7727188 DOI: 10.1186/s12859-020-03916-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Accepted: 11/30/2020] [Indexed: 11/24/2022] Open
Abstract
Background Protein phosphoglycerylation, the addition of a 1,3-bisphosphoglyceric acid (1,3-BPG) to a lysine residue of a protein and thus to form a 3-phosphoglyceryl-lysine, is a reversible and non-enzymatic post-translational modification (PTM) and plays a regulatory role in glucose metabolism and glycolytic process. As the number of experimentally verified phosphoglycerylated sites has increased significantly, statistical or machine learning methods are imperative for investigating the characteristics of phosphoglycerylation sites. Currently, research into phosphoglycerylation is very limited, and only a few resources are available for the computational identification of phosphoglycerylation sites.
Result We present a bioinformatics investigation of phosphoglycerylation sites based on sequence-based features. The TwoSampleLogo analysis reveals that the regions surrounding the phosphoglycerylation sites contain a high relatively of positively charged amino acids, especially in the upstream flanking region. Additionally, the non-polar and aliphatic amino acids are more abundant surrounding phosphoglycerylated lysine following the results of PTM-Logo, which may play a functional role in discriminating between phosphoglycerylation and non-phosphoglycerylation sites. Many types of features were adopted to build the prediction model on the training dataset, including amino acid composition, amino acid pair composition, positional weighted matrix and position-specific scoring matrix. Further, to improve the predictive power, numerous top features ranked by F-score were considered as the final combination for classification, and thus the predictive models were trained using DT, RF and SVM classifiers. Evaluation by five-fold cross-validation showed that the selected features was most effective in discriminating between phosphoglycerylated and non-phosphoglycerylated sites. Conclusion The SVM model trained with the selected sequence-based features performed well, with a sensitivity of 77.5%, a specificity of 73.6%, an accuracy of 74.9%, and a Matthews Correlation Coefficient value of 0.49. Furthermore, the model also consistently provides the effective performance in independent testing set, yielding sensitivity of 75.7% and specificity of 64.9%. Finally, the model has been implemented as a web-based system, namely iDPGK, which is now freely available at http://mer.hc.mmh.org.tw/iDPGK/.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan.,Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan
| | - Fang-Yu Hung
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan
| | - Hui-Ju Kao
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan
| | - Hui-Hsuan Lau
- Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan. .,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan. .,Department of Obstetrics and Gynecology, Mackay Memorial Hospital, Taipei City 104, Taiwan.
| | - Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City 252, Taiwan. .,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu City 300, Taiwan. .,Mackay Junior College of Medicine, Medicine, Nursing and Management College, Taipei City 112, Taiwan.
| |
Collapse
|
8
|
Sharma R, Kumar S, Tsunoda T, Kumarevel T, Sharma A. Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles. Anal Biochem 2020; 612:113954. [PMID: 32946833 DOI: 10.1016/j.ab.2020.113954] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 08/26/2020] [Accepted: 09/10/2020] [Indexed: 10/23/2022]
Abstract
BACKGROUND DNA-binding proteins perform important roles in cellular processes and are involved in many biological activities. These proteins include crucial protein-DNA binding domains and can interact with single-stranded or double-stranded DNA, and accordingly classified as single-stranded DNA-binding proteins (SSBs) or double-stranded DNA-binding proteins (DSBs). Computational prediction of SSBs and DSBs helps in annotating protein functions and understanding of protein-binding domains. RESULTS Performance is reported using the DNA-binding protein dataset that was recently introduced by Wang et al., [1]. The proposed method achieved a sensitivity of 0.600, specificity of 0.792, AUC of 0.758, MCC of 0.369, accuracy of 0.744, and F-measure of 0.536, on the independent test set. CONCLUSION The proposed method with the hidden Markov model (HMM) profiles for feature extraction, outperformed the benchmark method in the literature and achieved an overall improvement of approximately 3%. The source code and supplementary information of the proposed method is available at https://github.com/roneshsharma/Predict-DNA-binding-proteins/wiki.
Collapse
Affiliation(s)
- Ronesh Sharma
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji.
| | - Shiu Kumar
- School of Electrical and Electronics Engineering, Fiji National University, Suva, Fiji.
| | - Tatsuhiko Tsunoda
- Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan; Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan; Laboratory of Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, 113-0033, Japan.
| | - Thirumananseri Kumarevel
- Laboratory for Transcription Structural Biology, RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan.
| | - Alok Sharma
- Laboratory of Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan; Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan; School of Engineering and Physics, The University of the South Pacific, Suva, Fiji; Institute for Integrated and Intelligent Systems, Griffith University, Nathan, Brisbane, QLD, Australia.
| |
Collapse
|
9
|
Identification of membrane protein types via multivariate information fusion with Hilbert–Schmidt Independence Criterion. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2019.11.103] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
López Y, Dehzangi A, Reddy HM, Sharma A. C-iSUMO: A sumoylation site predictor that incorporates intrinsic characteristics of amino acid sequences. Comput Biol Chem 2020; 87:107235. [PMID: 32604027 DOI: 10.1016/j.compbiolchem.2020.107235] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Revised: 12/16/2019] [Accepted: 02/12/2020] [Indexed: 12/13/2022]
Abstract
Post-translational modifications are considered important molecular interactions in protein science. One of these modifications is "sumoylation" whose computational detection has recently become a challenge. In this paper, we propose a new computational predictor which makes use of the sine and cosine of backbone torsion angles and the accessible surface area for predicting sumoylation sites. The aforementioned features were computed for all the proteins in our benchmark dataset, and a training matrix consisting of sumoylation and non-sumoylation sites was ultimately created. This training matrix was balanced by undersampling the majority class (non-sumoylation sites) using the NearMiss method. Finally, an AdaBoost classifier was used for discriminating between sumoylation and non-sumoylation sites. Our predictor was called "C-iSumo" because of its effective use of circular functions. C-iSumo was compared with another predictor which was outperformed in statistical metrics such as sensitivity (0.734), accuracy (0.746) and Matthews correlation coefficient (0.494).
Collapse
Affiliation(s)
- Yosvany López
- Genesis Institute of Genetic Research, Genesis Healthcare Co., Tokyo, Japan.
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, Maryland, USA
| | | | - Alok Sharma
- School of Engineering and Physics, University of the South Pacific, Suva, Fiji; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia.
| |
Collapse
|
11
|
Khanh Le NQ, Nguyen QH, Chen X, Rahardja S, Nguyen BP. Classification of adaptor proteins using recurrent neural networks and PSSM profiles. BMC Genomics 2019; 20:966. [PMID: 31874633 PMCID: PMC6929330 DOI: 10.1186/s12864-019-6335-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 11/25/2019] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Adaptor proteins are carrier proteins that play a crucial role in signal transduction. They commonly consist of several modular domains, each having its own binding activity and operating by forming complexes with other intracellular-signaling molecules. Many studies determined that the adaptor proteins had been implicated in a variety of human diseases. Therefore, creating a precise model to predict the function of adaptor proteins is one of the vital tasks in bioinformatics and computational biology. Few computational biology studies have been conducted to predict the protein functions, and in most of those studies, position specific scoring matrix (PSSM) profiles had been used as the features to be fed into the neural networks. However, the neural networks could not reach the optimal result because the sequential information in PSSMs has been lost. This study proposes an innovative approach by incorporating recurrent neural networks (RNNs) and PSSM profiles to resolve this problem. RESULTS Compared to other state-of-the-art methods which had been applied successfully in other problems, our method achieves enhancement in all of the common measurement metrics. The area under the receiver operating characteristic curve (AUC) metric in prediction of adaptor proteins in the cross-validation and independent datasets are 0.893 and 0.853, respectively. CONCLUSIONS This study opens a research path that can promote the use of RNNs and PSSM profiles in bioinformatics and computational biology. Our approach is reproducible by scientists that aim to improve the performance results of different protein function prediction problems. Our source code and datasets are available at https://github.com/ngphubinh/adaptors.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Keelung Road, Da'an Distric, Taipei City 106, Taiwan (R.O.C.)
| | - Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet, Hanoi 100000, Vietnam
| | - Xuan Chen
- Beijing Genomics Institute, 21 Hongan 3rd Street, Shenzhen 518083, China
| | - Susanto Rahardja
- School of Marine Science and Technology, Northwestern Polytechnical University, 127 West Youyi Road, Xi'an 710072, China.
| | - Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Gate 7, Kelburn Parade, Wellington 6140, New Zealand
| |
Collapse
|
12
|
Chandra A, Sharma A, Dehzangi A, Shigemizu D, Tsunoda T. Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix. BMC Mol Cell Biol 2019; 20:57. [PMID: 31856704 PMCID: PMC6923822 DOI: 10.1186/s12860-019-0240-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Accepted: 11/20/2019] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification. RESULTS We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively. CONCLUSIONS The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.
Collapse
Affiliation(s)
- Abel Chandra
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji.
| | - Alok Sharma
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji. .,Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, 4111, Australia. .,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan. .,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan. .,CREST, JST, Tokyo, 102-8666, Japan.
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD, USA
| | - Daichi Shigemizu
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.,CREST, JST, Tokyo, 102-8666, Japan.,Medical Genome Center, National Center for Geriatrics and Gerontology, Obu, Aichi, 474-8511, Japan
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, 230-0045, Japan.,CREST, JST, Tokyo, 102-8666, Japan.,Laboratory for Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, 108-8639, Japan
| |
Collapse
|
13
|
Ahmad S, Gromiha MM, Raghava GPS, Schönbach C, Ranganathan S. APBioNet's annual International Conference on Bioinformatics (InCoB) returns to India in 2018. BMC Genomics 2019; 19:266. [PMID: 30999857 PMCID: PMC7402400 DOI: 10.1186/s12864-019-5582-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
InCoB, one of the largest annual bioinformatics conferences in the Asia-Pacific region since its launch in 2002, returned to New Delhi, India after 12 years, with a conference attendance of 314 delegates. The 2018 conference had sessions on Big Data and Algorithms, Next Generation Sequencing and Omics Science, Structure, Function and Interactions, Disease and Drug Discovery and Plant and Agricultural Bioinformatics. The conference also featured an industry track as well as panel discussions on Women in Bioinformatics and Democratization vs. Quality control in academic publishing. Asia Pacific Bioinformatics Interaction & Networking Society (APbians) was launched as an APBionet Special Interest Group. Of the 52 oral presentations made, 22 were accepted in supplemental issues of BMC Bioinformatics, BMC Genomics or BMC Medical Genomics and are briefly reviewed here. Next year’s InCoB will be held in Jakarta, Indonesia from September 10–12, 2019.
Collapse
Affiliation(s)
- Shandar Ahmad
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110 067, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamilnadu, 600 036, India
| | - Gajendra P S Raghava
- Centre for Computational Biology, Indraprastha Institute of Information Technology, Okhla Industrial Estate, Phase III, New Delhi, 110020, India
| | - Christian Schönbach
- Department of Biology, School of Science and Technology, Nazarbayev University, Astana, Kazakhstan.,International Research Center for Medical Sciences, Graduate School of Medical Sciences, Kumamoto University, Kumamoto, 860-0811, Japan
| | - Shoba Ranganathan
- Department of Molecular Sciences, Macquarie University, Sydney, NSW, 2109, Australia. .,Transformational Bioinformatics, Health and Biosecurity, Commonwealth Scientific and Industrial Research Organisation, Sydney, Australia.
| |
Collapse
|