1
|
Tasmia SA, Kibria MK, Tuly KF, Islam MA, Khatun MS, Hasan MM, Mollah MNH. Prediction of serine phosphorylation sites mapping on Schizosaccharomyces Pombe by fusing three encoding schemes with the random forest classifier. Sci Rep 2022; 12:2632. [PMID: 35173235 PMCID: PMC8850546 DOI: 10.1038/s41598-022-06529-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 02/01/2022] [Indexed: 11/08/2022] Open
Abstract
Serine phosphorylation is one type of protein post-translational modifications (PTMs), which plays an essential role in various cellular processes and disease pathogenesis. Numerous methods are used for the prediction of phosphorylation sites. However, the traditional wet-lab based experimental approaches are time-consuming, laborious, and expensive. In this work, a computational predictor was proposed to predict serine phosphorylation sites mapping on Schizosaccharomyces pombe (SP) by the fusion of three encoding schemes namely k-spaced amino acid pair composition (CKSAAP), binary and amino acid composition (AAC) with the random forest (RF) classifier. So far, the proposed method is firstly developed to predict serine phosphorylation sites for SP. Both the training and independent test performance scores were used to investigate the success of the proposed RF based fusion prediction model compared to others. We also investigated their performances by 5-fold cross-validation (CV). In all cases, it was observed that the recommended predictor achieves the largest scores of true positive rate (TPR), true negative rate (TNR), accuracy (ACC), Mathew coefficient of correlation (MCC), Area under the ROC curve (AUC) and pAUC (partial AUC) at false positive rate (FPR) = 0.20. Thus, the prediction performance as discussed in this paper indicates that the proposed approach may be a beneficial and motivating computational resource for predicting serine phosphorylation sites in the case of Fungi. The online interface of the software for the proposed prediction model is publicly available at http://mollah-bioinformaticslab-stat.ru.ac.bd/PredSPS/ .
Collapse
Affiliation(s)
- Samme Amena Tasmia
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Kaderi Kibria
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Khanis Farhana Tuly
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Md Ariful Islam
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh
| | - Mst Shamima Khatun
- Department of Microbiology and Immunology, Tulane University School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112, USA
| | - Md Nurul Haque Mollah
- Bioinformatics Laboratory, Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| |
Collapse
|
2
|
Tasmia SA, Ahmed FF, Mosharaf P, Hasan M, Mollah NH. An Improved Computational Prediction Model for Lysine Succinylation Sites Mapping on Homo sapiens by Fusing Three Sequence Encoding Schemes with the Random Forest Classifier. Curr Genomics 2021; 22:122-136. [PMID: 34220299 PMCID: PMC8188582 DOI: 10.2174/1389202922666210219114211] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 12/13/2020] [Accepted: 01/06/2021] [Indexed: 11/22/2022] Open
Abstract
Background Lysine succinylation is one of the reversible protein post-translational modifications (PTMs), which regulate the structure and function of proteins. It plays a significant role in various cellular physiologies including some diseases of human as well as many other organisms. The accurate identification of succinylation site is essential to understand the various biological functions and drug development. Methods In this study, we developed an improved method to predict lysine succinylation sites mapping on Homo sapiens by the fusion of three encoding schemes such as binary, the composition of k-spaced amino acid pairs (CKSAAP) and amino acid composition (AAC) with the random forest (RF) classifier. The prediction performance of the proposed random forest (RF) based on the fusion model in a comparison of other candidates was investigated by using 20-fold cross-validation (CV) and two independent test datasets were collected from two different sources. Results The CV results showed that the proposed predictor achieves the highest scores of sensitivity (SN) as 0.800, specificity (SP) as 0.902, accuracy (ACC) as 0.919, Mathew correlation coefficient (MCC) as 0.766 and partial AUC (pAUC) as 0.163 at a false-positive rate (FPR) = 0.10 and area under the ROC curve (AUC) as 0.958. It achieved the highest performance scores of SN as 0.811, SP as 0.902, ACC as 0.891, MCC as 0.629 and pAUC as 0.139 and AUC as 0.921 for the independent test protein set-1 and SN as 0.772, SP as 0.901, ACC as 0.836, MCC as 0.677 and pAUC as 0.141 at FPR = 0.10 and AUC as 0.923 for the independent test protein set-2. It also outperformed all the other existing prediction models. Conclusion The prediction performances as discussed in this article recommend that the proposed method might be a useful and encouraging computational resource for lysine succinylation site prediction in the case of human population.
Collapse
Affiliation(s)
- Samme Amena Tasmia
- 1Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh; 2Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh; 3Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan
| | - Fee Faysal Ahmed
- 1Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh; 2Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh; 3Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan
| | - Parvez Mosharaf
- 1Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh; 2Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh; 3Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan
| | - Mehedi Hasan
- 1Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh; 2Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh; 3Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan
| | - Nurul Haque Mollah
- 1Bioinformatics Lab., Department of Statistics, Rajshahi University, Rajshahi-6205, Bangladesh; 2Department of Mathematics, Jashore University of Science and Technology, Jashore, Bangladesh; 3Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Fukuoka, Japan
| |
Collapse
|
3
|
Prediction of Apoptosis Protein Subcellular Localization with Multilayer Sparse Coding and Oversampling Approach. BIOMED RESEARCH INTERNATIONAL 2019; 2019:2436924. [PMID: 30834257 PMCID: PMC6374881 DOI: 10.1155/2019/2436924] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 01/04/2019] [Accepted: 01/20/2019] [Indexed: 11/29/2022]
Abstract
The prediction of apoptosis protein subcellular localization plays an important role in understanding the progress in cell proliferation and death. Recently computational approaches to this issue have become very popular, since the traditional biological experiments are so costly and time-consuming that they cannot catch up with the growth rate of sequence data anymore. In order to improve the prediction accuracy of apoptosis protein subcellular localization, we proposed a sparse coding method combined with traditional feature extraction algorithm to complete the sparse representation of apoptosis protein sequences, using multilayer pooling based on different sizes of dictionaries to integrate the processed features, as well as oversampling approach to decrease the influences caused by unbalanced data sets. Then the extracted features were input to a support vector machine to predict the subcellular localization of the apoptosis protein. The experiment results obtained by Jackknife test on two benchmark data sets indicate that our method can significantly improve the accuracy of the apoptosis protein subcellular localization prediction.
Collapse
|
4
|
Hasan MM, Kurata H. GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features. PLoS One 2018; 13:e0200283. [PMID: 30312302 PMCID: PMC6193575 DOI: 10.1371/journal.pone.0200283] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 06/22/2018] [Indexed: 01/09/2023] Open
Abstract
Lysine succinylation is one of the dominant post-translational modification of the protein that contributes to many biological processes including cell cycle, growth and signal transduction pathways. Identification of succinylation sites is an important step for understanding the function of proteins. The complicated sequence patterns of protein succinylation revealed by proteomic studies highlight the necessity of developing effective species-specific in silico strategies for global prediction succinylation sites. Here we have developed the generic and nine species-specific succinylation site classifiers through aggregating multiple complementary features. We optimized the consecutive features using the Wilcoxon-rank feature selection scheme. The final feature vectors were trained by a random forest (RF) classifier. With an integration of RF scores via logistic regression, the resulting predictor termed GPSuc achieved better performance than other existing generic and species-specific succinylation site predictors. To reveal the mechanism of succinylation and assist hypothesis-driven experimental design, our predictor serves as a valuable resource. To provide a promising performance in large-scale datasets, a web application was developed at http://kurata14.bio.kyutech.ac.jp/GPSuc/.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
- Biomedi Informatics R&D Center, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
- * E-mail:
| |
Collapse
|
5
|
Predicting protein submitochondrial locations by incorporating the pseudo-position specific scoring matrix into the general Chou's pseudo-amino acid composition. J Theor Biol 2018; 450:86-103. [DOI: 10.1016/j.jtbi.2018.04.026] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2018] [Revised: 04/10/2018] [Accepted: 04/16/2018] [Indexed: 01/16/2023]
|
6
|
Saidijam M, Karimi Dermani F, Sohrabi S, Patching SG. Efflux proteins at the blood-brain barrier: review and bioinformatics analysis. Xenobiotica 2017; 48:506-532. [PMID: 28481715 DOI: 10.1080/00498254.2017.1328148] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
1. Efflux proteins at the blood-brain barrier provide a mechanism for export of waste products of normal metabolism from the brain and help to maintain brain homeostasis. They also prevent entry into the brain of a wide range of potentially harmful compounds such as drugs and xenobiotics. 2. Conversely, efflux proteins also hinder delivery of therapeutic drugs to the brain and central nervous system used to treat brain tumours and neurological disorders. For bypassing efflux proteins, a comprehensive understanding of their structures, functions and molecular mechanisms is necessary, along with new strategies and technologies for delivery of drugs across the blood-brain barrier. 3. We review efflux proteins at the blood-brain barrier, classified as either ATP-binding cassette (ABC) transporters (P-gp, BCRP, MRPs) or solute carrier (SLC) transporters (OATP1A2, OATP1A4, OATP1C1, OATP2B1, OAT3, EAATs, PMAT/hENT4 and MATE1). 4. This includes information about substrate and inhibitor specificity, structural organisation and mechanism, membrane localisation, regulation of expression and activity, effects of diseases and conditions and the principal technique used for in vivo analysis of efflux protein activity: positron emission tomography (PET). 5. We also performed analyses of evolutionary relationships, membrane topologies and amino acid compositions of the proteins, and linked these to structure and function.
Collapse
Affiliation(s)
- Massoud Saidijam
- a Department of Molecular Medicine and Genetics , Research Centre for Molecular Medicine, School of Medicine, Hamadan University of Medical Sciences , Hamadan , Iran and
| | - Fatemeh Karimi Dermani
- a Department of Molecular Medicine and Genetics , Research Centre for Molecular Medicine, School of Medicine, Hamadan University of Medical Sciences , Hamadan , Iran and
| | - Sareh Sohrabi
- a Department of Molecular Medicine and Genetics , Research Centre for Molecular Medicine, School of Medicine, Hamadan University of Medical Sciences , Hamadan , Iran and
| | - Simon G Patching
- b School of BioMedical Sciences and the Astbury Centre for Structural Molecular Biology, University of Leeds , Leeds , UK
| |
Collapse
|
7
|
Saidijam M, Azizpour S, Patching SG. Comprehensive analysis of the numbers, lengths and amino acid compositions of transmembrane helices in prokaryotic, eukaryotic and viral integral membrane proteins of high-resolution structure. J Biomol Struct Dyn 2017; 36:443-464. [PMID: 28150531 DOI: 10.1080/07391102.2017.1285725] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We report a comprehensive analysis of the numbers, lengths and amino acid compositions of transmembrane helices in 235 high-resolution structures of integral membrane proteins. The properties of 1551 transmembrane helices in the structures were compared with those obtained by analysis of the same amino acid sequences using topology prediction tools. Explanations for the 81 (5.2%) missing or additional transmembrane helices in the prediction results were identified. Main reasons for missing transmembrane helices were mis-identification of N-terminal signal peptides, breaks in α-helix conformation or charged residues in the middle of transmembrane helices and transmembrane helices with unusual amino acid composition. The main reason for additional transmembrane helices was mis-identification of amphipathic helices, extramembrane helices or hairpin re-entrant loops. Transmembrane helix length had an overall median of 24 residues and an average of 24.9 ± 7.0 residues and the most common length was 23 residues. The overall content of residues in transmembrane helices as a percentage of the full proteins had a median of 56.8% and an average of 55.7 ± 16.0%. Amino acid composition was analysed for the full proteins, transmembrane helices and extramembrane regions. Individual proteins or types of proteins with transmembrane helices containing extremes in contents of individual amino acids or combinations of amino acids with similar physicochemical properties were identified and linked to structure and/or function. In addition to overall median and average values, all results were analysed for proteins originating from different types of organism (prokaryotic, eukaryotic, viral) and for subgroups of receptors, channels, transporters and others.
Collapse
Affiliation(s)
- Massoud Saidijam
- a Department of Molecular Medicine and Genetics, Research Centre for Molecular Medicine, School of Medicine , Hamadan University of Medical Sciences , Hamadan , Iran
| | - Sonia Azizpour
- a Department of Molecular Medicine and Genetics, Research Centre for Molecular Medicine, School of Medicine , Hamadan University of Medical Sciences , Hamadan , Iran
| | - Simon G Patching
- b School of BioMedical Sciences and the Astbury Centre for Structural Molecular Biology , University of Leeds , Leeds , UK
| |
Collapse
|
8
|
Characterisation of the DAACS Family Escherichia coli Glutamate/Aspartate-Proton Symporter GltP Using Computational, Chemical, Biochemical and Biophysical Methods. J Membr Biol 2016; 250:145-162. [DOI: 10.1007/s00232-016-9942-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 12/09/2016] [Indexed: 10/20/2022]
|