1
|
Charoenkwan P, Schaduangrat N, Lio P, Moni MA, Chumnanpuen P, Shoombuatong W. iAMAP-SCM: A Novel Computational Tool for Large-Scale Identification of Antimalarial Peptides Using Estimated Propensity Scores of Dipeptides. ACS OMEGA 2022; 7:41082-41095. [PMID: 36406571 PMCID: PMC9670693 DOI: 10.1021/acsomega.2c04465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]
Abstract
Antimalarial peptides (AMAPs) varying in length, amino acid composition, charge, conformational structure, hydrophobicity, and amphipathicity reflect their diversity in antimalarial mechanisms. Due to the worldwide major health problem concerning antimicrobial resistance, these peptides possess great therapeutic value owing to their low incidences of drug resistance as compared to conventional antibiotics. Although well-known experimental methods are able to precisely determine the antimalarial activity of peptides, these methods are still time-consuming and costly. Thus, machine learning (ML)-based methods that are capable of identifying AMAPs rapidly by using only sequence information would be beneficial for the high-throughput identification of AMAPs. In this study, we propose the first computational model (termed iAMAP-SCM) for the large-scale identification and characterization of peptides with antimalarial activity by using only sequence information. Specifically, we employed an interpretable scoring card method (SCM) to develop iAMAP-SCM and estimate propensities of 20 amino acids and 400 dipeptides to be AMAPs in a supervised manner. Experimental results showed that iAMAP-SCM could achieve a maximum accuracy and Matthew's coefficient correlation of 0.957 and 0.834, respectively, on the independent test dataset. In addition, SCM-derived propensities of 20 amino acids and selected physicochemical properties were used to provide an understanding of the functional mechanisms of AMAPs. Finally, a user-friendly online computational platform of iAMAP-SCM is publicly available at http://pmlabstack.pythonanywhere.com/iAMAP-SCM. The iAMAP-SCM predictor is anticipated to assist experimental scientists in the high-throughput identification of potential AMAP candidates for the treatment of malaria and other clinical applications.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern
Management and Information Technology, College of Arts, Media and
Technology, Chiang Mai University, Chiang Mai50200, Thailand
| | - Nalini Schaduangrat
- Center
of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok10700, Thailand
| | - Pietro Lio
- Department
of Computer Science and Technology, University
of Cambridge, CambridgeshireCB3 0FD, U.K.
| | - Mohammad Ali Moni
- Artificial
Intelligence & Digital Health, School of Health and Rehabilitation
Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St LuciaQLD 4072, Australia
| | - Pramote Chumnanpuen
- Department
of Zoology, Faculty of Science, Kasetsart
University, Bangkok10900, Thailand
- Omics Center
for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok10900, Thailand
| | - Watshara Shoombuatong
- Center
of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok10700, Thailand
| |
Collapse
|
2
|
Improved prediction and characterization of blood-brain barrier penetrating peptides using estimated propensity scores of dipeptides. J Comput Aided Mol Des 2022; 36:781-796. [DOI: 10.1007/s10822-022-00476-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 09/15/2022] [Indexed: 11/27/2022]
|
3
|
Charoenkwan P, Kanthawong S, Schaduangrat N, Li’ P, Moni MA, Shoombuatong W. SCMRSA: a New Approach for Identifying and Analyzing Anti-MRSA Peptides Using Estimated Propensity Scores of Dipeptides. ACS OMEGA 2022; 7:32653-32664. [PMID: 36120041 PMCID: PMC9476499 DOI: 10.1021/acsomega.2c04305] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]
Abstract
Staphylococcus aureus is deemed to be one of the major causes of hospital and community-acquired infections, especially in methicillin-resistant S. aureus (MRSA) strains. Because antimicrobial peptides have captured attention as novel drug candidates due to their rapid and broad-spectrum antimicrobial activity, anti-MRSA peptides have emerged as potential therapeutics for the treatment of bacterial infections. Although experimental approaches can precisely identify anti-MRSA peptides, they are usually cost-ineffective and labor-intensive. Therefore, computational approaches that are able to identify and characterize anti-MRSA peptides by using sequence information are highly desirable. In this study, we present the first computational approach (termed SCMRSA) for identifying and characterizing anti-MRSA peptides by using sequence information without the use of 3D structural information. In SCMRSA, we employed an interpretable scoring card method (SCM) coupled with the estimated propensity scores of 400 dipeptides. Comparative experiments indicated that SCMRSA was more effective and could outperform several machine learning-based classifiers with an accuracy of 0.960 and Matthews correlation coefficient of 0.848 on the independent test data set. In addition, we employed the SCMRSA-derived propensity scores to provide a more in-depth explanation regarding the functional mechanisms of anti-MRSA peptides. Finally, in order to serve community-wide use of the proposed SCMRSA, we established a user-friendly webserver which can be accessed online at http://pmlabstack.pythonanywhere.com/SCMRSA. SCMRSA is anticipated to be an open-source and useful tool for screening and identifying novel anti-MRSA peptides for follow-up experimental studies.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern
Management and Information Technology, College of Arts, Media and
Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Sakawrat Kanthawong
- Department
of Microbiology, Faculty of Medicine, Khon
Kaen University, Khon Kaen 40002, Thailand
| | - Nalini Schaduangrat
- Center
of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Pietro Li’
- Department
of Computer Science and Technology, University
of Cambridge, Cambridge CB3 0FD, U.K.
| | - Mohammad Ali Moni
- Artificial
Intelligence & Digital Health, School of Health and Rehabilitation
Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland St Lucia, Queensland 4072, Australia
| | - Watshara Shoombuatong
- Center
of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
4
|
Genome-Wide Profiling of Alternative Splicing and Gene Fusion during Rice Black-Streaked Dwarf Virus Stress in Maize (Zea mays L.). Genes (Basel) 2022; 13:genes13030456. [PMID: 35328010 PMCID: PMC8955601 DOI: 10.3390/genes13030456] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 02/28/2022] [Accepted: 02/28/2022] [Indexed: 12/26/2022] Open
Abstract
Rice black-streaked dwarf virus (RBSDV) causes maize rough dwarf disease (MRDD), which is a viral disease that significantly affects maize yields worldwide. Plants tolerate stress through transcriptional reprogramming at the alternative splicing (AS), transcriptional, and fusion gene (FG) levels. However, it is unclear whether and how AS and FG interfere with transcriptional reprogramming in MRDD. In this study, we performed global profiling of AS and FG on maize response to RBSDV and compared it with transcriptional changes. There are approximately 1.43 to 2.25 AS events per gene in maize infected with RBSDV. GRMZM2G438622 was only detected in four AS modes (A3SS, A5SS, RI, and SE), whereas GRMZM2G059392 showed downregulated expression and four AS events. A total of 106 and 176 FGs were detected at two time points, respectively, including six differentially expressed genes and five differentially spliced genes. The gene GRMZM2G076798 was the only FG that occurred at two time points and was involved in two FG events. Among these, 104 GOs were enriched, indicating that nodulin-, disease resistance-, and chloroplastic-related genes respond to RBSDV stress in maize. These results provide new insights into the mechanisms underlying post-transcriptional and transcriptional regulation of maize response to RBSDV stress.
Collapse
|
5
|
Charoenkwan P, Chiangjong W, Nantasenamat C, Moni MA, Lio’ P, Manavalan B, Shoombuatong W. SCMTHP: A New Approach for Identifying and Characterizing of Tumor-Homing Peptides Using Estimated Propensity Scores of Amino Acids. Pharmaceutics 2022; 14:pharmaceutics14010122. [PMID: 35057016 PMCID: PMC8779003 DOI: 10.3390/pharmaceutics14010122] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 12/16/2021] [Accepted: 12/28/2021] [Indexed: 12/13/2022] Open
Abstract
Tumor-homing peptides (THPs) are small peptides that can recognize and bind cancer cells specifically. To gain a better understanding of THPs’ functional mechanisms, the accurate identification and characterization of THPs is required. Although some computational methods for in silico THP identification have been proposed, a major drawback is their lack of model interpretability. In this study, we propose a new, simple and easily interpretable computational approach (called SCMTHP) for identifying and analyzing tumor-homing activities of peptides via the use of a scoring card method (SCM). To improve the predictability and interpretability of our predictor, we generated propensity scores of 20 amino acids as THPs. Finally, informative physicochemical properties were used for providing insights on characteristics giving rise to the bioactivity of THPs via the use of SCMTHP-derived propensity scores. Benchmarking experiments from independent test indicated that SCMTHP could achieve comparable performance to state-of-the-art method with accuracies of 0.827 and 0.798, respectively, when evaluated on two benchmark datasets consisting of Main and Small datasets. Furthermore, SCMTHP was found to outperform several well-known machine learning-based classifiers (e.g., decision tree, k-nearest neighbor, multi-layer perceptron, naive Bayes and partial least squares regression) as indicated by both 10-fold cross-validation and independent tests. Finally, the SCMTHP web server was established and made freely available online. SCMTHP is expected to be a useful tool for rapid and accurate identification of THPs and for providing better understanding on THP biophysical and biochemical properties.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Wararat Chiangjong
- Pediatric Translational Research Unit, Department of Pediatrics, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok 10400, Thailand;
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Mohammad Ali Moni
- Artificial Intelligence & Digital Health Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia;
| | - Pietro Lio’
- Department of Computer Science and Technology, University of Cambridge, Cambridge CB3 0FD, UK;
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
- Correspondence: (B.M.); (W.S.)
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
- Correspondence: (B.M.); (W.S.)
| |
Collapse
|
6
|
Sangphukieo A, Laomettachit T, Ruengjitchatchawalya M. PhotoModPlus: A web server for photosynthetic protein prediction from genome neighborhood features. PLoS One 2021; 16:e0248682. [PMID: 33730083 PMCID: PMC7968678 DOI: 10.1371/journal.pone.0248682] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Accepted: 03/03/2021] [Indexed: 11/20/2022] Open
Abstract
A new web server called PhotoModPlus is presented as a platform for predicting photosynthetic proteins via genome neighborhood networks (GNN) and genome neighborhood-based machine learning. GNN enables users to visualize the overview of the conserved neighboring genes from multiple photosynthetic prokaryotic genomes and provides functional guidance on the query input. In the platform, we also present a new machine learning model utilizing genome neighborhood features for predicting photosynthesis-specific functions based on 24 prokaryotic photosynthesis-related GO terms, namely PhotoModGO. The new model performed better than the sequence-based approaches with an F1 measure of 0.872, based on nested five-fold cross-validation. Finally, we demonstrated the applications of the webserver and the new model in the identification of novel photosynthetic proteins. The server is user-friendly, compatible with all devices, and available at bicep.kmutt.ac.th/photomod.
Collapse
Affiliation(s)
- Apiwat Sangphukieo
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, Thailand
- School of Information Technology, KMUTT, Thung Khru, Bangkok, Thailand
| | - Teeraphan Laomettachit
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, Thailand
| | - Marasri Ruengjitchatchawalya
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, Thailand
- Biotechnology Program, School of Bioresources and Technology, KMUTT, Bang Khun Thian, Bangkok, Thailand
- Algal Biotechnology Research Group, Pilot Plant Development and Training Institute, KMUTT, Bang Khun Thian, Bangkok, Thailand
| |
Collapse
|
7
|
Charoenkwan P, Yana J, Nantasenamat C, Hasan MM, Shoombuatong W. iUmami-SCM: A Novel Sequence-Based Predictor for Prediction and Analysis of Umami Peptides Using a Scoring Card Method with Propensity Scores of Dipeptides. J Chem Inf Model 2020; 60:6666-6678. [DOI: 10.1021/acs.jcim.0c00707] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| |
Collapse
|
8
|
iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. Genomics 2020; 112:2813-2822. [DOI: 10.1016/j.ygeno.2020.03.019] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 03/19/2020] [Accepted: 03/22/2020] [Indexed: 12/21/2022]
|
9
|
Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J Comput Aided Mol Des 2020; 34:1105-1116. [DOI: 10.1007/s10822-020-00323-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 06/10/2020] [Indexed: 12/11/2022]
|
10
|
Sangphukieo A, Laomettachit T, Ruengjitchatchawalya M. Photosynthetic protein classification using genome neighborhood-based machine learning feature. Sci Rep 2020; 10:7108. [PMID: 32346070 PMCID: PMC7189237 DOI: 10.1038/s41598-020-64053-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 04/07/2020] [Indexed: 11/08/2022] Open
Abstract
Identification of novel photosynthetic proteins is important for understanding and improving photosynthetic efficiency. Synergistically, genome neighborhood can provide additional useful information to identify photosynthetic proteins. We, therefore, expected that applying a computational approach, particularly machine learning (ML) with the genome neighborhood-based feature should facilitate the photosynthetic function assignment. Our results revealed a functional relationship between photosynthetic genes and their conserved neighboring genes observed by 'Phylo score', indicating their functions could be inferred from the genome neighborhood profile. Therefore, we created a new method for extracting patterns based on the genome neighborhood network (GNN) and applied them for the photosynthetic protein classification using ML algorithms. Random forest (RF) classifier using genome neighborhood-based features achieved the highest accuracy up to 87% in the classification of photosynthetic proteins and also showed better performance (Mathew's correlation coefficient = 0.718) than other available tools including the sequence similarity search (0.447) and ML-based method (0.361). Furthermore, we demonstrated the ability of our model to identify novel photosynthetic proteins compared to the other methods. Our classifier is available at http://bicep2.kmutt.ac.th/photomod_standalone, https://bit.ly/2S0I2Ox and DockerHub: https://hub.docker.com/r/asangphukieo/photomod.
Collapse
Affiliation(s)
- Apiwat Sangphukieo
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, 10150, Thailand
- School of Information Technology, KMUTT, Bang Mod, Thung Khru, Bangkok, 10140, Thailand
| | - Teeraphan Laomettachit
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, 10150, Thailand
| | - Marasri Ruengjitchatchawalya
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (KMUTT), Bang Khun Thian, Bangkok, 10150, Thailand.
- Biotechnology program, School of Bioresources and Technology, KMUTT, Bang Khun Thian, Bangkok, 10150, Thailand.
- Algal Biotechnology Research Group, Pilot Plant Development and Training Institute (PDTI), KMUTT, Bang Khun Thian, Bangkok, 10150, Thailand.
| |
Collapse
|
11
|
Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, Shoombuatong W. PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method. Cells 2020; 9:E353. [PMID: 32028709 PMCID: PMC7072630 DOI: 10.3390/cells9020353] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 01/20/2020] [Accepted: 01/27/2020] [Indexed: 12/16/2022] Open
Abstract
Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand;
| | - Sakawrat Kanthawong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand;
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Janchai Yana
- Department of Chemistry, Faculty of Science and Technology, Chiang Mai Rajabhat University, Chiang Mai 50300, Thailand;
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| |
Collapse
|
12
|
Zhang YZ, Shen HB. Signal-3L 2.0: A Hierarchical Mixture Model for Enhancing Protein Signal Peptide Prediction by Incorporating Residue-Domain Cross-Level Features. J Chem Inf Model 2017; 57:988-999. [PMID: 28298081 DOI: 10.1021/acs.jcim.6b00484] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Signal peptides play key roles in targeting and translocation of integral membrane proteins and secretory proteins. However, signal peptides present several challenges for automatic prediction methods. One challenge is that it is difficult to discriminate signal peptides from transmembrane helices, as both the H-region of the peptides and the transmembrane helices are hydrophobic. Another is that it is difficult to identify the cleavage site between signal peptides and mature proteins, as cleavage motifs or patterns are still unclear for most proteins. To solve these problems and further enhance automatic signal peptide recognition, we report a new Signal-3L 2.0 predictor. Our new model is constructed with a hierarchical protocol, where it first determines the existence of a signal peptide. For this, we propose a new residue-domain cross-level feature-driven approach, and we demonstrate that protein functional domain information is particularly useful for discriminating between the transmembrane helices and signal peptides as they perform different functions. Next, in order to accurately identify the unique signal peptide cleavage sites along the sequence, we designed a top-down approach where a subset of potential cleavage sites are screened using statistical learning rules, and then a final unique site is selected according to its evolution conservation score. Because this mixed approach utilizes both statistical learning and evolution analysis, it shows a strong capacity for recognizing cleavage sites. Signal-3L 2.0 has been benchmarked on multiple data sets, and the experimental results have demonstrated its accuracy. The online server is available at www.csbio.sjtu.edu.cn/bioinf/Signal-3L/ .
Collapse
Affiliation(s)
- Yi-Ze Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University , Shanghai, 200240, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China , Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University , Shanghai, 200240, China.,Key Laboratory of System Control and Information Processing, Ministry of Education of China , Shanghai, 200240, China
| |
Collapse
|
13
|
Vasylenko T, Liou YF, Chiou PC, Chu HW, Lai YS, Chou YL, Huang HL, Ho SY. SCMBYK: prediction and characterization of bacterial tyrosine-kinases based on propensity scores of dipeptides. BMC Bioinformatics 2016; 17:514. [PMID: 28155663 PMCID: PMC5260027 DOI: 10.1186/s12859-016-1371-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Background Bacterial tyrosine-kinases (BY-kinases), which play an important role in numerous cellular processes, are characterized as a separate class of enzymes and share no structural similarity with their eukaryotic counterparts. However, in silico methods for predicting BY-kinases have not been developed yet. Since these enzymes are involved in key regulatory processes, and are promising targets for anti-bacterial drug design, it is desirable to develop a simple and easily interpretable predictor to gain new insights into bacterial tyrosine phosphorylation. This study proposes a novel SCMBYK method for predicting and characterizing BY-kinases. Results A dataset consisting of 797 BY-kinases and 783 non-BY-kinases was established to design the SCMBYK predictor, which achieved training and test accuracies of 97.55 and 96.73%, respectively. Furthermore, the leave-one-phylum-out method was used to predict specific bacterial phyla hosts of target sequences, gaining 97.39% average test accuracy. After analyzing SCMBYK-derived propensity scores, four characteristics of BY-kinases were determined: 1) BY-kinases tend to be composed of α-helices; 2) the amino-acid content of extracellular regions of BY-kinases is expected to be dominated by residues such as Val, Ile, Phe and Tyr; 3) BY-kinases structurally resemble nuclear proteins; 4) different domains play different roles in triggering BY-kinase activity. Conclusions The SCMBYK predictor is an effective method for identification of possible BY-kinases. Furthermore, it can be used as a part of a novel drug repurposing method, which recognizes putative BY-kinases and matches them to approved drugs. Among other results, our analysis revealed that azathioprine could suppress the virulence of M. tuberculosis, and thus be considered as a potential antibiotic for tuberculosis treatment. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1371-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tamara Vasylenko
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan
| | - Yi-Fan Liou
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan
| | - Po-Chin Chiou
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan
| | - Hsiao-Wei Chu
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan
| | - Yung-Sung Lai
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan
| | - Yu-Ling Chou
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan
| | - Hui-Ling Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan. .,College of Biological Science and Technology, National Chiao Tung University, Hsinchu, 300, Taiwan. .,Center for Bioinformatics Research, National Chiao Tung University, Hsinchu, Taiwan.
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, 300, Taiwan. .,College of Biological Science and Technology, National Chiao Tung University, Hsinchu, 300, Taiwan. .,Center for Bioinformatics Research, National Chiao Tung University, Hsinchu, Taiwan.
| |
Collapse
|
14
|
Shigemitsu S, Cao W, Terada T, Shimizu K. Development of a prediction system for tail-anchored proteins. BMC Bioinformatics 2016; 17:378. [PMID: 27634135 PMCID: PMC5025589 DOI: 10.1186/s12859-016-1202-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Accepted: 08/24/2016] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND "Tail-anchored (TA) proteins" is a collective term for transmembrane proteins with a C-terminal transmembrane domain (TMD) and without an N-terminal signal sequence. TA proteins account for approximately 3-5 % of all transmembrane proteins that mediate membrane fusion, regulation of apoptosis, and vesicular transport. The combined use of TMD and signal sequence prediction tools is typically required to predict TA proteins. RESULTS Here we developed a prediction system named TAPPM that predicted TA proteins solely from target amino acid sequences according to the knowledge of the sequence features of TMDs and the peripheral regions of TA proteins. Manually curated TA proteins were collected from published literature. We constructed hidden markov models of TA proteins as well as three different types of transmembrane proteins with similar structures and compared their likelihoods as TA proteins. CONCLUSIONS Using the HMM models, we achieved high prediction accuracy; area under the receiver operator curve values reaching 0.963. A command line tool written in Python is available at https://github.com/davecao/tappm_cli .
Collapse
Affiliation(s)
- Shunsuke Shigemitsu
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Wei Cao
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Tohru Terada
- Agricultural Bioinformatics Research Unit, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| | - Kentaro Shimizu
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657 Japan
| |
Collapse
|
15
|
Liou YF, Vasylenko T, Yeh CL, Lin WC, Chiu SH, Charoenkwan P, Shu LS, Ho SY, Huang HL. SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides. BMC Genomics 2015; 16 Suppl 12:S6. [PMID: 26677931 PMCID: PMC4682407 DOI: 10.1186/1471-2164-16-s12-s6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Identifying putative membrane transport proteins (MTPs) and understanding the transport mechanisms involved remain important challenges for the advancement of structural and functional genomics. However, the transporter characters are mainly acquired from MTP crystal structures which are hard to crystalize. Therefore, it is desirable to develop bioinformatics tools for the effective large-scale analysis of available sequences to identify novel transporters and characterize such transporters. RESULTS This work proposes a novel method (SCMMTP) based on the scoring card method (SCM) using dipeptide composition to identify and characterize MTPs from an existing dataset containing 900 MTPs and 660 non-MTPs which are separated into a training dataset consisting 1,380 proteins and an independent dataset consisting 180 proteins. The SCMMTP produced estimating propensity scores for amino acids and dipeptides as MTPs. The SCMMTP training and test accuracy levels respectively reached 83.81% and 76.11%. The test accuracy of support vector machine (SVM) using a complicated classification method with a low possibility for biological interpretation and position-specific substitution matrix (PSSM) as a protein feature is 80.56%, thus SCMMTP is comparable to SVM-PSSM. To identify MTPs, SCMMTP is applied to three datasets including: 1) human transmembrane proteins, 2) a photosynthetic protein dataset, and 3) a human protein database. MTPs showing α-helix rich structure is agreed with previous studies. The MTPs used residues with low hydration energy. It is hypothesized that, after filtering substrates, the hydrated water molecules need to be released from the pore regions. CONCLUSIONS SCMMTP yields estimating propensity scores for amino acids and dipeptides as MTPs, which can be used to identify novel MTPs and characterize transport mechanisms for use in further experiments. AVAILABILITY http://iclab.life.nctu.edu.tw/iclab_webtools/SCMMTP/.
Collapse
Affiliation(s)
- Yi-Fan Liou
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Tamara Vasylenko
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Chia-Lun Yeh
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Wei-Chun Lin
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Shih-Hsiang Chiu
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Phasit Charoenkwan
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Li-Sun Shu
- Department of Information Management, Overseas Chinese University, Taichung, Taiwan
| | - Shinn-Ying Ho
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- Center for Bioinformatics Research, National Chiao Tung University, Hsinchu City, Taiwan
| | - Hui-Ling Huang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- Center for Bioinformatics Research, National Chiao Tung University, Hsinchu City, Taiwan
| |
Collapse
|
16
|
Predicting cancerlectins by the optimal g-gap dipeptides. Sci Rep 2015; 5:16964. [PMID: 26648527 PMCID: PMC4673586 DOI: 10.1038/srep16964] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 10/22/2015] [Indexed: 12/14/2022] Open
Abstract
The cancerlectin plays a key role in the process of tumor cell differentiation. Thus, to fully understand the function of cancerlectin is significant because it sheds light on the future direction for the cancer therapy. However, the traditional wet-experimental methods were money- and time-consuming. It is highly desirable to develop an effective and efficient computational tool to identify cancerlectins. In this study, we developed a sequence-based method to discriminate between cancerlectins and non-cancerlectins. The analysis of variance (ANOVA) was used to choose the optimal feature set derived from the g-gap dipeptide composition. The jackknife cross-validated results showed that the proposed method achieved the accuracy of 75.19%, which is superior to other published methods. For the convenience of other researchers, an online web-server CaLecPred was established and can be freely accessed from the website http://lin.uestc.edu.cn/server/CalecPred. We believe that the CaLecPred is a powerful tool to study cancerlectins and to guide the related experimental validations.
Collapse
|