Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Huang HL, Charoenkwan P, Kao TF, Lee HC, Chang FL, Huang WL, Ho SJ, Shu LS, Chen WL, Ho SY. Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition. BMC Bioinformatics 2012;13 Suppl 17:S3. [PMID: 23282103 DOI: 10.1186/1471-2105-13-S17-S3] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

For:	Huang HL, Charoenkwan P, Kao TF, Lee HC, Chang FL, Huang WL, Ho SJ, Shu LS, Chen WL, Ho SY. Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition. BMC Bioinformatics 2012;13 Suppl 17:S3. [PMID: 23282103 DOI: 10.1186/1471-2105-13-S17-S3] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Dutta S, Zunjare RU, Sil A, Mishra DC, Arora A, Gain N, Chand G, Chhabra R, Muthusamy V, Hossain F. Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition. Amino Acids 2024;56:20. [PMID: 38460024 DOI: 10.1007/s00726-023-03368-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 12/05/2023] [Indexed: 03/11/2024]

Rahbar MR, Nezafat N, Morowvat MH, Savardashtaki A, Ghoshoon MB, Mehrabani-Zeinabad K, Ghasemi Y. Targeting Efficient Features of Urate Oxidase to Increase Its Solubility. Appl Biochem Biotechnol 2024:10.1007/s12010-023-04819-w. [PMID: 38308671 DOI: 10.1007/s12010-023-04819-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/19/2023] [Indexed: 02/05/2024]

Rojas C, Ballabio D, Consonni V, Suárez-Estrella D, Todeschini R. Classification-based machine learning approaches to predict the taste of molecules: A review. Food Res Int 2023;171:113036. [PMID: 37330849 DOI: 10.1016/j.foodres.2023.113036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/02/2023] [Accepted: 05/22/2023] [Indexed: 06/19/2023]

Charoenkwan P, Schaduangrat N, Lio P, Moni MA, Chumnanpuen P, Shoombuatong W. iAMAP-SCM: A Novel Computational Tool for Large-Scale Identification of Antimalarial Peptides Using Estimated Propensity Scores of Dipeptides. ACS OMEGA 2022;7:41082-41095. [PMID: 36406571 PMCID: PMC9670693 DOI: 10.1021/acsomega.2c04465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]

Abstract

Antimalarial peptides (AMAPs) varying in length, amino acid composition, charge, conformational structure, hydrophobicity, and amphipathicity reflect their diversity in antimalarial mechanisms. Due to the worldwide major health problem concerning antimicrobial resistance, these peptides possess great therapeutic value owing to their low incidences of drug resistance as compared to conventional antibiotics. Although well-known experimental methods are able to precisely determine the antimalarial activity of peptides, these methods are still time-consuming and costly. Thus, machine learning (ML)-based methods that are capable of identifying AMAPs rapidly by using only sequence information would be beneficial for the high-throughput identification of AMAPs. In this study, we propose the first computational model (termed iAMAP-SCM) for the large-scale identification and characterization of peptides with antimalarial activity by using only sequence information. Specifically, we employed an interpretable scoring card method (SCM) to develop iAMAP-SCM and estimate propensities of 20 amino acids and 400 dipeptides to be AMAPs in a supervised manner. Experimental results showed that iAMAP-SCM could achieve a maximum accuracy and Matthew's coefficient correlation of 0.957 and 0.834, respectively, on the independent test dataset. In addition, SCM-derived propensities of 20 amino acids and selected physicochemical properties were used to provide an understanding of the functional mechanisms of AMAPs. Finally, a user-friendly online computational platform of iAMAP-SCM is publicly available at http://pmlabstack.pythonanywhere.com/iAMAP-SCM. The iAMAP-SCM predictor is anticipated to assist experimental scientists in the high-throughput identification of potential AMAP candidates for the treatment of malaria and other clinical applications.

Collapse

Improved prediction and characterization of blood-brain barrier penetrating peptides using estimated propensity scores of dipeptides. J Comput Aided Mol Des 2022;36:781-796. [DOI: 10.1007/s10822-022-00476-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 09/15/2022] [Indexed: 11/27/2022]

Qing R, Hao S, Smorodina E, Jin D, Zalevsky A, Zhang S. Protein Design: From the Aspect of Water Solubility and Stability. Chem Rev 2022;122:14085-14179. [PMID: 35921495 PMCID: PMC9523718 DOI: 10.1021/acs.chemrev.1c00757] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Indexed: 12/13/2022]

Charoenkwan P, Kanthawong S, Schaduangrat N, Li’ P, Moni MA, Shoombuatong W. SCMRSA: a New Approach for Identifying and Analyzing Anti-MRSA Peptides Using Estimated Propensity Scores of Dipeptides. ACS OMEGA 2022;7:32653-32664. [PMID: 36120041 PMCID: PMC9476499 DOI: 10.1021/acsomega.2c04305] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 08/22/2022] [Indexed: 06/15/2023]

Dubovski N, Fierro F, Margulis E, Ben Shoshan-Galeczki Y, Peri L, Niv MY. Taste GPCRs and their ligands. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022;193:177-193. [PMID: 36357077 DOI: 10.1016/bs.pmbts.2022.06.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Charoenkwan P, Schaduangrat N, Hasan MM, Moni MA, Lió P, Shoombuatong W. Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins. EXCLI JOURNAL 2022;21:554-570. [PMID: 35651661 PMCID: PMC9150013 DOI: 10.17179/excli2022-4723] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022]

Abstract

Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.

Collapse

Charoenkwan P, Schaduangrat N, Mahmud SMH, Thinnukool O, Shoombuatong W. Recent development of machine learning-based methods for the prediction of defensin family and subfamily. EXCLI JOURNAL 2022;21:757-771. [PMID: 35949489 PMCID: PMC9360473 DOI: 10.17179/excli2022-4913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 05/03/2022] [Indexed: 11/05/2022]

Abstract

Nearly all living species comprise of host defense peptides called defensins, that are crucial for innate immunity. These peptides work by activating the immune system which kills the microbes directly or indirectly, thus providing protection to the host. Thus far, numerous preclinical and clinical trials for peptide-based drugs are currently being evaluated. Although, experimental methods can help to precisely identify the defensin peptide family and subfamily, these approaches are often time-consuming and cost-ineffective. On the other hand, machine learning (ML) methods are able to effectively employ protein sequence information without the knowledge of a protein's three-dimensional structure, thus highlighting their predictive ability for the large-scale identification. To date, several ML methods have been developed for the in silico identification of the defensin peptide family and subfamily. Therefore, summarizing the advantages and disadvantages of the existing methods is urgently needed in order to provide useful suggestions for the development and improvement of new computational models for the identification of the defensin peptide family and subfamily. With this goal in mind, we first provide a comprehensive survey on a collection of six state-of-the-art computational approaches for predicting the defensin peptide family and subfamily. Herein, we cover different important aspects, including the dataset quality, feature encoding methods, feature selection schemes, ML algorithms, cross-validation methods and web server availability/usability. Moreover, we provide our thoughts on the limitations of existing methods and future perspectives for improving the prediction performance and model interpretability. The insights and suggestions gained from this review are anticipated to serve as a valuable guidance for researchers for the development of more robust and useful predictors.

Collapse

Kabir M, Nantasenamat C, Kanthawong S, Charoenkwan P, Shoombuatong W. Large-scale comparative review and assessment of computational methods for phage virion proteins identification. EXCLI JOURNAL 2022;21:11-29. [PMID: 35145365 PMCID: PMC8822302 DOI: 10.17179/excli2021-4411] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 11/29/2021] [Indexed: 12/11/2022]

Mckenna A, P N Dubey S. Machine Learning Based Predictive Model for the Analysis of Sequence Activity Relationships Using Protein Spectra and Protein Descriptors. J Biomed Inform 2022;128:104016. [PMID: 35143999 DOI: 10.1016/j.jbi.2022.104016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 12/13/2021] [Accepted: 02/03/2022] [Indexed: 11/26/2022]

Assessment of Therapeutic Antibody Developability by Combinations of In Vitro and In Silico Methods. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2313:57-113. [PMID: 34478132 DOI: 10.1007/978-1-0716-1450-1_4] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Abstract

Although antibodies have become the fastest-growing class of therapeutics on the market, it is still challenging to develop them for therapeutic applications, which often require these molecules to withstand stresses that are not present in vivo. We define developability as the likelihood of an antibody candidate with suitable functionality to be developed into a manufacturable, stable, safe, and effective drug that can be formulated to high concentrations while retaining a long shelf life. The implementation of reliable developability assessments from the early stages of antibody discovery enables flagging and deselection of potentially problematic candidates, while focussing available resources on the development of the most promising ones. Currently, however, thorough developability assessment requires multiple in vitro assays, which makes it labor intensive and time consuming to implement at early stages. Furthermore, accurate in vitro analysis at the early stage is compromised by the high number of potential candidates that are often prepared at low quantities and purity. Recent improvements in the performance of computational predictors of developability potential are beginning to change this scenario. Many computational methods only require the knowledge of the amino acid sequences and can be used to identify possible developability issues or to rank available candidates according to a range of biophysical properties. Here, we describe how the implementation of in silico tools into antibody discovery pipelines is increasingly offering time- and cost-effective alternatives to in vitro experimental screening, thus streamlining the drug development process. We discuss in particular the biophysical and biochemical properties that underpin developability potential and their trade-offs, review various in vitro assays to measure such properties or parameters that are predictive of developability, and give an overview of the growing number of in silico tools available to predict properties important for antibody development, including the CamSol method developed in our laboratory.

Collapse

Charoenkwan P, Chiangjong W, Nantasenamat C, Moni MA, Lio’ P, Manavalan B, Shoombuatong W. SCMTHP: A New Approach for Identifying and Characterizing of Tumor-Homing Peptides Using Estimated Propensity Scores of Amino Acids. Pharmaceutics 2022;14:pharmaceutics14010122. [PMID: 35057016 PMCID: PMC8779003 DOI: 10.3390/pharmaceutics14010122] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 12/16/2021] [Accepted: 12/28/2021] [Indexed: 12/13/2022] Open

Charoenkwan P, Chotpatiwetchkul W, Lee VS, Nantasenamat C, Shoombuatong W. A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides. Sci Rep 2021;11:23782. [PMID: 34893688 PMCID: PMC8664844 DOI: 10.1038/s41598-021-03293-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 12/01/2021] [Indexed: 02/08/2023] Open

Abstract

Owing to their ability to maintain a thermodynamically stable fold at extremely high temperatures, thermophilic proteins (TTPs) play a critical role in basic research and a variety of applications in the food industry. As a result, the development of computation models for rapidly and accurately identifying novel TTPs from a large number of uncharacterized protein sequences is desirable. In spite of existing computational models that have already been developed for characterizing thermophilic proteins, their performance and interpretability remain unsatisfactory. We present a novel sequence-based thermophilic protein predictor, termed SCMTPP, for improving model predictability and interpretability. First, an up-to-date and high-quality dataset consisting of 1853 TPPs and 3233 non-TPPs was compiled from published literature. Second, the SCMTPP predictor was created by combining the scoring card method (SCM) with estimated propensity scores of g-gap dipeptides. Benchmarking experiments revealed that SCMTPP had a cross-validation accuracy of 0.883, which was comparable to that of a support vector machine-based predictor (0.906-0.910) and 2-17% higher than that of commonly used machine learning models. Furthermore, SCMTPP outperformed the state-of-the-art approach (ThermoPred) on the independent test dataset, with accuracy and MCC of 0.865 and 0.731, respectively. Finally, the SCMTPP-derived propensity scores were used to elucidate the critical physicochemical properties for protein thermostability enhancement. In terms of interpretability and generalizability, comparative results showed that SCMTPP was effective for identifying and characterizing TPPs. We had implemented the proposed predictor as a user-friendly online web server at http://pmlabstack.pythonanywhere.com/SCMTPP in order to allow easy access to the model. SCMTPP is expected to be a powerful tool for facilitating community-wide efforts to identify TPPs on a large scale and guiding experimental characterization of TPPs.

Collapse

Charoenkwan P, Chiangjong W, Hasan MM, Nantasenamat C, Shoombuatong W. Review and comparative analysis of machine learning-based predictors for predicting and analyzing of anti-angiogenic peptides. Curr Med Chem 2021;29:849-864. [PMID: 34375178 DOI: 10.2174/0929867328666210810145806] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/17/2021] [Accepted: 06/22/2021] [Indexed: 11/22/2022]

Charoenkwan P, Anuwongcharoen N, Nantasenamat C, Hasan MM, Shoombuatong W. In Silico Approaches for the Prediction and Analysis of Antiviral Peptides: A Review. Curr Pharm Des 2021;27:2180-2188. [PMID: 33138759 DOI: 10.2174/1381612826666201102105827] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 08/20/2020] [Indexed: 11/22/2022]

Bhandari BK, Gardner PP, Lim CS. Solubility-Weighted Index: fast and accurate prediction of protein solubility. Bioinformatics 2021;36:4691-4698. [PMID: 32559287 PMCID: PMC7750957 DOI: 10.1093/bioinformatics/btaa578] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 05/05/2020] [Accepted: 06/12/2020] [Indexed: 12/14/2022] Open

Adiga R. Benchmarking Datasets from Malaria Cytotoxic T-cell Epitopes Using Machine Learning Approach. Avicenna J Med Biotechnol 2021;13:87-91. [PMID: 34012524 PMCID: PMC8112139 DOI: 10.18502/ajmb.v13i2.5527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Chen J, Zheng S, Zhao H, Yang Y. Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map. J Cheminform 2021;13:7. [PMID: 33557952 PMCID: PMC7869490 DOI: 10.1186/s13321-021-00488-1] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/20/2021] [Indexed: 11/26/2022] Open

Charoenkwan P, Chiangjong W, Lee VS, Nantasenamat C, Hasan MM, Shoombuatong W. Improved prediction and characterization of anticancer activities of peptides using a novel flexible scoring card method. Sci Rep 2021;11:3017. [PMID: 33542286 PMCID: PMC7862624 DOI: 10.1038/s41598-021-82513-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 01/18/2021] [Indexed: 01/30/2023] Open

Abstract

As anticancer peptides (ACPs) have attracted great interest for cancer treatment, several approaches based on machine learning have been proposed for ACP identification. Although existing methods have afforded high prediction accuracies, however such models are using a large number of descriptors together with complex ensemble approaches that consequently leads to low interpretability and thus poses a challenge for biologists and biochemists. Therefore, it is desirable to develop a simple, interpretable and efficient predictor for accurate ACP identification as well as providing the means for the rational design of new anticancer peptides with promising potential for clinical application. Herein, we propose a novel flexible scoring card method (FSCM) making use of propensity scores of local and global sequential information for the development of a sequence-based ACP predictor (named iACP-FSCM) for improving the prediction accuracy and model interpretability. To the best of our knowledge, iACP-FSCM represents the first sequence-based ACP predictor for rationalizing an in-depth understanding into the molecular basis for the enhancement of anticancer activities of peptides via the use of FSCM-derived propensity scores. The independent testing results showed that the iACP-FSCM provided accuracies of 0.825 and 0.910 as evaluated on the main and alternative datasets, respectively. Results from comparative benchmarking demonstrated that iACP-FSCM could outperform seven other existing ACP predictors with marked improvements of 7% and 17% for accuracy and MCC, respectively, on the main dataset. Furthermore, the iACP-FSCM (0.910) achieved very comparable results to that of the state-of-the-art ensemble model AntiCP2.0 (0.920) as evaluated on the alternative dataset. Comparative results demonstrated that iACP-FSCM was the most suitable choice for ACP identification and characterization considering its simplicity, interpretability and generalizability. It is highly anticipated that the iACP-FSCM may be a robust tool for the rapid screening and identification of promising ACPs for clinical use.

Collapse

Charoenkwan P, Kanthawong S, Nantasenamat C, Hasan MM, Shoombuatong W. iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides. Genomics 2020;113:689-698. [PMID: 33017626 DOI: 10.1016/j.ygeno.2020.09.065] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 09/21/2020] [Accepted: 09/30/2020] [Indexed: 01/09/2023]

Abstract

Fast, accurate identification and characterization of amyloid proteins at a large-scale is essential for understating their role in therapeutic intervention strategies. As a matter of fact, there exist only one in silico model for amyloid protein identification using the random forest (RF) model in conjunction with various feature types namely the RFAmy. However, it suffers from low interpretability for biologists. Thus, it is highly desirable to develop a simple and easily interpretable prediction method with robust accuracy as compared to the existing complicated model. In this study, we propose iAMY-SCM, the first scoring card method-based predictor for predicting and analyzing amyloid proteins. Herein, the iAMY-SCM made use of a simple weighted-sum function in conjunction with the propensity scores of dipeptides for the amyloid protein identification. Cross-validation results indicated that iAMY-SCM provided an accuracy of 0.895 that corresponded to 10-22% higher performance than that of widely used machine learning models. Furthermore, iAMY-SCM achieving an accuracy of 0.827 as evaluated by an independent test, which was found to be comparable to that of RFAmy and was approximately 9-13% higher than widely used machine learning models. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide insights into the biophysical and biochemical properties of amyloid proteins. As such, this demonstrates that the proposed iAMY-SCM is efficient and reliable in terms of simplicity, interpretability and implementation. To facilitate ease of use of the proposed iAMY-SCM, a user-friendly and publicly accessible web server at http://camt.pythonanywhere.com/iAMY-SCM has been established. We anticipate that that iAMY-SCM will be an important tool for facilitating the large-scale prediction and characterization of amyloid protein.

Collapse

iBitter-SCM: Identification and characterization of bitter peptides using a scoring card method with propensity scores of dipeptides. Genomics 2020;112:2813-2822. [DOI: 10.1016/j.ygeno.2020.03.019] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 03/19/2020] [Accepted: 03/22/2020] [Indexed: 12/21/2022]

Zhang D, Guan ZX, Zhang ZM, Li SH, Dao FY, Tang H, Lin H. Recent Development of Computational Predicting Bioluminescent Proteins. Curr Pharm Des 2020;25:4264-4273. [PMID: 31696804 DOI: 10.2174/1381612825666191107100758] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 11/04/2019] [Indexed: 12/22/2022]

Meta-iPVP: a sequence-based meta-predictor for improving the prediction of phage virion proteins using effective feature representation. J Comput Aided Mol Des 2020;34:1105-1116. [DOI: 10.1007/s10822-020-00323-z] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2020] [Accepted: 06/10/2020] [Indexed: 12/11/2022]

Wang S, Cao Z, Li M, Yue Y. G-DipC: An Improved Feature Representation Method for Short Sequences to Predict the Type of Cargo in Cell-Penetrating Peptides. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:739-747. [PMID: 31352350 DOI: 10.1109/tcbb.2019.2930993] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Charoenkwan P, Kanthawong S, Schaduangrat N, Yana J, Shoombuatong W. PVPred-SCM: Improved Prediction and Analysis of Phage Virion Proteins Using a Scoring Card Method. Cells 2020;9:E353. [PMID: 32028709 PMCID: PMC7072630 DOI: 10.3390/cells9020353] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 01/20/2020] [Accepted: 01/27/2020] [Indexed: 12/16/2022] Open

Khurana S, Rawi R, Kunji K, Chuang GY, Bensmail H, Mall R. DeepSol: a deep learning framework for sequence-based protein solubility prediction. Bioinformatics 2019;34:2605-2613. [PMID: 29554211 DOI: 10.1093/bioinformatics/bty166] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 03/13/2018] [Indexed: 01/09/2023] Open

THPep: A machine learning-based approach for predicting tumor homing peptides. Comput Biol Chem 2019;80:441-451. [PMID: 31151025 DOI: 10.1016/j.compbiolchem.2019.05.008] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Revised: 04/18/2019] [Accepted: 05/17/2019] [Indexed: 01/24/2023]

de Marco A, Ferrer-Miralles N, Garcia-Fruitós E, Mitraki A, Peternel S, Rinas U, Trujillo-Roldán MA, Valdez-Cruz NA, Vázquez E, Villaverde A. Bacterial inclusion bodies are industrially exploitable amyloids. FEMS Microbiol Rev 2019;43:53-72. [PMID: 30357330 DOI: 10.1093/femsre/fuy038] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 10/23/2018] [Indexed: 12/13/2022] Open

Affiliation(s)

Ario de Marco Laboratory for Environmental and Life Sciences, University of Nova Gorica, Vipavska Cesta 13, 5000 Nova Gorica, Slovenia
Neus Ferrer-Miralles Institut de Biotecnologia i de Biomedicina (IBB), Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,Departament de Genètica i de Microbiologia, Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Carrer de la Vall Moronta s/n, 08193 Cerdanyola del Vallès, Spain
Elena Garcia-Fruitós Department of Ruminant Production, Institut de Recerca i Tecnologia Agroalimentàries (IRTA), Torre Marimon, 08140 Caldes de Montbui, Barcelona, Spain
Anna Mitraki Department of Materials Science and Technology, University of Crete, Vassilika Vouton, 70013 Heraklion, Crete, Greece.,Institute of Electronic Structure and Laser (IESL), Foundation for Research and Technology Hellas (FORTH), N. Plastira 100, Vassilika Vouton, 70013 Heraklion, Crete, Greece
Spela Peternel Lupinica, Alešovceva 6, 1000 Ljubljana, Slovenia
Ursula Rinas Leibniz University of Hannover, Technical Chemistry and Life Science, 30167 Hannover, Germany.,Helmholtz Centre for Infection Research, 38124 Braunschweig, Germany
Mauricio A Trujillo-Roldán Programa de Investigación de Producción de Biomoléculas, Unidad de Bioprocesos, Departamento de Biología Molecular y Biotecnología, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, 04510 Ciudad de México, México
Norma A Valdez-Cruz Programa de Investigación de Producción de Biomoléculas, Departamento de Biología Molecular y Biotecnología, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, 04510 Ciudad de México, México
Esther Vázquez Institut de Biotecnologia i de Biomedicina (IBB), Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,Departament de Genètica i de Microbiologia, Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Carrer de la Vall Moronta s/n, 08193 Cerdanyola del Vallès, Spain
Antonio Villaverde Institut de Biotecnologia i de Biomedicina (IBB), Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,Departament de Genètica i de Microbiologia, Carrer de la Vall Moronta s/n, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Spain.,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Carrer de la Vall Moronta s/n, 08193 Cerdanyola del Vallès, Spain

Collapse

Rawi R, Mall R, Kunji K, Shen CH, Kwong PD, Chuang GY. PaRSnIP: sequence-based protein solubility prediction using gradient boosting machine. Bioinformatics 2019;34:1092-1098. [PMID: 29069295 DOI: 10.1093/bioinformatics/btx662] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 10/17/2017] [Indexed: 11/13/2022] Open

Avwioroko OJ, Anigboro AA, Unachukwu NN, Tonukari NJ. Isolation, identification and in silico analysis of alpha-amylase gene of Aspergillus niger strain CSA35 obtained from cassava undergoing spoilage. Biochem Biophys Rep 2018;14:35-42. [PMID: 29872732 PMCID: PMC5986626 DOI: 10.1016/j.bbrep.2018.03.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2018] [Revised: 03/26/2018] [Accepted: 03/27/2018] [Indexed: 11/19/2022] Open

Multispecific Substrate Recognition in a Proton-Dependent Oligopeptide Transporter. Structure 2018;26:467-476.e4. [PMID: 29429879 PMCID: PMC5845931 DOI: 10.1016/j.str.2018.01.005] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 11/22/2017] [Accepted: 01/10/2018] [Indexed: 12/27/2022]

Ribarič S. Peptides as Potential Therapeutics for Alzheimer's Disease. Molecules 2018;23:E283. [PMID: 29385735 PMCID: PMC6017258 DOI: 10.3390/molecules23020283] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 01/26/2018] [Accepted: 01/28/2018] [Indexed: 12/22/2022] Open

Trainor K, Broom A, Meiering EM. Exploring the relationships between protein sequence, structure and solubility. Curr Opin Struct Biol 2017;42:136-146. [PMID: 28160724 DOI: 10.1016/j.sbi.2017.01.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2016] [Revised: 01/10/2017] [Accepted: 01/11/2017] [Indexed: 10/20/2022]

Vasylenko T, Liou YF, Chiou PC, Chu HW, Lai YS, Chou YL, Huang HL, Ho SY. SCMBYK: prediction and characterization of bacterial tyrosine-kinases based on propensity scores of dipeptides. BMC Bioinformatics 2016;17:514. [PMID: 28155663 PMCID: PMC5260027 DOI: 10.1186/s12859-016-1371-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open

Abstract

Background

Bacterial tyrosine-kinases (BY-kinases), which play an important role in numerous cellular processes, are characterized as a separate class of enzymes and share no structural similarity with their eukaryotic counterparts. However, in silico methods for predicting BY-kinases have not been developed yet. Since these enzymes are involved in key regulatory processes, and are promising targets for anti-bacterial drug design, it is desirable to develop a simple and easily interpretable predictor to gain new insights into bacterial tyrosine phosphorylation. This study proposes a novel SCMBYK method for predicting and characterizing BY-kinases.

Results

A dataset consisting of 797 BY-kinases and 783 non-BY-kinases was established to design the SCMBYK predictor, which achieved training and test accuracies of 97.55 and 96.73%, respectively. Furthermore, the leave-one-phylum-out method was used to predict specific bacterial phyla hosts of target sequences, gaining 97.39% average test accuracy. After analyzing SCMBYK-derived propensity scores, four characteristics of BY-kinases were determined: 1) BY-kinases tend to be composed of α-helices; 2) the amino-acid content of extracellular regions of BY-kinases is expected to be dominated by residues such as Val, Ile, Phe and Tyr; 3) BY-kinases structurally resemble nuclear proteins; 4) different domains play different roles in triggering BY-kinase activity.

Conclusions

The SCMBYK predictor is an effective method for identification of possible BY-kinases. Furthermore, it can be used as a part of a novel drug repurposing method, which recognizes putative BY-kinases and matches them to approved drugs. Among other results, our analysis revealed that azathioprine could suppress the virulence of M. tuberculosis, and thus be considered as a potential antibiotic for tuberculosis treatment.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-1371-4) contains supplementary material, which is available to authorized users.

Collapse

Liou YF, Vasylenko T, Yeh CL, Lin WC, Chiu SH, Charoenkwan P, Shu LS, Ho SY, Huang HL. SCMMTP: identifying and characterizing membrane transport proteins using propensity scores of dipeptides. BMC Genomics 2015;16 Suppl 12:S6. [PMID: 26677931 PMCID: PMC4682407 DOI: 10.1186/1471-2164-16-s12-s6] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

Identifying putative membrane transport proteins (MTPs) and understanding the transport mechanisms involved remain important challenges for the advancement of structural and functional genomics. However, the transporter characters are mainly acquired from MTP crystal structures which are hard to crystalize. Therefore, it is desirable to develop bioinformatics tools for the effective large-scale analysis of available sequences to identify novel transporters and characterize such transporters.

RESULTS

This work proposes a novel method (SCMMTP) based on the scoring card method (SCM) using dipeptide composition to identify and characterize MTPs from an existing dataset containing 900 MTPs and 660 non-MTPs which are separated into a training dataset consisting 1,380 proteins and an independent dataset consisting 180 proteins. The SCMMTP produced estimating propensity scores for amino acids and dipeptides as MTPs. The SCMMTP training and test accuracy levels respectively reached 83.81% and 76.11%. The test accuracy of support vector machine (SVM) using a complicated classification method with a low possibility for biological interpretation and position-specific substitution matrix (PSSM) as a protein feature is 80.56%, thus SCMMTP is comparable to SVM-PSSM. To identify MTPs, SCMMTP is applied to three datasets including: 1) human transmembrane proteins, 2) a photosynthetic protein dataset, and 3) a human protein database. MTPs showing α-helix rich structure is agreed with previous studies. The MTPs used residues with low hydration energy. It is hypothesized that, after filtering substrates, the hydrated water molecules need to be released from the pore regions.

CONCLUSIONS

SCMMTP yields estimating propensity scores for amino acids and dipeptides as MTPs, which can be used to identify novel MTPs and characterize transport mechanisms for use in further experiments.

AVAILABILITY

http://iclab.life.nctu.edu.tw/iclab_webtools/SCMMTP/.

Collapse

Postsynthetic Domain Assembly with NpuDnaE and SspDnaB Split Inteins. Appl Biochem Biotechnol 2015;177:1137-51. [DOI: 10.1007/s12010-015-1802-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 08/04/2015] [Indexed: 10/23/2022]

SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method. BMC Bioinformatics 2015;16 Suppl 1:S8. [PMID: 25708243 PMCID: PMC4331707 DOI: 10.1186/1471-2105-16-s1-s8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods.

RESULTS

A novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains.

CONCLUSIONS

The SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.

Collapse

Predicting recombinant protein expression experiments using molecular dynamics simulation. Chem Eng Sci 2015. [DOI: 10.1016/j.ces.2014.09.044] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Liou YF, Charoenkwan P, Srinivasulu Y, Vasylenko T, Lai SC, Lee HC, Chen YH, Huang HL, Ho SY. SCMHBP: prediction and analysis of heme binding proteins using propensity scores of dipeptides. BMC Bioinformatics 2014;15 Suppl 16:S4. [PMID: 25522279 PMCID: PMC4290654 DOI: 10.1186/1471-2105-15-s16-s4] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Abstract

Background

Heme binding proteins (HBPs) are metalloproteins that contain a heme ligand (an iron-porphyrin complex) as the prosthetic group. Several computational methods have been proposed to predict heme binding residues and thereby to understand the interactions between heme and its host proteins. However, few in silico methods for identifying HBPs have been proposed.

Results

This work proposes a scoring card method (SCM) based method (named SCMHBP) for predicting and analyzing HBPs from sequences. A balanced dataset of 747 HBPs (selected using a Gene Ontology term GO:0020037) and 747 non-HBPs (selected from 91,414 putative non-HBPs) with an identity of 25% was firstly established. Consequently, a set of scores that quantified the propensity of amino acids and dipeptides to be HBPs is estimated using SCM to maximize the predictive accuracy of SCMHBP. Finally, the informative physicochemical properties of 20 amino acids are identified by utilizing the estimated propensity scores to be used to categorize HBPs. The training and mean test accuracies of SCMHBP applied to three independent test datasets are 85.90% and 71.57%, respectively. SCMHBP performs well relative to comparison with such methods as support vector machine (SVM), decision tree J48, and Bayes classifiers. The putative non-HBPs with high sequence propensity scores are potential HBPs, which can be further validated by experimental confirmation. The propensity scores of individual amino acids and dipeptides are examined to elucidate the interactions between heme and its host proteins. The following characteristics of HBPs are derived from the propensity scores: 1) aromatic side chains are important to the effectiveness of specific HBP functions; 2) a hydrophobic environment is important in the interaction between heme and binding sites; and 3) the whole HBP has low flexibility whereas the heme binding residues are relatively flexible.

Conclusions

SCMHBP yields knowledge that improves our understanding of HBPs rather than merely improves the prediction accuracy in predicting HBPs.

Collapse

Prediction of soluble heterologous protein expression levels inEscherichia colifrom sequence-based features and its potential in biopharmaceutical process development. ACTA ACUST UNITED AC 2014. [DOI: 10.4155/pbp.14.23] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Propensity scores for prediction and characterization of bioluminescent proteins from sequences. PLoS One 2014;9:e97158. [PMID: 24828431 PMCID: PMC4020813 DOI: 10.1371/journal.pone.0097158] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Accepted: 04/15/2014] [Indexed: 01/19/2023] Open

Abstract

Bioluminescent proteins (BLPs) are a class of proteins with various mechanisms of light emission such as bioluminescence and fluorescence from luminous organisms. While valuable for commercial and medical applications, identification of BLPs, including luciferases and fluorescent proteins (FPs), is rather challenging, owing to their high variety of protein sequences. Moreover, characterization of BLPs facilitates mutagenesis analysis to enhance bioluminescence and fluorescence. Therefore, this study proposes a novel methodological approach to estimating the propensity scores of 400 dipeptides and 20 amino acids in order to design two prediction methods and characterize BLPs based on a scoring card method (SCM). The SCMBLP method for predicting BLPs achieves an accuracy of 90.83% for 10-fold cross-validation higher than existing support vector machine based methods and a test accuracy of 82.85%. A dataset consisting of 269 luciferases and 216 FPs is also established to design the SCMLFP prediction method, which achieves training and test accuracies of 97.10% and 96.28%, respectively. Additionally, four informative physicochemical properties of 20 amino acids are identified using the estimated propensity scores to characterize BLPs as follows: 1) high transfer free energy from inside to the protein surface, 2) high occurrence frequency of residues in the transmembrane regions of the protein, 3) large hydrophobicity scale from the native protein structure, and 4) high correlation coefficient (R = 0.921) between the amino acid compositions of BLPs and integral membrane proteins. Further analyzing BLPs reveals that luciferases have a larger value of R (0.937) than FPs (0.635), suggesting that luciferases tend to locate near the cell membrane location rather than FPs for convenient receipt of extracellular ions. Importantly, the propensity scores of dipeptides and amino acids and the identified properties facilitate efforts to predict, characterize, and apply BLPs, including luciferases, photoproteins, and FPs. The web server is available at http://iclab.life.nctu.edu.tw/SCMBLP/index.html.

Collapse

A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli. BMC Bioinformatics 2014;15:134. [PMID: 24885721 PMCID: PMC4098780 DOI: 10.1186/1471-2105-15-134] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Accepted: 03/25/2014] [Indexed: 12/14/2022] Open

Abstract

Background

Over the last 20 years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20 years of research on the matter, no comprehensive review is available on the published methods.

Results

This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end.

Conclusions

This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.

Collapse

Charoenkwan P, Shoombuatong W, Lee HC, Chaijaruwanich J, Huang HL, Ho SY. SCMCRYS: predicting protein crystallization using an ensemble scoring card method with estimating propensity scores of P-collocated amino acid pairs. PLoS One 2013;8:e72368. [PMID: 24019868 PMCID: PMC3760885 DOI: 10.1371/journal.pone.0072368] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2013] [Accepted: 07/15/2013] [Indexed: 11/19/2022] Open

Abstract

Existing methods for predicting protein crystallization obtain high accuracy using various types of complemented features and complex ensemble classifiers, such as support vector machine (SVM) and Random Forest classifiers. It is desirable to develop a simple and easily interpretable prediction method with informative sequence features to provide insights into protein crystallization. This study proposes an ensemble method, SCMCRYS, to predict protein crystallization, for which each classifier is built by using a scoring card method (SCM) with estimating propensity scores of p-collocated amino acid (AA) pairs (p = 0 for a dipeptide). The SCM classifier determines the crystallization of a sequence according to a weighted-sum score. The weights are the composition of the p-collocated AA pairs, and the propensity scores of these AA pairs are estimated using a statistic with optimization approach. SCMCRYS predicts the crystallization using a simple voting method from a number of SCM classifiers. The experimental results show that the single SCM classifier utilizing dipeptide composition with accuracy of 73.90% is comparable to the best previously-developed SVM-based classifier, SVM_POLY (74.6%), and our proposed SVM-based classifier utilizing the same dipeptide composition (77.55%). The SCMCRYS method with accuracy of 76.1% is comparable to the state-of-the-art ensemble methods PPCpred (76.8%) and RFCRYS (80.0%), which used the SVM and Random Forest classifiers, respectively. This study also investigates mutagenesis analysis based on SCM and the result reveals the hypothesis that the mutagenesis of surface residues Ala and Cys has large and small probabilities of enhancing protein crystallizability considering the estimated scores of crystallizability and solubility, melting point, molecular weight and conformational entropy of amino acids in a generalized condition. The propensity scores of amino acids and dipeptides for estimating the protein crystallizability can aid biologists in designing mutation of surface residues to enhance protein crystallizability. The source code of SCMCRYS is available at http://iclab.life.nctu.edu.tw/SCMCRYS/.

Collapse

Guilloux A, Caudron B, Jestin JL. A method to predict edge strands in beta-sheets from protein sequences. Comput Struct Biotechnol J 2013;7:e201305001. [PMID: 24688737 PMCID: PMC3962219 DOI: 10.5936/csbj.201305001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Revised: 05/27/2013] [Accepted: 05/30/2013] [Indexed: 12/15/2022] Open

Schönbach C, Tongsima S, Chan J, Brusic V, Tan TW, Ranagathan S. InCoB2012 Conference: from biological data to knowledge to technological breakthroughs. BMC Bioinformatics 2012;13 Suppl 17:S1. [PMID: 23281929 PMCID: PMC3521245 DOI: 10.1186/1471-2105-13-s17-s1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open