Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Fang Y, Guo Y, Feng Y, Li M. Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features. Amino Acids 2007;34:103-9. [PMID: 17624492 DOI: 10.1007/s00726-007-0568-2] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2007] [Accepted: 05/23/2007] [Indexed: 11/25/2022]

For:	Fang Y, Guo Y, Feng Y, Li M. Predicting DNA-binding proteins: approached from Chou's pseudo amino acid composition and other specific sequence features. Amino Acids 2007;34:103-9. [PMID: 17624492 DOI: 10.1007/s00726-007-0568-2] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2007] [Accepted: 05/23/2007] [Indexed: 11/25/2022]

Number

Cited by Other Article(s)

Samant M, Jethva M, Hasija Y. INTERACT-O-FINDER: A Tool for Prediction of DNA-Binding Proteins Using Sequence Features. Int J Pept Res Ther 2014. [DOI: 10.1007/s10989-014-9446-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

RETRACTED: Identifying halophilic proteins based on random forests with preprocessing of the pseudo-amino acid composition. J Theor Biol 2014;361:175-81. [DOI: 10.1016/j.jtbi.2014.07.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Revised: 07/14/2014] [Accepted: 07/15/2014] [Indexed: 01/07/2023]

Liu B, Xu J, Fan S, Xu R, Zhou J, Wang X. PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation. Mol Inform 2014;34:8-17. [DOI: 10.1002/minf.201400025] [Citation(s) in RCA: 135] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2014] [Accepted: 05/27/2014] [Indexed: 11/06/2022]

newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation. Comput Biol Chem 2014;52:51-9. [PMID: 25240115 DOI: 10.1016/j.compbiolchem.2014.09.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2014] [Revised: 09/05/2014] [Accepted: 09/06/2014] [Indexed: 11/21/2022]

Song L, Li D, Zeng X, Wu Y, Guo L, Zou Q. nDNA-Prot: identification of DNA-binding proteins based on unbalanced classification. BMC Bioinformatics 2014;15:298. [PMID: 25196432 PMCID: PMC4165999 DOI: 10.1186/1471-2105-15-298] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2014] [Accepted: 09/03/2014] [Indexed: 11/23/2022] Open

Abstract

Background

DNA-binding proteins are vital for the study of cellular processes. In recent genome engineering studies, the identification of proteins with certain functions has become increasingly important and needs to be performed rapidly and efficiently. In previous years, several approaches have been developed to improve the identification of DNA-binding proteins. However, the currently available resources are insufficient to accurately identify these proteins. Because of this, the previous research has been limited by the relatively unbalanced accuracy rate and the low identification success of the current methods.

Results

In this paper, we explored the practicality of modelling DNA binding identification and simultaneously employed an ensemble classifier, and a new predictor (nDNA-Prot) was designed. The presented framework is comprised of two stages: a 188-dimension feature extraction method to obtain the protein structure and an ensemble classifier designated as imDC. Experiments using different datasets showed that our method is more successful than the traditional methods in identifying DNA-binding proteins. The identification was conducted using a feature that selected the minimum Redundancy and Maximum Relevance (mRMR). An accuracy rate of 95.80% and an Area Under the Curve (AUC) value of 0.986 were obtained in a cross validation. A test dataset was tested in our method and resulted in an 86% accuracy, versus a 76% using iDNA-Prot and a 68% accuracy using DNA-Prot.

Conclusions

Our method can help to accurately identify DNA-binding proteins, and the web server is accessible at http://datamining.xmu.edu.cn/~songli/nDNA. In addition, we also predicted possible DNA-binding protein sequences in all of the sequences from the UniProtKB/Swiss-Prot database.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-298) contains supplementary material, which is available to authorized users.

Collapse

Liu B, Xu J, Lan X, Xu R, Zhou J, Wang X, Chou KC. iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One 2014;9:e106691. [PMID: 25184541 PMCID: PMC4153653 DOI: 10.1371/journal.pone.0106691] [Citation(s) in RCA: 212] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 07/31/2014] [Indexed: 11/18/2022] Open

Nanni L, Lumini A, Brahnam S. An empirical study of different approaches for protein classification. ScientificWorldJournal 2014;2014:236717. [PMID: 25028675 PMCID: PMC4084589 DOI: 10.1155/2014/236717] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 05/05/2014] [Accepted: 05/07/2014] [Indexed: 01/05/2023] Open

Amanzadeh E, Mohabatkar H, Biria D. Classification of DNA Minor and Major Grooves Binding Proteins According to the NLSs by Data Analysis Methods. Appl Biochem Biotechnol 2014;174:437-51. [DOI: 10.1007/s12010-014-0926-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2013] [Accepted: 04/17/2014] [Indexed: 11/30/2022]

enDNA-Prot: identification of DNA-binding proteins by applying ensemble learning. BIOMED RESEARCH INTERNATIONAL 2014;2014:294279. [PMID: 24977146 PMCID: PMC4058174 DOI: 10.1155/2014/294279] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/05/2014] [Accepted: 05/05/2014] [Indexed: 12/03/2022]

Du P, Gu S, Jiao Y. PseAAC-General: fast building various modes of general form of Chou's pseudo-amino acid composition for large-scale protein datasets. Int J Mol Sci 2014;15:3495-506. [PMID: 24577312 PMCID: PMC3975349 DOI: 10.3390/ijms15033495] [Citation(s) in RCA: 211] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 02/13/2014] [Accepted: 02/14/2014] [Indexed: 11/16/2022] Open

Sequence based prediction of DNA-binding proteins based on hybrid feature selection using random forest and Gaussian naïve Bayes. PLoS One 2014;9:e86703. [PMID: 24475169 PMCID: PMC3901691 DOI: 10.1371/journal.pone.0086703] [Citation(s) in RCA: 115] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2013] [Accepted: 12/10/2013] [Indexed: 11/22/2022] Open

Abstract

Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred can be an alternative perspective predictor for large-scale determination of DNA-binding proteins.

Collapse

Eichner J, Topf F, Dräger A, Wrzodek C, Wanke D, Zell A. TFpredict and SABINE: sequence-based prediction of structural and functional characteristics of transcription factors. PLoS One 2013;8:e82238. [PMID: 24349230 PMCID: PMC3861411 DOI: 10.1371/journal.pone.0082238] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 10/21/2013] [Indexed: 11/18/2022] Open

Predicting DNA binding proteins using support vector machine with hybrid fractal features. J Theor Biol 2013;343:186-92. [PMID: 24189096 DOI: 10.1016/j.jtbi.2013.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 08/12/2013] [Accepted: 10/17/2013] [Indexed: 11/20/2022]

LIN HAO, DING CHEN, YUAN LUFENG, CHEN WEI, DING HUI, LI ZIQIANG, GUO FENGBIAO, HUANG JIAN, RAO NINI. PREDICTING SUBCHLOROPLAST LOCATIONS OF PROTEINS BASED ON THE GENERAL FORM OF CHOU'S PSEUDO AMINO ACID COMPOSITION: APPROACHED FROM OPTIMAL TRIPEPTIDE COMPOSITION. INT J BIOMATH 2013. [DOI: 10.1142/s1793524513500034] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Zou C, Gong J, Li H. An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis. BMC Bioinformatics 2013;14:90. [PMID: 23497329 PMCID: PMC3602657 DOI: 10.1186/1471-2105-14-90] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2012] [Accepted: 03/04/2013] [Indexed: 11/10/2022] Open

Abstract

Background

DNA-binding proteins (DNA-BPs) play a pivotal role in both eukaryotic and prokaryotic proteomes. There have been several computational methods proposed in the literature to deal with the DNA-BPs, many informative features and properties were used and proved to have significant impact on this problem. However the ultimate goal of Bioinformatics is to be able to predict the DNA-BPs directly from primary sequence.

Results

In this work, the focus is how to transform these informative features into uniform numeric representation appropriately and improve the prediction accuracy of our SVM-based classifier for DNA-BPs. A systematic representation of some selected features known to perform well is investigated here. Firstly, four kinds of protein properties are obtained and used to describe the protein sequence. Secondly, three different feature transformation methods (OCTD, AC and SAA) are adopted to obtain numeric feature vectors from three main levels: Global, Nonlocal and Local of protein sequence and their performances are exhaustively investigated. At last, the mRMR-IFS feature selection method and ensemble learning approach are utilized to determine the best prediction model. Besides, the optimal features selected by mRMR-IFS are illustrated based on the observed results which may provide useful insights for revealing the mechanisms of protein-DNA interactions. For five-fold cross-validation over the DNAdset and DNAaset, we obtained an overall accuracy of 0.940 and 0.811, MCC of 0.881 and 0.614 respectively.

Conclusions

The good results suggest that it can efficiently develop an entirely sequence-based protocol that transforms and integrates informative features from different scales used by SVM to predict DNA-BPs accurately. Moreover, a novel systematic framework for sequence descriptor-based protein function prediction is proposed here.

Collapse

Xiao X, Wang P, Lin WZ, Jia JH, Chou KC. iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. Anal Biochem 2013;436:168-77. [PMID: 23395824 DOI: 10.1016/j.ab.2013.01.019] [Citation(s) in RCA: 383] [Impact Index Per Article: 31.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Revised: 01/10/2013] [Accepted: 01/21/2013] [Indexed: 12/14/2022]

Characterization of structure–antioxidant activity relationship of peptides in free radical systems using QSAR models: Key sequence positions and their amino acid properties. J Theor Biol 2013;318:29-43. [DOI: 10.1016/j.jtbi.2012.10.029] [Citation(s) in RCA: 144] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 10/21/2012] [Accepted: 10/22/2012] [Indexed: 11/22/2022]

Chen W, Feng PM, Lin H, Chou KC. iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res 2013;41:e68. [PMID: 23303794 PMCID: PMC3616736 DOI: 10.1093/nar/gks1450] [Citation(s) in RCA: 480] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Abstract

Meiotic recombination is an important biological process. As a main driving force of evolution, recombination provides natural new combinations of genetic variations. Rather than randomly occurring across a genome, meiotic recombination takes place in some genomic regions (the so-called ‘hotspots’) with higher frequencies, and in the other regions (the so-called ‘coldspots’) with lower frequencies. Therefore, the information of the hotspots and coldspots would provide useful insights for in-depth studying of the mechanism of recombination and the genome evolution process as well. So far, the recombination regions have been mainly determined by experiments, which are both expensive and time-consuming. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the recombination regions. In this study, a predictor, called ‘iRSpot-PseDNC’, was developed for identifying the recombination hotspots and coldspots. In the new predictor, the samples of DNA sequences are formulated by a novel feature vector, the so-called ‘pseudo dinucleotide composition’ (PseDNC), into which six local DNA structural properties, i.e. three angular parameters (twist, tilt and roll) and three translational parameters (shift, slide and rise), are incorporated. It was observed by the rigorous jackknife test that the overall success rate achieved by iRSpot-PseDNC was >82% in identifying recombination spots in Saccharomyces cerevisiae, indicating the new predictor is promising or at least may become a complementary tool to the existing methods in this area. Although the benchmark data set used to train and test the current method was from S. cerevisiae, the basic approaches can also be extended to deal with all the other genomes. Particularly, it has not escaped our notice that the PseDNC approach can be also used to study many other DNA-related problems. As a user-friendly web-server, iRSpot-PseDNC is freely accessible at http://lin.uestc.edu.cn/server/iRSpot-PseDNC.

Collapse

Lin SX, Lapointe J. Theoretical and experimental biology in one<br>—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers. ACTA ACUST UNITED AC 2013. [DOI: 10.4236/jbise.2013.64054] [Citation(s) in RCA: 132] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model. PLoS One 2012. [PMID: 23189138 PMCID: PMC3506597 DOI: 10.1371/journal.pone.0049040] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Abstract

The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called "iSMP-Grey" was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at http://www.jci-bioinfo.cn/iSMP-Grey. Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper.

Collapse

Yu C, Deng M, Cheng SY, Yau SC, He RL, Yau SST. Protein space: a natural method for realizing the nature of protein universe. J Theor Biol 2012;318:197-204. [PMID: 23154188 DOI: 10.1016/j.jtbi.2012.11.005] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2012] [Revised: 11/01/2012] [Accepted: 11/02/2012] [Indexed: 10/27/2022]

A novel statistical measure for sequence comparison on the basis of k-word counts. J Theor Biol 2012;318:91-100. [PMID: 23147229 DOI: 10.1016/j.jtbi.2012.10.035] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2011] [Revised: 10/10/2012] [Accepted: 10/31/2012] [Indexed: 11/24/2022]

Ma X, Guo J, Liu HD, Xie JM, Sun X. Sequence-based prediction of DNA-binding residues in proteins with conservation and correlation information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:1766-1775. [PMID: 22868682 DOI: 10.1109/tcbb.2012.106] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]

An empirical study on the matrix-based protein representations and their combination with sequence-based approaches. Amino Acids 2012;44:887-901. [PMID: 23108592 DOI: 10.1007/s00726-012-1416-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2012] [Accepted: 10/03/2012] [Indexed: 10/27/2022]

A comparison of computational methods for identifying virulence factors. PLoS One 2012;7:e42517. [PMID: 22880014 PMCID: PMC3411817 DOI: 10.1371/journal.pone.0042517] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 07/09/2012] [Indexed: 12/16/2022] Open

Wang X, Li GZ. A multi-label predictor for identifying the subcellular locations of singleplex and multiplex eukaryotic proteins. PLoS One 2012;7:e36317. [PMID: 22629314 PMCID: PMC3358325 DOI: 10.1371/journal.pone.0036317] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Accepted: 04/01/2012] [Indexed: 01/30/2023] Open

Abstract

Subcellular locations of proteins are important functional attributes. An effective and efficient subcellular localization predictor is necessary for rapidly and reliably annotating subcellular locations of proteins. Most of existing subcellular localization methods are only used to deal with single-location proteins. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. To better reflect characteristics of multiplex proteins, it is highly desired to develop new methods for dealing with them. In this paper, a new predictor, called Euk-ECC-mPLoc, by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and hybridizing gene ontology with dipeptide composition information, has been developed that can be used to deal with systems containing both singleplex and multiplex eukaryotic proteins. It can be utilized to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome, (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Experimental results on a stringent benchmark dataset of eukaryotic proteins by jackknife cross validation test show that the average success rate and overall success rate obtained by Euk-ECC-mPLoc were 69.70% and 81.54%, respectively, indicating that our approach is quite promising. Particularly, the success rates achieved by Euk-ECC-mPLoc for small subsets were remarkably improved, indicating that it holds a high potential for simulating the development of the area. As a user-friendly web-server, Euk-ECC-mPLoc is freely accessible to the public at the website http://levis.tongji.edu.cn:8080/bioinfo/Euk-ECC-mPLoc/. We believe that Euk-ECC-mPLoc may become a useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying subcellular locations of eukaryotic proteins.

Collapse

Mei S. Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization. J Theor Biol 2012;293:121-30. [DOI: 10.1016/j.jtbi.2011.10.015] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2011] [Revised: 10/09/2011] [Accepted: 10/13/2011] [Indexed: 10/16/2022]

Gao QB, Zhao H, Ye X, He J. Prediction of pattern recognition receptor family using pseudo-amino acid composition. Biochem Biophys Res Commun 2011;417:73-7. [PMID: 22138239 DOI: 10.1016/j.bbrc.2011.11.057] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Accepted: 11/12/2011] [Indexed: 01/21/2023]

Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids 2011;43:657-65. [DOI: 10.1007/s00726-011-1114-9] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2010] [Accepted: 09/28/2011] [Indexed: 10/16/2022]

Lin WZ, Fang JA, Xiao X, Chou KC. iDNA-Prot: identification of DNA binding proteins using random forest with grey model. PLoS One 2011;6:e24756. [PMID: 21935457 PMCID: PMC3174210 DOI: 10.1371/journal.pone.0024756] [Citation(s) in RCA: 196] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2011] [Accepted: 08/16/2011] [Indexed: 11/18/2022] Open

A multi-label classifier for predicting the subcellular localization of gram-negative bacterial proteins with both single and multiple sites. PLoS One 2011;6:e20592. [PMID: 21698097 PMCID: PMC3117797 DOI: 10.1371/journal.pone.0020592] [Citation(s) in RCA: 184] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Accepted: 05/04/2011] [Indexed: 11/21/2022] Open

Abstract

Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the “multi-label scale” and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localization of Gram-positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gneg-mPLoc was adopted to demonstrate the power of iLoc-Gneg. The dataset contains 1,392 Gram-negative bacterial proteins classified into the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Of the 1,392 proteins, 1,328 are each with only one subcellular location and the other 64 are each with two subcellular locations, but none of the proteins included has pairwise sequence identity to any other in a same subset (subcellular location). It was observed that the overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gneg was over 91%, which is about 6% higher than that by Gneg-mPLoc. As a user-friendly web-server, iLoc-Gneg is freely accessible to the public at http://icpr.jci.edu.cn/bioinfo/iLoc-Gneg. Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user's convenience, the iLoc-Gneg web-server also has the function to accept the batch job submission, which is not available in the existing version of Gneg-mPLoc web-server. It is anticipated that iLoc-Gneg may become a useful high throughput tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development.

Collapse

Wang P, Hu L, Liu G, Jiang N, Chen X, Xu J, Zheng W, Li L, Tan M, Chen Z, Song H, Cai YD, Chou KC. Prediction of antimicrobial peptides based on sequence alignment and feature selection methods. PLoS One 2011;6:e18476. [PMID: 21533231 PMCID: PMC3076375 DOI: 10.1371/journal.pone.0018476] [Citation(s) in RCA: 142] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2010] [Accepted: 03/08/2011] [Indexed: 11/21/2022] Open

Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties. BMC Bioinformatics 2011;12 Suppl 1:S47. [PMID: 21342579 PMCID: PMC3044304 DOI: 10.1186/1471-2105-12-s1-s47] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Abstract

Background

Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (SVM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors.

Results

This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m=22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m=28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc.

Conclusions

The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences.

Collapse

Guo J, Rao N, Liu G, Yang Y, Wang G. Predicting protein folding rates using the concept of Chou's pseudo amino acid composition. J Comput Chem 2011;32:1612-7. [PMID: 21328402 DOI: 10.1002/jcc.21740] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 11/04/2010] [Accepted: 12/02/2010] [Indexed: 12/12/2022]

Xiao X, Wang P, Chou KC. GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. MOLECULAR BIOSYSTEMS 2010;7:911-9. [PMID: 21180772 DOI: 10.1039/c0mb00170h] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Abstract

G protein-coupled receptors (GPCRs) are among the most frequent targets of therapeutic drugs. With the avalanche of newly generated protein sequences in the post genomic age, to expedite the process of drug discovery, it is highly desirable to develop an automated method to rapidly identify GPCRs and their types. A new predictor was developed by hybridizing two different modes of pseudo-amino acid composition (PseAAC): the functional domain PseAAC and the low-frequency Fourier spectrum PseAAC. The new predictor is called GPCR-2L, where "2L" means that it is a two-layer predictor: the 1st layer prediction engine is to identify a query protein as GPCR or not; if it is, the prediction will be automatically continued to further identify it as belonging to one of the following six types: (1) rhodopsin-like (Class A), (2) secretin-like (Class B), (3) metabotropic glutamate/pheromone (Class C), (4) fungal pheromone (Class D), (5) cAMP receptor (Class E), or (6) frizzled/smoothened family (Class F). The overall success rate of GPCR-2L in identifying proteins as GPCRs or non-GPCRs is over 97.2%, while identifying GPCRs among their six types is over 97.8%. Such high success rates were derived by the rigorous jackknife cross-validation on a stringent benchmark dataset, in which none of the included proteins had ≥40% pairwise sequence identity to any other protein in a same subset. As a user-friendly web-server, GPCR-2L is freely accessible to the public at http://icpr.jci.edu.cn/, by which one can obtain the 2-level results in about 20 s for a query protein sequence of 500 amino acids. The longer the sequence is, the more time it may usually need. The high success rates reported here indicate that it is a quite effective approach to identify GPCRs and their types with the functional domain information and the low-frequency Fourier spectrum analysis. It is anticipated that GPCR-2L may become a useful tool for both basic research and drug development in the areas related to GPCRs.

Collapse

Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol 2010;273:236-47. [PMID: 21168420 PMCID: PMC7125570 DOI: 10.1016/j.jtbi.2010.12.024] [Citation(s) in RCA: 971] [Impact Index Per Article: 64.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2010] [Revised: 12/08/2010] [Accepted: 12/13/2010] [Indexed: 11/29/2022]

Identification of RNA-binding sites in proteins by integrating various sequence information. Amino Acids 2010;40:239-48. [DOI: 10.1007/s00726-010-0639-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 05/22/2010] [Indexed: 12/12/2022]

Yin J, Diao Y, Wen Z, Wang Z, Li M. Studying Peptides Biological Activities Based on Multidimensional Descriptors (E) Using Support Vector Regression. Int J Pept Res Ther 2010. [DOI: 10.1007/s10989-010-9210-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Yu L, Guo Y, Zhang Z, Li Y, Li M, Li G, Xiong W, Zeng Y. SecretP: a new method for predicting mammalian secreted proteins. Peptides 2010;31:574-8. [PMID: 20045033 DOI: 10.1016/j.peptides.2009.12.026] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2009] [Revised: 12/17/2009] [Accepted: 12/17/2009] [Indexed: 11/19/2022]

Chou KC, Shen HB. A new method for predicting the subcellular localization of eukaryotic proteins with both single and multiple sites: Euk-mPLoc 2.0. PLoS One 2010;5:e9931. [PMID: 20368981 PMCID: PMC2848569 DOI: 10.1371/journal.pone.0009931] [Citation(s) in RCA: 250] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2010] [Accepted: 03/08/2010] [Indexed: 11/19/2022] Open

Abstract

Information of subcellular locations of proteins is important for in-depth studies of cell biology. It is very useful for proteomics, system biology and drug development as well. However, most existing methods for predicting protein subcellular location can only cover 5 to 12 location sites. Also, they are limited to deal with single-location proteins and hence failed to work for multiplex proteins, which can simultaneously exist at, or move between, two or more location sites. Actually, multiplex proteins of this kind usually posses some important biological functions worthy of our special notice. A new predictor called "Euk-mPLoc 2.0" is developed by hybridizing the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracell, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole. Compared with the existing methods for predicting eukaryotic protein subcellular localization, the new predictor is much more powerful and flexible, particularly in dealing with proteins with multiple locations and proteins without available accession numbers. For a newly-constructed stringent benchmark dataset which contains both single- and multiple-location proteins and in which none of proteins has pairwise sequence identity to any other in a same location, the overall jackknife success rate achieved by Euk-mPLoc 2.0 is more than 24% higher than those by any of the existing predictors. As a user-friendly web-server, Euk-mPLoc 2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/. For a query protein sequence of 400 amino acids, it will take about 15 seconds for the web-server to yield the predicted result; the longer the sequence is, the more time it may usually need. It is anticipated that the novel approach and the powerful predictor as presented in this paper will have a significant impact to Molecular Cell Biology, System Biology, Proteomics, Bioinformatics, and Drug Development.

Collapse

Nanni L, Shi JY, Brahnam S, Lumini A. Protein classification using texture descriptors extracted from the protein backbone image. J Theor Biol 2010;264:1024-32. [PMID: 20307550 DOI: 10.1016/j.jtbi.2010.03.020] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 01/28/2010] [Accepted: 03/11/2010] [Indexed: 10/19/2022]

Gao QB, Ye XF, Jin ZC, He J. Improving discrimination of outer membrane proteins by fusing different forms of pseudo amino acid composition. Anal Biochem 2009;398:52-9. [PMID: 19874797 DOI: 10.1016/j.ab.2009.10.040] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2009] [Revised: 10/21/2009] [Accepted: 10/22/2009] [Indexed: 10/20/2022]

Using auto covariance method for functional discrimination of membrane proteins based on evolution information. Amino Acids 2009;38:1497-503. [DOI: 10.1007/s00726-009-0362-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2009] [Accepted: 09/24/2009] [Indexed: 11/29/2022]

Prediction of beta-turn in protein using E-SSpred and support vector machine. Protein J 2009;28:175-81. [PMID: 19488840 DOI: 10.1007/s10930-009-9181-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

SubChlo: predicting protein subchloroplast locations with pseudo-amino acid composition and the evidence-theoretic K-nearest neighbor (ET-KNN) algorithm. J Theor Biol 2009;261:330-5. [PMID: 19679138 DOI: 10.1016/j.jtbi.2009.08.004] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Revised: 07/30/2009] [Accepted: 08/01/2009] [Indexed: 11/23/2022]

Zeng YH, Guo YZ, Xiao RQ, Yang L, Yu LZ, Li ML. Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. J Theor Biol 2009;259:366-72. [PMID: 19341746 DOI: 10.1016/j.jtbi.2009.03.028] [Citation(s) in RCA: 147] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2008] [Revised: 02/25/2009] [Accepted: 03/13/2009] [Indexed: 12/20/2022]

Gao QB, Jin ZC, Ye XF, Wu C, He J. Prediction of nuclear receptors with optimal pseudo amino acid composition. Anal Biochem 2009;387:54-9. [PMID: 19454254 DOI: 10.1016/j.ab.2009.01.018] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2008] [Revised: 12/04/2008] [Accepted: 01/09/2009] [Indexed: 10/21/2022]

Nanni L, Mazzara S, Pattini L, Lumini A. Protein classification combining surface analysis and primary structure. Protein Eng Des Sel 2009;22:267-72. [DOI: 10.1093/protein/gzn084] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Shen HB, Chou KC. Identification of proteases and their types. Anal Biochem 2008;385:153-60. [PMID: 19007742 DOI: 10.1016/j.ab.2008.10.020] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Revised: 10/13/2008] [Accepted: 10/14/2008] [Indexed: 10/21/2022]

100

Xiao X, Lin WZ, Chou KC. Using grey dynamic modeling and pseudo amino acid composition to predict protein structural classes. J Comput Chem 2008;29:2018-24. [PMID: 18381630 DOI: 10.1002/jcc.20955] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]