51
|
Sharma A, Paliwal KK, Dehzangi A, Lyons J, Imoto S, Miyano S. A strategy to select suitable physicochemical attributes of amino acids for protein fold recognition. BMC Bioinformatics 2013; 14:233. [PMID: 23879571 PMCID: PMC3724710 DOI: 10.1186/1471-2105-14-233] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2012] [Accepted: 06/20/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Assigning a protein into one of its folds is a transitional step for discovering three dimensional protein structure, which is a challenging task in bimolecular (biological) science. The present research focuses on: 1) the development of classifiers, and 2) the development of feature extraction techniques based on syntactic and/or physicochemical properties. RESULTS Apart from the above two main categories of research, we have shown that the selection of physicochemical attributes of the amino acids is an important step in protein fold recognition and has not been explored adequately. We have presented a multi-dimensional successive feature selection (MD-SFS) approach to systematically select attributes. The proposed method is applied on protein sequence data and an improvement of around 24% in fold recognition has been noted when selecting attributes appropriately. CONCLUSION The MD-SFS has been applied successfully in selecting physicochemical attributes of the amino acids. The selected attributes show improved protein fold recognition performance.
Collapse
Affiliation(s)
- Alok Sharma
- Laboratory of DNA Information Analysis, University of Tokyo, Minato-ku, Tokyo, Japan.
| | | | | | | | | | | |
Collapse
|
52
|
Thangakani AM, Kumar S, Velmurugan D, Gromiha MM. Distinct position-specific sequence features of hexa-peptides that form amyloid-fibrils: application to discriminate between amyloid fibril and amorphous β-aggregate forming peptide sequences. BMC Bioinformatics 2013; 14 Suppl 8:S6. [PMID: 23815227 PMCID: PMC3654898 DOI: 10.1186/1471-2105-14-s8-s6] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Background Comparison of short peptides which form amyloid-fibrils with their homologues that may form amorphous β-aggregates but not fibrils, can aid development of novel amyloid-containing nanomaterials with well defined morphologies and characteristics. The knowledge gained from the comparative analysis could also be applied towards identifying potential aggregation prone regions in proteins, which are important for biotechnology applications or have been implicated in neurodegenerative diseases. In this work we have systematically analyzed a set of 139 amyloid-fibril hexa-peptides along with a highly homologous set of 168 hexa-peptides that do not form amyloid fibrils for their position-wise as well as overall amino acid compositions and averages of 49 selected amino acid properties. Results Amyloid-fibril forming peptides show distinct preferences and avoidances for amino acid residues to occur at each of the six positions. As expected, the amyloid fibril peptides are also more hydrophobic than non-amyloid peptides. We have used the results of this analysis to develop statistical potential energy values for the 20 amino acid residues to occur at each of the six different positions in the hexa-peptides. The distribution of the potential energy values in 139 amyloid and 168 non-amyloid fibrils are distinct and the amyloid-fibril peptides tend to be more stable (lower total potential energy values) than non-amyloid peptides. The average frequency of occurrence of these peptides with lower than specific cutoff energies at different positions is 72% and 50%, respectively. The potential energy values were used to devise a statistical discriminator to distinguish between amyloid-fibril and non-amyloid peptides. Our method could identify the amyloid-fibril forming hexa-peptides to an accuracy of 89%. On the other hand, the accuracy of identifying non-amyloid peptides was only 54%. Further attempts were made to improve the prediction accuracy via machine learning. This resulted in an overall accuracy of 82.7% with the sensitivity and specificity of 81.3% and 83.9%, respectively, in 10-fold cross-validation method. Conclusions Amyloid-fibril forming hexa-peptides show position specific sequence features that are different from those which may form amorphous β-aggregates. These positional preferences are found to be important features for discriminating amyloid-fibril forming peptides from their homologues that don't form amyloid-fibrils.
Collapse
Affiliation(s)
- A Mary Thangakani
- Department of Crystallography and Biophysics, University of Madras, Chennai 600025, India
| | | | | | | |
Collapse
|
53
|
Dehzangi A, Paliwal K, Sharma A, Dehzangi O, Sattar A. A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:564-75. [PMID: 24091391 DOI: 10.1109/tcbb.2013.65] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Better understanding of structural class of a given protein reveals important information about its overall folding type and its domain. It can also be directly used to provide critical information on general tertiary structure of a protein which has a profound impact on protein function determination and drug design. Despite tremendous enhancements made by pattern recognition-based approaches to solve this problem, it still remains as an unsolved issue for bioinformatics that demands more attention and exploration. In this study, we propose a novel feature extraction model that incorporates physicochemical and evolutionary-based information simultaneously. We also propose overlapped segmented distribution and autocorrelation-based feature extraction methods to provide more local and global discriminatory information. The proposed feature extraction methods are explored for 15 most promising attributes that are selected from a wide range of physicochemical-based attributes. Finally, by applying an ensemble of different classifiers namely, Adaboost.M1, LogitBoost, naive Bayes, multilayer perceptron (MLP), and support vector machine (SVM) we show enhancement of the protein structural class prediction accuracy for four popular benchmarks.
Collapse
|
54
|
Cheng X, Xiao X, Wu ZC, Wang P, Lin WZ. Swfoldrate: predicting protein folding rates from amino acid sequence with sliding window method. Proteins 2012; 81:140-8. [PMID: 22933332 DOI: 10.1002/prot.24171] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 07/20/2012] [Accepted: 08/25/2012] [Indexed: 01/18/2023]
Abstract
Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp.
Collapse
Affiliation(s)
- Xiang Cheng
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China
| | | | | | | | | |
Collapse
|
55
|
Bastolla U, Bruscolini P, Velasco JL. Sequence determinants of protein folding rates: Positive correlation between contact energy and contact range indicates selection for fast folding. Proteins 2012; 80:2287-304. [DOI: 10.1002/prot.24118] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Revised: 05/14/2012] [Accepted: 05/17/2012] [Indexed: 11/12/2022]
|
56
|
Dasgupta A, Udgaonkar JB. Four-State Folding of a SH3 Domain: Salt-Induced Modulation of the Stabilities of the Intermediates and Native State. Biochemistry 2012; 51:4723-34. [DOI: 10.1021/bi300223b] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Amrita Dasgupta
- National Centre for Biological
Sciences, Tata Institute of Fundamental Research, Bangalore 560065,
India
| | - Jayant B. Udgaonkar
- National Centre for Biological
Sciences, Tata Institute of Fundamental Research, Bangalore 560065,
India
| |
Collapse
|
57
|
Real value prediction of protein folding rate change upon point mutation. J Comput Aided Mol Des 2012; 26:339-47. [DOI: 10.1007/s10822-012-9560-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Accepted: 03/02/2012] [Indexed: 10/28/2022]
|
58
|
Huang JT, Xing DJ, Huang W. Relationship between protein folding kinetics and amino acid properties. Amino Acids 2011; 43:567-72. [DOI: 10.1007/s00726-011-1189-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2011] [Accepted: 11/29/2011] [Indexed: 10/14/2022]
|
59
|
Galzitskaya OV, Bogatyreva NS, Glyakina AV. Bacterial proteins fold faster than eukaryotic proteins with simple folding kinetics. BIOCHEMISTRY (MOSCOW) 2011; 76:225-35. [PMID: 21568856 DOI: 10.1134/s000629791102009x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Protein domain frequency and distribution among kingdoms was statistically analyzed using the SCOP structural database. It appeared that among chosen protein domains with the best resolution, eukaryotic proteins more often belong to α-helical and β-structural proteins, while proteins of bacterial origin belong to α/β structural class. Statistical analysis of folding rates of 73 proteins with known experimental data revealed that bacterial proteins with simple kinetics (23 proteins) exhibit a higher folding rate compared to eukaryotic proteins with simple folding kinetics (27 proteins). Analysis of protein domain amino acid composition showed that the frequency of amino acid residues in proteins of eukaryotic and bacterial origin is different for proteins with simple and complex folding kinetics.
Collapse
Affiliation(s)
- O V Galzitskaya
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia.
| | | | | |
Collapse
|
60
|
Guo J, Rao N. Predicting protein folding rate from amino acid sequence. J Bioinform Comput Biol 2011; 9:1-13. [PMID: 21328704 DOI: 10.1142/s0219720011005306] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2010] [Revised: 10/19/2010] [Accepted: 10/19/2010] [Indexed: 11/18/2022]
Abstract
Predicting protein folding rate from amino acid sequence is an important challenge in computational and molecular biology. Over the past few years, many methods have been developed to reflect the correlation between the folding rates and protein structures and sequences. In this paper, we present an effective method, a combined neural network--genetic algorithm approach, to predict protein folding rates only from amino acid sequences, without any explicit structural information. The originality of this paper is that, for the first time, it tackles the effect of sequence order. The proposed method provides a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.80 and the standard error is 2.65 for 93 proteins, the largest such databases of proteins yet studied, when evaluated with leave-one-out jackknife test. The comparative results demonstrate that this correlation is better than most of other methods, and suggest the important contribution of sequence order information to the determination of protein folding rates.
Collapse
Affiliation(s)
- Jianxiu Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China.
| | | |
Collapse
|
61
|
Guo J, Rao N, Liu G, Yang Y, Wang G. Predicting protein folding rates using the concept of Chou's pseudo amino acid composition. J Comput Chem 2011; 32:1612-7. [PMID: 21328402 DOI: 10.1002/jcc.21740] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 11/04/2010] [Accepted: 12/02/2010] [Indexed: 12/12/2022]
Abstract
One of the most important challenges in computational and molecular biology is to understand the relationship between amino acid sequences and the folding rates of proteins. Recent works suggest that topological parameters, amino acid properties, chain length and the composition index relate well with protein folding rates, however, sequence order information has seldom been considered as a property for predicting protein folding rates. In this study, amino acid sequence order was used to derive an effective method, based on an extended version of the pseudo-amino acid composition, for predicting protein folding rates without any explicit structural information. Using the jackknife cross validation test, the method was demonstrated on the largest dataset (99 proteins) reported. The method was found to provide a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.81 (with a highly significant level) and the standard error is 2.46. The reported algorithm was found to perform better than several representative sequence-based approaches using the same dataset. The results indicate that sequence order information is an important determinant of protein folding rates.
Collapse
Affiliation(s)
- Jianxiu Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, People's Republic of China
| | | | | | | | | |
Collapse
|
62
|
GUO JX, RAO NN, LIU GX, LI J, WANG YH. Predicting Protein Folding Rate From Amino Acid Sequence. PROG BIOCHEM BIOPHYS 2011. [DOI: 10.3724/sp.j.1206.2010.00380] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
63
|
Harihar B, Selvaraj S. Application of long-range order to predict unfolding rates of two-state proteins. Proteins 2010; 79:880-7. [DOI: 10.1002/prot.22925] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Revised: 10/07/2010] [Accepted: 10/24/2010] [Indexed: 01/09/2023]
|
64
|
Chang L, Wang J, Wang W. Composition-based effective chain length for prediction of protein folding rates. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:051930. [PMID: 21230523 DOI: 10.1103/physreve.82.051930] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Indexed: 05/30/2023]
Abstract
Folding rate prediction is a useful way to find the key factors affecting folding kinetics of proteins. Structural information is more or less required in the present prediction methods, which limits the application of these methods to various proteins. In this work, an "effective length" is defined solely based on the composition of a protein, namely, the number of specific types of amino acids in a protein. A physical theory based on a minimalist model is employed to describe the relation between the folding rates and the effective length of proteins. Based on the resultant relationship between folding rates and effective length, the optimal sets of amino acids are found through the enumeration over all possible combinations of amino acids. This optimal set achieves a high correlation (with the coefficient of 0.84) between the folding rates and the optimal effective length. The features of these amino acids are consistent with our model and landscape theory. Further comparisons between our effective length and other factors are carried out. The effective length is physically consistent with structure-based prediction methods and has the best predictability for folding rates. These results all suggest that both entropy and energetics contribute importantly to folding kinetics. The ability to accurately and efficiently predict folding rates from composition enables the analysis of the kinetics for various kinds of proteins. The underlying physics in our method may be helpful to stimulate further understanding on the effects of various amino acids in folding dynamics.
Collapse
Affiliation(s)
- Le Chang
- National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, Nanjing 210093, China
| | | | | |
Collapse
|
65
|
Tartaglia GG, Vendruscolo M. Proteome-Level Interplay between Folding and Aggregation Propensities of Proteins. J Mol Biol 2010; 402:919-28. [DOI: 10.1016/j.jmb.2010.08.013] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Revised: 08/05/2010] [Accepted: 08/09/2010] [Indexed: 10/19/2022]
|
66
|
iFC²: an integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content. Amino Acids 2010; 40:963-73. [PMID: 20730460 DOI: 10.1007/s00726-010-0721-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 08/06/2010] [Indexed: 10/19/2022]
Abstract
Several descriptors of protein structure at the sequence and residue levels have been recently proposed. They are widely adopted in the analysis and prediction of structural and functional characteristics of proteins. Numerous in silico methods have been developed for sequence-based prediction of these descriptors. However, many of them do not have a public web-server and only a few integrate multiple descriptors to improve the predictions. We introduce iFC² (integrated prediction of fold, class, and content) server that is the first to integrate three modern predictors of sequence-level descriptors. They concern fold type (PFRES), structural class (SCEC), and secondary structure content (PSSC-core). The server exploits relations between the three descriptors to implement a cross-evaluation procedure that improves over the predictions of the individual methods. The iFC² annotates fold and class predictions as potentially correct/incorrect. When tested on datasets with low-similarity chains, for the fold prediction iFC² labels 82% of the PFRES predictions as correct and the accuracy of these predictions equals 72%. The accuracy of the remaining 28% of the PFRES predictions equals 38%. Similarly, our server assigns correct labels for over 79% of SCEC predictions, which are shown to be 98% accurate, while the remaining SCEC predictions are only 15% accurate. These results are shown to be competitive when contrasted against recent relevant web-servers. Predictions on CASP8 targets show that the content predicted by iFC² is competitive when compared with the content computed from the tertiary structures predicted by three best-performing methods in CASP8. The iFC² server is available at http://biomine.ece.ualberta.ca/1D/1D.html .
Collapse
|
67
|
Huang LT, Gromiha MM. First insight into the prediction of protein folding rate change upon point mutation. Bioinformatics 2010; 26:2121-7. [DOI: 10.1093/bioinformatics/btq350] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
68
|
Galzitskaya OV. Is protein folding rate dependent on number of folding stages? Modeling of protein folding with ferredoxin-like fold. BIOCHEMISTRY. BIOKHIMIIA 2010; 75:717-727. [PMID: 20636263 DOI: 10.1134/s0006297910060064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Statistical analysis of protein folding rates has been done for 84 proteins with available experimental data. A surprising result is that the proteins with multi-state kinetics from the size range of 50-100 amino acid residues (a.a.) fold as fast as proteins with two-state kinetics from the same size range. At the same time, the proteins with two-state kinetics from the size range 101-151 a.a. fold faster than those from the size range 50-100 a.a. Moreover, it turns out unexpectedly that usually in the group of structural homologs from the size range 50-100 a.a., proteins with multi-state kinetics fold faster than those with two-state kinetics. The protein folding for six proteins with a ferredoxin-like fold and with a similar size has been modeled using Monte Carlo simulations and dynamic programming. Good correlation between experimental folding rates, some structural parameters, and the number of Monte Carlo steps has been obtained. It is shown that a protein with multi-state kinetics actually folds three times faster than its structural homologs.
Collapse
Affiliation(s)
- O V Galzitskaya
- Institute of Protein Research, Russian Academy of Sciences, Pushchino, Moscow Region, Russia.
| |
Collapse
|
69
|
Xi L, Li S, Liu H, Li J, Lei B, Yao X. Global and local prediction of protein folding rates based on sequence autocorrelation information. J Theor Biol 2010; 264:1159-68. [DOI: 10.1016/j.jtbi.2010.03.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2009] [Revised: 03/28/2010] [Accepted: 03/29/2010] [Indexed: 11/24/2022]
|
70
|
Juretić D, Vukicević D, Ilić N, Antcheva N, Tossi A. Computational design of highly selective antimicrobial peptides. J Chem Inf Model 2010; 49:2873-82. [PMID: 19947578 DOI: 10.1021/ci900327a] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have created a structure-selectivity database (AMPad) of frog-derived, helical antimicrobial peptides (AMPs), in which the selectivity was determined as a therapeutic index (TI), and then used the novel concept of sequence moments to study the lengthwise asymmetry of physicochemical peptide properties. We found that the cosine of the angle between two sequence moments obtained with different hydrophobicity scales, defined as the D-descriptor, identifies highly selective peptide antibiotics. We could then use this descriptor to predict TI changes after point mutations in known AMPs, and to aid the prediction of TI for de novo designed AMPs. In combination with an amino acid selectivity index, a motif regularity index and other statistical rules extracted from AMPad, the D-descriptor enabled construction of the AMP-Designer algorithm. A 23 residue, glycine-rich, peptide suggested by the algorithm was synthesized and the activity and selectivity tested. This peptide, adepantin 1, is less than 50% identical to any other AMP, has a potent antibacterial activity against the reference organism, E. coli, and has a significantly greater selectivity (TI > 200) than the best AMP present in the AMPad database (TI = 125).
Collapse
Affiliation(s)
- Davor Juretić
- Department of Physics, Faculty of Science, University of Split, 21000 Split, Croatia and Department of Life Sciences, University of Trieste, 34127 Trieste, Italy.
| | | | | | | | | |
Collapse
|
71
|
Ou YY, Chen SA, Gromiha MM. Classification of transporters using efficient radial basis function networks with position-specific scoring matrices and biochemical properties. Proteins 2010; 78:1789-97. [DOI: 10.1002/prot.22694] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
72
|
Abstract
We have developed a thermodynamic database for proteins and mutants, ProTherm, which is a collection of a large number of thermodynamic data on protein stability along with the sequence and structure information, experimental methods and conditions, and literature information. This is a valuable resource for understanding/predicting the stability of proteins, and it can be accessible at http://www.gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html . ProTherm has several features including various search, display, and sorting options and visualization tools. We have analyzed the data in ProTherm to examine the relationship among thermodynamics, structure, and function of proteins. We describe the progress on the development of methods for understanding/predicting protein stability, such as (i) relationship between the stability of protein mutants and amino acid properties, (ii) average assignment method, (iii) empirical energy functions, (iv) torsion, distance, and contact potentials, and (v) machine learning techniques. The list of online resources for predicting protein stability has also been provided.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | | |
Collapse
|
73
|
Mizianty MJ, Kurgan L. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences. BMC Bioinformatics 2009; 10:414. [PMID: 20003388 PMCID: PMC2805645 DOI: 10.1186/1471-2105-10-414] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2009] [Accepted: 12/13/2009] [Indexed: 11/13/2022] Open
Abstract
Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.
Collapse
Affiliation(s)
- Marcin J Mizianty
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.
| | | |
Collapse
|
74
|
Harihar B, Selvaraj S. Refinement of the long-range order parameter in predicting folding rates of two-state proteins. Biopolymers 2009; 91:928-35. [DOI: 10.1002/bip.21281] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
75
|
Jiang Y, Iglinski P, Kurgan L. Prediction of protein folding rates from primary sequences using hybrid sequence representation. J Comput Chem 2009; 30:772-83. [DOI: 10.1002/jcc.21096] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
76
|
Gromiha MM. Multiple Contact Network Is a Key Determinant to Protein Folding Rates. J Chem Inf Model 2009; 49:1130-5. [DOI: 10.1021/ci800440x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- M. Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
77
|
Analysis of oligomeric proteins during unfolding by pH and temperature. J Mol Model 2009; 15:1013-25. [PMID: 19205760 DOI: 10.1007/s00894-008-0365-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2008] [Accepted: 09/22/2008] [Indexed: 10/21/2022]
Abstract
During thermal transition and variation of pH, structural properties of 35 proteins and their complexes (bound with substrate and co-factor) were analyzed in detail. During pH alteration, these proteins were shown to have substantial differences in conformations. pH conformers were analyzed in detail. Free energy and other energy parameters were also estimated for these proteins at various pH and temperatures. Detailed structural analysis and binding interfaces of various substrates, inhibitors and cofactor of these proteins were also investigated using docking and molecular dynamic simulation.
Collapse
|
78
|
Huang LT, Gromiha MM. Analysis and prediction of protein folding rates using quadratic response surface models. J Comput Chem 2008; 29:1675-83. [PMID: 18351617 DOI: 10.1002/jcc.20925] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.
Collapse
Affiliation(s)
- Liang-Tsung Huang
- Department of Computer Science and Information Engineering, Ming-Dao University, Changhua 523, Taiwan
| | | |
Collapse
|
79
|
Chen K, Kurgan LA, Ruan J. Prediction of protein structural class using novel evolutionary collocation-based sequence representation. J Comput Chem 2008; 29:1596-604. [PMID: 18293306 DOI: 10.1002/jcc.20918] [Citation(s) in RCA: 131] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although existing structural class prediction methods applied virtually all state-of-the-art classifiers, many of them use a relatively simple protein sequence representation that often includes amino acid (AA) composition. To this end, we propose a novel sequence representation that incorporates evolutionary information encoded using PSI-BLAST profile-based collocation of AA pairs. We used six benchmark datasets and five representative classifiers to quantify and compare the quality of the structural class prediction with the proposed representation. The best, classifier support vector machine achieved 61-96% accuracy on the six datasets. These predictions were comprehensively compared with a wide range of recently proposed methods for prediction of structural classes. Our comprehensive comparison shows superiority of the proposed representation, which results in error rate reductions that range between 14% and 26% when compared with predictions of the best-performing, previously published classifiers on the considered datasets. The study also shows that, for the benchmark dataset that includes sequences characterized by low identity (i.e., 25%, 30%, and 40%), the prediction accuracies are 20-35% lower than for the other three datasets that include sequences with a higher degree of similarity. In conclusion, the proposed representation is shown to substantially improve the accuracy of the structural class prediction. A web server that implements the presented prediction method is freely available at http://biomine.ece.ualberta.ca/Structural_Class/SCEC.html.
Collapse
Affiliation(s)
- Ke Chen
- Department of Electrical and Computer Engineering, ECERF, University of Alberta, Edmonton, Alberta, Canada
| | | | | |
Collapse
|
80
|
Ouyang Z, Liang J. Predicting protein folding rates from geometric contact and amino acid sequence. Protein Sci 2008; 17:1256-63. [PMID: 18434498 DOI: 10.1110/ps.034660.108] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Protein folding speeds are known to vary over more than eight orders of magnitude. Plaxco, Simons, and Baker (see References) first showed a correlation of folding speed with the topology of the native protein. That and subsequent studies showed, if the native structure of a protein is known, its folding speed can be predicted reasonably well through a correlation with the "localness" of the contacts in the protein. In the present work, we develop a related measure, the geometric contact number, N (alpha), which is the number of nonlocal contacts that are well-packed, by a Voronoi criterion. We find, first, that in 80 proteins, the largest such database of proteins yet studied, N (alpha) is a consistently excellent predictor of folding speeds of both two-state fast folders and more complex multistate folders. Second, we show that folding rates can also be predicted from amino acid sequences directly, without the need to know the native topology or other structural properties.
Collapse
Affiliation(s)
- Zheng Ouyang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois 60607, USA
| | | |
Collapse
|
81
|
Kurgan LA, Zhang T, Zhang H, Shen S, Ruan J. Secondary structure-based assignment of the protein structural classes. Amino Acids 2008; 35:551-64. [DOI: 10.1007/s00726-008-0080-3] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2008] [Accepted: 02/27/2008] [Indexed: 11/24/2022]
|
82
|
Huang JT, Cheng JP. Differentiation between two-state and multi-state folding proteins based on sequence. Proteins 2008; 72:44-9. [DOI: 10.1002/prot.21893] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
83
|
Huang JT, Cheng JP. Prediction of folding transition-state position (βT) of small, two-state proteins from local secondary structure content. Proteins 2007; 68:218-22. [PMID: 17469192 DOI: 10.1002/prot.21411] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Folding kinetics of proteins is governed by the free energy and position of transition states. But attempts to predict the position of folding transition state on reaction pathway from protein structure have been met with only limited success, unlike the folding-rate prediction. Here, we find that the folding transition-state position is related to the secondary structure content of native two-state proteins. We present a simple method for predicting the transition-state position from their alpha-helix, turn and polyproline secondary structures. The method achieves 81% correlation with experiment over 24 small, two-state proteins, suggesting that the local secondary structure content, especially for content of alpha-helix, is a determinant of the solvent accessibility of the transition state ensemble and size of folding nucleus.
Collapse
Affiliation(s)
- Ji-Tao Huang
- College of Chemistry and State Key Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China
| | | |
Collapse
|
84
|
Huang LT, Saraboji K, Ho SY, Hwang SF, Ponnuswamy MN, Gromiha MM. Prediction of protein mutant stability using classification and regression tool. Biophys Chem 2007; 125:462-70. [PMID: 17113702 DOI: 10.1016/j.bpc.2006.10.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Revised: 10/19/2006] [Accepted: 10/23/2006] [Indexed: 11/18/2022]
Abstract
Prediction of protein stability upon amino acid substitutions is an important problem in molecular biology and the solving of which would help for designing stable mutants. In this work, we have analyzed the stability of protein mutants using two different datasets of 1396 and 2204 mutants obtained from ProTherm database, respectively for free energy change due to thermal (DeltaDeltaG) and denaturant denaturations (DeltaDeltaG(H(2)O)). We have used a set of 48 physical, chemical energetic and conformational properties of amino acid residues and computed the difference of amino acid properties for each mutant in both sets of data. These differences in amino acid properties have been related to protein stability (DeltaDeltaG and DeltaDeltaG(H(2)O)) and are used to train with classification and regression tool for predicting the stability of protein mutants. Further, we have tested the method with 4 fold, 5 fold and 10 fold cross validation procedures. We found that the physical properties, shape and flexibility are important determinants of protein stability. The classification of mutants based on secondary structure (helix, strand, turn and coil) and solvent accessibility (buried, partially buried, partially exposed and exposed) distinguished the stabilizing/destabilizing mutants at an average accuracy of 81% and 80%, respectively for DeltaDeltaG and DeltaDeltaG(H(2)O). The correlation between the experimental and predicted stability change is 0.61 for DeltaDeltaG and 0.44 for DeltaDeltaG(H(2)O). Further, the free energy change due to the replacement of amino acid residue has been predicted within an average error of 1.08 kcal/mol and 1.37 kcal/mol for thermal and chemical denaturation, respectively. The relative importance of secondary structure and solvent accessibility, and the influence of the dataset on prediction of protein mutant stability have been discussed.
Collapse
Affiliation(s)
- Liang-Tsung Huang
- Institute of Information Engineering and Computer Science, Feng-Chia University, Taichung, 407, Taiwan
| | | | | | | | | | | |
Collapse
|
85
|
Gromiha MM, Thangakani AM, Selvaraj S. FOLD-RATE: prediction of protein folding rates from amino acid sequence. Nucleic Acids Res 2006; 34:W70-4. [PMID: 16845101 PMCID: PMC1538837 DOI: 10.1093/nar/gkl043] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We have developed a web server, FOLD-RATE, for predicting the folding rates of proteins from their amino acid sequences. The relationship between amino acid properties and protein folding rates has been systematically analyzed and a statistical method based on linear regression technique has been proposed for predicting the folding rate of proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and folding rates of two and three-state proteins. Consequently, different regression equations have been developed for proteins belonging to all-alpha, all-beta and mixed class. We observed an excellent agreement between predicted and experimentally observed folding rates of proteins; the correlation coefficients are, 0.99, 0.97 and 0.90, respectively, for all-alpha, all-beta and mixed class proteins. The prediction server is freely available at http://psfs.cbrc.jp/fold-rate/.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
86
|
Gromiha MM, Suwa M. Influence of amino acid properties for discriminating outer membrane proteins at better accuracy. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2006; 1764:1493-7. [PMID: 16963325 DOI: 10.1016/j.bbapap.2006.07.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2006] [Revised: 07/13/2006] [Accepted: 07/28/2006] [Indexed: 10/24/2022]
Abstract
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the influence of physico-chemical, energetic and conformational properties of amino acid residues for discriminating outer membrane proteins using different machine learning algorithms, such as, Bayes rules, Logistic functions, Neural networks, Support vector machines, Decision trees, etc. We observed that most of the properties have discriminated the OMPs with similar accuracy. The neural network method with the property, free energy change could discriminate the OMPs from other folding types of globular and membrane proteins at the 5-fold cross-validation accuracy of 94.4% in a dataset of 1,088 proteins, which is better than that obtained with amino acid composition. The accuracy of discriminating globular proteins is 94.3% and that of transmembrane helical (TMH) proteins is 91.8%. Further, the neural network method is tested with globular proteins belonging to 30 major folding types and it could successfully exclude 99.4% of the considered 1612 non-redundant proteins. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, CBRC, National Institute of Advanced Industrial Science and Technology, AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Tokyo 135-0064, Japan.
| | | |
Collapse
|
87
|
Gromiha MM, Selvaraj S, Thangakani AM. A Statistical Method for Predicting Protein Unfolding Rates from Amino Acid Sequence. J Chem Inf Model 2006; 46:1503-8. [PMID: 16711769 DOI: 10.1021/ci050417u] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The prediction of protein unfolding rates from amino acid sequences is one of the most important challenges in computational biology and chemistry. The analysis on the relationship between protein unfolding rates and physical-chemical, energetic, and conformational properties of amino acid residues provides valuable information to understand and predict the unfolding rates of two- and three-state proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and unfolding rates of two- and three-state proteins, indicating the importance of native-state topology in determining the protein unfolding rates. We have formulated three independent linear regression equations to different structural classes of proteins for predicting their unfolding rates from amino acid sequences and obtained an excellent agreement between predicted and experimentally observed unfolding rates of proteins; the correlation coefficients are 0.999, 0.990, and 0.992, respectively, for all-alpha, all-beta, and mixed-class proteins. Further, we have derived a general equation applicable to all structural classes of proteins, which can be used for predicting the unfolding rates for proteins of an unknown structural class. We observed a correlation of 0.987 and 0.930, respectively, for back-check and jack-knife tests. These accuracy levels are better than those of other methods in the literature.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
88
|
Abstract
The significant correlation between protein folding rates and the sequence-predicted secondary structure suggests that folding rates are largely determined by the amino acid sequence. Here, we present a method for predicting the folding rates of proteins from sequences using the intrinsic properties of amino acids, which does not require any information on secondary structure prediction and structural topology. The contribution of residue to the folding rate is expressed by the residue's Omega value. For a given residue, its Omega depends on the amino acid properties (amino acid rigidity and dislike of amino acid for secondary structures). Our investigation achieves 82% correlation with folding rates determined experimentally for simple, two-state proteins studied until the present, suggesting that the amino acid sequence of a protein is an important determinant of the protein-folding rate and mechanism.
Collapse
Affiliation(s)
- Ji-Tao Huang
- Department of Biochemistry, Tianjin University of Technology, Tianjin, China.
| | | |
Collapse
|