1
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2024:10.1007/s12033-024-01119-4. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
2
|
PFDB: A standardized protein folding database with temperature correction. Sci Rep 2019; 9:1588. [PMID: 30733462 PMCID: PMC6367381 DOI: 10.1038/s41598-018-36992-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 11/22/2018] [Indexed: 11/23/2022] Open
Abstract
We constructed a standardized protein folding kinetics database (PFDB) in which the logarithmic rate constants of all listed proteins are calculated at the standard temperature (25 °C). A temperature correction based on the Eyring–Kramers equation was introduced for proteins whose folding kinetics were originally measured at temperatures other than 25 °C. We verified the temperature correction by comparing the logarithmic rate constants predicted and experimentally observed at 25 °C for 14 different proteins, and the results demonstrated improvement of the quality of the database. PFDB consists of 141 (89 two-state and 52 non-two-state) single-domain globular proteins, which has the largest number among the currently available databases of protein folding kinetics. PFDB is thus intended to be used as a standard for developing and testing future predictive and theoretical studies of protein folding. PFDB can be accessed from the following link: http://lee.kias.re.kr/~bala/PFDB.
Collapse
|
3
|
Pancsa R, Raimondi D, Cilia E, Vranken WF. Early Folding Events, Local Interactions, and Conservation of Protein Backbone Rigidity. Biophys J 2017; 110:572-583. [PMID: 26840723 DOI: 10.1016/j.bpj.2015.12.028] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 12/21/2015] [Accepted: 12/29/2015] [Indexed: 01/20/2023] Open
Abstract
Protein folding is in its early stages largely determined by the protein sequence and complex local interactions between amino acids, resulting in lower energy conformations that provide the context for further folding into the native state. We compiled a comprehensive data set of early folding residues based on pulsed labeling hydrogen deuterium exchange experiments. These early folding residues have corresponding higher backbone rigidity as predicted by DynaMine from sequence, an effect also present when accounting for the secondary structures in the folded protein. We then show that the amino acids involved in early folding events are not more conserved than others, but rather, early folding fragments and the secondary structure elements they are part of show a clear trend toward conserving a rigid backbone. We therefore propose that backbone rigidity is a fundamental physical feature conserved by proteins that can provide important insights into their folding mechanisms and stability.
Collapse
Affiliation(s)
- Rita Pancsa
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Daniele Raimondi
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Elisa Cilia
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Wim F Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.
| |
Collapse
|
4
|
Are protein hubs faster folders? Exploration based on Escherichia coli proteome. Amino Acids 2016; 48:2747-2753. [DOI: 10.1007/s00726-016-2309-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 08/05/2016] [Indexed: 10/21/2022]
|
5
|
Huang JT, Wang T, Huang SR, Li X. Prediction of protein folding rates from simplified secondary structure alphabet. J Theor Biol 2015; 383:1-6. [PMID: 26247139 DOI: 10.1016/j.jtbi.2015.07.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Revised: 06/20/2015] [Accepted: 07/23/2015] [Indexed: 10/23/2022]
Abstract
Protein folding is a very complicated and highly cooperative dynamic process. However, the folding kinetics is likely to depend more on a few key structural features. Here we find that secondary structures can determine folding rates of only large, multi-state folding proteins and fails to predict those for small, two-state proteins. The importance of secondary structures for protein folding is ordered as: extended β strand > α helix > bend > turn > undefined secondary structure>310 helix > isolated β strand > π helix. Only the first three secondary structures, extended β strand, α helix and bend, can achieve a good correlation with folding rates. This suggests that the rate-limiting step of protein folding would depend upon the formation of regular secondary structures and the buckling of chain. The reduced secondary structure alphabet provides a simplified description for the machine learning applications in protein design.
Collapse
Affiliation(s)
- Jitao T Huang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China.
| | - Titi Wang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China
| | - Shanran R Huang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China
| | - Xin Li
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China
| |
Collapse
|
6
|
Huang JT, Wang T, Huang SR, Li X. Reduced alphabet for protein folding prediction. Proteins 2015; 83:631-9. [PMID: 25641420 DOI: 10.1002/prot.24762] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2014] [Revised: 11/07/2014] [Accepted: 12/21/2014] [Indexed: 01/17/2023]
Abstract
What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28-letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28-letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design.
Collapse
Affiliation(s)
- Jitao T Huang
- Department of Chemistry and National Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin, 300071, People's Republic of China
| | | | | | | |
Collapse
|
7
|
Xiong H, Yang Y, Hu XP, He YM, Ma BG. Sequence determinants of prokaryotic gene expression level under heat stress. Gene 2014; 551:92-102. [PMID: 25168890 DOI: 10.1016/j.gene.2014.08.049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2014] [Accepted: 08/25/2014] [Indexed: 10/24/2022]
Abstract
Prokaryotic gene expression is environment-dependent and temperature plays an important role in shaping the gene expression profile. Revealing the regulation mechanisms of gene expression pertaining to temperature has attracted tremendous efforts in recent years particularly owning to the yielding of transcriptome and proteome data by high-throughput techniques. However, most of the previous works concentrated on the characterization of the gene expression profile of individual organism and little effort has been made to disclose the commonality among organisms, especially for the gene sequence features. In this report, we collected the transcriptome and proteome data measured under heat stress condition from recently published literature and studied the sequence determinants for the expression level of heat-responsive genes on multiple layers. Our results showed that there indeed exist commonness and consistent patterns of the sequence features among organisms for the differentially expressed genes under heat stress condition. Some features are attributed to the requirement of thermostability while some are dominated by gene function. The revealed sequence determinants of bacterial gene expression level under heat stress complement the knowledge about the regulation factors of prokaryotic gene expression responding to the change of environmental conditions. Furthermore, comparisons to thermophilic adaption have been performed to reveal the similarity and dissimilarity of the sequence determinants for the response to heat stress and for the adaption to high habitat temperature, which elucidates the complex landscape of gene expression related to the same physical factor of temperature.
Collapse
Affiliation(s)
- Heng Xiong
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Yi Yang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Xiao-Pan Hu
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Yi-Ming He
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China
| | - Bin-Guang Ma
- Agricultural Bioinformatics Key Laboratory of Hubei Province, College of Life Science and Technology, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
8
|
Wagaman AS, Coburn A, Brand-Thomas I, Dash B, Jaswal SS. A comprehensive database of verified experimental data on protein folding kinetics. Protein Sci 2014; 23:1808-12. [PMID: 25229122 DOI: 10.1002/pro.2551] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Accepted: 09/15/2014] [Indexed: 11/11/2022]
Abstract
Insights into protein folding rely increasingly on the synergy between experimental and theoretical approaches. Developing successful computational models requires access to experimental data of sufficient quantity and high quality. We compiled folding rate constants for what initially appeared to be 184 proteins from 15 published collections/web databases. To generate the highest confidence in the dataset, we verified the reported lnkf value and exact experimental construct and conditions from the original experimental report(s). The resulting comprehensive database of 126 verified entries, ACPro, will serve as a freely accessible resource (https://www.ats.amherst.edu/protein/) for the protein folding community to enable confident testing of predictive models. In addition, we provide a streamlined submission form for researchers to add new folding kinetics results, requiring specification of all the relevant experimental information according to the standards proposed in 2005 by the protein folding consortium organized by Plaxco. As the number and diversity of proteins whose folding kinetics are studied expands, our curated database will enable efficient and confident incorporation of new experimental results into a standardized collection. This database will support a more robust symbiosis between experiment and theory, leading ultimately to more rapid and accurate insights into protein folding, stability, and dynamics.
Collapse
Affiliation(s)
- Amy S Wagaman
- Department of Mathematics and Statistics, Amherst College, Amherst, Massachusetts
| | | | | | | | | |
Collapse
|
9
|
Quad-PRE: a hybrid method to predict protein quaternary structure attributes. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:715494. [PMID: 24963340 PMCID: PMC4052169 DOI: 10.1155/2014/715494] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 04/24/2014] [Accepted: 04/27/2014] [Indexed: 11/17/2022]
Abstract
The protein quaternary structure is very important to the biological process. Predicting their attributes is an essential task in computational biology for the advancement of the proteomics. However, the existing methods did not consider sufficient properties of amino acid. To end this, we proposed a hybrid method Quad-PRE to predict protein quaternary structure attributes using the properties of amino acid, predicted secondary structure, predicted relative solvent accessibility, and position-specific scoring matrix profiles and motifs. Empirical evaluation on independent dataset shows that Quad-PRE achieved higher overall accuracy 81.7%, especially higher accuracy 92.8%, 93.3%, and 90.6% on discrimination for trimer, hexamer, and octamer, respectively. Our model also reveals that six features sets are all important to the prediction, and a hybrid method is an optimal strategy by now. The results indicate that the proposed method can classify protein quaternary structure attributes effectively.
Collapse
|
10
|
Huang JT, Huang W, Huang SR, Li X. How the folding rates of two- and multistate proteins depend on the amino acid properties. Proteins 2014; 82:2375-82. [DOI: 10.1002/prot.24599] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 04/27/2014] [Accepted: 05/05/2014] [Indexed: 01/05/2023]
Affiliation(s)
- Jitao T. Huang
- Department of Chemistry and State Key Laboratory of EOC; College of Chemistry, Nankai University; Tianjin 300071 China
| | - Wei Huang
- Department of Chemistry and State Key Laboratory of EOC; College of Chemistry, Nankai University; Tianjin 300071 China
| | - Shanran R. Huang
- Department of Chemistry and State Key Laboratory of EOC; College of Chemistry, Nankai University; Tianjin 300071 China
| | - Xin Li
- Department of Chemistry and State Key Laboratory of EOC; College of Chemistry, Nankai University; Tianjin 300071 China
| |
Collapse
|
11
|
Wagaman AS, Jaswal SS. Capturing protein folding-relevant topology via absolute contact order variants. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2014. [DOI: 10.1142/s0219633614500059] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Absolute contact order is one of the simplest parameters used to predict protein folding rates. Many variants of contact order (CO) have been applied to highlight different aspects of contact neighborhoods and their relationship to folding. However, a systematic study of the influence of CO variants on correlation with folding rate has not been performed for a large combined set of multi- and two-state proteins. We explore different contact neighborhoods and resulting CO by varying the distance thresholds and weighting of sequence separation for heavy atom and residue-based counting methods for a set of 136 proteins diverse across folding and structural classes. We examine the changes in contact neighborhoods and compare correlations with our CO variants and the protein folding rates across our data set as well as by folding type and structural class. Different CO variants lead to the strongest correlations within each protein structural class. Our results demonstrate that backbone topology at a distance beyond where energetic interactions dominate is able to capture folding determinants, and suggest that more sensitive methods of characterizing contact relationships may improve ln kf prediction for diverse protein sets.
Collapse
Affiliation(s)
- Amy S. Wagaman
- Mathematics Department, Amherst College, P. O. Box 5000, Amherst, MA 01002, USA
| | - Sheila S. Jaswal
- Chemistry Department and Program in Biochemistry and Biophysics, Amherst College, P. O. Box 5000, Amherst, MA 01002, USA
| |
Collapse
|
12
|
Das A, Sin BK, Mohazab AR, Plotkin SS. Unfolded protein ensembles, folding trajectories, and refolding rate prediction. J Chem Phys 2013; 139:121925. [DOI: 10.1063/1.4817215] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|
13
|
Huang S, Huang JT. Inter-residue interaction is a determinant of protein folding kinetics. J Theor Biol 2013; 317:224-8. [DOI: 10.1016/j.jtbi.2012.10.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Revised: 09/17/2012] [Accepted: 10/02/2012] [Indexed: 11/30/2022]
|
14
|
Huang JT, Xing DJ, Huang W. Choice of synonymous codons associated with protein folding. Proteins 2012; 80:2056-62. [DOI: 10.1002/prot.24096] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2012] [Revised: 03/29/2012] [Accepted: 04/05/2012] [Indexed: 11/11/2022]
|
15
|
Real value prediction of protein folding rate change upon point mutation. J Comput Aided Mol Des 2012; 26:339-47. [DOI: 10.1007/s10822-012-9560-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Accepted: 03/02/2012] [Indexed: 10/28/2022]
|
16
|
Buck PM, Kumar S, Wang X, Agrawal NJ, Trout BL, Singh SK. Computational methods to predict therapeutic protein aggregation. Methods Mol Biol 2012; 899:425-451. [PMID: 22735968 DOI: 10.1007/978-1-61779-921-1_26] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Protein based biotherapeutics have emerged as a successful class of pharmaceuticals. However, these macromolecules endure a variety of physicochemical degradations during manufacturing, shipping, and storage, which may adversely impact the drug product quality. Of these degradations, the irreversible self-association of therapeutic proteins to form aggregates is a major challenge in the formulation of these molecules. Tools to predict and mitigate protein aggregation are, therefore, of great interest to biopharmaceutical research and development. In this chapter, a number of such computational tools developed to understand and predict the various steps involved in protein aggregation are described. These tools can be grouped into three general classes: unfolding kinetics and native state thermal stability, colloidal stability, and sequence/structure based aggregation liabilities. Chapter sections introduce each class by discussing how these predictive tools provide insight into the molecular events leading to protein aggregation. The computational methods are then explained in detail along with their advantages and limitations.
Collapse
Affiliation(s)
- Patrick M Buck
- Biotherapeutics Pharmaceutical Research and Development, Pfizer, Inc, St. Louis, MO, USA
| | | | | | | | | | | |
Collapse
|
17
|
Huang JT, Xing DJ, Huang W. Relationship between protein folding kinetics and amino acid properties. Amino Acids 2011; 43:567-72. [DOI: 10.1007/s00726-011-1189-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2011] [Accepted: 11/29/2011] [Indexed: 10/14/2022]
|
18
|
Zou T, Ozkan SB. Local and non-local native topologies reveal the underlying folding landscape of proteins. Phys Biol 2011; 8:066011. [DOI: 10.1088/1478-3975/8/6/066011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
19
|
Puorger C, Vetsch M, Wider G, Glockshuber R. Structure, Folding and Stability of FimA, the Main Structural Subunit of Type 1 Pili from Uropathogenic Escherichia coli Strains. J Mol Biol 2011; 412:520-35. [DOI: 10.1016/j.jmb.2011.07.044] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2011] [Revised: 07/16/2011] [Accepted: 07/20/2011] [Indexed: 11/26/2022]
|
20
|
GUO JX, RAO NN, LIU GX, LI J, WANG YH. Predicting Protein Folding Rate From Amino Acid Sequence. PROG BIOCHEM BIOPHYS 2011. [DOI: 10.3724/sp.j.1206.2010.00380] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
21
|
Harihar B, Selvaraj S. Application of long-range order to predict unfolding rates of two-state proteins. Proteins 2010; 79:880-7. [DOI: 10.1002/prot.22925] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Revised: 10/07/2010] [Accepted: 10/24/2010] [Indexed: 01/09/2023]
|
22
|
Zhang H, Zhang T, Gao J, Ruan J, Shen S, Kurgan L. Determination of protein folding kinetic types using sequence and predicted secondary structure and solvent accessibility. Amino Acids 2010; 42:271-83. [DOI: 10.1007/s00726-010-0805-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Accepted: 11/01/2010] [Indexed: 10/18/2022]
|
23
|
Gao J, Zhang T, Zhang H, Shen S, Ruan J, Kurgan L. Accurate prediction of protein folding rates from sequence and sequence-derived residue flexibility and solvent accessibility. Proteins 2010; 78:2114-30. [PMID: 20455267 DOI: 10.1002/prot.22727] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Protein folding rates vary by several orders of magnitude and they depend on the topology of the fold and the size and composition of the sequence. Although recent works show that the rates can be predicted from the sequence, allowing for high-throughput annotations, they consider only the sequence and its predicted secondary structure. We propose a novel sequence-based predictor, PFR-AF, which utilizes solvent accessibility and residue flexibility predicted from the sequence, to improve predictions and provide insights into the folding process. The predictor includes three linear regressions for proteins with two-state, multistate, and unknown (mixed-state) folding kinetics. PFR-AF on average outperforms current methods when tested on three datasets. The proposed approach provides high-quality predictions in the absence of similarity between the predicted and the training sequences. The PFR-AF's predictions are characterized by high (between 0.71 and 0.95, depending on the dataset) correlation and the lowest (between 0.75 and 0.9) mean absolute errors with respect to the experimental rates, as measured using out-of-sample tests. Our models reveal that for the two-state chains inclusion of solvent-exposed Ala may accelerate the folding, while increased content of Ile may reduce the folding speed. We also demonstrate that increased flexibility of coils facilitates faster folding and that proteins with larger content of solvent-exposed strands may fold at a slower pace. The increased flexibility of the solvent-exposed residues is shown to elongate folding, which also holds, with a lower correlation, for buried residues. Two case studies are included to support our findings.
Collapse
Affiliation(s)
- Jianzhao Gao
- College of Mathematics and LPMC, Nankai University, Tianjin, People's Republic of China
| | | | | | | | | | | |
Collapse
|
24
|
Huang LT, Gromiha MM. First insight into the prediction of protein folding rate change upon point mutation. Bioinformatics 2010; 26:2121-7. [DOI: 10.1093/bioinformatics/btq350] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
25
|
Xi L, Li S, Liu H, Li J, Lei B, Yao X. Global and local prediction of protein folding rates based on sequence autocorrelation information. J Theor Biol 2010; 264:1159-68. [DOI: 10.1016/j.jtbi.2010.03.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2009] [Revised: 03/28/2010] [Accepted: 03/29/2010] [Indexed: 11/24/2022]
|
26
|
Lin GN, Wang Z, Xu D, Cheng J. SeqRate: sequence-based protein folding type classification and rates prediction. BMC Bioinformatics 2010; 11 Suppl 3:S1. [PMID: 20438647 PMCID: PMC2863059 DOI: 10.1186/1471-2105-11-s3-s1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Protein folding rate is an important property of a protein. Predicting protein folding rate is useful for understanding protein folding process and guiding protein design. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. And most methods do not distinguish the different kinetic nature (two-state folding or multi-state folding) of the proteins. Here we developed a method, SeqRate, to predict both protein folding kinetic type (two-state versus multi-state) and real-value folding rate using sequence length, amino acid composition, contact order, contact number, and secondary structure information predicted from only protein sequence with support vector machines. RESULTS We systematically studied the contributions of individual features to folding rate prediction. On a standard benchmark dataset, the accuracy of folding kinetic type classification is 80%. The Pearson correlation coefficient and the mean absolute difference between predicted and experimental folding rates (sec-1) in the base-10 logarithmic scale are 0.81 and 0.79 for two-state protein folders, and 0.80 and 0.68 for three-state protein folders. SeqRate is the first sequence-based method for protein folding type classification and its accuracy of fold rate prediction is improved over previous sequence-based methods. Its performance can be further enhanced with additional information, such as structure-based geometric contacts, as inputs. CONCLUSIONS Both the web server and software of predicting folding rate are publicly available at http://casp.rnet.missouri.edu/fold_rate/index.html.
Collapse
Affiliation(s)
- Guan Ning Lin
- Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA
| | - Zheng Wang
- Department of Computer Science, University of Missouri, Columbia, Missouri, 65211, USA
| | - Dong Xu
- Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA
- Department of Computer Science, University of Missouri, Columbia, Missouri, 65211, USA
| | - Jianlin Cheng
- Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA
- Department of Computer Science, University of Missouri, Columbia, Missouri, 65211, USA
| |
Collapse
|
27
|
Harihar B, Selvaraj S. Refinement of the long-range order parameter in predicting folding rates of two-state proteins. Biopolymers 2009; 91:928-35. [DOI: 10.1002/bip.21281] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
28
|
Abstract
Various topologies for representing 3D protein structures have been advanced for purposes ranging from prediction of folding rates to ab initio structure prediction. Examples include relative contact order, Delaunay tessellations, and backbone torsion angle distributions. Here, we introduce a new topology based on a novel means for operationalizing 3D proximities with respect to the underlying chain. The measure involves first interpreting a rank-based representation of the nearest neighbors of each residue as a permutation, then determining how perturbed this permutation is relative to an unfolded chain. We show that the resultant topology provides improved association with folding and unfolding rates determined for a set of two-state proteins under standardized conditions. Furthermore, unlike existing topologies, the proposed geometry exhibits fine scale structure with respect to sequence position along the chain, potentially providing insights into folding initiation and/or nucleation sites.
Collapse
Affiliation(s)
- Mark R Segal
- Division of Biostatistics, University of California, San Francisco, California 94107, USA.
| |
Collapse
|
29
|
Gromiha MM. Multiple Contact Network Is a Key Determinant to Protein Folding Rates. J Chem Inf Model 2009; 49:1130-5. [DOI: 10.1021/ci800440x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- M. Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
30
|
Istomin AY, Jacobs DJ, Livesay DR. On the role of structural class of a protein with two-state folding kinetics in determining correlations between its size, topology, and folding rate. Protein Sci 2008; 16:2564-9. [PMID: 17962408 DOI: 10.1110/ps.073124507] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The time it takes for proteins to fold into their native states varies over several orders of magnitude depending on their native-state topology, size, and amino acid composition. In a number of previous studies, it was found that there is strong correlation between logarithmic folding rates and contact order for proteins that fold with two-state kinetics, while such correlation is absent for three-state proteins. Conversely, strong correlations between folding rates and chain length occur within three-state proteins, but not in two-state proteins. Here, we demonstrate that chain lengths and folding rates of two-state proteins are not correlated with each other only when all-alpha, all-beta, and mixed-class proteins are considered together, which is typically the case. However, when considering all-alpha and all-beta two-state proteins separately, there is significant linear correlation between folding rate and size. Moreover, the sets of data points for the all-alpha and all-beta classes define asymptotes of lower and upper limits on folding rates of mixed-class proteins. By analyzing correlation of other topological parameters with folding rates of two-state proteins, we find that only the long-range order exhibits correlation with folding rates that is uniform over all three classes. It is also the only descriptor to provide statistically significant correlations for each of the three structural classes. We give an interpretation of this observation in terms of Makarov and Plaxco's diffusion-based topomer-search model.
Collapse
Affiliation(s)
- Andrei Y Istomin
- Department of Physics and Optical Science, University of North Carolina at Charlotte 28223, USA.
| | | | | |
Collapse
|
31
|
Huang JT, Cheng JP. Differentiation between two-state and multi-state folding proteins based on sequence. Proteins 2008; 72:44-9. [DOI: 10.1002/prot.21893] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
32
|
Abstract
The "protein folding problem" consists of three closely related puzzles: (a) What is the folding code? (b) What is the folding mechanism? (c) Can we predict the native structure of a protein from its amino acid sequence? Once regarded as a grand challenge, protein folding has seen great progress in recent years. Now, foldable proteins and nonbiological polymers are being designed routinely and moving toward successful applications. The structures of small proteins are now often well predicted by computer methods. And, there is now a testable explanation for how a protein can fold so quickly: A protein solves its large global optimization problem as a series of smaller local optimization problems, growing and assembling the native structure from peptide fragments, local structures first.
Collapse
Affiliation(s)
- Ken A. Dill
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94143
- Graduate Group in Biophysics, University of California, San Francisco, California 94143;
| | - S. Banu Ozkan
- Department of Physics, Arizona State University, Tempe, Arizona 85287;
| | - M. Scott Shell
- Department of Chemical Engineering, University of California, Santa Barbara, California 93106;
| | - Thomas R. Weikl
- Max Planck Institute of Colloids and Interfaces, Department of Theory and Bio-Systems, 14424 Potsdam, Germany;
| |
Collapse
|
33
|
Weikl TR. Loop-closure principles in protein folding. Arch Biochem Biophys 2008; 469:67-75. [PMID: 17662688 DOI: 10.1016/j.abb.2007.06.018] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2007] [Revised: 06/20/2007] [Accepted: 06/22/2007] [Indexed: 10/23/2022]
Abstract
Simple theoretical concepts and models have been helpful to understand the folding rates and routes of single-domain proteins. As reviewed in this article, a physical principle that appears to underly these models is loop closure.
Collapse
Affiliation(s)
- Thomas R Weikl
- Max Planck Institute of Colloids and Interfaces, Department of Theory and Bio-Systems, 14424 Potsdam, Germany.
| |
Collapse
|
34
|
Huang JT, Cheng JP. Prediction of folding transition-state position (βT) of small, two-state proteins from local secondary structure content. Proteins 2007; 68:218-22. [PMID: 17469192 DOI: 10.1002/prot.21411] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Folding kinetics of proteins is governed by the free energy and position of transition states. But attempts to predict the position of folding transition state on reaction pathway from protein structure have been met with only limited success, unlike the folding-rate prediction. Here, we find that the folding transition-state position is related to the secondary structure content of native two-state proteins. We present a simple method for predicting the transition-state position from their alpha-helix, turn and polyproline secondary structures. The method achieves 81% correlation with experiment over 24 small, two-state proteins, suggesting that the local secondary structure content, especially for content of alpha-helix, is a determinant of the solvent accessibility of the transition state ensemble and size of folding nucleus.
Collapse
Affiliation(s)
- Ji-Tao Huang
- College of Chemistry and State Key Laboratory of Elemento-Organic Chemistry, Nankai University, Tianjin 300071, China
| | | |
Collapse
|