1
|
Roche R, Bhattacharya S, Bhattacharya D. Hybridized distance- and contact-based hierarchical structure modeling for folding soluble and membrane proteins. PLoS Comput Biol 2021; 17:e1008753. [PMID: 33621244 PMCID: PMC7935296 DOI: 10.1371/journal.pcbi.1008753] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2020] [Revised: 03/05/2021] [Accepted: 01/31/2021] [Indexed: 11/18/2022] Open
Abstract
Crystallography and NMR system (CNS) is currently a widely used method for fragment-free ab initio protein folding from inter-residue distance or contact maps. Despite its widespread use in protein structure prediction, CNS is a decade-old macromolecular structure determination system that was originally developed for solving macromolecular geometry from experimental restraints as opposed to predictive modeling driven by interaction map data. As such, the adaptation of the CNS experimental structure determination protocol for ab initio protein folding is intrinsically anomalous that may undermine the folding accuracy of computational protein structure prediction. In this paper, we propose a new CNS-free hierarchical structure modeling method called DConStruct for folding both soluble and membrane proteins driven by distance and contact information. Rigorous experimental validation shows that DConStruct attains much better reconstruction accuracy than CNS when tested with the same input contact map at varying contact thresholds. The hierarchical modeling with iterative self-correction employed in DConStruct scales at a much higher degree of folding accuracy than CNS with the increase in contact thresholds, ultimately approaching near-optimal reconstruction accuracy at higher-thresholded contact maps. The folding accuracy of DConStruct can be further improved by exploiting distance-based hybrid interaction maps at tri-level thresholding, as demonstrated by the better performance of our method in folding free modeling targets from the 12th and 13th rounds of the Critical Assessment of techniques for protein Structure Prediction (CASP) experiments compared to popular CNS- and fragment-based approaches and energy-minimization protocols, some of which even using much finer-grained distance maps than ours. Additional large-scale benchmarking shows that DConStruct can significantly improve the folding accuracy of membrane proteins compared to a CNS-based approach. These results collectively demonstrate the feasibility of greatly improving the accuracy of ab initio protein folding by optimally exploiting the information encoded in inter-residue interaction maps beyond what is possible by CNS. Predicting the folded and functional 3-dimensional structure of a protein molecule from its amino acid sequence is of central importance to structural biology. Recently, promising advances have been made in ab initio protein folding due to the reasonably accurate estimation of inter-residue interaction maps at increasingly higher resolutions that range from binary contacts to finer-grained distances. Despite the progress in predicting the interaction maps, approaches for turning the residue-residue interactions projected in these maps into their precise spatial positioning heavily rely on a decade-old experimental structure determination protocol that is not suitable for predictive modeling. This paper presents a new hierarchical structure modeling method, DConStruct, which can better exploit the information encoded in the interaction maps at multiple granularities, from binary contact maps to distance-based hybrid maps at tri-level thresholding, for improved ab initio folding. Multiple large-scale benchmarking experiments show that our proposed method can substantially improve the folding accuracy for both soluble and membrane proteins compared to state-of-the-art approaches. DConStruct is licensed under the GNU General Public License v3 and freely available at https://github.com/Bhattacharya-Lab/DConStruct.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
- Department of Biological Sciences, Auburn University, Auburn, Alabama, United States of America
- * E-mail:
| |
Collapse
|
2
|
McGehee AJ, Bhattacharya S, Roche R, Bhattacharya D. PolyFold: An interactive visual simulator for distance-based protein folding. PLoS One 2020; 15:e0243331. [PMID: 33270805 PMCID: PMC7714222 DOI: 10.1371/journal.pone.0243331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Accepted: 11/18/2020] [Indexed: 11/18/2022] Open
Abstract
Recent advances in distance-based protein folding have led to a paradigm shift in protein structure prediction. Through sufficiently precise estimation of the inter-residue distance matrix for a protein sequence, it is now feasible to predict the correct folds for new proteins much more accurately than ever before. Despite the exciting progress, a dedicated visualization system that can dynamically capture the distance-based folding process is still lacking. Most molecular visualizers typically provide only a static view of a folded protein conformation, but do not capture the folding process. Even among the selected few graphical interfaces that do adopt a dynamic perspective, none of them are distance-based. Here we present PolyFold, an interactive visual simulator for dynamically capturing the distance-based protein folding process through real-time rendering of a distance matrix and its compatible spatial conformation as it folds in an intuitive and easy-to-use interface. PolyFold integrates highly convergent stochastic optimization algorithms with on-demand customizations and interactive manipulations to maximally satisfy the geometric constraints imposed by a distance matrix. PolyFold is capable of simulating the complex process of protein folding even on modest personal computers, thus making it accessible to the general public for fostering citizen science. Open source code of PolyFold is freely available for download at https://github.com/Bhattacharya-Lab/PolyFold. It is implemented in cross-platform Java and binary executables are available for macOS, Linux, and Windows.
Collapse
Affiliation(s)
- Andrew J. McGehee
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Rahmatullah Roche
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, United States of America
- Department of Biological Sciences, Auburn University, Auburn, AL, United States of America
- * E-mail:
| |
Collapse
|
3
|
Kurczynska M, Kotulska M. Automated method to differentiate between native and mirror protein models obtained from contact maps. PLoS One 2018; 13:e0196993. [PMID: 29787567 PMCID: PMC5963800 DOI: 10.1371/journal.pone.0196993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Accepted: 04/24/2018] [Indexed: 11/23/2022] Open
Abstract
Mirror protein structures are often considered as artifacts in modeling protein structures. However, they may soon become a new branch of biochemistry. Moreover, methods of protein structure reconstruction, based on their residue-residue contact maps, need methodology to differentiate between models of native and mirror orientation, especially regarding the reconstructed backbones. We analyzed 130 500 structural protein models obtained from contact maps of 1 305 SCOP domains belonging to all 7 structural classes. On average, the same numbers of native and mirror models were obtained among 100 models generated for each domain. Since their structural features are often not sufficient for differentiating between the two types of model orientations, we proposed to apply various energy terms (ETs) from PyRosetta to separate native and mirror models. To automate the procedure for differentiating these models, the k-means clustering algorithm was applied. Using total energy did not allow to obtain appropriate clusters–the accuracy of the clustering for class A (all helices) was no more than 0.52. Therefore, we tested a series of different k-means clusterings based on various combinations of ETs. Finally, applying two most differentiating ETs for each class allowed to obtain satisfying results. To unify the method for differentiating between native and mirror models, independent of their structural class, the two best ETs for each class were considered. Finally, the k-means clustering algorithm used three common ETs: probability of amino acid assuming certain values of dihedral angles Φ and Ψ, Ramachandran preferences and Coulomb interactions. The accuracies of clustering with these ETs were in the range between 0.68 and 0.76, with sensitivity and selectivity in the range between 0.68 and 0.87, depending on the structural class. The method can be applied to all fully-automated tools for protein structure reconstruction based on contact maps, especially those analyzing big sets of models.
Collapse
Affiliation(s)
- Monika Kurczynska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - Malgorzata Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
- * E-mail:
| |
Collapse
|
4
|
Kurczynska M, Kania E, Konopka BM, Kotulska M. Applying PyRosetta molecular energies to separate properly oriented protein models from mirror models, obtained from contact maps. J Mol Model 2016; 22:111. [PMID: 27107578 PMCID: PMC4842210 DOI: 10.1007/s00894-016-2975-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 04/05/2016] [Indexed: 11/30/2022]
Abstract
Reconstructing protein structure based on contact maps leads to two types of models: properly oriented models and mirror models. This is due to the fact that contact maps do not include information on protein chirality. Therefore, both types of model orientations share the same contact map and are geometrically allowed. In this work, we verified the hypothesis that some of the energy terms calculated by PyRosetta could be useful to distinguish between properly oriented and mirror models. We studied 440 models of all-alpha protein domains reconstructed manually from their contact maps, where 50 % of the models were properly oriented and 50 % had mirror orientation. We showed that dihedral angles and energy terms, based on the probability of specific geometrical arrangement of the residues, differed significantly for properly oriented and mirror models.
Collapse
Affiliation(s)
- Monika Kurczynska
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
| | - Ewa Kania
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland.,Biotechnology Center, Dresden University of Technology, Tatzberg 47/49, 01307, Dresden, Germany
| | - Bogumil M Konopka
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland
| | - Malgorzata Kotulska
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wroclaw University of Science and Technology, Wybrzeze Wyspianskiego 27, 50-370, Wroclaw, Poland.
| |
Collapse
|
5
|
Bywater RP. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data. PLoS One 2016; 11:e0150769. [PMID: 26963911 PMCID: PMC4786192 DOI: 10.1371/journal.pone.0150769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/17/2016] [Indexed: 11/18/2022] Open
Abstract
Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before.
Collapse
|
6
|
Bywater RP. Prediction of protein structural features from sequence data based on Shannon entropy and Kolmogorov complexity. PLoS One 2015; 10:e0119306. [PMID: 25856073 PMCID: PMC4391790 DOI: 10.1371/journal.pone.0119306] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2013] [Accepted: 01/29/2015] [Indexed: 11/21/2022] Open
Abstract
While the genome for a given organism stores the information necessary for the organism to function and flourish it is the proteins that are encoded by the genome that perhaps more than anything else characterize the phenotype for that organism. It is therefore not surprising that one of the many approaches to understanding and predicting protein folding and properties has come from genomics and more specifically from multiple sequence alignments. In this work I explore ways in which data derived from sequence alignment data can be used to investigate in a predictive way three different aspects of protein structure: secondary structures, inter-residue contacts and the dynamics of switching between different states of the protein. In particular the use of Kolmogorov complexity has identified a novel pathway towards achieving these goals.
Collapse
|
7
|
Abstract
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid.
Collapse
|
8
|
Jain P, Hirst JD. Exploring protein structural dissimilarity to facilitate structure classification. BMC STRUCTURAL BIOLOGY 2009; 9:60. [PMID: 19765314 PMCID: PMC2754988 DOI: 10.1186/1472-6807-9-60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2009] [Accepted: 09/19/2009] [Indexed: 12/04/2022]
Abstract
BACKGROUND Classification of newly resolved protein structures is important in understanding their architectural, evolutionary and functional relatedness to known protein structures. Among various efforts to improve the database of Structural Classification of Proteins (SCOP), automation has received particular attention. Herein, we predict the deepest SCOP structural level that an unclassified protein shares with classified proteins with an equal number of secondary structure elements (SSEs). RESULTS We compute a coefficient of dissimilarity (Omega) between proteins, based on structural and sequence-based descriptors characterising the respective constituent SSEs. For a set of 1,661 pairs of proteins with sequence identity up to 35%, the performance of Omega in predicting shared Class, Fold and Super-family levels is comparable to that of DaliLite Z score and shows a greater than four-fold increase in the true positive rate (TPR) for proteins sharing the Family level. On a larger set of 600 domains representing 200 families, the performance of Z score improves in predicting a shared Family, but still only achieves about half of the TPR of Omega. The TPR for structures sharing a Super-family is lower than in the first dataset, but Omega performs slightly better than Z score. Overall, the sensitivity of Omega in predicting common Fold level is higher than that of the DaliLite Z score. CONCLUSION Classification to a deeper level in the hierarchy is specific and difficult. So the efficiency of Omega may be attractive to the curators and the end-users of SCOP. We suggest Omega may be a better measure for structure classification than the DaliLite Z score, with the caveat that currently we are restricted to comparing structures with equal number of SSEs.
Collapse
Affiliation(s)
- Pooja Jain
- School of Chemistry, The University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | - Jonathan D Hirst
- School of Chemistry, The University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| |
Collapse
|
9
|
Kaján L, Rychlewski L. Evaluation of 3D-Jury on CASP7 models. BMC Bioinformatics 2007; 8:304. [PMID: 17711571 PMCID: PMC2040163 DOI: 10.1186/1471-2105-8-304] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 08/21/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND 3D-Jury, the structure prediction consensus method publicly available in the Meta Server http://meta.bioinfo.pl/, was evaluated using models gathered in the 7th round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers. RESULTS The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models. CONCLUSION The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature http://meta.bioinfo.pl/compare_your_model_example.pl available in the Meta Server.
Collapse
Affiliation(s)
- László Kaján
- BioInfoBank Institute, ul. Limanowskiego 24 A, 60-744 Poznań, Poland
| | - Leszek Rychlewski
- BioInfoBank Institute, ul. Limanowskiego 24 A, 60-744 Poznań, Poland
- Bioinformatics Unit, Department of Physics, Adam Mickiewicz University, ul. Umultowska 85, 61-614 Poznań, Poland
| |
Collapse
|
10
|
Chen Y, Ding F, Dokholyan NV. Fidelity of the protein structure reconstruction from inter-residue proximity constraints. J Phys Chem B 2007; 111:7432-8. [PMID: 17542631 DOI: 10.1021/jp068963t] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Inter-residue proximity constraints obtained in such experiments as cross-linking/mass spectrometry are important sources of information for protein structure determination. A central question in structure determination using these constraints is, What is the minimal number of inter-residue constraints needed to determine the fold of a protein? It is also unknown how the different structural aspects of constraints differentiate their ability in determining the native fold and whether there is a rational strategy for selecting constraints that feature higher fidelity in structure determination. To shed light on these questions, we study the fidelity of protein fold determination using theoretical inter-residue proximity constraints derived from protein native structures and the effect of various subsets of such constraints on fold determination. We show that approximately 70% randomly selected constraints are sufficient for determining the fold of a domain (with an average root-mean-square deviation of <or=3.4 A from their native structures). We find that random constraint selection often outperforms the rational strategy that predominantly favors the constraints representing global structural features. To uncover a strategy for constraint selection for the optimal structure determination, we study the role of the topological properties of these constraints. Interestingly, we do not observe any correlation between various simple topological properties of the selected constraints, emphasizing different global and local structural features, and the performance of these constraints, suggesting that accurate protein structure determination relies on a composite of global and local structural information.
Collapse
Affiliation(s)
- Yiwen Chen
- Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | | | | |
Collapse
|
11
|
Koh IYY, Eyrich VA, Marti-Renom MA, Przybylski D, Madhusudhan MS, Eswar N, Graña O, Pazos F, Valencia A, Sali A, Rost B. EVA: Evaluation of protein structure prediction servers. Nucleic Acids Res 2003; 31:3311-5. [PMID: 12824315 PMCID: PMC169025 DOI: 10.1093/nar/gkg619] [Citation(s) in RCA: 134] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
EVA (http://cubic.bioc.columbia.edu/eva/) is a web server for evaluation of the accuracy of automated protein structure prediction methods. The evaluation is updated automatically each week, to cope with the large number of existing prediction servers and the constant changes in the prediction methods. EVA currently assesses servers for secondary structure prediction, contact prediction, comparative protein structure modelling and threading/fold recognition. Every day, sequences of newly available protein structures in the Protein Data Bank (PDB) are sent to the servers and their predictions are collected. The predictions are then compared to the experimental structures once a week; the results are published on the EVA web pages. Over time, EVA has accumulated prediction results for a large number of proteins, ranging from hundreds to thousands, depending on the prediction method. This large sample assures that methods are compared reliably. As a result, EVA provides useful information to developers as well as users of prediction methods.
Collapse
Affiliation(s)
- Ingrid Y Y Koh
- Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St Nicholas Avenue, New York, NY 10032, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Taylor WR, Munro REJ, Petersen K, Bywater RP. Ab initio modelling of the N-terminal domain of the secretin receptors. Comput Biol Chem 2003; 27:103-14. [PMID: 12821307 DOI: 10.1016/s1476-9271(03)00020-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
G protein coupled receptors of the secretin family are activated by peptide hormones of about 30 residues in length. There is considerable sequence homology within both the hormone and receptor families. The receptors possess in addition to the integral membrane domain a characteristic extracellular domain of about 120 residues in length, having conserved cysteine residues, which are involved in disulphide bridge formation, and tryptophanes, which have been shown to be critical for hormone binding. This extracellular domain does not have detectable homology to any known protein fold. In order to be able to propose a structure for this domain we have used ab initio prediction methods combined with constraints based on experimental results for the disulphide connectivity. The results of computational tools for predicting secondary structure and accessibility, together with ligand binding and mutational data and other structural considerations were used in the ab initio protein folding programs DRAGON and GADGET and also the simpler program RAMBLE, which was able to explore different permutations of disulphide bond connectivity, tryptophan side chain orientation and chain topology. The methods generated a limited number of plausible models but no single unique solution was found under the constraints. One of these was refined into a full atomic model that contained a possible peptide binding site comprising the most conserved residues.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, NW7 1AA, London, UK
| | | | | | | |
Collapse
|
13
|
Nelson E, Grishin N. Investigation of the folding profiles of evolutionarily selected model proteins. J Chem Phys 2003. [DOI: 10.1063/1.1536621] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
14
|
Berriz GF, Shakhnovich EI. Characterization of the folding kinetics of a three-helix bundle protein via a minimalist Langevin model. J Mol Biol 2001; 310:673-85. [PMID: 11439031 DOI: 10.1006/jmbi.2001.4792] [Citation(s) in RCA: 44] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We use a simple off-lattice Langevin model of protein folding to characterize the folding and unfolding of a fast-folding, 46 residue three-helix bundle. Under conditions at which the C-terminal helix is 30 % stable, we observe a clear three-state folding mechanism. In the on-pathway intermediate state, the middle and C-terminal helices are folded and in contact with each other, while the N-terminal region remains disordered. Nevertheless, under these conditions this intermediate is thermodynamically unstable relative to its unfolded state. The first and highest folding barrier corresponds to the organization of the hinge between the middle and C-terminal helices. A subsequent major barrier corresponds to the organization of the hinge between the middle and N-terminal helices. Hyperstabilizing the hinge regions leads to twice the folding rate that is obtained from hyperstabilizing the helices, even though much fewer contacts are involved in hinge hyperstabilization than in helix hyperstabilization. Unfolding follows single-exponential kinetics, even at temperatures only slightly above the folding transition temperature.
Collapse
Affiliation(s)
- G F Berriz
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA 02138, USA
| | | |
Collapse
|
15
|
Xu Y, Xu D, Crawford OH, Einstein JR. A computational method for NMR-constrained protein threading. J Comput Biol 2001; 7:449-67. [PMID: 11108473 DOI: 10.1089/106652700750050880] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein threading provides an effective method for fold recognition and backbone structure prediction. But its application is currently limited due to its level of prediction accuracy and scope of applicability. One way to significantly improve its usefulness is through the incorporation of underconstrained (or partial) NMR data. It is well known that the NMR method for protein structure determination applies only to small proteins and that its effectiveness decreases rapidly as the protein mass increases beyond about 30 kD. We present, in this paper, a computational framework for applying underconstrained NMR data (that alone are insufficient for structure determination) as constraints in protein threading and also in all-atom model construction. In this study, we consider both secondary structure assignments from chemical shifts and NOE distance restraints. Our results have shown that both secondary structure assignments and a small number of long-range NOEs can significantly improve the threading quality in both fold recognition and threading-alignment accuracy, and can possibly extend threading's scope of applicability from homologs to analogs. An accurate backbone structure generated by NMR-constrained threading can then provide a great amount of structural information, equivalent to that provided by many NMR data; and hence can help reduce the number of NMR data typically required for an accurate structure determination. This new technique can potentially accelerate current NMR structure determination processes and possibly expand NMR's capability to larger proteins.
Collapse
Affiliation(s)
- Y Xu
- Life Sciences Division, Oak Ridge National Laboratory, TN 37831-6480, USA.
| | | | | | | |
Collapse
|
16
|
Abstract
We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible three-dimensional arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case, our method finds significant numbers of conformations close to the native structure. In addition, we assign coordinates to all atoms for four of the 25 proteins and show that this has a small effect on the number of near-native conformations. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of conformations (between 0.02% and 82%) within 6 A of the native structure. The method's speed and efficiency make it a valuable tool for predicting protein structure.
Collapse
Affiliation(s)
- B Fain
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | |
Collapse
|
17
|
Abstract
We discuss the problem of representations of protein structure and give the definition of contact maps. We present a method to obtain a three-dimensional polypeptide conformation from a contact map. We also explain how to deal with the case of nonphysical contact maps. We describe a stochastic method to perform dynamics in contact map space. We explain how the motion is restricted to physical regions of the space. First, we introduce the exact free energy of a contact map and discuss two simple approximations to it. Second, we present a method to derive energy parameters based on perception learning. We prove in an extensive number of situations that the pairwise contact approximation both when alone and when supplemented with a hydrophobic term is unsuitable for stabilizing proteins' native states.
Collapse
Affiliation(s)
- M Vendruscolo
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | | |
Collapse
|
18
|
Huang ES, Samudrala R, Ponder JW. Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. J Mol Biol 1999; 290:267-81. [PMID: 10388572 DOI: 10.1006/jmbi.1999.2861] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The problem of protein tertiary structure prediction from primary sequence can be separated into two subproblems: generation of a library of possible folds and specification of a best fold given the library. A distance geometry procedure based on random pairwise metrization with good sampling properties was used to generate a library of 500 possible structures for each of 11 small helical proteins. The input to distance geometry consisted of sets of restraints to enforce predicted helical secondary structure and a generic range of 5 to 11 A between predicted contact residues on all pairs of helices. For each of the 11 targets, the resulting library contained structures with low RMSD versus the native structure. Near-native sampling was enhanced by at least three orders of magnitude compared to a random sampling of compact folds. All library members were scored with a combination of an all-atom distance-dependent function, a residue pair-potential, and a hydrophobicity function. In six of the 11 cases, the best-ranking fold was considered to be near native. Each library was also reduced to a final ab initio prediction via consensus distance geometry performed over the 50 best-ranking structures from the full set of 500. The consensus results were of generally higher quality, yielding six predictions within 6.5 A of the native fold. These favorable predictions corresponded to those for which the correlation between the RMSD and the scoring function were highest. The advantage of the reported methodology is its extreme simplicity and potential for including other types of structural restraints.
Collapse
Affiliation(s)
- E S Huang
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, Saint Louis, MO, 63110, USA
| | | | | |
Collapse
|
19
|
Ayers DJ, Gooley PR, Widmer-Cooper A, Torda AE. Enhanced protein fold recognition using secondary structure information from NMR. Protein Sci 1999; 8:1127-33. [PMID: 10338023 PMCID: PMC2144327 DOI: 10.1110/ps.8.5.1127] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
NMR offers the possibility of accurate secondary structure for proteins that would be too large for structure determination. In the absence of an X-ray crystal structure, this information should be useful as an adjunct to protein fold recognition methods based on low resolution force fields. The value of this information has been tested by adding varying amounts of artificial secondary structure data and threading a sequence through a library of candidate folds. Using a literature test set, the threading method alone has only a one-third chance of producing a correct answer among the top ten guesses. With realistic secondary structure information, one can expect a 60-80% chance of finding a homologous structure. The method has then been applied to examples with published estimates of secondary structure. This implementation is completely independent of sequence homology, and sequences are optimally aligned to candidate structures with gaps and insertions allowed. Unlike work using predicted secondary structure, we test the effect of differing amounts of relatively reliable data.
Collapse
Affiliation(s)
- D J Ayers
- Research School of Chemistry, Australian National University, Canberra ACT
| | | | | | | |
Collapse
|
20
|
|
21
|
Chelvanayagam G, Knecht L, Jenny T, Benner SA, Gonnet GH. A combinatorial distance-constraint approach to predicting protein tertiary models from known secondary structure. FOLDING & DESIGN 1998; 3:149-60. [PMID: 9562545 DOI: 10.1016/s1359-0278(98)00023-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND Distance geometry methods allow protein structures to be constructed using a large number of distance constraints, which can be elucidated by experimental techniques such as NMR. New methods for gleaning tertiary structural information from multiple sequence alignments make it possible for distance constraints to be predicted from sequence information alone. The basic distance geometry method can thus be applied using these empirically derived distance constraints. Such an approach, which incorporates a novel combinatoric procedure, is reported here. RESULTS Given the correct sheet topology and disulfide formations, the fully automated procedure is generally able to construct native-like Calpha models for eight small beta-protein structures. When the sheet topology was unknown but disulfide connectivities were included, all sheet topologies were explored by the combinatorial procedure. Using a simple geometric evaluation scheme, models with the correct sheet topology were ranked first in four of the eight example cases, second in three examples and third in one example. If neither the sheet topology nor the disulfide connectivities were given a priori, all combinations of sheet topologies and disulfides were explored by the combinatorial procedure. The evaluation scheme ranked the correct topology within the top five folds for half the example cases. CONCLUSIONS The combinatorial procedure is a useful technique for identifying a limited number of low-resolution candidate folds for small, disulfide-rich, beta-protein structures. Better results are obtained, however, if correct disulfide connectivities are known in advance. Combinatorial distance constraints can be applied whenever there are a sufficiently small number of finite connectivities.
Collapse
Affiliation(s)
- G Chelvanayagam
- Computational Chemistry Group, Universitätstrasse 16, ETH Zentrum, Zürich, CH 8092, Switzerland.
| | | | | | | | | |
Collapse
|
22
|
Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. FOLDING & DESIGN 1998; 2:295-306. [PMID: 9377713 DOI: 10.1016/s1359-0278(97)00041-2] [Citation(s) in RCA: 196] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
BACKGROUND Prediction of a protein's structure from its amino acid sequence is a key issue in molecular biology. While dynamics, performed in the space of two-dimensional contact maps, eases the necessary conformational search, it may also lead to maps that do not correspond to any real three-dimensional structure. To remedy this, an efficient procedure is needed to reconstruct three-dimensional conformations from their contact maps. RESULTS We present an efficient algorithm to recover the three-dimensional structure of a protein from its contact map representation. We show that when a physically realizable map is used as target, our method generates a structure whose contact map is essentially similar to the target. furthermore, the reconstructed and original structures are similar up to the resolution of the contact map representation. Next, we use nonphysical target maps, obtained by corrupting a physical one; in this case, our method essentially recovers the underlying physical map and structure. Hence, our algorithm will help to fold proteins, using dynamics in the space of contact maps. Finally, we investigate the manner in which the quality of the recovered structure degrades when the number of contacts is reduced. CONCLUSIONS The procedure is capable of assigning quickly and reliably a three-dimensional structure to a given contact map. It is well suited for use in parallel with dynamics in contact map space to project a contact map onto its closest physically allowed structural counterpart.
Collapse
Affiliation(s)
- M Vendruscolo
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | | | | |
Collapse
|
23
|
Dandekar T, König R. Computational methods for the prediction of protein folds. BIOCHIMICA ET BIOPHYSICA ACTA 1997; 1343:1-15. [PMID: 9428653 DOI: 10.1016/s0167-4838(97)00132-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
24
|
Kerr ID, Sansom MS. The pore-lining region of shaker voltage-gated potassium channels: comparison of beta-barrel and alpha-helix bundle models. Biophys J 1997; 73:581-602. [PMID: 9251779 PMCID: PMC1180959 DOI: 10.1016/s0006-3495(97)78095-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Although there is a large body of site-directed mutagenesis data that identify the pore-lining sequence of the voltage-gated potassium channel, the structure of this region remains unknown. We have interpreted the available biochemical data as a set of topological and orientational restraints and employed these restraints to produce molecular models of the potassium channel pore region, H5. The H5 sequence has been modeled either as a tetramer of membrane-spanning beta-hairpins, thus producing an eight-stranded beta-barrel, or as a tetramer of incompletely membrane-spanning alpha-helical hairpins, thus producing an eight-staved alpha-helix bundle. In total, restraints-directed modeling has produced 40 different configurations of the beta-barrel model, each configuration comprising an ensemble of 20 structures, and 24 different configurations of the alpha-helix bundle model, each comprising an ensemble of 24 structures. Thus, over 1300 model structures for H5 have been generated. Configurations have been ranked on the basis of their predicted pore properties and on the extent of their agreement with the biochemical data. This ranking is employed to identify particular configurations of H5 that may be explored further as models of the pore-lining region of the voltage-gated potassium channel pore.
Collapse
Affiliation(s)
- I D Kerr
- Laboratory of Molecular Biophysics, University of Oxford, England
| | | |
Collapse
|