1
|
Singh TV, Shagolsem LS. Universality and Identity Ordering in Heteropolymer Coil–Globule Transition. Macromolecules 2022. [DOI: 10.1021/acs.macromol.2c01559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Thoudam Vilip Singh
- Department of Physics, National Institute of Technology Manipur, Imphal795004, India
| | - Lenin S. Shagolsem
- Department of Physics, National Institute of Technology Manipur, Imphal795004, India
| |
Collapse
|
2
|
Reza MS, Zhang H, Hossain MT, Jin L, Feng S, Wei Y. COMTOP: Protein Residue-Residue Contact Prediction through Mixed Integer Linear Optimization. MEMBRANES 2021; 11:membranes11070503. [PMID: 34209399 PMCID: PMC8305966 DOI: 10.3390/membranes11070503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
Protein contact prediction helps reconstruct the tertiary structure that greatly determines a protein’s function; therefore, contact prediction from the sequence is an important problem. Recently there has been exciting progress on this problem, but many of the existing methods are still low quality of prediction accuracy. In this paper, we present a new mixed integer linear programming (MILP)-based consensus method: a Consensus scheme based On a Mixed integer linear opTimization method for prOtein contact Prediction (COMTOP). The MILP-based consensus method combines the strengths of seven selected protein contact prediction methods, including CCMpred, EVfold, DeepCov, NNcon, PconsC4, plmDCA, and PSICOV, by optimizing the number of correctly predicted contacts and achieving a better prediction accuracy. The proposed hybrid protein residue–residue contact prediction scheme was tested in four independent test sets. For 239 highly non-redundant proteins, the method showed a prediction accuracy of 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively. When tested on the CASP13 and CASP14 test sets, the proposed method obtained accuracies of 75.91% and 77.49% for top-L/5 predictions, respectively. COMTOP was further tested on 57 non-redundant α-helical transmembrane proteins and achieved prediction accuracies of 64.34% and 73.91% for top-L/2 and top-L/5 predictions, respectively. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For example, COMTOP performed much better for large number of contact predictions (such as top-5L and top-3L) than for small number of contact predictions such as top-L/2 and top-L/5. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.
Collapse
Affiliation(s)
- Md. Selim Reza
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Huiling Zhang
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Md. Tofazzal Hossain
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Langxi Jin
- Department of Computer Science and Technology, School of Computer Science and Technology, Harbin University of Science and Technology, 52 Xuefu Road, Nangang District, Harbin 150080, China;
| | - Shengzhong Feng
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Yanjie Wei
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
- Correspondence:
| |
Collapse
|
3
|
Choudhury CK, Kuksenok O. Native-Based Dissipative Particle Dynamics Approach for α-Helical Folding. J Phys Chem B 2020; 124:11379-11386. [PMID: 33270459 DOI: 10.1021/acs.jpcb.0c08603] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We developed a dissipative particle dynamics (DPD) approach that captures polyalanine folding into a stable helical conformation. Within the proposed native-based approach, the DPD parameters are derived based on the contact map constructed from the molecular dynamics (MD) simulations. We show that the proposed approach reproduces the folding of polypeptides of various lengths, including bundle formation for sufficiently long polypeptides. The proposed approach also allows one to capture the folding of the helical segments of the lysozyme. With further development of computationally efficient native-based DPD approaches for folding, modeling of a range of biomaterials incorporating α-helical segments could be extended to time and length scales far beyond those accessible in molecular dynamics simulations.
Collapse
Affiliation(s)
- Chandan Kumar Choudhury
- Department of Materials Science and Engineering, Clemson University, Clemson, South Carolina 29634, United States
| | - Olga Kuksenok
- Department of Materials Science and Engineering, Clemson University, Clemson, South Carolina 29634, United States
| |
Collapse
|
4
|
CONAN: A Tool to Decode Dynamical Information from Molecular Interaction Maps. Biophys J 2019; 114:1267-1273. [PMID: 29590584 PMCID: PMC5883949 DOI: 10.1016/j.bpj.2018.01.033] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 12/19/2017] [Accepted: 01/22/2018] [Indexed: 02/07/2023] Open
Abstract
The analysis of contacts is a powerful tool to understand biomolecular function in a series of contexts, from the investigation of dynamical behavior at equilibrium to the study of nonequilibrium dynamics in which the system moves between multiple states. We thus propose a tool called CONtact ANalysis (CONAN) that, from molecular dynamics (MD) trajectories, analyzes interresidue contacts, creates videos of time-resolved contact maps, and performs correlation, principal component, and cluster analysis, revealing how specific contacts relate to functionally relevant states sampled by MD. We present how CONAN can identify features describing the dynamics of ubiquitin both at equilibrium and during mechanical unfolding. Additionally, we show the analysis of MD trajectories of an α-synuclein mutant peptide that undergoes an α-β conformational transition that can be easily monitored using CONAN, which identifies the multiple states that the peptide explores along its conformational dynamics. The high versatility and ease of use of the software make CONAN a tool that can significantly facilitate the understanding of the complex dynamical behavior of proteins or other biomolecules. CONAN and its documentation are freely available for download on GitHub.
Collapse
|
5
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
6
|
Abstract
In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch.In this chapter, we briefly discuss many elements of protein residue-residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA.
| |
Collapse
|
7
|
|
8
|
Chen J, Brooks CL. Can molecular dynamics simulations provide high-resolution refinement of protein structure? Proteins 2007; 67:922-30. [PMID: 17373704 DOI: 10.1002/prot.21345] [Citation(s) in RCA: 107] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Recent advances in efficient and accurate treatment of solvent with the generalized Born approximation (GB) have made it possible to substantially refine the protein structures generated by various prediction tools through detailed molecular dynamics simulations. As demonstrated in a recent CASPR experiment, improvement can be quite reliably achieved when the initial models are sufficiently close to the native basin (e.g., 3-4 A C(alpha) RMSD). A key element to effective refinement is to incorporate reliable structural information into the simulation protocol. Without intimate knowledge of the target and prediction protocol used to generate the initial structural models, it can be assumed that the regular secondary structure elements (helices and strands) and overall fold topology are largely correct to start with, such that the protocol limits itself to the scope of refinement and focuses the sampling in vicinity of the initial structure. The secondary structures can be enforced by dihedral restraints and the topology through structural contacts, implemented as either multiple pair-wise C(alpha) distance restraints or a single sidechain distance matrix restraint. The restraints are weakly imposed with flat-bottom potentials to allow sufficient flexibility for structural rearrangement. Refinement is further facilitated by enhanced sampling of advanced techniques such as the replica exchange method (REX). In general, for single domain proteins of small to medium sizes, 3-5 nanoseconds of REX/GB refinement simulations appear to be sufficient for reasonable convergence. Clustering of the resulting structural ensembles can yield refined models over 1.0 A closer to the native structure in C(alpha) RMSD. Substantial improvement of sidechain contacts and rotamer states can also be achieved in most cases. Additional improvement is possible with longer sampling and knowledge of the robust structural features in the initial models for a given prediction protocol. Nevertheless, limitations still exist in sampling as well as force field accuracy, manifested as difficulty in refinement of long and flexible loops.
Collapse
Affiliation(s)
- Jianhan Chen
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | |
Collapse
|
9
|
Pietal MJ, Tuszynska I, Bujnicki JM. PROTMAP2D: visualization, comparison and analysis of 2D maps of protein structure. Bioinformatics 2007; 23:1429-30. [PMID: 17400727 DOI: 10.1093/bioinformatics/btm124] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein structure comparison is a fundamental problem in structural biology and bioinformatics. Two-dimensional maps of distances between residues in the structure contain sufficient information to restore the 3D representation, while maps of contacts reveal characteristic patterns of interactions between secondary and super-secondary structures and are very attractive for visual analysis. The overlap of 2D maps of two structures can be easily calculated, providing a sensitive measure of protein structure similarity. PROTMAP2D is a software tool for calculation of contact and distance maps based on user-defined criteria, quantitative comparison of pairs or series of contact maps (e.g. alternative models of the same protein, model versus native structure, different trajectories from molecular dynamics simulations, etc.) and visualization of the results. AVAILABILITY PROTMAP2D for Windows / Linux / MacOSX is freely available for academic users from http://genesilico.pl/protmap2d.htm
Collapse
Affiliation(s)
- Michal J Pietal
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, Warsaw, Poland
| | | | | |
Collapse
|
10
|
Connectivity independent protein-structure alignment: a hierarchical approach. BMC Bioinformatics 2006; 7:510. [PMID: 17118190 PMCID: PMC1683948 DOI: 10.1186/1471-2105-7-510] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2006] [Accepted: 11/21/2006] [Indexed: 11/13/2022] Open
Abstract
Background Protein-structure alignment is a fundamental tool to study protein function, evolution and model building. In the last decade several methods for structure alignment were introduced, but most of them ignore that structurally similar proteins can share the same spatial arrangement of secondary structure elements (SSE) but differ in the underlying polypeptide chain connectivity (non-sequential SSE connectivity). Results We perform protein-structure alignment using a two-level hierarchical approach implemented in the program GANGSTA. On the first level, pair contacts and relative orientations between SSEs (i.e. α-helices and β-strands) are maximized with a genetic algorithm (GA). On the second level residue pair contacts from the best SSE alignments are optimized. We have tested the method on visually optimized structure alignments of protein pairs (pairwise mode) and for database scans. For a given protein structure, our method is able to detect significant structural similarity of functionally important folds with non-sequential SSE connectivity. The performance for structure alignments with strictly sequential SSE connectivity is comparable to that of other structure alignment methods. Conclusion As demonstrated for several applications, GANGSTA finds meaningful protein-structure alignments independent of the SSE connectivity. GANGSTA is able to detect structural similarity of protein folds that are assigned to different superfamilies but nevertheless possess similar structures and perform related functions, even if these proteins differ in SSE connectivity.
Collapse
|
11
|
Khare SD, Dokholyan NV. Common dynamical signatures of familial amyotrophic lateral sclerosis-associated structurally diverse Cu, Zn superoxide dismutase mutants. Proc Natl Acad Sci U S A 2006; 103:3147-52. [PMID: 16488975 PMCID: PMC1413921 DOI: 10.1073/pnas.0511266103] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
More than 100 structurally diverse point mutations leading to aggregation in the dimeric enzyme Cu, Zn superoxide dismutase (SOD1) are implicated in familial amyotrophic lateral sclerosis (FALS). Although SOD1 dimer dissociation is a known requirement for its aggregation, the common structural basis for diverse FALS mutations resulting in aggregation is not fully understood. In molecular dynamics simulations of wild-type SOD1 and three structurally diverse FALS mutants (A4V, G37R, and H46R), we find that a common effect of mutations on SOD1 dimer is the mutation-induced disruption of dynamic coupling between monomers. In the wild-type dimer, the principal coupled motion corresponds to a "breathing motion" of the monomers around an axis parallel to the dimer interface, and an opening-closing motion of the distal metal-binding loops. These coupled motions are disrupted in all three mutants independent of the mutation location. Loss of coupled motions in mutant dimers occurs with increased disruption of a key stabilizing structural element (the beta-plug) leading to the de-protection of edge strands. To rationalize disruption of coupling, which is independent of the effect of the mutation on global SOD1 stability, we analyze the residue-residue interaction network formed in SOD1. We find that the dimer interface and metal-binding loops, both involved in coupled motions, are regions of high connectivity in the network. Our results suggest that independent of the effect on protein stability, altered protein dynamics, due to long-range communication within its structure, may underlie the aggregation of mutant SOD1 in FALS.
Collapse
Affiliation(s)
- Sagar D. Khare
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC 27599
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, NC 27599
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
12
|
Heo M, Kim S, Moon EJ, Cheon M, Chung K, Chang I. Perceptron learning of pairwise contact energies for proteins incorporating the amino acid environment. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 72:011906. [PMID: 16090000 DOI: 10.1103/physreve.72.011906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2004] [Revised: 05/10/2005] [Indexed: 05/03/2023]
Abstract
Although a coarse-grained description of proteins is a simple and convenient way to attack the protein folding problem, the construction of a global pairwise energy function which can simultaneously recognize the native folds of many proteins has resulted in partial success. We have sought the possibility of a systematic improvement of this pairwise-contact energy function as we extended the parameter space of amino acids, incorporating local environments of amino acids, beyond a 20 x 20 matrix. We have studied the pairwise contact energy functions of 20 x 20, 60 x 60, and 180 x 180 matrices depending on the extent of parameter space, and compared their effect on the learnability of energy parameters in the context of a gapless threading, bearing in mind that a 20 x 20 pairwise contact matrix has been shown to be too simple to recognize the native folds of many proteins. In this paper, we show that the construction of a global pairwise energy function was achieved using 1006 training proteins of a homology of less than 30%, which include all representatives of different protein classes. After parametrizing the local environments of the amino acids into nine categories depending on three secondary structures and three kinds of hydrophobicity (desolvation), the 16290 pairwise contact energies (scores) of the amino acids could be determined by perceptron learning and protein threading. These could simultaneously recognize all the native folds of the 1006 training proteins. When these energy parameters were tested on the 382 test proteins of a homology of less than 90%, 370 (96.9%) proteins could recognize their native folds. We set up a simple thermodynamic framework in the conformational space of decoys to calculate the unfolded fraction and the specific heat of real proteins. The different thermodynamic stabilities of E.coli ribonuclease H (RNase H) and its mutants were well described in our calculation, agreeing with the experiment.
Collapse
Affiliation(s)
- Muyoung Heo
- National Research Laboratory for Computational Proteomics and Biophysics, Department of Physics, Pusan National University, Busan, Korea
| | | | | | | | | | | |
Collapse
|
13
|
Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction. Pattern Recognit Lett 2005. [DOI: 10.1016/j.patrec.2005.01.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
14
|
Zhang GZ, Huang DS. Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme. J Comput Aided Mol Des 2005; 18:797-810. [PMID: 16075311 DOI: 10.1007/s10822-005-0578-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2004] [Accepted: 12/14/2004] [Indexed: 10/25/2022]
Abstract
Inter-residue contacts map prediction is one of the most important intermediate steps to the protein folding problem. In this paper, we focus on the problem of protein inter-residue contacts map prediction based on neural network technique. Firstly, we use a genetic algorithm (GA) to optimize the radial basis function widths and hidden centers of a radial basis function neural network (RBFNN), then a novel binary encoding scheme is employed to train the network for the purpose of learning and predicting the inter-residue contacts patterns of protein sequences got from the protein data bank (PDB). The experimental evidence indicates the utility of our proposed encoding strategy and GA optimized RBFNN. Moreover, the simulation results demonstrate that the network got a better performance for these proteins, whose residue length falls into the area of (100, 300), and the predicted accuracy with a contact threshold of 7 Angstroms scores higher than the other 3 values with 5, 6, and 8 Angstroms.
Collapse
Affiliation(s)
- Guang-Zheng Zhang
- Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences
| | | |
Collapse
|
15
|
Hamilton N, Burrage K, Ragan MA, Huber T. Protein contact prediction using patterns of correlation. Proteins 2004; 56:679-84. [PMID: 15281121 DOI: 10.1002/prot.20160] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations.
Collapse
Affiliation(s)
- Nicholas Hamilton
- Advanced Computational Modelling Centre, Department of Mathematics, The University of Queensland, St. Lucia, Queensland, Australia.
| | | | | | | |
Collapse
|
16
|
Caprara A, Carr R, Istrail S, Lancia G, Walenz B. 1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap. J Comput Biol 2004; 11:27-52. [PMID: 15072687 DOI: 10.1089/106652704773416876] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.
Collapse
Affiliation(s)
- Alberto Caprara
- D.E.I.S., Università di Bologna, Viale Risorgimento, 2 40136 Bologna, Italy
| | | | | | | | | |
Collapse
|
17
|
Chelli R, Gervasio FL, Procacci P, Schettino V. Inter-residue and solvent-residue interactions in proteins: a statistical study on experimental structures. Proteins 2004; 55:139-51. [PMID: 14997548 DOI: 10.1002/prot.20030] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A large set of protein structures resolved by X-ray or NMR techniques has been extracted from the Protein Data Bank and analyzed using statistical methods. In particular, we investigate the interactions between side chains and the interactions between solvent and side chains, pointing out on the possibility of including the solvent as part of a knowledge-based potential. The solvent-residue contacts are accounted for on the basis of the Voronoi's polyhedron analysis. Our investigation confirms the importance of hydrophobic residues in determining the protein stability. We observe that in general hydrophobic-hydrophobic interactions and, more specifically, aromatic-aromatic contacts tend to be increasingly distally separated in the primary sequence of proteins, thus connecting distinct secondary structure elements. A simple relation expressing the dependence of the protein free energy by the number of residues is proposed. Such a relation includes both the residue-residue and the solvent-residue contributions. The former is dominant for large size proteins, whereas for small sizes (number of residues less than 100) the two terms are comparable. Gapless threading experiments show that the solvent-residue knowledge-based potential yields a significant contribution with respect to discriminating the native structure of proteins. Such contribution is important especially for proteins of small size and is similar to that given by the most favorable residue-residue knowledge-based potential referring to hydrophobic-hydrophobic interactions such as isoleucine-leucine. In general, the inclusion of the solvent-residue interaction produces a relevant increase of the free energy gap between the native structures and decoys.
Collapse
Affiliation(s)
- Riccardo Chelli
- Dipartimento di Chimica, Università di Firenze, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | | | | | | |
Collapse
|
18
|
Choi IG, Kwon J, Kim SH. Local feature frequency profile: a method to measure structural similarity in proteins. Proc Natl Acad Sci U S A 2004; 101:3797-802. [PMID: 14985506 PMCID: PMC374324 DOI: 10.1073/pnas.0308656100] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Measures of structural similarity between known protein structures provide an objective basis for classifying protein folds and for revealing a global view of the protein structure universe. Here, we describe a rapid method to measure structural similarity based on the profiles of representative local features of C(alpha) distance matrices of compared protein structures. We first extract a finite number of representative local feature (LF) patterns from the distance matrices of all protein fold families by medoid analysis. Then, each C(alpha) distance matrix of a protein structure is encoded by labeling all its submatrices by the index of the nearest representative LF patterns. Finally, the structure is represented by the frequency distribution of these indices, which we call the LF frequency (LFF) profile of the protein. The LFF profile allows one to calculate structural similarity scores among a large number of protein structures quickly, and also to construct and update the "map" of the protein structure universe easily. The LFF profile method efficiently maps complex protein structures into a common Euclidean space without prior assignment of secondary structure information or structural alignment.
Collapse
Affiliation(s)
- In-Geol Choi
- Department of Chemistry, University of California, and Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
19
|
|
20
|
Tropsha A, Carter CW, Cammer S, Vaisman II. Simplicial neighborhood analysis of protein packing (SNAPP): a computational geometry approach to studying proteins. Methods Enzymol 2003; 374:509-44. [PMID: 14696387 DOI: 10.1016/s0076-6879(03)74022-1] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Alexander Tropsha
- Department of Medicinal Chemistry and Natural Products, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | | | | | | |
Collapse
|
21
|
Felts AK, Gallicchio E, Wallqvist A, Levy RM. Distinguishing native conformations of proteins from decoys with an effective free energy estimator based on the OPLS all-atom force field and the Surface Generalized Born solvent model. Proteins 2002; 48:404-22. [PMID: 12112706 DOI: 10.1002/prot.10171] [Citation(s) in RCA: 113] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Protein decoy data sets provide a benchmark for testing scoring functions designed for fold recognition and protein homology modeling problems. It is commonly believed that statistical potentials based on reduced atomic models are better able to discriminate native-like from misfolded decoys than scoring functions based on more detailed molecular mechanics models. Recent benchmark tests on small data sets, however, suggest otherwise. In this work, we report the results of extensive decoy detection tests using an effective free energy function based on the OPLS all-atom (OPLS-AA) force field and the Surface Generalized Born (SGB) model for the solvent electrostatic effects. The OPLS-AA/SGB effective free energy is used as a scoring function to detect native protein folds among a total of 48,832 decoys for 32 different proteins from Park and Levitt's 4-state-reduced, Levitt's local-minima, Baker's ROSETTA all-atom, and Skolnick's decoy sets. Solvent electrostatic effects are included through the Surface Generalized Born (SGB) model. All structures are locally minimized without restraints. From an analysis of the individual energy components of the OPLS-AA/SGB energy function for the native and the best-ranked decoy, it is determined that a balance of the terms of the potential is responsible for the minimized energies that most successfully distinguish the native from the misfolded conformations. Different combinations of individual energy terms provide less discrimination than the total energy. The results are consistent with observations that all-atom molecular potentials coupled with intermediate level solvent dielectric models are competitive with knowledge-based potentials for decoy detection and protein modeling problems such as fold recognition and homology modeling.
Collapse
Affiliation(s)
- Anthony K Felts
- Department of Chemistry and Chemical Biology, Rutgers University, Wright-Rieman Laboratories, Piscataway, New Jersey 08854-8087, USA.
| | | | | | | |
Collapse
|
22
|
Kabakçioglu A, Kanter I, Vendruscolo M, Domany E. Statistical properties of contact vectors. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 65:041904. [PMID: 12005870 DOI: 10.1103/physreve.65.041904] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2001] [Indexed: 05/23/2023]
Abstract
We study the statistical properties of contact vectors, a construct to characterize a protein's structure. The contact vector of an N-residue protein is a list of N integers n(i), representing the number of residues in contact with residue i. We study analytically (at mean-field level) and numerically the amount of structural information contained in a contact vector. Analytical calculations reveal that a large variance in the contact numbers reduces the degeneracy of the mapping between contact vectors and structures. Exact enumeration for lengths up to N=16 on the three-dimensional cubic lattice indicates that the growth rate of number of contact vectors as a function of N is only 3% less than that for contact maps. In particular, for compact structures we present numerical evidence that, practically, each contact vector corresponds to only a handful of structures. We discuss how this information can be used for better structure prediction.
Collapse
Affiliation(s)
- A Kabakçioglu
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | |
Collapse
|
23
|
Fariselli P, Olmea O, Valencia A, Casadio R. Prediction of contact maps with neural networks and correlated mutations. PROTEIN ENGINEERING 2001; 14:835-43. [PMID: 11742102 DOI: 10.1093/protein/14.11.835] [Citation(s) in RCA: 149] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Contact maps of proteins are predicted with neural network-based methods, using as input codings of increasing complexity including evolutionary information, sequence conservation, correlated mutations and predicted secondary structures. Neural networks are trained on a data set comprising the contact maps of 173 non-homologous proteins as computed from their well resolved three-dimensional structures. Proteins are selected from the Protein Data Bank database provided that they align with at least 15 similar sequences in the corresponding families. The predictors are trained to learn the association rules between the covalent structure of each protein and its contact map with a standard back propagation algorithm and tested on the same protein set with a cross-validation procedure. Our results indicate that the method can assign protein contacts with an average accuracy of 0.21 and with an improvement over a random predictor of a factor >6, which is higher than that previously obtained with methods only based either on neural networks or on correlated mutations. Furthermore, filtering the network outputs with a procedure based on the residue coordination numbers, the accuracy of predictions increases up to 0.25 for all the proteins, with an 8-fold deviation from a random predictor. These scores are the highest reported so far for predicting protein contact maps.
Collapse
Affiliation(s)
- P Fariselli
- CIRB and Department of Biology, University of Bologna, via Irnerio 42, Bologna, Italy
| | | | | | | |
Collapse
|
24
|
Simon I, Fiser A, Tusnády GE. Predicting protein conformation by statistical methods. BIOCHIMICA ET BIOPHYSICA ACTA 2001; 1549:123-36. [PMID: 11690649 DOI: 10.1016/s0167-4838(01)00253-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
The unique folded structure makes a polypeptide a functional protein. The number of known sequences is about a hundred times larger than the number of known structures and the gap is increasing rapidly. The primary goal of all structure prediction methods is to obtain structure-related information on proteins, whose structures have not been determined experimentally. Besides this goal, the development of accurate prediction methods helps to reveal principles of protein folding. Here we present a brief survey of protein structure predictions based on statistical analyses of known sequence and structure data. We discuss the background of these methods and attempt to elucidate principles, which govern structure formation of soluble and membrane proteins.
Collapse
Affiliation(s)
- I Simon
- Institute of Enzymology, BRC, Hungarian Academy of Sciences, Budapest, Hungary.
| | | | | |
Collapse
|
25
|
|
26
|
|
27
|
Abstract
We discuss the problem of representations of protein structure and give the definition of contact maps. We present a method to obtain a three-dimensional polypeptide conformation from a contact map. We also explain how to deal with the case of nonphysical contact maps. We describe a stochastic method to perform dynamics in contact map space. We explain how the motion is restricted to physical regions of the space. First, we introduce the exact free energy of a contact map and discuss two simple approximations to it. Second, we present a method to derive energy parameters based on perception learning. We prove in an extensive number of situations that the pairwise contact approximation both when alone and when supplemented with a hydrophobic term is unsuitable for stabilizing proteins' native states.
Collapse
Affiliation(s)
- M Vendruscolo
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | | |
Collapse
|
28
|
Vendruscolo M, Najmanovich R, Domany E. Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? Proteins 2000; 38:134-48. [PMID: 10656261 DOI: 10.1002/(sici)1097-0134(20000201)38:2<134::aid-prot3>3.0.co;2-a] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a method to derive contact energy parameters from large sets of proteins. The basic requirement on which our method is based is that for each protein in the database the native contact map has lower energy than all its decoy conformations that are obtained by threading. Only when this condition is satisfied one can use the proposed energy function for fold identification. Such a set of parameters can be found (by perceptron learning) if Mp, the number of proteins in the database, is not too large. Other aspects that influence the existence of such a solution are the exact definition of contact and the value of the critical distance Rc, below which two residues are considered to be in contact. Another important novel feature of our approach is its ability to determine whether an energy function of some suitable proposed form can or cannot be parameterized in a way that satisfies our basic requirement. As a demonstration of this, we determine the region in the (Rc, Mp) plane in which the problem is solvable, i.e., we can find a set of contact parameters that stabilize simultaneously all the native conformations. We show that for large enough databases the contact approximation to the energy cannot stabilize all the native folds even against the decoys obtained by gapless threading.
Collapse
Affiliation(s)
- M Vendruscolo
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel.
| | | | | |
Collapse
|
29
|
Casadio R, Compiani M, Fariselli P, Jacoboni I, Martelli PL. Neural networks predict protein folding and structure: artificial intelligence faces biomolecular complexity. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2000; 11:149-182. [PMID: 10877475 DOI: 10.1080/10629360008039120] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
In the genomic era DNA sequencing is increasing our knowledge of the molecular structure of genetic codes from bacteria to man at a hyperbolic rate. Billions of nucleotides and millions of aminoacids are already filling the electronic files of the data bases presently available, which contain a tremendous amount of information on the most biologically relevant macromolecules, such as DNA, RNA and proteins. The most urgent problem originates from the need to single out the relevant information amidst a wealth of general features. Intelligent tools are therefore needed to optimise the search. Data mining for sequence analysis in biotechnology has been substantially aided by the development of new powerful methods borrowed from the machine learning approach. In this paper we discuss the application of artificial feedforward neural networks to deal with some fundamental problems tied with the folding process and the structure-function relationship in proteins.
Collapse
Affiliation(s)
- R Casadio
- Laboratory of Biocomputing, Centro Interdipartimentale per le Ricerche Biotecnologiche (CIRB), University of Bologna, Italy.
| | | | | | | | | |
Collapse
|
30
|
Abstract
We studied the possibility to approximate a Lennard-Jones interaction by a pairwise contact potential. First we used a Lennard-Jones potential to design off-lattice, protein-like heteropolymer sequences, whose lowest energy (native) conformations were then identified by molecular dynamics. Then we turned to investigate whether one can find a pairwise contact potential, whose ground states are the contact maps associated with these native conformations. We show that such a requirement cannot be satisfied exactly, i.e., no such contact parameters exist. Nevertheless, we found that one can find contact energy parameters for which an energy minimization procedure, acting in the space of contact maps, yields maps whose corresponding structures are close to the native ones. Finally, we show that when these structures are used as the initial point of a molecular dynamics energy minimization process, the correct native folds are recovered with high probability.
Collapse
Affiliation(s)
- C Clementi
- International School for Advanced Studies (SISSA) and Istituto Nazionale di Fiscia della Materia, Trieste, Italy.
| | | | | | | |
Collapse
|
31
|
Abstract
It is generally accepted that many different protein sequences have similar folded structures, and that there is a relatively high probability that a new sequence possesses a previously observed fold. An indirect consequence of this is that protein design should define the sequence space accessible to a given structure, rather than providing a single optimized sequence. We have recently developed a new approach for protein sequence design, which optimizes the complete sequence of a protein based on the knowledge of its backbone structure, its amino acid composition and a physical energy function including van der Waals interactions, electrostatics, and environment free energy. The specificity of the designed sequence for its template backbone is imposed by keeping the amino acid composition fixed. Here, we show that our procedure converges in sequence space, albeit not to the native sequence of the protein. We observe that while polar residues are well conserved in our designed sequences, non-polar amino acids at the surface of a protein are often replaced by polar residues. The designed sequences provide a multiple alignment of sequences that all adopt the same three-dimensional fold. This alignment is used to derive a profile matrix for chicken triose phosphate isomerase, TIM. The matrix is found to recognize significantly the native sequence for TIM, as well as closely related sequences. Possible application of this approach to protein fold recognition is discussed.
Collapse
Affiliation(s)
- P Koehl
- Department of Structural Biology, Fairchild Building, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
32
|
Chowdhury D, Stauffer D, Strey R. Periodicity-dependent stiffness of periodic hydrophilic-hydrophobic heteropolymers. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1999; 60:R1158-61. [PMID: 11969942 DOI: 10.1103/physreve.60.r1158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/1999] [Indexed: 04/18/2023]
Abstract
From extensive Monte Carlo simulations of a Larson model of perfectly periodic heteropolymers (PHP) in water, a striking stiffening is observed as the period of the alternating hydrophobic and hydrophilic blocks is shortened. At short period and low temperature needlelike conformations are the stable conformations. As temperature is increased thermal fluctuations induce kinks and bends. At large periods compact oligomeric globules are observed. From the generalized Larson prescription, originally developed for modeling surfactant molecules in aqueous solutions, we find that the shorter the period is the more stretched the PHP is. This novel effect is expected to stimulate polymer synthesis and trigger research on the rheology of aqueous periodic heteropolymer solutions.
Collapse
Affiliation(s)
- D Chowdhury
- Institute for Theoretical Physics, University of Cologne, D-50923 Köln, Germany
| | | | | |
Collapse
|
33
|
Carlsson AE. Simplified calculation of folding energies and residue coordination numbers in random heteropolymers. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 1999; 59:5995-6000. [PMID: 11969582 DOI: 10.1103/physreve.59.5995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/1998] [Indexed: 04/18/2023]
Abstract
I develop a formalism for calculating effective pair and higher-order interactions between residues in random heteropolymers that approximately predict the folding enthalpy and the coordination numbers of individual residues. In a simple model heteropolymer with additive couplings between residues, the folding enthalpy is written in terms of two-, three-, and four-body interactions between residues. The coordination numbers are expressed in terms of interactions between up to three residues. Application to a 6x6 square model shows that the folding enthalpy is obtained to an accuracy of better than 1%. The coordination numbers are obtained with a rms error of 1.2 neighbors.
Collapse
Affiliation(s)
- A E Carlsson
- Department of Physics, CB 1105, Washington University, St. Louis, Missouri 63130, USA
| |
Collapse
|
34
|
Fariselli P, Casadio R. A neural network based predictor of residue contacts in proteins. PROTEIN ENGINEERING 1999; 12:15-21. [PMID: 10065706 DOI: 10.1093/protein/12.1.15] [Citation(s) in RCA: 115] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We describe a method based on neural networks for predicting contact maps of proteins using as input chemicophysical and evolutionary information. Neural networks are trained on a data set comprising the contact maps of 200 non-homologous proteins of well resolved three-dimensional structures. The systems learn the association rules between the covalent structure of each protein and its correspondent contact map by means of a standard back propagation algorithm. Validation of the predictor on the training set and on 408 proteins of known structure which are not homologous to those contained in the training set indicate that this method scores higher than statistical approaches previously described and based on correlated mutations and sequence information.
Collapse
Affiliation(s)
- P Fariselli
- Biocomputing Group (Centro Interdipartimentale per le Ricerche Biotecnologiche), Bologna, Italy
| | | |
Collapse
|
35
|
Vendruscolo M, Domany E. Pairwise contact potentials are unsuitable for protein folding. J Chem Phys 1998. [DOI: 10.1063/1.477748] [Citation(s) in RCA: 115] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
36
|
Abstract
BACKGROUND Two problems are of major importance in protein fold prediction: how to generate plausible conformations, and how to choose an energy function to identify the native state. Contact maps are a simple representation of protein structure and offer a promising framework to address these two issues. RESULTS In this work we develop Monte Carlo dynamics in contact map space. The procedure is divided into four steps: non-local dynamics, in which large-scale "cluster" moves are performed (clusters are in approximate correspondence with secondary structure elements); local dynamics, in which secondary structure location is optimized; reconstruction, in which the physicality of the contact map is restored; and refinement, which consists of a further Monte Carlo energy minimization in real space. We demonstrate that such a dynamical procedure is effective in producing uncorrelated low-energy states. CONCLUSIONS The procedure introduced in this paper very effectively generates a representative ensemble of conformations. We are able to show that existing sets of pairwise contact energy parameters are not suitable to single out the native state within this ensemble. The remaining outstanding issue in protein folding is to find an energy function that can discriminate the native state from decoys.
Collapse
Affiliation(s)
- M Vendruscolo
- Department of Physics of Complex Systems, Weizmann Institute of ScienceRehovot, 76100, Israel.
| | | |
Collapse
|
37
|
Abstract
Genome sequencing projects continue to provide a flood of new protein sequences, and prediction methods remain an important means of adding structural information. Recently, there have been advances in secondary structure prediction, which feed, in turn, into improved fold recognition algorithms. Finally, there have been technical improvements in comparative modelling, and studies of the expected accuracy of three-dimensional structural models built by this method.
Collapse
Affiliation(s)
- D R Westhead
- The European Bioinformatics Institute EMBL Outstation Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK.
| | | |
Collapse
|
38
|
Abstract
The Z-score of a protein is defined as the energy separation between the native fold and the average of an ensemble of misfolds in the units of the standard deviation of the ensemble. The Z-score is often used as a way of testing the knowledge-based potentials for their ability to recognize the native fold from other alternatives. However, it is not known what range of values the Z-scores should have if one had a correct potential. Here, we offer an estimate of Z-scores extracted from calorimetric measurements of proteins. The energies obtained from these experimental data are compared with those from computer simulations of a lattice model protein. It is suggested that the Z-scores calculated from different knowledge-based potentials are generally too small in comparison with the experimental values.
Collapse
Affiliation(s)
- L Zhang
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | |
Collapse
|
39
|
Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. FOLDING & DESIGN 1998; 2:295-306. [PMID: 9377713 DOI: 10.1016/s1359-0278(97)00041-2] [Citation(s) in RCA: 196] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
BACKGROUND Prediction of a protein's structure from its amino acid sequence is a key issue in molecular biology. While dynamics, performed in the space of two-dimensional contact maps, eases the necessary conformational search, it may also lead to maps that do not correspond to any real three-dimensional structure. To remedy this, an efficient procedure is needed to reconstruct three-dimensional conformations from their contact maps. RESULTS We present an efficient algorithm to recover the three-dimensional structure of a protein from its contact map representation. We show that when a physically realizable map is used as target, our method generates a structure whose contact map is essentially similar to the target. furthermore, the reconstructed and original structures are similar up to the resolution of the contact map representation. Next, we use nonphysical target maps, obtained by corrupting a physical one; in this case, our method essentially recovers the underlying physical map and structure. Hence, our algorithm will help to fold proteins, using dynamics in the space of contact maps. Finally, we investigate the manner in which the quality of the recovered structure degrades when the number of contacts is reduced. CONCLUSIONS The procedure is capable of assigning quickly and reliably a three-dimensional structure to a given contact map. It is well suited for use in parallel with dynamics in contact map space to project a contact map onto its closest physically allowed structural counterpart.
Collapse
Affiliation(s)
- M Vendruscolo
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | | | | |
Collapse
|
40
|
Abstract
The two-dimensional contact map of interresidue distances is a visual analysis technique for protein structures. We present two standalone software tools designed to be used in combination to increase the versatility of this simple yet powerful technique. First, the program Structer calculates contact maps from three-dimensional molecular structural data. The contact map matrix can then be viewed in the graphical matrix-visualization program Dotter. Instead of using a predefined distance cutoff, we exploit Dotter's dynamic rendering control, allowing interactive exploration at varying distance cutoffs after calculating the matrix once. Structer can use a number of distance measures, can incorporate multiple chains in one contact map, and allows masking of user-defined residue sets. It works either directly with PDB files, or can use the MMDB network API for reading structures.
Collapse
Affiliation(s)
- E L Sonnhammer
- Computational Biology Branch, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | |
Collapse
|
41
|
Zhang L, Skolnick J. How do potentials derived from structural databases relate to "true" potentials? Protein Sci 1998; 7:112-22. [PMID: 9514266 PMCID: PMC2143818 DOI: 10.1002/pro.5560070112] [Citation(s) in RCA: 43] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Knowledge-based potentials are used widely in protein folding and inverse folding algorithms. Two kinds of derivation methods are used. (1) The interactions in a database of known protein structures are assumed to obey a Boltzmann distribution. (2) The stability of the native folds relative to a manifold of misfolded structures is optimized. Here, a set of previously derived contact and secondary structure propensity potentials, taken as the "true" potentials, are employed to construct an artificial protein structural database from protein fragments. Then, new sets of potentials are derived to see how they are related to the true potentials. Using the Boltzmann distribution method, when the stability of the structures in the database lies within a certain range, both contact potentials and secondary structure propensities can be derived separately with remarkable accuracy. In general, the optimization method was found to be less accurate due to errors in the "excess energy" contribution. When the excess energy terms are kept as a constraint, the true potentials are recovered exactly.
Collapse
Affiliation(s)
- L Zhang
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, California 92037, USA
| | | |
Collapse
|
42
|
Mirny LA, Shakhnovich EI. How to derive a protein folding potential? A new approach to an old problem. J Mol Biol 1996; 264:1164-79. [PMID: 9000638 DOI: 10.1006/jmbi.1996.0704] [Citation(s) in RCA: 230] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
In this paper we introduce a novel method of deriving a pairwise potential for protein folding. The potential is obtained by an optimization procedure that simultaneously maximizes thermodynamic stability for all proteins in the database. When applied to the representative dataset of proteins and with the energy function taken in pairwise contact approximation, our potential scored somewhat better than existing ones. However, the discrimination of the native structure from decoys is still not strong enough to make the potential useful for ab initio folding. Our results suggest that the problem lies with pairwise amino acid contact approximation and/or simplified presentation of proteins rather than with the derivation of potential. We argue that more detail of protein structure and energetics should be taken into account to achieve energy gaps. The suggested method is general enough to allow us to systematically derive parameters for more sophisticated energy functions. The internal control of validity for the potential derived by our method is convergence to a unique solution upon addition of new proteins to the database. The method is tested on simple model systems where sequences are designed, using the preset "true" potential, to have low energy in a dataset of structures. Our procedure is able to recover the potential with correlation r approximately 91% with the true one and we were able to fold all model structures using the recovered potential. Other statistical knowledge-based approaches were tested using this model and the results indicate that they also can recover the true potential with high degree of accuracy.
Collapse
Affiliation(s)
- L A Mirny
- Harvard University, Department of Chemistry, Cambridge, MA 02138, USA
| | | |
Collapse
|