1
|
Nonnative Energetic Frustrations in Protein Folding at Residual Level: A Simulation Study of Homologous Immunoglobulin-like β-Sandwich Proteins. Int J Mol Sci 2018; 19:ijms19051515. [PMID: 29783701 PMCID: PMC5983731 DOI: 10.3390/ijms19051515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 05/08/2018] [Accepted: 05/09/2018] [Indexed: 11/16/2022] Open
Abstract
Nonnative interactions cause energetic frustrations in protein folding and were found to dominate key events in folding intermediates. However, systematically characterizing energetic frustrations that are caused by nonnative intra-residue interactions at residual resolution is still lacking. Recently, we studied the folding of a set of homologous all-α proteins and found that nonnative-contact-based energetic frustrations are highly correlated to topology of the protein native-contact network. Here, we studied the folding of nine homologous immunoglobulin-like (Ig-like) β-sandwich proteins, and examined nonnative-contact-based energetic frustrations Gō-like model. Our calculations showed that nonnative-interaction-based energetic frustrations in β-sandwich proteins are much more complicated than those in all-α proteins, and they exhibit highly heterogeneous effects on the folding of secondary structures. Further, the nonnative interactions introduced distinct correlations in the folding of different folding-patches of β-sandwich proteins. Taken together, a strong interplay might exist between nonnative-interaction energetic frustrations and the protein native-contact networks, which ensures that β-sandwich domains adopt a common folding mechanism.
Collapse
|
2
|
Pasi M, Tiberti M, Arrigoni A, Papaleo E. xPyder: a PyMOL plugin to analyze coupled residues and their networks in protein structures. J Chem Inf Model 2012; 52:1865-74. [PMID: 22721491 DOI: 10.1021/ci300213c] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
A versatile method to directly identify and analyze short- or long-range coupled or communicating residues in a protein conformational ensemble is of extreme relevance to achieve a complete understanding of protein dynamics and structural communication routes. Here, we present xPyder, an interface between one of the most employed molecular graphics systems, PyMOL, and the analysis of dynamical cross-correlation matrices (DCCM). The approach can also be extended, in principle, to matrices including other indexes of communication propensity or intensity between protein residues, as well as the persistence of intra- or intermolecular interactions, such as those underlying protein dynamics. The xPyder plugin for PyMOL 1.4 and 1.5 is offered as Open Source software via the GPL v2 license, and it can be found, along with the installation package, the user guide, and examples, at http://linux.btbs.unimib.it/xpyder/.
Collapse
Affiliation(s)
- Marco Pasi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, P.zza della Scienza 2, 20126 Milan, Italy
| | | | | | | |
Collapse
|
3
|
|
4
|
Nasrallah CA, Mathews DH, Huelsenbeck JP. Quantifying the impact of dependent evolution among sites in phylogenetic inference. Syst Biol 2010; 60:60-73. [PMID: 21081481 PMCID: PMC2997629 DOI: 10.1093/sysbio/syq074] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Nearly all commonly used methods of phylogenetic inference assume that characters in an alignment evolve independently of one another. This assumption is attractive for simplicity and computational tractability but is not biologically reasonable for RNAs and proteins that have secondary and tertiary structures. Here, we simulate RNA and protein-coding DNA sequence data under a general model of dependence in order to assess the robustness of traditional methods of phylogenetic inference to violation of the assumption of independence among sites. We find that the accuracy of independence-assuming methods is reduced by the dependence among sites; for proteins this reduction is relatively mild, but for RNA this reduction may be substantial. We introduce the concept of effective sequence length and its utility for considering information content in phylogenetics.
Collapse
Affiliation(s)
- Chris A Nasrallah
- Department of Integrative Biology, University of California, Berkeley, 3060 Valley Life Sciences Building #3140, Berkeley, CA 94720-3140, USA.
| | | | | |
Collapse
|
5
|
Gupta N, Mangal N, Biswas S. Evolution and similarity evaluation of protein structures in contact map space. Proteins 2006; 59:196-204. [PMID: 15726585 DOI: 10.1002/prot.20415] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Prediction of fold from amino acid sequence of a protein has been an active area of research in the past few years, but the limited accuracy of existing techniques emphasizes the need to develop newer approaches to tackle this task. In this study, we use contact map prediction as an intermediate step in fold prediction from sequence. Contact map is a reduced graph-theoretic representation of proteins that models the local and global inter-residue contacts in the structure. We start with a population of random contact maps for the protein sequence and "evolve" the population to a "high-feasibility" configuration using a genetic algorithm. A neural network is employed to assess the feasibility of contact maps based on their 4 physically relevant properties. We also introduce 5 parameters, based on algebraic graph theory and physical considerations, that can be used to judge the structural similarity between proteins through contact maps. To predict the fold of a given amino acid sequence, we predict a contact map that will sufficiently approximate the structure of the corresponding protein. Then we assess the similarity of this contact map with the representative contact map of each fold; the fold that corresponds to the closest match is our predicted fold for the input sequence. We have found that our feasibility measure is able to differentiate between feasible and infeasible contact maps. Further, this novel approach is able to predict the folds from sequences significantly better than a random predictor.
Collapse
Affiliation(s)
- Nitin Gupta
- Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India.
| | | | | |
Collapse
|
6
|
Heo M, Kim S, Moon EJ, Cheon M, Chung K, Chang I. Perceptron learning of pairwise contact energies for proteins incorporating the amino acid environment. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 72:011906. [PMID: 16090000 DOI: 10.1103/physreve.72.011906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/06/2004] [Revised: 05/10/2005] [Indexed: 05/03/2023]
Abstract
Although a coarse-grained description of proteins is a simple and convenient way to attack the protein folding problem, the construction of a global pairwise energy function which can simultaneously recognize the native folds of many proteins has resulted in partial success. We have sought the possibility of a systematic improvement of this pairwise-contact energy function as we extended the parameter space of amino acids, incorporating local environments of amino acids, beyond a 20 x 20 matrix. We have studied the pairwise contact energy functions of 20 x 20, 60 x 60, and 180 x 180 matrices depending on the extent of parameter space, and compared their effect on the learnability of energy parameters in the context of a gapless threading, bearing in mind that a 20 x 20 pairwise contact matrix has been shown to be too simple to recognize the native folds of many proteins. In this paper, we show that the construction of a global pairwise energy function was achieved using 1006 training proteins of a homology of less than 30%, which include all representatives of different protein classes. After parametrizing the local environments of the amino acids into nine categories depending on three secondary structures and three kinds of hydrophobicity (desolvation), the 16290 pairwise contact energies (scores) of the amino acids could be determined by perceptron learning and protein threading. These could simultaneously recognize all the native folds of the 1006 training proteins. When these energy parameters were tested on the 382 test proteins of a homology of less than 90%, 370 (96.9%) proteins could recognize their native folds. We set up a simple thermodynamic framework in the conformational space of decoys to calculate the unfolded fraction and the specific heat of real proteins. The different thermodynamic stabilities of E.coli ribonuclease H (RNase H) and its mutants were well described in our calculation, agreeing with the experiment.
Collapse
Affiliation(s)
- Muyoung Heo
- National Research Laboratory for Computational Proteomics and Biophysics, Department of Physics, Pusan National University, Busan, Korea
| | | | | | | | | | | |
Collapse
|
7
|
Combining a binary input encoding scheme with RBFNN for globulin protein inter-residue contact map prediction. Pattern Recognit Lett 2005. [DOI: 10.1016/j.patrec.2005.01.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
8
|
Zhang GZ, Huang DS. Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme. J Comput Aided Mol Des 2005; 18:797-810. [PMID: 16075311 DOI: 10.1007/s10822-005-0578-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2004] [Accepted: 12/14/2004] [Indexed: 10/25/2022]
Abstract
Inter-residue contacts map prediction is one of the most important intermediate steps to the protein folding problem. In this paper, we focus on the problem of protein inter-residue contacts map prediction based on neural network technique. Firstly, we use a genetic algorithm (GA) to optimize the radial basis function widths and hidden centers of a radial basis function neural network (RBFNN), then a novel binary encoding scheme is employed to train the network for the purpose of learning and predicting the inter-residue contacts patterns of protein sequences got from the protein data bank (PDB). The experimental evidence indicates the utility of our proposed encoding strategy and GA optimized RBFNN. Moreover, the simulation results demonstrate that the network got a better performance for these proteins, whose residue length falls into the area of (100, 300), and the predicted accuracy with a contact threshold of 7 Angstroms scores higher than the other 3 values with 5, 6, and 8 Angstroms.
Collapse
Affiliation(s)
- Guang-Zheng Zhang
- Intelligent Computing Lab, Hefei Institute of Intelligent Machines, Chinese Academy of Sciences
| | | |
Collapse
|
9
|
Chelli R, Gervasio FL, Procacci P, Schettino V. Inter-residue and solvent-residue interactions in proteins: a statistical study on experimental structures. Proteins 2004; 55:139-51. [PMID: 14997548 DOI: 10.1002/prot.20030] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A large set of protein structures resolved by X-ray or NMR techniques has been extracted from the Protein Data Bank and analyzed using statistical methods. In particular, we investigate the interactions between side chains and the interactions between solvent and side chains, pointing out on the possibility of including the solvent as part of a knowledge-based potential. The solvent-residue contacts are accounted for on the basis of the Voronoi's polyhedron analysis. Our investigation confirms the importance of hydrophobic residues in determining the protein stability. We observe that in general hydrophobic-hydrophobic interactions and, more specifically, aromatic-aromatic contacts tend to be increasingly distally separated in the primary sequence of proteins, thus connecting distinct secondary structure elements. A simple relation expressing the dependence of the protein free energy by the number of residues is proposed. Such a relation includes both the residue-residue and the solvent-residue contributions. The former is dominant for large size proteins, whereas for small sizes (number of residues less than 100) the two terms are comparable. Gapless threading experiments show that the solvent-residue knowledge-based potential yields a significant contribution with respect to discriminating the native structure of proteins. Such contribution is important especially for proteins of small size and is similar to that given by the most favorable residue-residue knowledge-based potential referring to hydrophobic-hydrophobic interactions such as isoleucine-leucine. In general, the inclusion of the solvent-residue interaction produces a relevant increase of the free energy gap between the native structures and decoys.
Collapse
Affiliation(s)
- Riccardo Chelli
- Dipartimento di Chimica, Università di Firenze, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | | | | | | |
Collapse
|
10
|
Burioni R, Cassi D, Cecconi F, Vulpiani A. Topological thermal instability and length of proteins. Proteins 2004; 55:529-35. [PMID: 15103617 DOI: 10.1002/prot.20072] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We present an analysis of the effects of global topology on the structural stability of folded proteins in thermal equilibrium with a heat bath. For a large class of single domain proteins, we computed the harmonic spectrum within the Gaussian Network Model (GNM) and determined their spectral dimension, a parameter describing the low frequency behavior of the density of modes. We found a surprisingly strong correlation between the spectral dimension and the number of amino acids in the protein. Considering that larger spectral dimension values relate to more topologically compact folded states, our results indicate that, for a given temperature and length of protein, the folded structure corresponds to a less compact folding, one compatible with thermodynamic stability.
Collapse
Affiliation(s)
- Raffaella Burioni
- Dipartimento di Fisica and INFM, Università di Parma, Parco Area delle Scienze 7A, 43100 Parma, Italy
| | | | | | | |
Collapse
|
11
|
|
12
|
Salvi G, Mölbert S, De Los Rios P. Design of lattice proteins with explicit solvent. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 66:061911. [PMID: 12513322 DOI: 10.1103/physreve.66.061911] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2002] [Indexed: 05/24/2023]
Abstract
Protein design is important to develop new drugs. As such, a knowledge of the correct model to use to design novel proteins is of the utmost importance. Here we show that a simple model where the solvent degrees of freedom are (semi)explicitly taken into account performs better than other existing models when compared to real data. Some consequences on the criteria to be used for protein design are discussed.
Collapse
Affiliation(s)
- G Salvi
- Institut de Physique Théorique, Université de Lausanne, CH-1015 Lausanne, Switzerland
| | | | | |
Collapse
|
13
|
Hunter CG, Subramaniam S. Natural coordinate representation for the protein backbone structure. Proteins 2002; 49:206-15. [PMID: 12211001 DOI: 10.1002/prot.10201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A new model for describing the geometry of the C(alpha) backbone atoms in protein molecules is derived. This model uses one continuous variable per amino acid. This is half the number of degrees-of-freedom used in traditional backbone models. The new model was tested on 721 PDB structures and its average accuracy was determined to be 1.14 A cRMSD. This model can be used as a description of local structure that provides higher resolution than the traditional secondary structure categories. Also, because this structure description is one-dimensional, it can be used to align structures with the same efficiency and convergence properties available in the popular sequence alignment tools. Furthermore, the 1:1 correspondence with the amino acid sequence has implications for combined sequence/structure alignment. Conventional secondary structure prediction was used to further reduce the number of degrees-of-freedom in 16 test proteins. In those cases, the average cRMSD degraded from 0.96 to 2.33 A while the number of degrees-of-freedom improved (reduced) by more than 30%.
Collapse
Affiliation(s)
- Cornelius G Hunter
- Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA
| | | |
Collapse
|
14
|
Sosnick TR, Berry RS, Colubri A, Fernández A. Distinguishing foldable proteins from nonfolders: when and how do they differ? Proteins 2002; 49:15-23. [PMID: 12211012 DOI: 10.1002/prot.10193] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
When a denatured polypeptide is put into refolding conditions, it undergoes conformational changes on a variety of times scales. We set out here to distinguish the fast events that promote productive folding from other processes that may be generic to any non-folding polypeptide. We have apply an ab initio folding algorithm to model the folding of various proteins and their compositionally identical, random-sequence analogues. In the earliest stages, proteins and their scrambled-sequence counterparts undergo indistinguishable reductions in the extent to which they explore conformation space. For both polypeptides, an early contraction occurs but does not involve the formation of a distinct intermediate. Following this phase, however, the naturally-occurring sequences are distinguished by an increase in the formation of three-body correlations wherein a hydrophobic group desolvates and protects an intra-molecular hydrogen bond. These correlations are manifested in a mild but measurable reduction of the accessible configuration space beyond that of the random-sequence peptides, and portend the folding to the native structure. Hence, early events reflect a generic response of the denatured ensemble to a change in solvent condition, but the wild-type sequence develops additional correlations as its structure evolves that can reveal the protein's foldability.
Collapse
Affiliation(s)
- Tobin R Sosnick
- Department of Biochemistry and Molecular Biology and the Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois 60637, USA.
| | | | | | | |
Collapse
|
15
|
Abstract
A method is presented to identify hot mutational spots and predict the extent of surface burial at the transition state relative to the native fold in two-state folding proteins. The method is based on ab initio simulations of folding histories in which transitions between coarsely defined conformations and pairwise interactions are dependent on the solvent environments created by the chain. The highly conserved mammalian ubiquitin is adopted as a study case to make predictions. The evolution in time of the chain topology suggests a nucleation process with a critical point signaled by a sudden quenching of structural fluctuations. The occurrence of this nucleus is shown to be concurrent with a sudden escalation in the number of three-body correlations whereby hydrophobic units approach residue pairs engaged in amide-carbonyl hydrogen bonding. These correlations determine a pattern designed to structure the surrounding solvent, protecting intramolecular hydrogen bonds from water attack. Such correlations are shown to be required to stabilize the nucleus, with kinetic consequences for the folding process. Those nuclear residues that adopt the dual role of protecting and being protected while engaged in hydrogen bonds are predicted to be the hottest mutational spots. Some such residues are shown not to retain the same protecting role in the native fold. This kinetic treatment of folding nucleation is independently validated vis-a-vis a Phi-value analysis on chymotrypsin inhibitor 2, a protein for which extensive mutational data exists.
Collapse
Affiliation(s)
- Ariel Fernández
- Max-Planck-Institut für Biochemie, Martinsried (bei München), Germany and Instituto de Matemática, Universidad Nacional del Sur-CONICET, Bahia Blanca, Argentina.
| |
Collapse
|
16
|
Kabakçioglu A, Kanter I, Vendruscolo M, Domany E. Statistical properties of contact vectors. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 65:041904. [PMID: 12005870 DOI: 10.1103/physreve.65.041904] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2001] [Indexed: 05/23/2023]
Abstract
We study the statistical properties of contact vectors, a construct to characterize a protein's structure. The contact vector of an N-residue protein is a list of N integers n(i), representing the number of residues in contact with residue i. We study analytically (at mean-field level) and numerically the amount of structural information contained in a contact vector. Analytical calculations reveal that a large variance in the contact numbers reduces the degeneracy of the mapping between contact vectors and structures. Exact enumeration for lengths up to N=16 on the three-dimensional cubic lattice indicates that the growth rate of number of contact vectors as a function of N is only 3% less than that for contact maps. In particular, for compact structures we present numerical evidence that, practically, each contact vector corresponds to only a handful of structures. We discuss how this information can be used for better structure prediction.
Collapse
Affiliation(s)
- A Kabakçioglu
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | |
Collapse
|
17
|
|
18
|
Fernández A. Cooperative walks in a cubic lattice: Protein folding as a many-body problem. J Chem Phys 2001. [DOI: 10.1063/1.1405447] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
19
|
|