1
|
Holland J, Grigoryan G. Structure‐conditioned amino‐acid couplings: how contact geometry affects pairwise sequence preferences. Protein Sci 2022; 31:900-917. [PMID: 35060221 PMCID: PMC8927866 DOI: 10.1002/pro.4280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/06/2022] [Accepted: 01/12/2022] [Indexed: 11/11/2022]
Abstract
Relating a protein's sequence to its conformation is a central challenge for both structure prediction and sequence design. Statistical contact potentials, as well as their more descriptive versions that account for side‐chain orientation and other geometric descriptors, have served as simplistic but useful means of representing second‐order contributions in sequence–structure relationships. Here we ask what happens when a pairwise potential is conditioned on the fully defined geometry of interacting backbones fragments. We show that the resulting structure‐conditioned coupling energies more accurately reflect pair preferences as a function of structural contexts. These structure‐conditioned energies more reliably encode native sequence information and more highly correlate with experimentally determined coupling energies. Clustering a database of interaction motifs by structure results in ensembles of similar energies and clustering them by energy results in ensembles of similar structures. By comparing many pairs of interaction motifs and showing that structural similarity and energetic similarity go hand‐in‐hand, we provide a tangible link between modular sequence and structure elements. This link is applicable to structural modeling, and we show that scoring CASP models with structured‐conditioned energies results in substantially higher correlation with structural quality than scoring the same models with a contact potential. We conclude that structure‐conditioned coupling energies are a good way to model the impact of interaction geometry on second‐order sequence preferences.
Collapse
Affiliation(s)
- Jack Holland
- Department of Computer Science Dartmouth College Hanover New Hampshire USA
| | - Gevorg Grigoryan
- Department of Computer Science Dartmouth College Hanover New Hampshire USA
| |
Collapse
|
2
|
Chen MC, Li Y, Zhu YH, Ge F, Yu DJ. SSCpred: Single-Sequence-Based Protein Contact Prediction Using Deep Fully Convolutional Network. J Chem Inf Model 2020; 60:3295-3303. [DOI: 10.1021/acs.jcim.9b01207] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ming-Cai Chen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Washtenaw 100, Ann Arbor, Michigan 48109-2218, United States
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| |
Collapse
|
3
|
Zhou HX, Pang X. Electrostatic Interactions in Protein Structure, Folding, Binding, and Condensation. Chem Rev 2018; 118:1691-1741. [PMID: 29319301 DOI: 10.1021/acs.chemrev.7b00305] [Citation(s) in RCA: 476] [Impact Index Per Article: 79.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Charged and polar groups, through forming ion pairs, hydrogen bonds, and other less specific electrostatic interactions, impart important properties to proteins. Modulation of the charges on the amino acids, e.g., by pH and by phosphorylation and dephosphorylation, have significant effects such as protein denaturation and switch-like response of signal transduction networks. This review aims to present a unifying theme among the various effects of protein charges and polar groups. Simple models will be used to illustrate basic ideas about electrostatic interactions in proteins, and these ideas in turn will be used to elucidate the roles of electrostatic interactions in protein structure, folding, binding, condensation, and related biological functions. In particular, we will examine how charged side chains are spatially distributed in various types of proteins and how electrostatic interactions affect thermodynamic and kinetic properties of proteins. Our hope is to capture both important historical developments and recent experimental and theoretical advances in quantifying electrostatic contributions of proteins.
Collapse
Affiliation(s)
- Huan-Xiang Zhou
- Department of Chemistry and Department of Physics, University of Illinois at Chicago , Chicago, Illinois 60607, United States.,Department of Physics and Institute of Molecular Biophysics, Florida State University , Tallahassee, Florida 32306, United States
| | - Xiaodong Pang
- Department of Physics and Institute of Molecular Biophysics, Florida State University , Tallahassee, Florida 32306, United States
| |
Collapse
|
4
|
|
5
|
Saravanan KM, Balasubramanian H, Nallusamy S, Samuel S. Sequence and structural analysis of two designed proteins with 88% identity adopting different folds. Protein Eng Des Sel 2010; 23:911-8. [PMID: 20952437 DOI: 10.1093/protein/gzq070] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein folding is a natural phenomenon by which a sequence of amino acids folds into a unique functional three-dimensional structure. Although the sequence code that governs folding remains a mystery, one can identify key inter-residue contacts responsible for a given topology. In nature, there are many pairs of proteins of a given length that share little or no sequence identity. Similarly, there are many proteins that share a common topology but lack significant evidence of homology. In order to tackle this problem, protein engineering studies have been used to determine the minimal number of amino acid residues that codes for a particular fold. In recent years, the coupling of theoretical models and experiments in the study of protein folding has resulted in providing some fruitful clues. He et al. have designed two proteins with 88% sequence identity, which adopt different folds and functions. In this work, we have systematically analysed these two proteins by performing pentapeptide search, secondary structure predictions, variation in inter-residue interactions and residue-residue pair preferences, surrounding hydrophobicity computations, conformational switching and energy computations. We conclude that the local secondary structural preference of the two designed proteins at the Nand C-terminal ends to adopt either coil or strand conformation may be a crucial factor in adopting the different folds. Early on during the process of folding, both proteins may choose different energetically favourable pathways to attain the different folds.
Collapse
Affiliation(s)
- K Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, TN 620024, India
| | | | | | | |
Collapse
|
6
|
Zhou HX. Protein folding in confined and crowded environments. Arch Biochem Biophys 2007; 469:76-82. [PMID: 17719556 PMCID: PMC2223181 DOI: 10.1016/j.abb.2007.07.013] [Citation(s) in RCA: 125] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2007] [Revised: 07/12/2007] [Accepted: 07/12/2007] [Indexed: 11/17/2022]
Abstract
Confinement and crowding are two major factors that can potentially impact protein folding in cellular environments. Theories based on considerations of excluded volumes predict disparate effects on protein folding stability for confinement and crowding: confinement can stabilize proteins by over 10k(B)T but crowding has a very modest effect on stability. On the other hand, confinement and crowding are both predicted to favor conformations of the unfolded state which are compact, and consequently may increase the folding rate. These predictions are largely borne out by experimental studies of protein folding under confined and crowded conditions in the test tube. Protein folding in cellular environments is further complicated by interactions with surrounding surfaces and other factors. Concerted theoretical modeling and test-tube and in vivo experiments promise to elucidate the complexity of protein folding in cellular environments.
Collapse
Affiliation(s)
- Huan-Xiang Zhou
- Department of Physics and Institute of Biophysics and School of Computational Science, Florida State University, Tallahassee, FL 32306, USA.
| |
Collapse
|
7
|
Abstract
An angle Omega is defined to serve as a metric for global side-chain orientations, which reflects the orientation of the side chain relative to the radial vector from the center of the protein to an amino acid. The side-chain orientations of buried residues exhibit characteristically different orientations than do exposed residues, in both monomeric and dimeric structures. Overall, buried side chains point mostly inward, whereas surface side chains tend to point outward from the surface. This difference in behavior also correlates well with the residue hydrophobicity; so a global side-chain orientation can be viewed as a direct structural manifestation of hydrophobicity. When various solvent-accessible layers are considered, the behavior is relatively continuous between centrally located and exposed residues. In the case of interfacial residues between subunits, there are statistically significant differences between exposed residues and interface residues for ALA, ARG, ASN, ASP, GLU, HIS, LYS, THR, VAL, MET, PRO, and overall the interface residues have an increased tendency to point inward. Presumably, these substantial differences in orientations of side chains may be a manifestation of hydrophobic forces.
Collapse
Affiliation(s)
- Aimin Yan
- Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames 50011-3020, USA
| | | |
Collapse
|
8
|
Dehouck Y, Gilis D, Rooman M. Database-derived potentials dependent on protein size for in silico folding and design. Biophys J 2005; 87:171-81. [PMID: 15240455 PMCID: PMC1304340 DOI: 10.1529/biophysj.103.037861] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Knowledge-based potentials are widely used in simulations of protein folding, structure prediction, and protein design. Their advantages include limited computational requirements and the ability to deal with low-resolution protein models compatible with long-scale simulations. Their drawbacks comprehend their dependence on specific features of the dataset from which they are derived, such as the size of the proteins it contains, and their physical meaning is still a subject of debate. We address these issues by probing the theoretical validity of these potentials as mean-force potentials that take the solvent implicitly into account and involve entropic contributions due to atomic degrees of freedom and solvation. The dependence on the size of the system is checked on distance-dependent amino acid pair potentials, derived from six protein structure sets containing proteins of increasing length N. For large inter-residue distances, they are found to display the theoretically predicted 1/N behavior weighted by a factor depending on the boundaries and the compressibility of the system. For short distances, different trends are observed according to the nature of the residue pairs and their ability to form, for example, electrostatic, cation-pi or pi-pi interactions, or hydrophobic packing. The results of this analysis are used to devise a novel protein size-dependent distance potential, which displays an improved performance in discriminating native sequence-structure matches among decoy models.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique Génomique et Structurale, Université Libre de Bruxelles, Brussels, Belgium.
| | | | | |
Collapse
|
9
|
Zhang C, Liu S, Zhou H, Zhou Y. The dependence of all-atom statistical potentials on structural training database. Biophys J 2005; 86:3349-58. [PMID: 15189839 PMCID: PMC1304244 DOI: 10.1529/biophysj.103.035998] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An accurate statistical energy function that is suitable for the prediction of protein structures of all classes should be independent of the structural database used for energy extraction. Here, two high-resolution, low-sequence-identity structural databases of 333 alpha-proteins and 271 beta-proteins were built for examining the database dependence of three all-atom statistical energy functions. They are RAPDF (residue-specific all-atom conditional probability discriminatory function), atomic KBP (atomic knowledge-based potential), and DFIRE (statistical potential based on distance-scaled finite ideal-gas reference state). These energy functions differ in the reference states used for energy derivation. The energy functions extracted from the different structural databases are used to select native structures from multiple decoys of 64 alpha-proteins and 28 beta-proteins. The performance in native structure selections indicates that the DFIRE-based energy function is mostly independent of the structural database whereas RAPDF and KBP have a significant dependence. The construction of two additional structural databases of alpha/beta and alpha + beta-proteins further confirmed the weak dependence of DFIRE on the structural databases of various structural classes. The possible source for the difference between the three all-atom statistical energy functions is that the physical reference state of ideal gas used in the DFIRE-based energy function is least dependent on the structural database.
Collapse
Affiliation(s)
- Chi Zhang
- Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
10
|
Chelli R, Gervasio FL, Procacci P, Schettino V. Inter-residue and solvent-residue interactions in proteins: a statistical study on experimental structures. Proteins 2004; 55:139-51. [PMID: 14997548 DOI: 10.1002/prot.20030] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A large set of protein structures resolved by X-ray or NMR techniques has been extracted from the Protein Data Bank and analyzed using statistical methods. In particular, we investigate the interactions between side chains and the interactions between solvent and side chains, pointing out on the possibility of including the solvent as part of a knowledge-based potential. The solvent-residue contacts are accounted for on the basis of the Voronoi's polyhedron analysis. Our investigation confirms the importance of hydrophobic residues in determining the protein stability. We observe that in general hydrophobic-hydrophobic interactions and, more specifically, aromatic-aromatic contacts tend to be increasingly distally separated in the primary sequence of proteins, thus connecting distinct secondary structure elements. A simple relation expressing the dependence of the protein free energy by the number of residues is proposed. Such a relation includes both the residue-residue and the solvent-residue contributions. The former is dominant for large size proteins, whereas for small sizes (number of residues less than 100) the two terms are comparable. Gapless threading experiments show that the solvent-residue knowledge-based potential yields a significant contribution with respect to discriminating the native structure of proteins. Such contribution is important especially for proteins of small size and is similar to that given by the most favorable residue-residue knowledge-based potential referring to hydrophobic-hydrophobic interactions such as isoleucine-leucine. In general, the inclusion of the solvent-residue interaction produces a relevant increase of the free energy gap between the native structures and decoys.
Collapse
Affiliation(s)
- Riccardo Chelli
- Dipartimento di Chimica, Università di Firenze, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | | | | | | |
Collapse
|
11
|
Zhou HX. Residual charge interactions in unfolded staphylococcal nuclease can be explained by the Gaussian-chain model. Biophys J 2002; 83:2981-6. [PMID: 12496071 PMCID: PMC1302379 DOI: 10.1016/s0006-3495(02)75304-6] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
The discrepancy of the pH dependence of the unfolding free energy for staphylococcal nuclease from what is expected from an idealized model for the unfolded state is accounted for by the recently developed Gaussian-chain model. Residual electrostatic effects in the unfolded state are attributed to nonspecific interactions dominated by charges close along the sequence. The dominance of nonspecific local interactions appears to be supported by some experimental evidence.
Collapse
Affiliation(s)
- Huan-Xiang Zhou
- Department of Physics, Drexel University, Philadelphia, PA 19104, USA.
| |
Collapse
|
12
|
Zhou HX. A Gaussian-chain model for treating residual charge-charge interactions in the unfolded state of proteins. Proc Natl Acad Sci U S A 2002; 99:3569-74. [PMID: 11891295 PMCID: PMC122564 DOI: 10.1073/pnas.052030599] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Characterization of the unfolded state is essential for understanding the protein folding problem. In the unfolded state, a protein molecule samples vastly different conformations. Here I present a simple theoretical method for treating residual charge-charge interactions in the unfolded state. The method is based on modeling an unfolded protein as a Gaussian chain. After sampling over all conformations, the electrostatic interaction energy between two charged residues (separated by l peptide bonds) is given by W = 332(6/pi)(1/2)[1 - pi(1/2)xexp(x(2))erfc(x)]/epsilond, where d = bl(1/2) + s and x = kappad/6(1/2). In unfolded barnase, the residual interactions lead to downward pK(a) shifts of approximately 0.33 unit, in agreement with experiment. pK(a) shifts in the unfolded state significantly affect pH dependence of protein folding stability, and the predicted effects agree very well with experimental results on barnase and four other proteins. For T4 lysozyme, the charge reversal mutation K147E is found to stabilize the unfolded state even more than the folded state (1.39 vs. 0.46 kcal/mol), leading to the experimentally observed result that the mutation is net destabilizing for the folding. The Gaussian-chain model provides a quantitative characterization of the unfolded state and may prove valuable for elucidating the energetic contributions to the stability of thermophilic proteins and the energy landscape of protein folding.
Collapse
Affiliation(s)
- Huan-Xiang Zhou
- Department of Physics, Drexel University, Philadelphia, PA 19104, USA.
| |
Collapse
|
13
|
Abstract
We present theory showing that confining a protein to a small inert space (a "cage") should stabilize the protein against reversible unfolding. Examples of such spaces might include the pores within chromatography columns, the Anfinsen cage in chaperonins, the interiors of ribosomes, or regions of steric occlusion inside cells. Confinement eliminates some expanded configurations of the unfolded chain, shifting the equilibrium from the unfolded state toward the native state. The partition coefficient for a protein in a confined space is predicted to decrease significantly when the solvent is changed from native to denaturing conditions. Small cages are predicted to increase the stability of the native state by as much as 15 kcal/mol. Confinement may also increase the rates of protein or RNA folding.
Collapse
Affiliation(s)
- H X Zhou
- Department of Physics, Drexel University, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
14
|
Adamian L, Liang J. Helix-helix packing and interfacial pairwise interactions of residues in membrane proteins. J Mol Biol 2001; 311:891-907. [PMID: 11518538 DOI: 10.1006/jmbi.2001.4908] [Citation(s) in RCA: 154] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Helix-helix packing plays a critical role in maintaining the tertiary structures of helical membrane proteins. By examining the overall distribution of voids and pockets in the transmembrane (TM) regions of helical membrane proteins, we found that bacteriorhodopsin and halorhodopsin are the most tightly packed, whereas mechanosensitive channel is the least tightly packed. Large residues F, W, and H have the highest propensity to be in a TM void or a pocket, whereas small residues such as S, G, A, and T are least likely to be found in a void or a pocket. The coordination number for non-bonded interactions for each of the residue types is found to correlate with the size of the residue. To assess specific interhelical interactions between residues, we have developed a new computational method to characterize nearest neighboring atoms that are in physical contact. Using an atom-based probabilistic model, we estimate the membrane helical interfacial pairwise (MHIP) propensity. We found that there are many residue pairs that have high propensity for interhelical interactions, but disulfide bonds are rarely found in the TM regions. The high propensity pairs include residue pairs between an aromatic residue and a basic residue (W-R, W-H, and Y-K). In addition, many residue pairs have high propensity to form interhelical polar-polar atomic contacts, for example, residue pairs between two ionizable residues, between one ionizable residue and one N or Q. Soluble proteins do not share this pattern of diverse polar-polar interhelical interaction. Exploratory analysis by clustering of the MHIP values suggests that residues similar in side-chain branchness, cyclic structures, and size tend to have correlated behavior in participating interhelical interactions. A chi-square test rejects the null hypothesis that membrane protein and soluble protein have the same distribution of interhelical pairwise propensity. This observation may help us to understand the folding mechanism of membrane proteins.
Collapse
Affiliation(s)
- L Adamian
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street RM 218, MC-063, Chicago, IL 60607, USA
| | | |
Collapse
|