1
|
Xiong P, Hu X, Huang B, Zhang J, Chen Q, Liu H. Increasing the efficiency and accuracy of the ABACUS protein sequence design method. Bioinformatics 2019; 36:136-144. [DOI: 10.1093/bioinformatics/btz515] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 05/29/2019] [Accepted: 06/21/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
The ABACUS (a backbone-based amino acid usage survey) method uses unique statistical energy functions to carry out protein sequence design. Although some of its results have been experimentally verified, its accuracy remains improvable because several important components of the method have not been specifically optimized for sequence design or in contexts of other parts of the method. The computational efficiency also needs to be improved to support interactive online applications or the consideration of a large number of alternative backbone structures.
Results
We derived a model to measure solvent accessibility with larger mutual information with residue types than previous models, optimized a set of rotamers which can approximate the sidechain atomic positions more accurately, and devised an empirical function to treat inter-atomic packing with parameters fitted to native structures and optimized in consistence with the rotamer set. Energy calculations have been accelerated by interpolation between pre-determined representative points in high-dimensional structural feature spaces. Sidechain repacking tests showed that ABACUS2 can accurately reproduce the conformation of native sidechains. In sequence design tests, the native residue type recovery rate reached 37.7%, exceeding the value of 32.7% for ABACUS1. Applying ABACUS2 to designed sequences on three native backbones produced proteins shown to be well-folded by experiments.
Availability and implementation
The ABACUS2 sequence design server can be visited at http://biocomp.ustc.edu.cn/servers/abacus-design.php.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peng Xiong
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Xiuhong Hu
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Bin Huang
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Jiahai Zhang
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Quan Chen
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Haiyan Liu
- School of Life Sciences, Hefei, Anhui 230026, China
- Hefei National Laboratory for Physical Sciences at the Microscale, Hefei, Anhui 230026, China
- School of Data Science, University of Sciences and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
2
|
Tian Y, Huang X, Zhu Y. Computational design of enzyme-ligand binding using a combined energy function and deterministic sequence optimization algorithm. J Mol Model 2015; 21:191. [PMID: 26162695 DOI: 10.1007/s00894-015-2742-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 06/24/2015] [Indexed: 01/06/2023]
Abstract
Enzyme amino-acid sequences at ligand-binding interfaces are evolutionarily optimized for reactions, and the natural conformation of an enzyme-ligand complex must have a low free energy relative to alternative conformations in native-like or non-native sequences. Based on this assumption, a combined energy function was developed for enzyme design and then evaluated by recapitulating native enzyme sequences at ligand-binding interfaces for 10 enzyme-ligand complexes. In this energy function, the electrostatic interaction between polar or charged atoms at buried interfaces is described by an explicitly orientation-dependent hydrogen-bonding potential and a pairwise-decomposable generalized Born model based on the general side chain in the protein design framework. The energy function is augmented with a pairwise surface-area based hydrophobic contribution for nonpolar atom burial. Using this function, on average, 78% of the amino acids at ligand-binding sites were predicted correctly in the minimum-energy sequences, whereas 84% were predicted correctly in the most-similar sequences, which were selected from the top 20 sequences for each enzyme-ligand complex. Hydrogen bonds at the enzyme-ligand binding interfaces in the 10 complexes were usually recovered with the correct geometries. The binding energies calculated using the combined energy function helped to discriminate the active sequences from a pool of alternative sequences that were generated by repeatedly solving a series of mixed-integer linear programming problems for sequence selection with increasing integer cuts.
Collapse
Affiliation(s)
- Ye Tian
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, People's Republic of China
| | | | | |
Collapse
|
3
|
Huang YM, Bystroff C. Expanded explorations into the optimization of an energy function for protein design. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1176-1187. [PMID: 24384706 PMCID: PMC3919130 DOI: 10.1109/tcbb.2013.113] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Nature possesses a secret formula for the energy as a function of the structure of a protein. In protein design, approximations are made to both the structural representation of the molecule and to the form of the energy equation, such that the existence of a general energy function for proteins is by no means guaranteed. Here, we present new insights toward the application of machine learning to the problem of finding a general energy function for protein design. Machine learning requires the definition of an objective function, which carries with it the implied definition of success in protein design. We explored four functions, consisting of two functional forms, each with two criteria for success. Optimization was carried out by a Monte Carlo search through the space of all variable parameters. Cross-validation of the optimized energy function against a test set gave significantly different results depending on the choice of objective function, pointing to relative correctness of the built-in assumptions. Novel energy cross terms correct for the observed nonadditivity of energy terms and an imbalance in the distribution of predicted amino acids. This paper expands on the work presented at the 2012 ACM-BCB.
Collapse
|
4
|
Huang X, Han K, Zhu Y. Systematic optimization model and algorithm for binding sequence selection in computational enzyme design. Protein Sci 2013; 22:929-41. [PMID: 23649589 DOI: 10.1002/pro.2275] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2013] [Revised: 03/14/2013] [Accepted: 04/27/2013] [Indexed: 01/04/2023]
Abstract
A systematic optimization model for binding sequence selection in computational enzyme design was developed based on the transition state theory of enzyme catalysis and graph-theoretical modeling. The saddle point on the free energy surface of the reaction system was represented by catalytic geometrical constraints, and the binding energy between the active site and transition state was minimized to reduce the activation energy barrier. The resulting hyperscale combinatorial optimization problem was tackled using a novel heuristic global optimization algorithm, which was inspired and tested by the protein core sequence selection problem. The sequence recapitulation tests on native active sites for two enzyme catalyzed hydrolytic reactions were applied to evaluate the predictive power of the design methodology. The results of the calculation show that most of the native binding sites can be successfully identified if the catalytic geometrical constraints and the structural motifs of the substrate are taken into account. Reliably predicting active site sequences may have significant implications for the creation of novel enzymes that are capable of catalyzing targeted chemical reactions.
Collapse
Affiliation(s)
- Xiaoqiang Huang
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, People's Republic of China
| | | | | |
Collapse
|
5
|
Huang X, Yang J, Zhu Y. A solvated ligand rotamer approach and its application in computational protein design. J Mol Model 2012. [PMID: 23192355 DOI: 10.1007/s00894-012-1695-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The structure-based design of protein-ligand interfaces with respect to different small molecules is of great significance in the discovery of functional proteins. By statistical analysis of a set of protein-ligand complex structures, it was determined that water-mediated hydrogen bonding at the protein-ligand interface plays a crucial role in governing the binding between the protein and the ligand. Based on the novel statistic results, a solvated ligand rotamer approach was developed to explicitly describe the key water molecules at the protein-ligand interface and a water-mediated hydrogen bonding model was applied in the computational protein design context to complement the continuum solvent model. The solvated ligand rotamer approach produces only one additional solvated rotamer for each rotamer in the ligand rotamer library and does not change the number of side-chain rotamers at each protein design site. This has greatly reduced the total combinatorial number in sequence selection for protein design, and the accuracy of the model was confirmed by two tests. For the water placement test, 61% of the crystal water molecules were predicted correctly in five protein-ligand complex structures. For the sequence recapitulation test, 44.7% of the amino acid identities were recovered using the solvated ligand rotamer approach and the water-mediated hydrogen bonding model, while only 30.4% were recovered when the explicitly bound waters were removed. These results indicated that the developed solvated ligand rotamer approach is promising for functional protein design targeting novel protein-ligand interactions.
Collapse
Affiliation(s)
- Xiaoqiang Huang
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
6
|
Fernández A. Epistructural tension promotes protein associations. PHYSICAL REVIEW LETTERS 2012; 108:188102. [PMID: 22681121 DOI: 10.1103/physrevlett.108.188102] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Indexed: 06/01/2023]
Abstract
Epistructural tension is the reversible work per unit area required to span the aqueous interface of a soluble protein structure. The parameter accounts for the free-energy cost of imperfect hydration, involving water molecules with a shortage of hydrogen-bonding partnerships relative to bulk levels. The binding hot spots along protein-protein interfaces are identified with residues that contribute significantly to the epistructural tension in the free subunits. Upon association, such residues either displace or become deprived of low-coordination vicinal water molecules.
Collapse
Affiliation(s)
- Ariel Fernández
- Instituto Argentino de Matemática, CONICET (National Research Council), Saavedra 15, 1083 Buenos Aires, Argentina.
| |
Collapse
|
7
|
Fernández A. Communication: Epistructural thermodynamics of soluble proteins. J Chem Phys 2012; 136:091101. [DOI: 10.1063/1.3691890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
8
|
Lei Y, Luo W, Zhu Y. A matching algorithm for catalytic residue site selection in computational enzyme design. Protein Sci 2011; 20:1566-75. [PMID: 21714026 DOI: 10.1002/pro.685] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2011] [Accepted: 06/07/2011] [Indexed: 11/07/2022]
Abstract
A loop closure-based sequential algorithm, PRODA_MATCH, was developed to match catalytic residues onto a scaffold for enzyme design in silico. The computational complexity of this algorithm is polynomial with respect to the number of active sites, the number of catalytic residues, and the maximal iteration number of cyclic coordinate descent steps. This matching algorithm is independent of a rotamer library that enables the catalytic residue to take any required conformation during the reaction coordinate. The catalytic geometric parameters defined between functional groups of transition state (TS) and the catalytic residues are continuously optimized to identify the accurate position of the TS. Pseudo-spheres are introduced for surrounding residues, which make the algorithm take binding into account as early as during the matching process. Recapitulation of native catalytic residue sites was used as a benchmark to evaluate the novel algorithm. The calculation results for the test set show that the native catalytic residue sites were successfully identified and ranked within the top 10 designs for 7 of the 10 chemical reactions. This indicates that the matching algorithm has the potential to be used for designing industrial enzymes for desired reactions.
Collapse
Affiliation(s)
- Yulin Lei
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
9
|
Klenin KV, Tristram F, Strunk T, Wenzel W. Derivatives of molecular surface area and volume: Simple and exact analytical formulas. J Comput Chem 2011; 32:2647-53. [DOI: 10.1002/jcc.21844] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2011] [Revised: 04/26/2011] [Accepted: 04/26/2011] [Indexed: 11/06/2022]
|
10
|
Schulz E, Frechero M, Appignanesi G, Fernández A. Sub-nanoscale surface ruggedness provides a water-tight seal for exposed regions in soluble protein structure. PLoS One 2010; 5. [PMID: 20862253 PMCID: PMC2941462 DOI: 10.1371/journal.pone.0012844] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2010] [Accepted: 08/26/2010] [Indexed: 12/03/2022] Open
Abstract
Soluble proteins must maintain backbone hydrogen bonds (BHBs) water-tight to ensure structural integrity. This protection is often achieved by burying the BHBs or wrapping them through intermolecular associations. On the other hand, water has low coordination resilience, with loss of hydrogen-bonding partnerships carrying significant thermodynamic cost. Thus, a core problem in structural biology is whether natural design actually exploits the water coordination stiffness to seal the backbone in regions that are exposed to the solvent. This work explores the molecular design features that make this type of seal operative, focusing on the side-chain arrangements that shield the protein backbone. We show that an efficient sealing is achieved by adapting the sub-nanoscale surface topography to the stringency of water coordination: an exposed BHB may be kept dry if the local concave curvature is small enough to impede formation of the coordination shell of a penetrating water molecule. Examination of an exhaustive database of uncomplexed proteins reveals that exposed BHBs invariably occur within such sub-nanoscale cavities in native folds, while this level of local ruggedness is absent in other regions. By contrast, BHB exposure in misfolded proteins occurs with larger local curvature promoting backbone hydration and consequently, structure disruption. These findings unravel physical constraints fitting a spatially dependent least-action for water coordination, introduce a molecular design concept, and herald the advent of water-tight peptide-based materials with sufficient backbone exposure to remain flexible.
Collapse
Affiliation(s)
- Erica Schulz
- Sección Fisicoquímica, Instituto de Química del Sur, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Científicas y Técnicas and Departamento de Química, Universidad Nacional del Sur, Bahía Blanca, Argentina
| | - Marisa Frechero
- Sección Fisicoquímica, Instituto de Química del Sur, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Científicas y Técnicas and Departamento de Química, Universidad Nacional del Sur, Bahía Blanca, Argentina
| | - Gustavo Appignanesi
- Sección Fisicoquímica, Instituto de Química del Sur, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Científicas y Técnicas and Departamento de Química, Universidad Nacional del Sur, Bahía Blanca, Argentina
| | - Ariel Fernández
- Department of Bioengineering, Rice University, Houston, Texas, United States of America
- Department of Computer Science, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
11
|
Abstract
The interfacial tension of biological water, a promoter of biomolecular interactions, is difficult to determine because of the inhomogeneous nanoscale patterns that make up the surface of biomolecules. These patterns modulate solubility in peculiar manners, enabling specific associations while preventing phase separation or precipitation. In this work, we derive de novo the nanoscale thermodynamics associated with the creation of biological interfaces and validate the results against experimentally identified complex interfaces. Interfacial tension is shown to be generated by hot spots of red-shifted dielectric relaxation. The most common spots involve hindered polar hydration. Taken collectively, these patches contribute more to the interfacial tension than the better known non-polar cavities with nanometre curvature, where our results agree with the established length-scale dependence of hydrophobicity. The thermodynamic results are validated by showing that the inferred patches of interfacial tension actually promote biomolecular associations.
Collapse
Affiliation(s)
- Ariel Fernández
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
12
|
Ma J. Explicit orientation dependence in empirical potentials and its significance to side-chain modeling. Acc Chem Res 2009; 42:1087-96. [PMID: 19445451 DOI: 10.1021/ar900009e] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein structure modeling and prediction have important applications throughout the biological sciences, from the design of pharmaceuticals to the elucidation of enzyme mechanisms. At the core of most protein modeling is an energy function, the minimum of which represents the free energy "cost" for forming a correct protein structure. The most commonly used energy functions are knowledge-based statistical potential functions; that is, they are empirically derived from statistical analysis of a set of high-resolution protein structures. When that kind of potential function is constructed, the anisotropic orientation dependence between the interacting groups is a critical component for accurately representing key molecular interactions, such as those involved in protein side-chain packing. In the literature, however, many potential functions are limited in their ability to describe orientation dependence. In all-atom potentials, they typically ignore heterogeneous chemical-bond connectivity. In coarse-grained potentials, such as (semi)-residue-based potentials, the simplified representation of residues often reduces the sensitivity of the potential to side-chain orientation. Recently, in an effort to maximally capture the orientation dependence in side-chain interactions, a new type of all-atom statistical potential was developed: OPUS-PSP (potential derived from side-chain packing). The key feature of this potential is its explicit description of orientation dependence in molecular interactions, which is achieved with a basis set of 19 rigid-body blocks extracted from the chemical structures of 20 amino acid residues. This basis set is specifically designed to maximally capture the essential elements of orientation dependence in molecular packing interactions. The potential is constructed from the orientation-specific packing statistics of pairs of those blocks in a nonredundant structural database. On decoy set tests, OPUS-PSP significantly outperforms most of the existing knowledge-based potentials in terms of both its ability to recognize native structures and its consistency in achieving high Z scores across decoy sets. The application of OPUS-PSP to conformational modeling of side chains has led to another method, called OPUS-Rota. In terms of combined speed and accuracy, OPUS-Rota outperforms all of the other methods in modeling side-chain conformation. In this Account, we briefly outline the basic scheme of the OPUS-PSP potential and its application to side-chain modeling via OPUS-Rota. Future perspectives on the modeling of orientation dependence are also discussed. The computer programs for OPUS-PSP and OPUS-Rota can be downloaded at http://sigler.bioch.bcm.tmc.edu/MaLab . They are free for academic users.
Collapse
Affiliation(s)
- Jianpeng Ma
- Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, and Department of Bioengineering, Rice University, Houston, Texas 77005
| |
Collapse
|
13
|
Durham E, Dorr B, Woetzel N, Staritzbichler R, Meiler J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J Mol Model 2009; 15:1093-108. [PMID: 19234730 PMCID: PMC2712621 DOI: 10.1007/s00894-009-0454-9] [Citation(s) in RCA: 220] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2008] [Accepted: 01/02/2009] [Indexed: 12/01/2022]
Abstract
The burial of hydrophobic amino acids in the protein core is a driving force in protein folding. The extent to which an amino acid interacts with the solvent and the protein core is naturally proportional to the surface area exposed to these environments. However, an accurate calculation of the solvent-accessible surface area (SASA), a geometric measure of this exposure, is numerically demanding as it is not pair-wise decomposable. Furthermore, it depends on a full-atom representation of the molecule. This manuscript introduces a series of four SASA approximations of increasing computational complexity and accuracy as well as knowledge-based environment free energy potentials based on these SASA approximations. Their ability to distinguish correctly from incorrectly folded protein models is assessed to balance speed and accuracy for protein structure prediction. We find the newly developed “Neighbor Vector” algorithm provides the most optimal balance of accurate yet rapid exposure measures.
Collapse
Affiliation(s)
- Elizabeth Durham
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, 465 21st Ave South, Nashville, TN 37232-8725, USA
| | | | | | | | | |
Collapse
|
14
|
Lu M, Dousis AD, Ma J. OPUS-Rota: a fast and accurate method for side-chain modeling. Protein Sci 2008; 17:1576-85. [PMID: 18556476 DOI: 10.1110/ps.035022.108] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
In this paper, we introduce a fast and accurate side-chain modeling method, named OPUS-Rota. In a benchmark comparison with the methods SCWRL, NCN, LGA, SPRUCE, Rosetta, and SCAP, OPUS-Rota is shown to be much faster than all the methods except SCWRL, which is comparably fast. In terms of overall chi (1) and chi (1+2) accuracies, however, OPUS-Rota is 5.4 and 8.8 percentage points better, respectively, than SCWRL. Compared with NCN, which has the best accuracy in the literature, OPUS-Rota is 1.6 percentage points better for overall chi (1+2) but 0.3 percentage points weaker for overall chi (1). Hence, our algorithm is much more accurate than SCWRL with similar execution speed, and it has accuracy comparable to or better than the most accurate methods in the literature, but with a runtime that is one or two orders of magnitude shorter. In addition, OPUS-Rota consistently outperforms SCWRL on the Wallner and Elofsson homology-modeling benchmark set when the sequence identity is greater than 40%. We hope that OPUS-Rota will contribute to high-accuracy structure refinement, and the computer program is freely available for academic users.
Collapse
Affiliation(s)
- Mingyang Lu
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030, USA
| | | | | |
Collapse
|
15
|
Vizcarra CL, Zhang N, Marshall SA, Wingreen NS, Zeng C, Mayo SL. An improved pairwise decomposable finite-difference Poisson-Boltzmann method for computational protein design. J Comput Chem 2008; 29:1153-62. [PMID: 18074340 DOI: 10.1002/jcc.20878] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Our goal is to develop accurate electrostatic models that can be implemented in current computational protein design protocols. To this end, we improve upon a previously reported pairwise decomposable, finite difference Poisson-Boltzmann (FDPB) model for protein design (Marshall et al., Protein Sci 2005, 14, 1293). The improvement involves placing generic sidechains at positions with unknown amino acid identity and explicitly capturing two-body perturbations to the dielectric environment. We compare the original and improved FDPB methods to standard FDPB calculations in which the dielectric environment is completely determined by protein atoms. The generic sidechain approach yields a two to threefold increase in accuracy per residue or residue pair over the original pairwise FDPB implementation, with no additional computational cost. Distance dependent dielectric and solvent-exclusion models were also compared with standard FDPB energies. The accuracy of the new pairwise FDPB method is shown to be superior to these models, even after reparameterization of the solvent-exclusion model.
Collapse
Affiliation(s)
- Christina L Vizcarra
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA
| | | | | | | | | | | |
Collapse
|
16
|
Zhang N, Zeng C. Reference energy extremal optimization: A stochastic search algorithm applied to computational protein design. J Comput Chem 2008; 29:1762-71. [DOI: 10.1002/jcc.20937] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
17
|
Kloppmann E, Ullmann GM, Becker T. An extended dead-end elimination algorithm to determine gap-free lists of low energy states. J Comput Chem 2007; 28:2325-35. [PMID: 17471458 DOI: 10.1002/jcc.20749] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Proteins are flexible systems and commonly populate several functionally important states. To understand protein function, these states and their energies have to be identified. We introduce an algorithm that allows the determination of a gap-free list of the low energy states. This algorithm is based on the dead-end elimination (DEE) theorem and is termed X-DEE (extended DEE). X-DEE is applicable to discrete systems whose state energy can be formulated as pairwise interaction between sites and their intrinsic energies. In this article, the computational performance of X-DEE is analyzed and discussed. X-DEE is implemented to determine the lowest energy protonation states of proteins, a problem to which DEE has not been applied so far. We use X-DEE to calculate a list of low energy protonation states for two bacteriorhodopsin structures that represent the first proton transfer step of the bacteriorhodopsin photocycle.
Collapse
Affiliation(s)
- Edda Kloppmann
- Structural Biology/Bioinformatics, University of Bayreuth, Universitätsstr. 30, BGI, 95447 Bayreuth, Germany
| | | | | |
Collapse
|
18
|
Leaver-Fay A, Butterfoss GL, Snoeyink J, Kuhlman B. Maintaining solvent accessible surface area under rotamer substitution for protein design. J Comput Chem 2007; 28:1336-41. [PMID: 17285560 DOI: 10.1002/jcc.20626] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Although quantities derived from solvent accessible surface areas (SASA) are useful in many applications in protein design and structural biology, the computational cost of accurate SASA calculation makes SASA-based scores difficult to integrate into commonly used protein design methodologies. We demonstrate a method for maintaining accurate SASA during a Monte Carlo search of sequence and rotamer space for a fixed protein backbone. We extend the fast Le Grand and Merz algorithm (Le Grand and Merz, J Comput Chem, 14, 349), which discretizes the solvent accessible surface for each atom by placing dots on a sphere and combines Boolean masks to determine which dots are exposed. By replacing semigroup operations with group operations (from Boolean logic to counting dot coverage) we support SASA updates. Our algorithm takes time proportional to the number of atoms affected by rotamer substitution, rather than the number of atoms in the protein. For design simulations with a one hundred residue protein our approach is approximately 145 times faster than performing a Le Grand and Merz SASA calculation from scratch following each rotamer substitution. To demonstrate practical effectiveness, we optimize a SASA-based measure of protein packing in the complete redesign of a large set of proteins and protein-protein interfaces.
Collapse
Affiliation(s)
- Andrew Leaver-Fay
- Department of Computer Science, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | | | | | | |
Collapse
|
19
|
Poole AM, Ranganathan R. Knowledge-based potentials in protein design. Curr Opin Struct Biol 2006; 16:508-13. [PMID: 16843652 DOI: 10.1016/j.sbi.2006.06.013] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2006] [Revised: 06/07/2006] [Accepted: 06/30/2006] [Indexed: 02/03/2023]
Abstract
Knowledge-based potentials are statistical parameters derived from databases of known protein properties that empirically capture aspects of the physical chemistry of protein structure and function. These potentials play a key role in protein design by improving the accuracy of physics-based models of interatomic interactions and enhancing the computational efficiency of the design process by limiting the complexity of searching sequence space. Recently, knowledge-based potentials (in isolation or in combination with physics-based potentials) have been applied to the modification of existing protein function, the redesign of natural protein folds and the complete design of a non-natural protein fold. In addition, knowledge-based potentials appear to be providing important information about the global topology of amino acid interactions in natural proteins. A detailed study of the methods and products of these protein design efforts promises to greatly expand our understanding of proteins and the evolutionary process that created them.
Collapse
Affiliation(s)
- Alan M Poole
- Howard Hughes Medical Institute, Department of Pharmacology and the Green Comprehensive Center Division for Systems Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA
| | | |
Collapse
|
20
|
Endres RG, Wingreen NS. Weight matrices for protein-DNA binding sites from a single co-crystal structure. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 73:061921. [PMID: 16906878 DOI: 10.1103/physreve.73.061921] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2005] [Revised: 01/31/2006] [Indexed: 05/11/2023]
Abstract
Transcription-factor proteins bind to specific DNA sequences to regulate gene expression in cells. DNA-binding sites are often identified using weight matrices calculated from multiple known binding sites. However, in many cases the number of examples is limited. Here, we report on an atomistic method that starts from an x-ray co-crystal structure of the protein bound to one particular DNA sequence, and infers other binding sites, which are used to construct a weight matrix. The emphasis of the paper is on using the Wang-Landau Monte Carlo algorithm to efficiently sample high-affinity binding sites, which demonstrates that sampling can produce accurate weight matrices in analogy to bioinformatics approaches. For cases of low complexity, we compare to the exhaustive (but slow) dead-end elimination algorithm. To recover crystal binding sites, it is important to include bound water in the protein-DNA interface. Our approach can, in principle, even be applied when no native protein-DNA co-crystal structure is available, only the structure of a closely related homologous protein whose amino-acid sequence is changed to the protein of interest.
Collapse
Affiliation(s)
- Robert G Endres
- NEC Laboratories America, Inc., Princeton, New Jersey 08540, USA.
| | | |
Collapse
|
21
|
Vizcarra CL, Mayo SL. Electrostatics in computational protein design. Curr Opin Chem Biol 2005; 9:622-6. [PMID: 16257567 DOI: 10.1016/j.cbpa.2005.10.014] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2005] [Accepted: 10/11/2005] [Indexed: 11/18/2022]
Abstract
Catalytic activity and protein-protein recognition have proven to be significant challenges for computational protein design. Electrostatic interactions are crucial for these and other protein functions, and therefore accurate modeling of electrostatics is necessary for successfully advancing protein design into the realm of protein function. This review focuses on recent progress in modeling electrostatic interactions in computational protein design, with particular emphasis on continuum models.
Collapse
Affiliation(s)
- Christina L Vizcarra
- Division of Chemistry and Chemical Engineering, Division of Biology and Howard Hughes Medical Institute, California Institute of Technology, Pasadena, California 91125, USA
| | | |
Collapse
|