1
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
2
|
Verma D, Grigoryan G, Bailey-Kellogg C. Pareto Optimization of Combinatorial Mutagenesis Libraries. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1143-1153. [PMID: 30040654 PMCID: PMC8262366 DOI: 10.1109/tcbb.2018.2858794] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
In order to increase the hit rate of discovering diverse, beneficial protein variants via high-throughput screening, we have developed a computational method to optimize combinatorial mutagenesis libraries for overall enrichment in two distinct properties of interest. Given scoring functions for evaluating individual variants, POCoM (Pareto Optimal Combinatorial Mutagenesis) scores entire libraries in terms of averages over their constituent members, and designs optimal libraries as sets of mutations whose combinations make the best trade-offs between average scores. This represents the first general-purpose method to directly design combinatorial libraries for multiple objectives characterizing their constituent members. Despite being rigorous in mapping out the Pareto frontier, it is also very fast even for very large libraries (e.g., designing 30 mutation, billion-member libraries in only hours). We here instantiate POCoM with scores based on a target's protein structure and its homologs' sequences, enabling the design of libraries containing variants balancing these two important yet quite different types of information. We demonstrate POCoM's generality and power in case study applications to green fluorescent protein, cytochrome P450, and β-lactamase. Analysis of the POCoM library designs provides insights into the trade-offs between structure- and sequence-based scores, as well as the impacts of experimental constraints on library designs. POCoM libraries incorporate mutations that have previously been found favorable experimentally, while diversifying the contexts in which these mutations are situated and maintaining overall variant quality.
Collapse
|
3
|
Zheng F, Grigoryan G. Simplifying the Design of Protein-Peptide Interaction Specificity with Sequence-Based Representations of Atomistic Models. Methods Mol Biol 2018; 1561:189-200. [PMID: 28236239 DOI: 10.1007/978-1-4939-6798-8_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Computationally designed peptides targeting protein-protein interaction interfaces are of great interest as reagents for biological research and potential therapeutics. In recent years, it has been shown that detailed structure-based calculations can, in favorable cases, describe relevant determinants of protein-peptide recognition. Yet, despite large increases in available computing power, such accurate modeling of the binding reaction is still largely outside the realm of protein design. The chief limitation is in the large sequence spaces generally involved in protein design problems, such that it is typically infeasible to apply expensive modeling techniques to score each sequence. Toward addressing this issue, we have previously shown that by explicitly evaluating the scores of a relatively small number of sequences, it is possible to synthesize a direct mapping between sequences and scores, such that the entire sequence space can be analyzed extremely rapidly. The associated method, called Cluster Expansion, has been used in a number of studies to design binding affinity and specificity. In this chapter, we provide instructions and guidance for applying this technique in the context of designing protein-peptide interactions to enable the use of more detailed and expensive scoring approaches than is typically possible.
Collapse
Affiliation(s)
- Fan Zheng
- Department of Biological Sciences, Dartmouth College, Hanover, NH, 03755, USA
| | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, 6211 Sudikoff Lab, Room 113, Hanover, NH, 03755, USA. .,Department of Biological Sciences, Dartmouth College, Hanover, NH, 03755, USA.
| |
Collapse
|
4
|
Verma D, Grigoryan G, Bailey-Kellogg C. Structure-based design of combinatorial mutagenesis libraries. Protein Sci 2015; 24:895-908. [PMID: 25611189 DOI: 10.1002/pro.2642] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2014] [Revised: 12/14/2014] [Accepted: 01/11/2015] [Indexed: 01/27/2023]
Abstract
The development of protein variants with improved properties (thermostability, binding affinity, catalytic activity, etc.) has greatly benefited from the application of high-throughput screens evaluating large, diverse combinatorial libraries. At the same time, since only a very limited portion of sequence space can be experimentally constructed and tested, an attractive possibility is to use computational protein design to focus libraries on a productive portion of the space. We present a general-purpose method, called "Structure-based Optimization of Combinatorial Mutagenesis" (SOCoM), which can optimize arbitrarily large combinatorial mutagenesis libraries directly based on structural energies of their constituents. SOCoM chooses both positions and substitutions, employing a combinatorial optimization framework based on library-averaged energy potentials in order to avoid explicitly modeling every variant in every possible library. In case study applications to green fluorescent protein, β-lactamase, and lipase A, SOCoM optimizes relatively small, focused libraries whose variants achieve energies comparable to or better than previous library design efforts, as well as larger libraries (previously not designable by structure-based methods) whose variants cover greater diversity while still maintaining substantially better energies than would be achieved by representative random library approaches. By allowing the creation of large-scale combinatorial libraries based on structural calculations, SOCoM promises to increase the scope of applicability of computational protein design and improve the hit rate of discovering beneficial variants. While designs presented here focus on variant stability (predicted by total energy), SOCoM can readily incorporate other structure-based assessments, such as the energy gap between alternative conformational or bound states.
Collapse
Affiliation(s)
- Deeptak Verma
- Department of Computer Science, Dartmouth College, Hanover, New Hampshire
| | | | | |
Collapse
|
5
|
Negron C, Keating AE. A set of computationally designed orthogonal antiparallel homodimers that expands the synthetic coiled-coil toolkit. J Am Chem Soc 2014; 136:16544-56. [PMID: 25337788 PMCID: PMC4277747 DOI: 10.1021/ja507847t] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Indexed: 12/11/2022]
Abstract
Molecular engineering of protein assemblies, including the fabrication of nanostructures and synthetic signaling pathways, relies on the availability of modular parts that can be combined to give different structures and functions. Currently, a limited number of well-characterized protein interaction components are available. Coiled-coil interaction modules have been demonstrated to be useful for biomolecular design, and many parallel homodimers and heterodimers are available in the coiled-coil toolkit. In this work, we sought to design a set of orthogonal antiparallel homodimeric coiled coils using a computational approach. There are very few antiparallel homodimers described in the literature, and none have been measured for cross-reactivity. We tested the ability of the distance-dependent statistical potential DFIRE to predict orientation preferences for coiled-coil dimers of known structure. The DFIRE model was then combined with the CLASSY multistate protein design framework to engineer sets of three orthogonal antiparallel homodimeric coiled coils. Experimental measurements confirmed the successful design of three peptides that preferentially formed antiparallel homodimers that, furthermore, did not interact with one additional previously reported antiparallel homodimer. Two designed peptides that formed higher-order structures suggest how future design protocols could be improved. The successful designs represent a significant expansion of the existing protein-interaction toolbox for molecular engineers.
Collapse
Affiliation(s)
- Christopher Negron
- Program
in Computational and Systems Biology and Departments of Biology and Biological
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts
Avenue, Cambridge, Massachusetts 021393, United States
| | - Amy E. Keating
- Program
in Computational and Systems Biology and Departments of Biology and Biological
Engineering, Massachusetts Institute of
Technology, 77 Massachusetts
Avenue, Cambridge, Massachusetts 021393, United States
| |
Collapse
|
6
|
Zheng F, Jewell H, Fitzpatrick J, Zhang J, Mierke DF, Grigoryan G. Computational design of selective peptides to discriminate between similar PDZ domains in an oncogenic pathway. J Mol Biol 2014; 427:491-510. [PMID: 25451599 DOI: 10.1016/j.jmb.2014.10.014] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Revised: 10/21/2014] [Accepted: 10/23/2014] [Indexed: 11/25/2022]
Abstract
Reagents that target protein-protein interactions to rewire signaling are of great relevance in biological research. Computational protein design may offer a means of creating such reagents on demand, but methods for encoding targeting selectivity are sorely needed. This is especially challenging when targeting interactions with ubiquitous recognition modules--for example, PDZ domains, which bind C-terminal sequences of partner proteins. Here we consider the problem of designing selective PDZ inhibitor peptides in the context of an oncogenic signaling pathway, in which two PDZ domains (NHERF-2 PDZ2-N2P2 and MAGI-3 PDZ6-M3P6) compete for a receptor C-terminus to differentially modulate oncogenic activities. Because N2P2 has been shown to increase tumorigenicity and M3P6 to decreases it, we sought to design peptides that inhibit N2P2 without affecting M3P6. We developed a structure-based computational design framework that models peptide flexibility in binding yet is efficient enough to rapidly analyze tradeoffs between affinity and selectivity. Designed peptides showed low-micromolar inhibition constants for N2P2 and no detectable M3P6 binding. Peptides designed for reverse discrimination bound M3P6 tighter than N2P2, further testing our technology. Experimental and computational analysis of selectivity determinants revealed significant indirect energetic coupling in the binding site. Successful discrimination between N2P2 and M3P6, despite their overlapping binding preferences, is highly encouraging for computational approaches to selective PDZ targeting, especially because design relied on a homology model of M3P6. Still, we demonstrate specific deficiencies of structural modeling that must be addressed to enable truly robust design. The presented framework is general and can be applied in many scenarios to engineer selective targeting.
Collapse
Affiliation(s)
- Fan Zheng
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - Heather Jewell
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA
| | | | - Jian Zhang
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, USA
| | - Dale F Mierke
- Department of Chemistry, Dartmouth College, Hanover, NH 03755, USA
| | - Gevorg Grigoryan
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, USA; Department of Computer Science, Dartmouth College, Hanover, NH 03755, USA.
| |
Collapse
|
7
|
Grigoryan G. Absolute free energies of biomolecules from unperturbed ensembles. J Comput Chem 2013; 34:2726-41. [PMID: 24132787 DOI: 10.1002/jcc.23448] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Revised: 07/11/2013] [Accepted: 08/31/2013] [Indexed: 01/31/2023]
Abstract
Computing the absolute free energy of a macromolecule's structural state, F, is a challenging problem of high relevance. This study presents a method that computes F using only information from an unperturbed simulation of the macromolecule in the relevant conformational state, ensemble, and environment. Absolute free energies produced by this method, dubbed Valuation of Local Configuration Integral with Dynamics (VALOCIDY), enable comparison of alternative states. For example, comparing explicitly solvated and vaporous states of amino acid side-chain analogs produces solvation free energies in good agreement with experiments. Also, comparisons between alternative conformational states of model heptapeptides (including the unfolded state) produce free energy differences in agreement with data from μs molecular-dynamics simulations and experimental propensities. The potential of using VALOCIDY in computational protein design is explored via a small design problem of stabilizing a β-turn structure. When VALOCIDY-based estimation of folding free energy is used as the design metric, the resulting sequence folds into the desired structure within the atomistic force field used in design. The VALOCIDY-based approach also recognizes the distinct status of the native sequence regardless of minor details of the starting template structure, in stark contrast with a traditional fixed-backbone approach.
Collapse
Affiliation(s)
- Gevorg Grigoryan
- Department of Computer Science and Department of Biology, Dartmouth College, Hanover, New Hampshire, 03755
| |
Collapse
|
8
|
Hahn S, Ashenberg O, Grigoryan G, Keating AE. Identifying and reducing error in cluster-expansion approximations of protein energies. J Comput Chem 2011; 31:2900-14. [PMID: 20602445 DOI: 10.1002/jcc.21585] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Protein design involves searching a vast space for sequences that are compatible with a defined structure. This can pose significant computational challenges. Cluster expansion is a technique that can accelerate the evaluation of protein energies by generating a simple functional relationship between sequence and energy. The method consists of several steps. First, for a given protein structure, a training set of sequences with known energies is generated. Next, this training set is used to expand energy as a function of clusters consisting of single residues, residue pairs, and higher order terms, if required. The accuracy of the sequence-based expansion is monitored and improved using cross-validation testing and iterative inclusion of additional clusters. As a trade-off for evaluation speed, the cluster-expansion approximation causes prediction errors, which can be reduced by including more training sequences, including higher order terms in the expansion, and/or reducing the sequence space described by the cluster expansion. This article analyzes the sources of error and introduces a method whereby accuracy can be improved by judiciously reducing the described sequence space. The method is applied to describe the sequence-stability relationship for several protein structures: coiled-coil dimers and trimers, a PDZ domain, and T4 lysozyme as examples with computationally derived energies, and SH3 domains in amphiphysin-1 and endophilin-1 as examples where the expanded pseudo-energies are obtained from experiments. Our open-source software package Cluster Expansion Version 1.0 allows users to expand their own energy function of interest and thereby apply cluster expansion to custom problems in protein design.
Collapse
Affiliation(s)
- Seungsoo Hahn
- Department of Biology, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | | | | | | |
Collapse
|
9
|
Apgar JR, Hahn S, Grigoryan G, Keating AE. Cluster expansion models for flexible-backbone protein energetics. J Comput Chem 2009; 30:2402-13. [PMID: 19360809 DOI: 10.1002/jcc.21249] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Protein structure prediction and design often involve discrete modeling of side-chain conformations on structural templates. Introducing backbone flexibility into such models has proven important in many different applications. Backbone flexibility improves model accuracy and provides access to larger sequence spaces in computational design, although at a cost in complexity and time. Here, we show that the influence of backbone flexibility on protein conformational energetics can be treated implicitly, at the level of sequence, using the technique of cluster expansion. Cluster expansion provides a way to convert structure-based energies into functions of sequence alone. It leads to dramatic speed-ups in energy evaluation and provides a convenient functional form for the analysis and optimization of sequence-structure relationships. We show that it can be applied effectively to flexible-backbone structural models using four proteins: alpha-helical coiled-coil dimers and trimers, zinc fingers, and Bcl-xL/peptide complexes. For each of these, low errors for the sequence-based models when compared with structure-based evaluations show that this new way of treating backbone flexibility has considerable promise, particularly for protein design.
Collapse
Affiliation(s)
- James R Apgar
- MIT Department of Chemistry, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | | | | | | |
Collapse
|
10
|
Grigoryan G, Reinke AW, Keating AE. Design of protein-interaction specificity gives selective bZIP-binding peptides. Nature 2009; 458:859-64. [PMID: 19370028 PMCID: PMC2748673 DOI: 10.1038/nature07885] [Citation(s) in RCA: 289] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2008] [Accepted: 02/09/2009] [Indexed: 01/14/2023]
Abstract
Interaction specificity is a required feature of biological networks and a necessary characteristic of protein or small-molecule reagents and therapeutics. The ability to alter or inhibit protein interactions selectively would advance basic and applied molecular science. Assessing or modelling interaction specificity requires treating multiple competing complexes, which presents computational and experimental challenges. Here we present a computational framework for designing protein-interaction specificity and use it to identify specific peptide partners for human basic-region leucine zipper (bZIP) transcription factors. Protein microarrays were used to characterize designed, synthetic ligands for all but one of 20 bZIP families. The bZIP proteins share strong sequence and structural similarities and thus are challenging targets to bind specifically. Nevertheless, many of the designs, including examples that bind the oncoproteins c-Jun, c-Fos and c-Maf (also called JUN, FOS and MAF, respectively), were selective for their targets over all 19 other families. Collectively, the designs exhibit a wide range of interaction profiles and demonstrate that human bZIPs have only sparsely sampled the possible interaction space accessible to them. Our computational method provides a way to systematically analyse trade-offs between stability and specificity and is suitable for use with many types of structure-scoring functions; thus, it may prove broadly useful as a tool for protein design.
Collapse
Affiliation(s)
- Gevorg Grigoryan
- MIT Department of Biology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA
| | | | | |
Collapse
|
11
|
Apgar JR, Gutwin KN, Keating AE. Predicting helix orientation for coiled-coil dimers. Proteins 2008; 72:1048-65. [PMID: 18506779 DOI: 10.1002/prot.22118] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The alpha-helical coiled coil is a structurally simple protein oligomerization or interaction motif consisting of two or more alpha helices twisted into a supercoiled bundle. Coiled coils can differ in their stoichiometry, helix orientation, and axial alignment. Because of the near degeneracy of many of these variants, coiled coils pose a challenge to fold recognition methods for structure prediction. Whereas distinctions between some protein folds can be discriminated on the basis of hydrophobic/polar patterning or secondary structure propensities, the sequence differences that encode important details of coiled-coil structure can be subtle. This is emblematic of a larger problem in the field of protein structure and interaction prediction: that of establishing specificity between closely similar structures. We tested the behavior of different computational models on the problem of recognizing the correct orientation--parallel vs. antiparallel--of pairs of alpha helices that can form a dimeric coiled coil. For each of 131 examples of known structure, we constructed a large number of both parallel and antiparallel structural models and used these to assess the ability of five energy functions to recognize the correct fold. We also developed and tested three sequence-based approaches that make use of varying degrees of implicit structural information. The best structural methods performed similarly to the best sequence methods, correctly categorizing approximately 81% of dimers. Steric compatibility with the fold was important for some coiled coils we investigated. For many examples, the correct orientation was determined by smaller energy differences between parallel and antiparallel structures distributed over many residues and energy components. Prediction methods that used structure but incorporated varying approximations and assumptions showed quite different behaviors when used to investigate energetic contributions to orientation preference. Sequence based methods were sensitive to the choice of residue-pair interactions scored.
Collapse
Affiliation(s)
- James R Apgar
- MIT Department of Chemistry, Cambridge, Massachusetts 02139, USA
| | | | | |
Collapse
|
12
|
Lippow SM, Tidor B. Progress in computational protein design. Curr Opin Biotechnol 2007; 18:305-11. [PMID: 17644370 PMCID: PMC3495006 DOI: 10.1016/j.copbio.2007.04.009] [Citation(s) in RCA: 161] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2007] [Accepted: 04/17/2007] [Indexed: 11/25/2022]
Abstract
Current progress in computational structure-based protein design is reviewed in the areas of methodology and applications. Foundational advances include new potential functions, more efficient ways of computing energetics, flexible treatments of solvent, and useful energy function approximations, as well as ensemble-based approaches to scoring designs for inclusion of entropic effects, improvements to guaranteed and to stochastic search techniques, and methods to design combinatorial libraries for screening and selection. Applications include new approaches and successes in the design of specificity for protein folding, binding, and catalysis, in the redesign of proteins for enhanced binding affinity, and in the application of design technology to study and alter enzyme catalysis. Computational protein design continues to mature and advance.
Collapse
Affiliation(s)
- Shaun M Lippow
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | | |
Collapse
|
13
|
Grigoryan G, Zhou F, Lustig SR, Ceder G, Morgan D, Keating AE. Ultra-fast evaluation of protein energies directly from sequence. PLoS Comput Biol 2006; 2:e63. [PMID: 16789811 PMCID: PMC1479088 DOI: 10.1371/journal.pcbi.0020063] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2006] [Accepted: 04/24/2006] [Indexed: 11/22/2022] Open
Abstract
The structure, function, stability, and many other properties of a protein in a fixed environment are fully specified by its sequence, but in a manner that is difficult to discern. We present a general approach for rapidly mapping sequences directly to their energies on a pre-specified rigid backbone, an important sub-problem in computational protein design and in some methods for protein structure prediction. The cluster expansion (CE) method that we employ can, in principle, be extended to model any computable or measurable protein property directly as a function of sequence. Here we show how CE can be applied to the problem of computational protein design, and use it to derive excellent approximations of physical potentials. The approach provides several attractive advantages. First, following a one-time derivation of a CE expansion, the amount of time necessary to evaluate the energy of a sequence adopting a specified backbone conformation is reduced by a factor of 107 compared to standard full-atom methods for the same task. Second, the agreement between two full-atom methods that we tested and their CE sequence-based expressions is very high (root mean square deviation 1.1–4.7 kcal/mol, R2 = 0.7–1.0). Third, the functional form of the CE energy expression is such that individual terms of the expansion have clear physical interpretations. We derived expressions for the energies of three classic protein design targets—a coiled coil, a zinc finger, and a WW domain—as functions of sequence, and examined the most significant terms. Single-residue and residue-pair interactions are sufficient to accurately capture the energetics of the dimeric coiled coil, whereas higher-order contributions are important for the two more globular folds. For the task of designing novel zinc-finger sequences, a CE-derived energy function provides significantly better solutions than a standard design protocol, in comparable computation time. Given these advantages, CE is likely to find many uses in computational structural modeling. Many applications in computational structural biology involve evaluating the energy of a protein adopting a specific structure. A variety of functions are used for this purpose. Statistical potentials are fast to evaluate but do not have a clear biophysical basis, whereas physics-based functions consist of well-defined terms that can be costly to compute. This paper describes how the theory of cluster expansion, originally developed to describe the energies of alloys, can be applied to generate a physical potential for proteins that is extremely fast to evaluate. Cluster expansion is a way of representing a property of a system as a discrete function of its degrees of freedom. In this paper, it is used for the problem of protein design, where the energy is determined by the identities and conformations of amino acids at different sites on a fixed protein backbone. Application of cluster expansion to three small protein folds—the α-helical coiled coil, the zinc finger, and the WW domain—shows that protein sequence can be mapped directly to energy using a surprisingly simple function that maintains high accuracy. Promising results on these small systems suggest that the theory may have utility for macromolecular modeling more generally.
Collapse
Affiliation(s)
- Gevorg Grigoryan
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Fei Zhou
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Steve R Lustig
- DuPont Central Research and Development, Experimental Station, Wilmington, Delaware, United States of America
| | - Gerbrand Ceder
- Department of Material Science and Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Dane Morgan
- Department of Material Science and Engineering, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Amy E Keating
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|