1
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
2
|
Zhu J, Avakyan N, Kakkis AA, Hoffnagle AM, Han K, Li Y, Zhang Z, Choi TS, Na Y, Yu CJ, Tezcan FA. Protein Assembly by Design. Chem Rev 2021; 121:13701-13796. [PMID: 34405992 PMCID: PMC9148388 DOI: 10.1021/acs.chemrev.1c00308] [Citation(s) in RCA: 107] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proteins are nature's primary building blocks for the construction of sophisticated molecular machines and dynamic materials, ranging from protein complexes such as photosystem II and nitrogenase that drive biogeochemical cycles to cytoskeletal assemblies and muscle fibers for motion. Such natural systems have inspired extensive efforts in the rational design of artificial protein assemblies in the last two decades. As molecular building blocks, proteins are highly complex, in terms of both their three-dimensional structures and chemical compositions. To enable control over the self-assembly of such complex molecules, scientists have devised many creative strategies by combining tools and principles of experimental and computational biophysics, supramolecular chemistry, inorganic chemistry, materials science, and polymer chemistry, among others. Owing to these innovative strategies, what started as a purely structure-building exercise two decades ago has, in short order, led to artificial protein assemblies with unprecedented structures and functions and protein-based materials with unusual properties. Our goal in this review is to give an overview of this exciting and highly interdisciplinary area of research, first outlining the design strategies and tools that have been devised for controlling protein self-assembly, then describing the diverse structures of artificial protein assemblies, and finally highlighting the emergent properties and functions of these assemblies.
Collapse
Affiliation(s)
| | | | - Albert A. Kakkis
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Alexander M. Hoffnagle
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Kenneth Han
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Yiying Li
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Zhiyin Zhang
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Tae Su Choi
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Youjeong Na
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Chung-Jui Yu
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - F. Akif Tezcan
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| |
Collapse
|
3
|
Nanda V, Belure SV, Shir OM. Searching for the Pareto frontier in multi-objective protein design. Biophys Rev 2017; 9:339-344. [PMID: 28799089 DOI: 10.1007/s12551-017-0288-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Accepted: 07/25/2017] [Indexed: 12/26/2022] Open
Abstract
The goal of protein engineering and design is to identify sequences that adopt three-dimensional structures of desired function. Often, this is treated as a single-objective optimization problem, identifying the sequence-structure solution with the lowest computed free energy of folding. However, many design problems are multi-state, multi-specificity, or otherwise require concurrent optimization of multiple objectives. There may be tradeoffs among objectives, where improving one feature requires compromising another. The challenge lies in determining solutions that are part of the Pareto optimal set-designs where no further improvement can be achieved in any of the objectives without degrading one of the others. Pareto optimality problems are found in all areas of study, from economics to engineering to biology, and computational methods have been developed specifically to identify the Pareto frontier. We review progress in multi-objective protein design, the development of Pareto optimization methods, and present a specific case study using multi-objective optimization methods to model the tradeoff between three parameters, stability, specificity, and complexity, of a set of interacting synthetic collagen peptides.
Collapse
Affiliation(s)
- Vikas Nanda
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, NJ, USA.
- Department of Biochemistry and Molecular Biophysics, Robert Wood Johnson Medical School, Rutgers University, Piscataway, NJ, USA.
| | - Sandeep V Belure
- Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, NJ, USA
- Department of Biochemistry and Molecular Biophysics, Robert Wood Johnson Medical School, Rutgers University, Piscataway, NJ, USA
| | - Ofer M Shir
- Department of Computer Science, Tel-Hai College, Kiryat Shmona, Upper Galilee, Israel
- The Galilee Research Institute-Migal, Kiryat Shmona, Upper Galilee, Israel
| |
Collapse
|
4
|
Computational protein design with backbone plasticity. Biochem Soc Trans 2016; 44:1523-1529. [PMID: 27911735 PMCID: PMC5264498 DOI: 10.1042/bst20160155] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Revised: 08/01/2016] [Accepted: 08/03/2016] [Indexed: 11/17/2022]
Abstract
The computational algorithms used in the design of artificial proteins have become increasingly sophisticated in recent years, producing a series of remarkable successes. The most dramatic of these is the de novo design of artificial enzymes. The majority of these designs have reused naturally occurring protein structures as ‘scaffolds’ onto which novel functionality can be grafted without having to redesign the backbone structure. The incorporation of backbone flexibility into protein design is a much more computationally challenging problem due to the greatly increased search space, but promises to remove the limitations of reusing natural protein scaffolds. In this review, we outline the principles of computational protein design methods and discuss recent efforts to consider backbone plasticity in the design process.
Collapse
|
5
|
Koh SK, Ananthasuresh GK, Vishveshwara S. A Deterministic Optimization Approach to Protein Sequence Design Using Continuous Models. Int J Rob Res 2016. [DOI: 10.1177/0278364905050354] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Determining the sequence of amino acid residues in a heteropolymer chain of a protein with a given conformation is a discrete combinatorial problem that is not generally amenable for gradient-based continuous optimization algorithms. In this paper we present a new approach to this problem using continuous models. In this modeling, continuous “state functions” are proposed to designate the type of each residue in the chain. Such a continuous model helps define a continuous sequence space in which a chosen criterion is optimized to find the most appropriate sequence. Searching a continuous sequence space using a deterministic optimization algorithm makes it possible to find the optimal sequences with much less computation than many other approaches. The computational efficiency of this method is further improved by combining it with a graph spectral method, which explicitly takes into account the topology of the desired conformation and also helps make the combined method more robust. The continuous modeling used here appears to have additional advantages in mimicking the folding pathways and in creating the energy landscapes that help find sequences with high stability and kinetic accessibility. To illustrate the new approach, a widely used simplifying assumption is made by considering only two types of residues: hydrophobic (H) and polar (P). Self-avoiding compact lattice models are used to validate the method with known results in the literature and data that can be practically obtained by exhaustive enumeration on a desktop computer. We also present examples of sequence design for the HP models of some real proteins, which are solved in less than five minutes on a single-processor desktop computer. Some open issues and future extensions are noted.
Collapse
Affiliation(s)
- Sung K. Koh
- Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, 19104-6315, USA
| | - G. K. Ananthasuresh
- Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, 19104-6315, USA and Mechanical Engineering, Indian Institute of Science, Bangalore 560 012, India,
| | | |
Collapse
|
6
|
Xu F, Silva T, Joshi M, Zahid S, Nanda V. Circular permutation directs orthogonal assembly in complex collagen peptide mixtures. J Biol Chem 2013; 288:31616-23. [PMID: 24043622 DOI: 10.1074/jbc.m113.501056] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Multiple types of natural collagens specifically assemble and co-exist in the extracellular matrix. Although noncollagenous trimerization domains facilitate the folding of triple-helical regions, it is intriguing to ask whether collagen sequences are also capable of controlling heterospecific association. In this study, we designed a model system mimicking simultaneous specific assembly of two collagen heterotrimers using a genetically inspired operation, circular permutation. Previously, surface charge-pair interactions were optimized on three collagen peptides to promote the formation of an abc-type heterotrimer. Circular permutation of these sequences retained networks of stabilizing interactions, preserving both triple-helical structure and heterospecificity of assembly. Combining original peptides A, B, and C and permuted peptides D, E, and F resulted primarily in formation of A:B:C and D:E:F, a heterospecificity of 2 of 56 possible stoichiometries. This degree of specificity in collagen molecular recognition is unprecedented in natural or synthetic collagens. Analysis of natural collagen sequences indicates low similarity between the neighboring exons. Combining the synthetic collagen model and bioinformatic analysis provides insight on how fibrillar collagens might have arisen from the duplication of smaller domains.
Collapse
Affiliation(s)
- Fei Xu
- From the Center for Advanced Biotechnology and Medicine, Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers University, Piscataway, New Jersey 08854
| | | | | | | | | |
Collapse
|
7
|
Huang YM, Bystroff C. Expanded explorations into the optimization of an energy function for protein design. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1176-1187. [PMID: 24384706 PMCID: PMC3919130 DOI: 10.1109/tcbb.2013.113] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Nature possesses a secret formula for the energy as a function of the structure of a protein. In protein design, approximations are made to both the structural representation of the molecule and to the form of the energy equation, such that the existence of a general energy function for proteins is by no means guaranteed. Here, we present new insights toward the application of machine learning to the problem of finding a general energy function for protein design. Machine learning requires the definition of an objective function, which carries with it the implied definition of success in protein design. We explored four functions, consisting of two functional forms, each with two criteria for success. Optimization was carried out by a Monte Carlo search through the space of all variable parameters. Cross-validation of the optimized energy function against a test set gave significantly different results depending on the choice of objective function, pointing to relative correctness of the built-in assumptions. Novel energy cross terms correct for the observed nonadditivity of energy terms and an imbalance in the distribution of predicted amino acids. This paper expands on the work presented at the 2012 ACM-BCB.
Collapse
|
8
|
Simonson T, Gaillard T, Mignon D, Schmidt am Busch M, Lopes A, Amara N, Polydorides S, Sedano A, Druart K, Archontis G. Computational protein design: the Proteus software and selected applications. J Comput Chem 2013; 34:2472-84. [PMID: 24037756 DOI: 10.1002/jcc.23418] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 07/08/2013] [Accepted: 07/28/2013] [Indexed: 12/13/2022]
Abstract
We describe an automated procedure for protein design, implemented in a flexible software package, called Proteus. System setup and calculation of an energy matrix are done with the XPLOR modeling program and its sophisticated command language, supporting several force fields and solvent models. A second program provides algorithms to search sequence space. It allows a decomposition of the system into groups, which can be combined in different ways in the energy function, for both positive and negative design. The whole procedure can be controlled by editing 2-4 scripts. Two applications consider the tyrosyl-tRNA synthetase enzyme and its successful redesign to bind both O-methyl-tyrosine and D-tyrosine. For the latter, we present Monte Carlo simulations where the D-tyrosine concentration is gradually increased, displacing L-tyrosine from the binding pocket and yielding the binding free energy difference, in good agreement with experiment. Complete redesign of the Crk SH3 domain is presented. The top 10000 sequences are all assigned to the correct fold by the SUPERFAMILY library of Hidden Markov Models. Finally, we report the acid/base behavior of the SNase protein. Sidechain protonation is treated as a form of mutation; it is then straightforward to perform constant-pH Monte Carlo simulations, which yield good agreement with experiment. Overall, the software can be used for a wide range of application, producing not only native-like sequences but also thermodynamic properties with errors that appear comparable to other current software packages.
Collapse
Affiliation(s)
- Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, 91128, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Parmar AS, Zahid S, Belure SV, Young R, Hasan N, Nanda V. Design of net-charged abc-type collagen heterotrimers. J Struct Biol 2013; 185:163-7. [PMID: 23603270 DOI: 10.1016/j.jsb.2013.04.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Revised: 03/05/2013] [Accepted: 04/08/2013] [Indexed: 10/26/2022]
Abstract
Net-negatively-charged heterospecific A:B:C collagen peptide heterotrimers were designed using an automated computational approach. The design algorithm considers both target stability and the energy gap between the target states and misfolded competing states. Structural characterization indicates the net-negative charge balance on the new designs enhances the specificity of the target state at the expense of its stability.
Collapse
Affiliation(s)
- Avanish S Parmar
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ and Center for Advanced Biotechnology and Medicine, Piscataway, NJ 08854, United States
| | - Sohail Zahid
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ and Center for Advanced Biotechnology and Medicine, Piscataway, NJ 08854, United States
| | - Sandeep V Belure
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ and Center for Advanced Biotechnology and Medicine, Piscataway, NJ 08854, United States
| | - Robert Young
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ and Center for Advanced Biotechnology and Medicine, Piscataway, NJ 08854, United States
| | - Nida Hasan
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ and Center for Advanced Biotechnology and Medicine, Piscataway, NJ 08854, United States
| | - Vikas Nanda
- Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, UMDNJ and Center for Advanced Biotechnology and Medicine, Piscataway, NJ 08854, United States.
| |
Collapse
|
10
|
Matthies MC, Bienert S, Torda AE. Dynamics in Sequence Space for RNA Secondary Structure Design. J Chem Theory Comput 2012; 8:3663-70. [PMID: 26593011 DOI: 10.1021/ct300267j] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
We have implemented a method for the design of RNA sequences that should fold to arbitrary secondary structures. A popular energy model allows one to take the derivative with respect to composition, which can then be interpreted as a force and used for Newtonian dynamics in sequence space. Combined with a negative design term, one can rapidly sample sequences which are compatible with a desired secondary structure via simulated annealing. Results for 360 structures were compared with those from another nucleic acid design program using measures such as the probability of the target structure and an ensemble-weighted distance to the target structure.
Collapse
Affiliation(s)
- Marco C Matthies
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Stefan Bienert
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany.,Biozentrum, University of Basel, Klingelbergstr. 50/70, 4056 Basel, Switzerland
| | - Andrew E Torda
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| |
Collapse
|
11
|
Perez-Aguilar JM, Saven JG. Computational design of membrane proteins. Structure 2012; 20:5-14. [PMID: 22244752 DOI: 10.1016/j.str.2011.12.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Revised: 12/21/2011] [Accepted: 12/21/2011] [Indexed: 11/26/2022]
Abstract
Membrane proteins are involved in a wide variety of cellular processes, and are typically part of the first interaction a cell has with extracellular molecules. As a result, these proteins comprise a majority of known drug targets. Membrane proteins are among the most difficult proteins to obtain and characterize, and a structure-based understanding of their properties can be difficult to elucidate. Notwithstanding, the design of membrane proteins can provide stringent tests of our understanding of these crucial biological systems, as well as introduce novel or targeted functionalities. Computational design methods have been particularly helpful in addressing these issues, and this review discusses recent studies that tailor membrane proteins to display specific structures or functions and examines how redesigned membrane proteins are being used to facilitate structural and functional studies.
Collapse
|
12
|
Xu F, Zahid S, Silva T, Nanda V. Computational design of a collagen A:B:C-type heterotrimer. J Am Chem Soc 2011; 133:15260-3. [PMID: 21902217 DOI: 10.1021/ja205597g] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have successfully designed an A:B:C collagen peptide heterotrimer using an automated computational approach. The algorithm maximizes the energy gap between the target and competing misfolded states while enforcing a minimum target stability. Circular dichroism (CD) measurements confirm that all three peptides are required to form a stable, structured triple helix. This study highlights the power of automated computational design, providing model systems to probe the biophysics of collagen assembly and developing general methods for the design of fibrous proteins.
Collapse
Affiliation(s)
- Fei Xu
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ and the Center for Advanced Biotechnology and Medicine, Piscataway, New Jersey 08854, United States
| | | | | | | |
Collapse
|
13
|
Genetic algorithm with alternating selection pressure for protein side-chain packing and pK(a) prediction. Biosystems 2011; 105:263-70. [PMID: 21672605 DOI: 10.1016/j.biosystems.2011.05.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2010] [Revised: 04/21/2011] [Accepted: 05/26/2011] [Indexed: 11/20/2022]
Abstract
The prediction of protein side-chain conformation is central for understanding protein functions. Side-chain packing is a sub-problem of protein folding and its computational complexity has been shown to be NP-hard. We investigated the capabilities of a hybrid (genetic algorithm/simulated annealing) technique for side-chain packing and for the generation of an ensemble of low energy side-chain conformations. Our method first relies on obtaining a near-optimal low energy protein conformation by optimizing its amino-acid side-chains. Upon convergence, the genetic algorithm is allowed to undergo forward and "backward" evolution by alternating selection pressures between minimal and higher energy setpoints. We show that this technique is very efficient for obtaining distributions of solutions centered at any desired energy from the minimum. We outline the general concepts of our evolutionary sampling methodology using three different alternating selective pressure schemes. Quality of the method was assessed by using it for protein pK(a) prediction.
Collapse
|
14
|
Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG. Theoretical and Computational Protein Design. Annu Rev Phys Chem 2011; 62:129-49. [DOI: 10.1146/annurev-physchem-032210-103509] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
| | | | | | - Jeffery G. Saven
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
| |
Collapse
|
15
|
Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010; 5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. METHODOLOGY/PRINCIPAL FINDINGS WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. CONCLUSIONS/SIGNIFICANCE For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Audrey Sedano
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
16
|
De novo self-assembling collagen heterotrimers using explicit positive and negative design. Biochemistry 2010; 49:2307-16. [PMID: 20170197 DOI: 10.1021/bi902077d] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We sought to computationally design model collagen peptides that specifically associate as heterotrimers. Computational design has been successfully applied to the creation of new protein folds and functions. Despite the high abundance of collagen and its key role in numerous biological processes, fibrous proteins have received little attention as computational design targets. Collagens are composed of three polypeptide chains that wind into triple helices. We developed a discrete computational model to design heterotrimer-forming collagen-like peptides. Stability and specificity of oligomerization were concurrently targeted using a combined positive and negative design approach. The sequences of three 30-residue peptides, A, B, and C, were optimized to favor charge-pair interactions in an ABC heterotrimer, while disfavoring the 26 competing oligomers (i.e., AAA, ABB, BCA). Peptides were synthesized and characterized for thermal stability and triple-helical structure by circular dichroism and NMR. A unique A:B:C-type species was not achieved. Negative design was partially successful, with only A + B and B + C competing mixtures formed. Analysis of computed versus experimental stabilities helps to clarify the role of electrostatics and secondary-structure propensities determining collagen stability and to provide important insight into how subsequent designs can be improved.
Collapse
|
17
|
Abstract
This paper discusses recent optimization approaches to the protein side-chain prediction problem, protein structural alignment, and molecular structure determination from X-ray diffraction measurements. The machinery employed to solve these problems has included algorithms from linear programming, dynamic programming, combinatorial optimization, and mixed-integer nonlinear programming. Many of these problems are purely continuous in nature. Yet, to this date, they have been approached mostly via combinatorial optimization algorithms that are applied to discrete approximations. The main purpose of the paper is to offer an introduction and motivate further systems approaches to these problems.
Collapse
Affiliation(s)
- Nikolaos V. Sahinidis
- Department of Chemical Engineering Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
18
|
Hong EJ, Lippow SM, Tidor B, Lozano-Pérez T. Rotamer optimization for protein design through MAP estimation and problem-size reduction. J Comput Chem 2009; 30:1923-45. [PMID: 19123203 DOI: 10.1002/jcc.21188] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The search for the global minimum energy conformation (GMEC) of protein side chains is an important computational challenge in protein structure prediction and design. Using rotamer models, the problem is formulated as a NP-hard optimization problem. Dead-end elimination (DEE) methods combined with systematic A* search (DEE/A*) has proven useful, but may not be strong enough as we attempt to solve protein design problems where a large number of similar rotamers is eligible and the network of interactions between residues is dense. In this work, we present an exact solution method, named BroMAP (branch-and-bound rotamer optimization using MAP estimation), for such protein design problems. The design goal of BroMAP is to be able to expand smaller search trees than conventional branch-and-bound methods while performing only a moderate amount of computation in each node, thereby reducing the total running time. To achieve that, BroMAP attempts reduction of the problem size within each node through DEE and elimination by lower bounds from approximate maximum-a-posteriori (MAP) estimation. The lower bounds are also exploited in branching and subproblem selection for fast discovery of strong upper bounds. Our computational results show that BroMAP tends to be faster than DEE/A* for large protein design cases. BroMAP also solved cases that were not solved by DEE/A* within the maximum allowed time, and did not incur significant disadvantage for cases where DEE/A* performed well. Therefore, BroMAP is particularly applicable to large protein design problems where DEE/A* struggles and can also substitute for DEE/A* in general GMEC search.
Collapse
Affiliation(s)
- Eun-Jong Hong
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | |
Collapse
|
19
|
am Busch MS, Mignon D, Simonson T. Computational protein design as a tool for fold recognition. Proteins 2009; 77:139-58. [PMID: 19408297 DOI: 10.1002/prot.22426] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computationally designed protein sequences have been proposed as a basis to perform fold recognition and homology searching. To investigate this possibility, an automated procedure is used to completely redesign 24 SH3 proteins and 22 SH2 proteins. We use the experimental backbone coordinates as fixed templates in the folded state and a molecular mechanics model to compute the pairwise interaction energies between all sidechain types and conformations. Energy calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is then used to scan the sequence and conformational space for optimal solutions. We produced 200,000-450,000 sequences for each backbone template. The designed sequences ressemble moderately-distant, natural homologues of the initial templates, according to their identity scores and their similarity with respect to the Pfam sets of SH2 and SH3 domains. Standard homology detection tools document their native-like character: the Conserved Domain Database recognizes 61% (52%) of our low-energy sequences as SH3 (SH2) domains; the SUPERFAMILY, Hidden-Markov Model library recognizes 81% (84%). Conversely, position specific scoring matrices (PSSMs) derived from our designed sequences can be used to detect natural homologues in sequence databases. Within SwissProt, a set of natural SH3 PSSMs detects 772 SH3 domains, for example; our designed PSSMs detect 67% of these, plus one additional sequence and two false positives. If six amino acids involved in substrate binding (a selective pressure not accounted for in our design) are reset to their experimental types, then 77% of the experimental SH3 domains are detected. Results for the SH2 domains are similar. Several directions to improve the method further are discussed.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | | | | |
Collapse
|
20
|
Bhattacherjee A, Biswas P. Combinatorial design of protein sequences with applications to lattice and real proteins. J Chem Phys 2009; 131:125101. [DOI: 10.1063/1.3236519] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
21
|
Jha AN, Ananthasuresh GK, Vishveshwara S. A search for energy minimized sequences of proteins. PLoS One 2009; 4:e6684. [PMID: 19690619 PMCID: PMC2724685 DOI: 10.1371/journal.pone.0006684] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 07/23/2009] [Indexed: 11/21/2022] Open
Abstract
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.
Collapse
Affiliation(s)
- Anupam Nath Jha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - G. K. Ananthasuresh
- Department of Mechanical Engineering, Indian Institute of Science, Bangalore, India
- * E-mail: (SV); (GKA)
| | - Saraswathi Vishveshwara
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- * E-mail: (SV); (GKA)
| |
Collapse
|
22
|
Suárez M, Jaramillo A. Challenges in the computational design of proteins. J R Soc Interface 2009; 6 Suppl 4:S477-91. [PMID: 19324680 PMCID: PMC2843960 DOI: 10.1098/rsif.2008.0508.focus] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2008] [Accepted: 02/04/2009] [Indexed: 11/12/2022] Open
Abstract
Protein design has many applications not only in biotechnology but also in basic science. It uses our current knowledge in structural biology to predict, by computer simulations, an amino acid sequence that would produce a protein with targeted properties. As in other examples of synthetic biology, this approach allows the testing of many hypotheses in biology. The recent development of automated computational methods to design proteins has enabled proteins to be designed that are very different from any known ones. Moreover, some of those methods mostly rely on a physical description of atomic interactions, which allows the designed sequences not to be biased towards known proteins. In this paper, we will describe the use of energy functions in computational protein design, the use of atomic models to evaluate the free energy in the unfolded and folded states, the exploration and optimization of amino acid sequences, the problem of negative design and the design of biomolecular function. We will also consider its use together with the experimental techniques such as directed evolution. We will end by discussing the challenges ahead in computational protein design and some of their future applications.
Collapse
Affiliation(s)
- María Suárez
- Laboratoire de Biochimie, Ecole Polytechnique, CNRS, 91128 Palaiseau Cedex, France
- Epigenomics Project, Genopole, Université d'Evry Val d'Essonne-Genopole-CNRS, Tour Evry2, Etage 10, Terrasses de l'Agora, 91034 Evry Cedex, France
| | - Alfonso Jaramillo
- Laboratoire de Biochimie, Ecole Polytechnique, CNRS, 91128 Palaiseau Cedex, France
- Epigenomics Project, Genopole, Université d'Evry Val d'Essonne-Genopole-CNRS, Tour Evry2, Etage 10, Terrasses de l'Agora, 91034 Evry Cedex, France
| |
Collapse
|
23
|
Suárez M, Tortosa P, Jaramillo A. PROTDES: CHARMM toolbox for computational protein design. SYSTEMS AND SYNTHETIC BIOLOGY 2009; 2:105-13. [PMID: 19572216 PMCID: PMC2735645 DOI: 10.1007/s11693-009-9026-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Revised: 05/17/2009] [Accepted: 05/30/2009] [Indexed: 12/13/2022]
Abstract
We present an open-source software able to automatically mutate any residue positions and find the best aminoacids in an arbitrary protein structure without requiring pairwise approximations. Our software, PROTDES, is based on CHARMM and it searches automatically for mutations optimizing a protein folding free energy. PROTDES allows the integration of molecular dynamics within the protein design. We have implemented an heuristic optimization algorithm that iteratively searches the best aminoacids and their conformations for an arbitrary set of positions within a structure. Our software allows CHARMM users to perform protein design calculations and to create their own procedures for protein design using their own energy functions. We show this by implementing three different energy functions based on different solvent treatments: surface area accessibility, generalized Born using molecular volume and an effective energy function. PROTDES, a tutorial, parameter sets, configuration tools and examples are freely available at http://soft.synth-bio.org/protdes.html.
Collapse
Affiliation(s)
- María Suárez
- Biochemistry Laboratory, CNRS—UMR 7654, Ecole Polytechnique, 91128 Palaiseau, France
- SYNTH-BIO group Epigenomics Project, Genopole Tour Evry2, etage 10, 523, Terrasses de l’Agora, 91034 Evry Cedex, France
| | - Pablo Tortosa
- Biochemistry Laboratory, CNRS—UMR 7654, Ecole Polytechnique, 91128 Palaiseau, France
| | - Alfonso Jaramillo
- Biochemistry Laboratory, CNRS—UMR 7654, Ecole Polytechnique, 91128 Palaiseau, France
- SYNTH-BIO group Epigenomics Project, Genopole Tour Evry2, etage 10, 523, Terrasses de l’Agora, 91034 Evry Cedex, France
| |
Collapse
|
24
|
Moltó G, Suárez M, Tortosa P, Alonso JM, Hernández V, Jaramillo A. Protein Design Based on Parallel Dimensional Reduction. J Chem Inf Model 2009; 49:1261-71. [DOI: 10.1021/ci8004594] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Germán Moltó
- Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 46022 Valencia, Spain, Epigenomics Project, Genopole-Université d'Évry Val d'Essonne-CNRS UPS 3201, 91034 Évry, France, and Laboratoire de Biochimie, École Polytechnique-CNRS UMR 7654, 91128, Palaiseau, France
| | - María Suárez
- Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 46022 Valencia, Spain, Epigenomics Project, Genopole-Université d'Évry Val d'Essonne-CNRS UPS 3201, 91034 Évry, France, and Laboratoire de Biochimie, École Polytechnique-CNRS UMR 7654, 91128, Palaiseau, France
| | - Pablo Tortosa
- Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 46022 Valencia, Spain, Epigenomics Project, Genopole-Université d'Évry Val d'Essonne-CNRS UPS 3201, 91034 Évry, France, and Laboratoire de Biochimie, École Polytechnique-CNRS UMR 7654, 91128, Palaiseau, France
| | - José M. Alonso
- Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 46022 Valencia, Spain, Epigenomics Project, Genopole-Université d'Évry Val d'Essonne-CNRS UPS 3201, 91034 Évry, France, and Laboratoire de Biochimie, École Polytechnique-CNRS UMR 7654, 91128, Palaiseau, France
| | - Vicente Hernández
- Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 46022 Valencia, Spain, Epigenomics Project, Genopole-Université d'Évry Val d'Essonne-CNRS UPS 3201, 91034 Évry, France, and Laboratoire de Biochimie, École Polytechnique-CNRS UMR 7654, 91128, Palaiseau, France
| | - Alfonso Jaramillo
- Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, 46022 Valencia, Spain, Epigenomics Project, Genopole-Université d'Évry Val d'Essonne-CNRS UPS 3201, 91034 Évry, France, and Laboratoire de Biochimie, École Polytechnique-CNRS UMR 7654, 91128, Palaiseau, France
| |
Collapse
|
25
|
am Busch MS, Lopes A, Amara N, Bathelt C, Simonson T. Testing the Coulomb/Accessible Surface Area solvent model for protein stability, ligand binding, and protein design. BMC Bioinformatics 2008; 9:148. [PMID: 18366628 PMCID: PMC2292695 DOI: 10.1186/1471-2105-9-148] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Accepted: 03/13/2008] [Indexed: 11/10/2022] Open
Abstract
Background Protein structure prediction and computational protein design require efficient yet sufficiently accurate descriptions of aqueous solvent. We continue to evaluate the performance of the Coulomb/Accessible Surface Area (CASA) implicit solvent model, in combination with the Charmm19 molecular mechanics force field. We test a set of model parameters optimized earlier, and we also carry out a new optimization in this work, using as a target a set of experimental stability changes for single point mutations of various proteins and peptides. The optimization procedure is general, and could be used with other force fields. The computation of stability changes requires a model for the unfolded state of the protein. In our approach, this state is represented by tripeptide structures of the sequence Ala-X-Ala for each amino acid type X. We followed an iterative optimization scheme which, at each cycle, optimizes the solvation parameters and a set of tripeptide structures for the unfolded state. This protocol uses a set of 140 experimental stability mutations and a large set of tripeptide conformations to find the best tripeptide structures and solvation parameters. Results Using the optimized parameters, we obtain a mean unsigned error of 2.28 kcal/mol for the stability mutations. The performance of the CASA model is assessed by two further applications: (i) calculation of protein-ligand binding affinities and (ii) computational protein design. For these two applications, the previous parameters and the ones optimized here give a similar performance. For ligand binding, we obtain reasonable agreement with a set of 55 experimental mutation data, with a mean unsigned error of 1.76 kcal/mol with the new parameters and 1.47 kcal/mol with the earlier ones. We show that the optimized CASA model is not inferior to the Generalized Born/Surface Area (GB/SA) model for the prediction of these binding affinities. Likewise, the new parameters perform well for the design of 8 SH3 domain proteins where an average of 32.8% sequence identity relative to the native sequences was achieved. Further, it was shown that the computed sequences have the character of naturally-occuring homologues of the native sequences. Conclusion Overall, the two CASA variants explored here perform very well for a wide variety of applications. Both variants provide an efficient solvent treatment for the computational engineering of ligands and proteins.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, 91128, Palaiseau, France.
| | | | | | | | | |
Collapse
|
26
|
Fung HK, Welsh WJ, Floudas CA. Computational De Novo Peptide and Protein Design: Rigid Templates versus Flexible Templates. Ind Eng Chem Res 2008. [DOI: 10.1021/ie071286k] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Ho Ki Fung
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| | - William J. Welsh
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| | - Christodoulos A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| |
Collapse
|
27
|
Schmidt Am Busch M, Lopes A, Mignon D, Simonson T. Computational protein design: Software implementation, parameter optimization, and performance of a simple model. J Comput Chem 2008; 29:1092-102. [DOI: 10.1002/jcc.20870] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
28
|
Jha AN, Ananthasuresh GK, Vishveshwara S. Protein sequence design based on the topology of the native state structure. J Theor Biol 2007; 248:81-90. [PMID: 17543996 DOI: 10.1016/j.jtbi.2007.04.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2006] [Revised: 03/23/2007] [Accepted: 04/23/2007] [Indexed: 11/21/2022]
Abstract
Computational design of sequences for a given structure is generally studied by exhaustively enumerating the sequence space or by searching in such a large space, which is prohibitively expensive. However, we point out that the protein topology has a wealth of information, which can be exploited to design sequences for a chosen structure. In this paper, we present a computationally efficient method for ranking the residue sites in a given native-state structure, which enables us to design sequences for a chosen structure. The premise for the method is that the topology of the graph representing the energetically interacting neighbours in a protein plays an important role in the inverse-folding problem. While our previous work (which was also based on topology) used eigenspectral analysis of the adjacency matrix of interactions for ranking the residue sites in a given chain, here we use a simple but effective way of assigning weights to the nodes on the basis of secondary connections, along with primary connections. This indirectly accounts for the edge weight in the graph and removes degeneracy in the degree. The new scheme needs only a few multiplications and additions to compute the preferred ranking of the residue sites even for structures of real proteins of sizes of a few hundred amino acid residues. We use HP lattice model examples (for which exhaustive enumeration of sequences is practical) to validate our ranking approach in obtaining sequences of lowest energy for any H-P residue composition for a given native-state structure. Some examples of native structures of real proteins are also included. Quantitative comparison of the efficacy of the new scheme with the earlier schemes is made. The new scheme consistently performs better and with much lower computational cost. An optimization procedure is added to work with the new scheme in a few rare cases wherein the new scheme fails to provide the best sequence, an optimization procedure is added to work with the new scheme.
Collapse
Affiliation(s)
- Anupam Nath Jha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore-560 012, India
| | | | | |
Collapse
|
29
|
Maglio O, Nastri F, Martin de Rosales RT, Faiella M, Pavone V, DeGrado WF, Lombardi A. Diiron-containing metalloproteins: Developing functional models. CR CHIM 2007. [DOI: 10.1016/j.crci.2007.03.010] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
30
|
Biswas P, Zou J, Saven JG. Statistical theory for protein ensembles with designed energy landscapes. J Chem Phys 2007; 123:154908. [PMID: 16252973 DOI: 10.1063/1.2062047] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Combinatorial protein libraries provide a promising route to investigate the determinants and features of protein folding and to identify novel folding amino acid sequences. A library of sequences based on a pool of different monomer types are screened for folding molecules, consistent with a particular foldability criterion. The number of sequences grows exponentially with the length of the polymer, making both experimental and computational tabulations of sequences infeasible. Herein a statistical theory is extended to specify the properties of sequences having particular values of global energetic quantities that specify their energy landscape. The theory yields the site-specific monomer probabilities. A foldability criterion is derived that characterizes the properties of sequences by quantifying the energetic separation of the target state from low-energy states in the unfolded ensemble and the fluctuations of the energies in the unfolded state ensemble. For a simple lattice model of proteins, excellent agreement is observed between the theory and the results of exact enumeration. The theory may be used to provide a quantitative framework for the design and interpretation of combinatorial experiments.
Collapse
Affiliation(s)
- Parbati Biswas
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA.
| | | | | |
Collapse
|
31
|
Reza F, Zuo P, Tian J. Protein interfacial pocket engineering via coupled computational filtering and biological focusing criterion. Ann Biomed Eng 2007; 35:1026-36. [PMID: 17453346 DOI: 10.1007/s10439-007-9316-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2006] [Accepted: 04/11/2007] [Indexed: 11/25/2022]
Abstract
To engineer bio-macromolecular systems, protein-substrate interactions and their configurations need to be understood, harnessed, and utilized. Due to the inherent large numbers of combinatorial configurations and conformational complexity, methods that rely on heuristics or stochastics, such as practical computational filtering (CF) or biological focusing (BF) criterions, when used alone rarely yield insights into these complexes or successes in (re)designing them. Here we use a coupled CF-BF criterion upon an amenable interfacial pocket (IP) of a protein scaffold complexed with its substrate to undergo residue replacement and R-group refinement (R4) to filter out energetically unfavorable residues and R-group conformations, and focus in on those that are evolutionarily favorable. We show that this coupled filtering and focusing can efficiently provide a putative engineered IP candidate and validate it computationally and empirically. The CF-BF criterion may permit holistic understanding of the nuances of existing protein IPs and their scaffolds and facilitate bioengineering efforts to alter substrate specificity. Such approach may contribute to accelerated elucidation of engineering principles of bio-macromolecular systems.
Collapse
Affiliation(s)
- Faisal Reza
- Department of Biomedical Engineering and Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708-0281, USA
| | | | | |
Collapse
|
32
|
Green DF, Dennis AT, Fam PS, Tidor B, Jasanoff A. Rational design of new binding specificity by simultaneous mutagenesis of calmodulin and a target peptide. Biochemistry 2006; 45:12547-59. [PMID: 17029410 PMCID: PMC2517080 DOI: 10.1021/bi060857u] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Calcium-saturated calmodulin (CaM) binds and influences the activity of a varied collection of target proteins in most cells. This promiscuity underlies the role of CaM as a shared participant in calcium-dependent signal transduction pathways but imposes a handicap on popular CaM-based calcium biosensors, which display an undesired tendency to cross-react with cellular proteins. Designed CaM/target pairs that retain high affinity for one another but lack affinity for wild-type CaM and its natural interaction partners would therefore be useful as sensor components and possibly also as elements of "synthetic" cellular-signaling networks. Here, we have adopted a rational approach to creating suitably modified CaM/target complexes by using computational design methods to guide parallel site-directed mutagenesis of both binding partners. A hierarchical design procedure was applied to suggest a small number of complementary mutations on CaM and on a peptide ligand derived from skeletal-muscle light-chain kinase (M13). Experimental analysis showed that the procedure was successful in identifying CaM and M13 mutants with novel specificity for one another. Importantly, the designed complexes retained an affinity comparable to the wild-type CaM/M13 complex. These results represent a step toward the creation of CaM and M13 derivatives with specificity fully orthogonal to the wild-type proteins and show that qualitatively accurate predictions may be obtained from computational methods applied simultaneously to two proteins involved in multiple-linked binding equilibria.
Collapse
Affiliation(s)
- David F. Green
- Biological Engineering Division Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
| | - Andrew T. Dennis
- Francis Bitter Magnet Laboratory Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
| | - Peter S. Fam
- Francis Bitter Magnet Laboratory Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
| | - Bruce Tidor
- Biological Engineering Division Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
- Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
- Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
- Corresponding authors: Bruce Tidor: Alan Jasanoff:
| | - Alan Jasanoff
- Biological Engineering Division Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
- Francis Bitter Magnet Laboratory Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
- Department of Nuclear Science and Engineering Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
- Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, Massachusetts 02139−4307 U.S.A
- Corresponding authors: Bruce Tidor: Alan Jasanoff:
| |
Collapse
|
33
|
Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N. A maximum likelihood framework for protein design. BMC Bioinformatics 2006; 7:326. [PMID: 16808841 PMCID: PMC1570151 DOI: 10.1186/1471-2105-7-326] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Accepted: 06/29/2006] [Indexed: 11/21/2022] Open
Abstract
Background The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility. Results We propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered. Conclusion Altogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution.
Collapse
Affiliation(s)
- Claudia L Kleinman
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Nicolas Rodrigue
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Cécile Bonnard
- Laboratoire d'lnformatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| | - Hervé Philippe
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Nicolas Lartillot
- Laboratoire d'lnformatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| |
Collapse
|
34
|
Nanda V, DeGrado WF. Computational design of heterochiral peptides against a helical target. J Am Chem Soc 2006; 128:809-16. [PMID: 16417370 DOI: 10.1021/ja054452t] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Polypeptides incorporating D-amino acids occasionally occur in nature and are an important class of pharmaceutical molecules. With the use of heterochiral Monte Carlo (HCMC), a method inspired by the de novo design of proteins, we develop peptide scaffolds for interacting with a molecular target, a left-handed alpha-helix. The HCMC approach concurrently seeks to optimize a peptide sequence, its internal conformation, and its docked conformation with a target surface. Several major classes of interactions are observed: (1) homochiral interactions between two alphaL helices, (2) heterochiral interactions between an alphaL and an alphaR helix, and (3) heterochiral interactions between the alphaL target and novel nonhelical structures. We explore the application of HCMC to simulating the preferential enantioselectivity of heterochiral complexes. Implications for biomimetic design in molecular recognition are discussed.
Collapse
Affiliation(s)
- Vikas Nanda
- Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, Piscataway, New Jersey 08854, USA.
| | | |
Collapse
|
35
|
Ziegler J, Schwarzinger S. Genetic algorithms as a tool for helix design – computational and experimental studies on prion protein helix 1. J Comput Aided Mol Des 2006; 20:47-54. [PMID: 16544054 DOI: 10.1007/s10822-006-9035-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2005] [Accepted: 01/17/2006] [Indexed: 10/24/2022]
Abstract
Evolutionary computing is a general optimization mechanism successfully implemented for a variety of numeric problems in a variety of fields, including structural biology. We here present an evolutionary approach to optimize helix stability in peptides and proteins employing the AGADIR energy function for helix stability as scoring function. With the ability to apply masks determining positions, which are to remain constant or fixed to a certain class of amino acids, our algorithm is capable of developing stable helical scaffolds containing a wide variety of structural and functional amino acid patterns. The algorithm showed good convergence behaviour in all tested cases and can be parameterized in a wide variety of ways. We have applied our algorithm for the optimization of the stability of prion protein helix 1, a structural element of the prion protein which is thought to play a crucial role in the conformational transition from the cellular to the pathogenic form of the prion protein, and which therefore poses an interesting target for pharmacological as well as genetic engineering approaches to counter the as of yet uncurable prion diseases. NMR spectroscopic investigations of selected stabilizing and destabilizing mutations found by our algorithm could demonstrate its ability to create stabilized variants of secondary structure elements.
Collapse
Affiliation(s)
- Jan Ziegler
- Lehrstuhl Biopolymere, University of Bayreuth, Universitätsstr. 30, 95444, Bayreuth, Germany.
| | | |
Collapse
|
36
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
37
|
Xie W, Sahinidis NV. Residue-rotamer-reduction algorithm for the protein side-chain conformation problem. Bioinformatics 2005; 22:188-94. [PMID: 16278239 DOI: 10.1093/bioinformatics/bti763] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The protein side-chain conformation problem is a central problem in proteomics with wide applications in protein structure prediction and design. Computational complexity results show that the problem is hard to solve. Yet, instances from realistic applications are large and demand fast and reliable algorithms. RESULTS We propose a new global optimization algorithm, which for the first time integrates residue reduction and rotamer reduction techniques previously developed for the protein side-chain conformation problem. We show that the proposed approach simplifies dramatically the topology of the underlining residue graph. Computations show that our algorithm solves problems using only 1-10% of the time required by the mixed-integer linear programming approach available in the literature. In addition, on a set of hard side-chain conformation problems, our algorithm runs 2-78 times faster than SCWRL 3.0, which is widely used for solving these problems. AVAILABILITY The implementation is available as an online server at http://eudoxus.scs.uiuc.edu/r3.html
Collapse
Affiliation(s)
- Wei Xie
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign 600 South Mathews Avenue, Urbana, IL 61801, USA
| | | |
Collapse
|
38
|
Abstract
The identification of protein mutations that enhance binding affinity may be achieved by computational or experimental means, or by a combination of the two. Sources of affinity enhancement may include improvements to the net balance of binding interactions of residues forming intermolecular contacts at the binding interface, such as packing and hydrogen-bonding interactions. Here we identify noncontacting residues that make substantial contributions to binding affinity and that also provide opportunities for mutations that increase binding affinity of the TEM1 beta-lactamase (TEM1) to the beta-lactamase inhibitor protein (BLIP). A region of BLIP not on the direct TEM1-binding surface was identified for which changes in net charge result in particularly large increases in computed binding affinity. Some mutations to the region have previously been characterized, and our results are in good correspondence with this results of that study. In addition, we propose novel mutations to BLIP that were computed to improve binding significantly without contacting TEM1 directly. This class of noncontacting electrostatic interactions could have general utility in the design and tuning of binding interactions.
Collapse
Affiliation(s)
- Brian A Joughin
- Computer Science and Artificial Intelligence Laboratory, Department of Biology, Center for Cancer Research, Massachusetts Institute of Technology, Room 32-212, Cambridge, MA 02139-4307, USA
| | | | | |
Collapse
|
39
|
Park S, Kono H, Wang W, Boder ET, Saven JG. Progress in the development and application of computational methods for probabilistic protein design. Comput Chem Eng 2005. [DOI: 10.1016/j.compchemeng.2004.07.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
40
|
Yang X, Saven JG. Computational methods for protein design and protein sequence variability: biased Monte Carlo and replica exchange. Chem Phys Lett 2005. [DOI: 10.1016/j.cplett.2004.10.153] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
41
|
Abstract
The relationship between monomer chirality and polymer structure has been studied using both theoretical and experimental methods. Atomistic models, such as the ones employed in computational protein folding and design, can be used to study the relationship between monomer chirality and the properties of polypeptides. Using a simulated evolution approach that combines side-chain epimerization with backbone flexibility, we recapitulate the relationship between basic forces that drive secondary structure formation and sequence homochirality. Additionally, we find heterochiral motifs including a C-terminal helix capping interaction and stable helix-reversals that result in bent helix structures. Our studies show that simulated evolution of chirality with backbone flexibility can be a powerful tool in the design of novel heteropolymers with tuned stereochemical properties.
Collapse
Affiliation(s)
- Vikas Nanda
- Department of Biochemistry and Molecular Biophysics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
42
|
Plecs JJ, Harbury PB, Kim PS, Alber T. Structural test of the parameterized-backbone method for protein design. J Mol Biol 2004; 342:289-97. [PMID: 15313624 DOI: 10.1016/j.jmb.2004.06.051] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2004] [Revised: 06/11/2004] [Accepted: 06/15/2004] [Indexed: 11/20/2022]
Abstract
Designing new protein folds requires a method for simultaneously optimizing the conformation of the backbone and the side-chains. One approach to this problem is the use of a parameterized backbone, which allows the systematic exploration of families of structures. We report the crystal structure of RH3, a right-handed, three-helix coiled coil that was designed using a parameterized backbone and detailed modeling of core packing. This crystal structure was determined using another rationally designed feature, a metal-binding site that permitted experimental phasing of the X-ray data. RH3 adopted the intended fold, which has not been observed previously in biological proteins. Unanticipated structural asymmetry in the trimer was a principal source of variation within the RH3 structure. The sequence of RH3 differs from that of a previously characterized right-handed tetramer, RH4, at only one position in each 11 amino acid sequence repeat. This close similarity indicates that the design method is sensitive to the core packing interactions that specify the protein structure. Comparison of the structures of RH3 and RH4 indicates that both steric overlap and cavity formation provide strong driving forces for oligomer specificity.
Collapse
Affiliation(s)
- Joseph J Plecs
- Department of Physics, University of California, Berkeley, 94720, USA
| | | | | | | |
Collapse
|
43
|
Liu HL, Hwang CK, Lin JC. The Stabilizing Effects of O-glycosylation on the Secondary Structural Integrity of the Designed α-loop-α motif by Molecular Dynamics Simulations. J Biomol Struct Dyn 2004; 22:131-6. [PMID: 15317474 DOI: 10.1080/07391102.2004.10506989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
In this study, various 400 ps molecular dynamics simulations were conducted to determine the stabilizing effect of O-glycosylation on the secondary structural integrity of the design alpha-loop-alpha motif, which has the optimal loop length of 7 Gly residues (denoted as N-A16G7A16-C). In general, O-glycosylation stabilizes the structural integrity of the model peptide regardless of the length and position of glycosylation sites because it decreases the opportunity for water molecules to compete for the intramolecular hydrogen bonds. The designed peptide exhibits the highest helicity when residues 11 and 31 are replaced with Ser residues followed by O-linked with 3 galactose residues, representing the "face-to-face" glycosylation near the loop. In this case, the loop exhibits an extended conformation and several new hydrogen bonds are observed between the main chain of the loop and the galactose residues, resulting in decreasing the fluctuation and increasing the stability of the entire peptide. When the glycosylation are made close to the loop, the secondary structural integrity of the alpha-loop-alpha motif increases with the number of galactose residues. In addition, "face-to-face" glycosylation increases the structural integrity of this motif to a greater extent than "back-to-back" glycosylation. However, when the glycosylation are created away from the loop and near the N- and C-termini, no general rule is found for the stabilizing effect.
Collapse
Affiliation(s)
- Hsuan-Liang Liu
- Department of Chemical Engineering, Graduate Institute of Biotechnology, National Taipei University of Technology, 1 Section 3 Chung-Hsiao East Road, Taipei, Taiwan 10608.
| | | | | |
Collapse
|
44
|
Lear JD, Stouffer AL, Gratkowski H, Nanda V, Degrado WF. Association of a model transmembrane peptide containing gly in a heptad sequence motif. Biophys J 2004; 87:3421-9. [PMID: 15315956 PMCID: PMC1304808 DOI: 10.1529/biophysj.103.032839] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A peptide containing glycine at a and d positions of a heptad motif was synthesized to investigate the possibility that membrane-soluble peptides with a Gly-based, left-handed helical packing motif would associate. Based on analytical ultracentrifugation in C14-betaine detergent micelles, the peptide did associate in a monomer-dimer equilibrium, although the association constant was significantly less than that reported for the right-handed dimer of the glycophorin A transmembrane peptide in similar detergents. Fluorescence resonance energy transfer (FRET) experiments conducted on peptides labeled at their N-termini with either tetramethylrhodamine (TMR) or 7-nitrobenz-2-oxa-1,3-diazole (NBD) also indicated association. However, analysis of the FRET data using the usual assumption of complete quenching for NBD-TMR pairs in the dimer could not be quantitatively reconciled with the analytical ultracentrifugation-measured dimerization constant. This led us to develop a general treatment for the association of helices to either parallel or antiparallel structures of any aggregation state. Applying this treatment to the FRET data, constraining the dimerization constant to be within experimental uncertainty of that measured by analytical ultracentrifugation, we found the data could be well described by a monomer-dimer equilibrium with only partial quenching of the dimer, suggesting that the helices are most probably antiparallel. These results also suggest that a left-handed Gly heptad repeat motif can drive membrane helix association, but the affinity is likely to be less strong than the previously reported right-handed motif described for glycophorin A.
Collapse
Affiliation(s)
- James D Lear
- Department of Biochemistry and Biophysics, School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
| | | | | | | | | |
Collapse
|
45
|
Calhoun JR, Kono H, Lahr S, Wang W, DeGrado WF, Saven JG. Computational design and characterization of a monomeric helical dinuclear metalloprotein. J Mol Biol 2004; 334:1101-15. [PMID: 14643669 DOI: 10.1016/j.jmb.2003.10.004] [Citation(s) in RCA: 122] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The de novo design of di-iron proteins is an important step towards understanding the diversity of function among this complex family of metalloenzymes. Previous designs of due ferro (DF) proteins have resulted in tetrameric and dimeric four-helix bundles having crystallographically well-defined structures and active-site geometries. Here, the design and characterization of DFsc, a 114 residue monomeric four-helix bundle, is presented. The backbone was modeled using previous oligomeric structures and appropriate inter-helical turns. The identities of 26 residues were predetermined, including the primary and secondary ligands in the active site, residues involved in active site accessibility, and the gamma beta gamma beta turn between helices 2 and 3. The remaining 88 amino acid residues were determined using statistical computer aided design, which is based upon a recent statistical theory of protein sequences. Rather than sampling sequences, the theory directly provides the site-specific amino acid probabilities, which are then used to guide sequence design. The resulting sequence (DFsc) expresses well in Escherichia coli and is highly soluble. Sedimentation studies confirm that the protein is monomeric in solution. Circular dichroism spectra are consistent with the helical content of the target structure. The protein is structured in both the apo and the holo forms, with the metal-bound form exhibiting increased stability. DFsc stoichiometrically binds a variety of divalent metal ions, including Zn(II), Co(II), Fe(II), and Mn(II), with micromolar affinities. 15N HSQC NMR spectra of both the apo and Zn(II) proteins reveal excellent dispersion with evidence of a significant structural change upon metal binding. DFsc is then a realization of complete de novo design, where backbone structure, activity, and sequence are specified in the design process.
Collapse
Affiliation(s)
- Jennifer R Calhoun
- Department of Biochemistry and Molecular Biophysics, Johnson Foundation, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | | | | | | | |
Collapse
|
46
|
Bolon DN, Marcus JS, Ross SA, Mayo SL. Prudent modeling of core polar residues in computational protein design. J Mol Biol 2003; 329:611-22. [PMID: 12767838 DOI: 10.1016/s0022-2836(03)00423-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Hydrogen bond interactions were surveyed in a set of protein structures. Compared to surface positions, polar side-chains at core positions form a greater number of intra-molecular hydrogen bonds. Furthermore, the majority of polar side-chains at core positions form at least one hydrogen bond to main-chain atoms that are not involved in hydrogen bonds to other main-chain atoms. Based on this structural survey, hydrogen bond rules were generated for each polar amino acid for use in protein core design. In the context of protein core design, these prudent polar rules were used to eliminate from consideration polar amino acid rotamers that do not form a minimum number of hydrogen bonds. As an initial test, the core of Escherichia coli thioredoxin was selected as a design target. For this target, the prudent polar strategy resulted in a minor increase in computational complexity compared to a strategy that did not allow polar residues. Dead-end elimination was used to identify global minimum energy conformations for the prudent polar and no polar strategies. The prudent polar strategy identified a protein sequence that was thermodynamically stabilized by 2.5 kcal/mol relative to wild-type thioredoxin and 2.2 kcal/mol relative to a thioredoxin variant whose core was designed without polar residues.
Collapse
Affiliation(s)
- Daniel N Bolon
- Biochemistry and Molecular Biophysics Option, California Institute of Technology, Mail Code 114-96, Pasadena, CA 91125, USA
| | | | | | | |
Collapse
|
47
|
Marshall SA, Lazar GA, Chirino AJ, Desjarlais JR. Rational design and engineering of therapeutic proteins. Drug Discov Today 2003; 8:212-21. [PMID: 12634013 DOI: 10.1016/s1359-6446(03)02610-2] [Citation(s) in RCA: 136] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
An increasing number of engineered protein therapeutics are currently being developed, tested in clinical trials and marketed for use. Many of these proteins arose out of hit-and-miss efforts to discover specific mutations, fusion partners or chemical modifications that confer desired properties. Through these efforts, several useful strategies have emerged for rational optimization of therapeutic candidates. The controlled manipulation of the physical, chemical and biological properties of proteins enabled by structure-based simulation is now being used to refine established rational engineering approaches and to advance new strategies. These methods provide clear, hypothesis-driven routes to solve problems that plague many proteins and to create novel mechanisms of action. We anticipate that rational protein engineering will shape the field of protein therapeutics dramatically by improving existing products and enabling the development of novel therapeutic agents.
Collapse
|
48
|
Zou J, Saven JG. Using self-consistent fields to bias Monte Carlo methods with applications to designing and sampling protein sequences. J Chem Phys 2003. [DOI: 10.1063/1.1539845] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
49
|
Abstract
Computational methods play a central role in the rational design of novel proteins. The present work describes a new hybrid exact rotamer optimization (HERO) method that builds on previous dead-end elimination algorithms to yield dramatic performance enhancements. Measured on experimentally validated physical models, these improvements make it possible to perform previously intractable designs of entire protein core, surface, or boundary regions. Computational demonstrations include a full core design of the variable domains of the light and heavy chains of catalytic antibody 48G7 FAB with 74 residues and 10(128) conformations, a full core/boundary design of the beta1 domain of protein G with 25 residues and 10(53) conformations, and a full surface design of the beta1 domain of protein G with 27 residues and 10(60) conformations. In addition, a full sequence design of the beta1 domain of protein G is used to demonstrate the strong dependence of algorithm performance on the exact form of the potential function and the fidelity of the rotamer library. These results emphasize that search algorithm performance for protein design can only be meaningfully evaluated on physical models that have been subjected to experimental scrutiny. The new algorithm greatly facilitates ongoing efforts to engineer increasingly complex protein features.
Collapse
Affiliation(s)
- D Benjamin Gordon
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | | | | | | |
Collapse
|
50
|
Hayes RJ, Bentzien J, Ary ML, Hwang MY, Jacinto JM, Vielmetter J, Kundu A, Dahiyat BI. Combining computational and experimental screening for rapid optimization of protein properties. Proc Natl Acad Sci U S A 2002; 99:15926-31. [PMID: 12446841 PMCID: PMC138541 DOI: 10.1073/pnas.212627499] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2002] [Accepted: 10/16/2002] [Indexed: 11/18/2022] Open
Abstract
We present a combined computational and experimental method for the rapid optimization of proteins. Using beta-lactamase as a test case, we redesigned the active site region using our Protein Design Automation technology as a computational screen to search the entire sequence space. By eliminating sequences incompatible with the protein fold, Protein Design Automation rapidly reduced the number of sequences to a size amenable to experimental screening, resulting in a library of approximately equal 200,000 mutants. These were then constructed and experimentally screened to select for variants with improved resistance to the antibiotic cefotaxime. In a single round, we obtained variants exhibiting a 1,280-fold increase in resistance. To our knowledge, all of the mutations were novel, i.e., they have not been identified as beneficial by random mutagenesis or DNA shuffling or seen in any of the naturally occurring TEM beta-lactamases, the most prevalent type of Gram-negative beta-lactamases. This combined approach allows for the rapid improvement of any property that can be screened experimentally and provides a powerful broadly applicable tool for protein engineering.
Collapse
|