1
|
Charpentier A, Mignon D, Barbe S, Cortes J, Schiex T, Simonson T, Allouche D. Variable Neighborhood Search with Cost Function Networks To Solve Large Computational Protein Design Problems. J Chem Inf Model 2018; 59:127-136. [DOI: 10.1021/acs.jcim.8b00510] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - David Mignon
- Laboratoire de Biochimie (CNRS UMR 7654), École Polytechnique, 91128 Palaiseau, France
| | - Sophie Barbe
- Laboratoire d’Ingénierie des Systèmes Biologiques et Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, 31077 Toulouse, France
| | - Juan Cortes
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Thomas Schiex
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR 7654), École Polytechnique, 91128 Palaiseau, France
| | - David Allouche
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
| |
Collapse
|
2
|
Villa F, Panel N, Chen X, Simonson T. Adaptive landscape flattening in amino acid sequence space for the computational design of protein:peptide binding. J Chem Phys 2018; 149:072302. [PMID: 30134674 DOI: 10.1063/1.5022249] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
For the high throughput design of protein:peptide binding, one must explore a vast space of amino acid sequences in search of low binding free energies. This complex problem is usually addressed with either simple heuristic scoring or expensive sequence enumeration schemes. Far more efficient than enumeration is a recent Monte Carlo approach that adaptively flattens the energy landscape in sequence space of the unbound peptide and provides formally exact binding free energy differences. The method allows the binding free energy to be used directly as the design criterion. We propose several improvements that allow still more efficient sampling and can address larger design problems. They include the use of Replica Exchange Monte Carlo and landscape flattening for both the unbound and bound peptides. We used the method to design peptides that bind to the PDZ domain of the Tiam1 signaling protein and could serve as inhibitors of its activity. Four peptide positions were allowed to mutate freely. Almost 75 000 peptide variants were processed in two simulations of 109 steps each that used 1 CPU hour on a desktop machine. 96% of the theoretical sequence space was sampled. The relative binding free energies agreed qualitatively with values from experiment. The sampled sequences agreed qualitatively with an experimental library of Tiam1-binding peptides. The main assumption limiting accuracy is the fixed backbone approximation, which could be alleviated in future work by using increased computational resources and multi-backbone designs.
Collapse
Affiliation(s)
- Francesco Villa
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Xingyu Chen
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
3
|
Gaillard T, Simonson T. Full Protein Sequence Redesign with an MMGBSA Energy Function. J Chem Theory Comput 2017; 13:4932-4943. [DOI: 10.1021/acs.jctc.7b00202] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Thomas Gaillard
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| |
Collapse
|
4
|
Villa F, Mignon D, Polydorides S, Simonson T. Comparing pairwise-additive and many-body generalized Born models for acid/base calculations and protein design. J Comput Chem 2017; 38:2396-2410. [PMID: 28749575 DOI: 10.1002/jcc.24898] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 06/30/2017] [Accepted: 07/06/2017] [Indexed: 12/13/2022]
Abstract
Generalized Born (GB) solvent models are common in acid/base calculations and protein design. With GB, the interaction between a pair of solute atoms depends on the shape of the protein/solvent boundary and, therefore, the positions of all solute atoms, so that GB is a many-body potential. For compute-intensive applications, the model is often simplified further, by introducing a mean, native-like protein/solvent boundary, which removes the many-body property. We investigate a method for both acid/base calculations and protein design that uses Monte Carlo simulations in which side chains can explore rotamers, bind/release protons, or mutate. The fluctuating protein/solvent dielectric boundary is treated in a way that is numerically exact (within the GB framework), in contrast to a mean boundary. Its originality is that it captures the many-body character while retaining the residue-pairwise complexity given by a fixed boundary. The method is implemented in the Proteus protein design software. It yields a slight but systematic improvement for acid/base constants in nine proteins and a significant improvement for the computational design of three PDZ domains. It eliminates a source of model uncertainty, which will facilitate the analysis of other model limitations. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Francesco Villa
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - David Mignon
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - Savvas Polydorides
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - Thomas Simonson
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| |
Collapse
|
5
|
Mignon D, Panel N, Chen X, Fuentes EJ, Simonson T. Computational Design of the Tiam1 PDZ Domain and Its Ligand Binding. J Chem Theory Comput 2017; 13:2271-2289. [PMID: 28394603 DOI: 10.1021/acs.jctc.6b01255] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
PDZ domains direct protein-protein interactions and serve as models for protein design. Here, we optimized a protein design energy function for the Tiam1 and Cask PDZ domains that combines a molecular mechanics energy, Generalized Born solvent, and an empirical unfolded state model. Designed sequences were recognized as PDZ domains by the Superfamily fold recognition tool and had similarity scores comparable to natural PDZ sequences. The optimized model was used to redesign the two PDZ domains, by gradually varying the chemical potential of hydrophobic amino acids; the tendency of each position to lose or gain a hydrophobic character represents a novel hydrophobicity index. We also redesigned four positions in the Tiam1 PDZ domain involved in peptide binding specificity. The calculated affinity differences between designed variants reproduced experimental data and suggest substitutions with altered specificities.
Collapse
Affiliation(s)
- David Mignon
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Xingyu Chen
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Ernesto J Fuentes
- Department of Biochemistry, Roy J. & Lucille A. Carver College of Medicine and Holden Comprehensive Cancer Center, University of Iowa , Iowa City, Iowa 52242-1109, United States
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| |
Collapse
|
6
|
Druart K, Bigot J, Audit E, Simonson T. A Hybrid Monte Carlo Scheme for Multibackbone Protein Design. J Chem Theory Comput 2016; 12:6035-6048. [DOI: 10.1021/acs.jctc.6b00421] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Karen Druart
- Laboratoire
de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Julien Bigot
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Edouard Audit
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Thomas Simonson
- Laboratoire
de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
7
|
Mignon D, Simonson T. Comparing three stochastic search algorithms for computational protein design: Monte Carlo, replica exchange Monte Carlo, and a multistart, steepest-descent heuristic. J Comput Chem 2016; 37:1781-93. [PMID: 27197555 DOI: 10.1002/jcc.24393] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 02/26/2016] [Accepted: 03/27/2016] [Indexed: 01/11/2023]
Abstract
Computational protein design depends on an energy function and an algorithm to search the sequence/conformation space. We compare three stochastic search algorithms: a heuristic, Monte Carlo (MC), and a Replica Exchange Monte Carlo method (REMC). The heuristic performs a steepest-descent minimization starting from thousands of random starting points. The methods are applied to nine test proteins from three structural families, with a fixed backbone structure, a molecular mechanics energy function, and with 1, 5, 10, 20, 30, or all amino acids allowed to mutate. Results are compared to an exact, "Cost Function Network" method that identifies the global minimum energy conformation (GMEC) in favorable cases. The designed sequences accurately reproduce experimental sequences in the hydrophobic core. The heuristic and REMC agree closely and reproduce the GMEC when it is known, with a few exceptions. Plain MC performs well for most cases, occasionally departing from the GMEC by 3-4 kcal/mol. With REMC, the diversity of the sequences sampled agrees with exact enumeration where the latter is possible: up to 2 kcal/mol above the GMEC. Beyond, room temperature replicas sample sequences up to 10 kcal/mol above the GMEC, providing thermal averages and a solution to the inverse protein folding problem. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- David Mignon
- Laboratoire De Biochimie (UMR CNRS 7654), Department Of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire De Biochimie (UMR CNRS 7654), Department Of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
8
|
Gaillard T, Panel N, Simonson T. Protein side chain conformation predictions with an MMGBSA energy function. Proteins 2016; 84:803-19. [PMID: 26948696 DOI: 10.1002/prot.25030] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 02/22/2016] [Accepted: 02/27/2016] [Indexed: 12/17/2022]
Abstract
The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Nicolas Panel
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Thomas Simonson
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| |
Collapse
|
9
|
Simonson T, Ye-Lehmann S, Palmai Z, Amara N, Wydau-Dematteis S, Bigan E, Druart K, Moch C, Plateau P. Redesigning the stereospecificity of tyrosyl-tRNA synthetase. Proteins 2016; 84:240-53. [PMID: 26676967 DOI: 10.1002/prot.24972] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Revised: 09/30/2015] [Accepted: 11/26/2015] [Indexed: 12/14/2022]
Abstract
D-Amino acids are largely excluded from protein synthesis, yet they are of great interest in biotechnology. Unnatural amino acids have been introduced into proteins using engineered aminoacyl-tRNA synthetases (aaRSs), and this strategy might be applicable to D-amino acids. Several aaRSs can aminoacylate their tRNA with a D-amino acid; of these, tyrosyl-tRNA synthetase (TyrRS) has the weakest stereospecificity. We use computational protein design to suggest active site mutations in Escherichia coli TyrRS that could increase its D-Tyr binding further, relative to L-Tyr. The mutations selected all modify one or more sidechain charges in the Tyr binding pocket. We test their effect by probing the aminoacyl-adenylation reaction through pyrophosphate exchange experiments. We also perform extensive alchemical free energy simulations to obtain L-Tyr/D-Tyr binding free energy differences. Agreement with experiment is good, validating the structural models and detailed thermodynamic predictions the simulations provide. The TyrRS stereospecificity proves hard to engineer through charge-altering mutations in the first and second coordination shells of the Tyr ammonium group. Of six mutants tested, two are active towards D-Tyr; one of these has an inverted stereospecificity, with a large preference for D-Tyr. However, its activity is low. Evidently, the TyrRS stereospecificity is robust towards charge rearrangements near the ligand. Future design may have to consider more distant and/or electrically neutral target mutations, and possibly design for binding of the transition state, whose structure however can only be modeled.
Collapse
Affiliation(s)
- Thomas Simonson
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | | | - Zoltan Palmai
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Najette Amara
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Sandra Wydau-Dematteis
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Erwan Bigan
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Karen Druart
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Clara Moch
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Pierre Plateau
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| |
Collapse
|
10
|
Polydorides S, Michael E, Mignon D, Druart K, Archontis G, Simonson T. Proteus and the Design of Ligand Binding Sites. Methods Mol Biol 2016; 1414:77-97. [PMID: 27094287 DOI: 10.1007/978-1-4939-3569-7_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
This chapter describes the organization and use of Proteus, a multitool computational suite for the optimization of protein and ligand conformations and sequences, and the calculation of pK α shifts and relative binding affinities. The software offers the use of several molecular mechanics force fields and solvent models, including two generalized Born variants, and a large range of scoring functions, which can combine protein stability, ligand affinity, and ligand specificity terms, for positive and negative design. We present in detail the steps for structure preparation, system setup, construction of the interaction energy matrix, protein sequence and structure optimizations, pK α calculations, and ligand titration calculations. We discuss illustrative examples, including the chemical/structural optimization of a complex between the MHC class II protein HLA-DQ8 and the vinculin epitope, and the chemical optimization of the compstatin analog Ac-Val4Trp/His9Ala, which regulates the function of protein C3 of the complement system.
Collapse
Affiliation(s)
- Savvas Polydorides
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus
| | - Eleni Michael
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus
| | - David Mignon
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | - Karen Druart
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | - Georgios Archontis
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus.
| | - Thomas Simonson
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France.
| |
Collapse
|
11
|
Prediction of Stable Globular Proteins Using Negative Design with Non-native Backbone Ensembles. Structure 2015; 23:2011-21. [DOI: 10.1016/j.str.2015.07.021] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2015] [Revised: 07/26/2015] [Accepted: 07/29/2015] [Indexed: 11/21/2022]
|
12
|
Druart K, Palmai Z, Omarjee E, Simonson T. Protein:Ligand binding free energies: A stringent test for computational protein design. J Comput Chem 2015; 37:404-15. [PMID: 26503829 DOI: 10.1002/jcc.24230] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 10/01/2015] [Accepted: 10/02/2015] [Indexed: 01/29/2023]
Abstract
A computational protein design method is extended to allow Monte Carlo simulations where two ligands are titrated into a protein binding pocket, yielding binding free energy differences. These provide a stringent test of the physical model, including the energy surface and sidechain rotamer definition. As a test, we consider tyrosyl-tRNA synthetase (TyrRS), which has been extensively redesigned experimentally. We consider its specificity for its substrate l-tyrosine (l-Tyr), compared to the analogs d-Tyr, p-acetyl-, and p-azido-phenylalanine (ac-Phe, az-Phe). We simulate l- and d-Tyr binding to TyrRS and six mutants, and compare the structures and binding free energies to a more rigorous "MD/GBSA" procedure: molecular dynamics with explicit solvent for structures and a Generalized Born + Surface Area model for binding free energies. Next, we consider l-Tyr, ac- and az-Phe binding to six other TyrRS variants. The titration results are sensitive to the precise rotamer definition, which involves a short energy minimization for each sidechain pair to help relax bad contacts induced by the discrete rotamer set. However, when designed mutant structures are rescored with a standard GBSA energy model, results agree well with the more rigorous MD/GBSA. As a third test, we redesign three amino acid positions in the substrate coordination sphere, with either l-Tyr or d-Tyr as the ligand. For two, we obtain good agreement with experiment, recovering the wildtype residue when l-Tyr is the ligand and a d-Tyr specific mutant when d-Tyr is the ligand. For the third, we recover His with either ligand, instead of wildtype Gln.
Collapse
Affiliation(s)
- Karen Druart
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Zoltan Palmai
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Eyaz Omarjee
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
13
|
Gaillard T, Simonson T. Pairwise decomposition of an MMGBSA energy function for computational protein design. J Comput Chem 2014; 35:1371-87. [PMID: 24854675 DOI: 10.1002/jcc.23637] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 04/14/2014] [Accepted: 05/01/2014] [Indexed: 02/02/2023]
Abstract
Computational protein design (CPD) aims at predicting new proteins or modifying existing ones. The computational challenge is huge as it requires exploring an enormous sequence and conformation space. The difficulty can be reduced by considering a fixed backbone and a discrete set of sidechain conformations. Another common strategy consists in precalculating a pairwise energy matrix, from which the energy of any sequence/conformation can be quickly obtained. In this work, we examine the pairwise decomposition of protein MMGBSA energy functions from a general theoretical perspective, and an implementation proposed earlier for CPD. It includes a Generalized Born term, whose many-body character is overcome using an effective dielectric environment, and a Surface Area term, for which we present an improved pairwise decomposition. A detailed evaluation of the error introduced by the decomposition on the different energy components is performed. We show that the error remains reasonable, compared to other uncertainties.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
14
|
Polydorides S, Simonson T. Monte Carlo simulations of proteins at constant pH with generalized Born solvent, flexible sidechains, and an effective dielectric boundary. J Comput Chem 2013; 34:2742-56. [PMID: 24122878 DOI: 10.1002/jcc.23450] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 09/04/2013] [Accepted: 09/08/2013] [Indexed: 12/11/2022]
Abstract
Titratable residues determine the acid/base behavior of proteins, strongly influencing their function; in addition, proton binding is a valuable reporter on electrostatic interactions. We describe a method for pK(a) calculations, using constant-pH Monte Carlo (MC) simulations to explore the space of sidechain conformations and protonation states, with an efficient and accurate generalized Born model (GB) for the solvent effects. To overcome the many-body dependency of the GB model, we use a "Native Environment" approximation, whose accuracy is shown to be good. It allows the precalculation and storage of interactions between all sidechain pairs, a strategy borrowed from computational protein design, which makes the MC simulations themselves very fast. The method is tested for 12 proteins and 167 titratable sidechains. It gives an rms error of 1.1 pH units, similar to the trivial "Null" model. The only adjustable parameter is the protein dielectric constant. The best accuracy is achieved for values between 4 and 8, a range that is physically plausible for a protein interior. For sidechains with large pKa shifts, ≥2, the rms error is 1.6, compared to 2.5 with the Null model and 1.5 with the empirical PROPKA method.
Collapse
Affiliation(s)
- Savvas Polydorides
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
15
|
Simonson T, Gaillard T, Mignon D, Schmidt am Busch M, Lopes A, Amara N, Polydorides S, Sedano A, Druart K, Archontis G. Computational protein design: the Proteus software and selected applications. J Comput Chem 2013; 34:2472-84. [PMID: 24037756 DOI: 10.1002/jcc.23418] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 07/08/2013] [Accepted: 07/28/2013] [Indexed: 12/13/2022]
Abstract
We describe an automated procedure for protein design, implemented in a flexible software package, called Proteus. System setup and calculation of an energy matrix are done with the XPLOR modeling program and its sophisticated command language, supporting several force fields and solvent models. A second program provides algorithms to search sequence space. It allows a decomposition of the system into groups, which can be combined in different ways in the energy function, for both positive and negative design. The whole procedure can be controlled by editing 2-4 scripts. Two applications consider the tyrosyl-tRNA synthetase enzyme and its successful redesign to bind both O-methyl-tyrosine and D-tyrosine. For the latter, we present Monte Carlo simulations where the D-tyrosine concentration is gradually increased, displacing L-tyrosine from the binding pocket and yielding the binding free energy difference, in good agreement with experiment. Complete redesign of the Crk SH3 domain is presented. The top 10000 sequences are all assigned to the correct fold by the SUPERFAMILY library of Hidden Markov Models. Finally, we report the acid/base behavior of the SNase protein. Sidechain protonation is treated as a form of mutation; it is then straightforward to perform constant-pH Monte Carlo simulations, which yield good agreement with experiment. Overall, the software can be used for a wide range of application, producing not only native-like sequences but also thermodynamic properties with errors that appear comparable to other current software packages.
Collapse
Affiliation(s)
- Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, 91128, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Mach P, Koehl P. Capturing protein sequence-structure specificity using computational sequence design. Proteins 2013; 81:1556-70. [DOI: 10.1002/prot.24307] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 03/28/2013] [Accepted: 04/11/2013] [Indexed: 02/05/2023]
Affiliation(s)
- Paul Mach
- Department of Applied Mathematics; Genome Center; University of California; Davis 95616 California
| | - Patrice Koehl
- Department of Computer Science; Genome Center; University of California; Davis 95616 California
| |
Collapse
|
17
|
Polydorides S, Amara N, Aubard C, Plateau P, Simonson T, Archontis G. Computational protein design with a generalized Born solvent model: application to Asparaginyl-tRNA synthetase. Proteins 2011; 79:3448-68. [PMID: 21563215 DOI: 10.1002/prot.23042] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2010] [Revised: 02/25/2011] [Accepted: 03/03/2011] [Indexed: 12/13/2022]
Abstract
Computational Protein Design (CPD) is a promising method for high throughput protein and ligand mutagenesis. Recently, we developed a CPD method that used a polar-hydrogen energy function for protein interactions and a Coulomb/Accessible Surface Area (CASA) model for solvent effects. We applied this method to engineer aspartyl-adenylate (AspAMP) specificity into Asparaginyl-tRNA synthetase (AsnRS), whose substrate is asparaginyl-adenylate (AsnAMP). Here, we implement a more accurate function, with an all-atom energy for protein interactions and a residue-pairwise generalized Born model for solvent effects. As a first test, we compute aminoacid affinities for several point mutants of Aspartyl-tRNA synthetase (AspRS) and Tyrosyl-tRNA synthetase and stability changes for three helical peptides and compare with experiment. As a second test, we readdress the problem of AsnRS aminoacid engineering. We compare three design criteria, which optimize the folding free-energy, the absolute AspAMP affinity, and the relative (AspAMP-AsnAMP) affinity. The sequences and conformations are improved with respect to our previous, polar-hydrogen/CASA study: For several designed complexes, the AspAMP carboxylate forms three interactions with a conserved arginine and a designed lysine, as in the active site of the AspRS:AspAMP complex. The conformations and interactions are well maintained in molecular dynamics simulations and the sequences have an inverted specificity, favoring AspAMP over AsnAMP. The method is not fully successful, since experimental measurements with the seven most promising sequences show that they do not catalyze at a detectable level the adenylation of Asp (or Asn) with ATP. This may be due to weak AspAMP binding and/or disruption of transition-state stabilization.
Collapse
|
18
|
Aleksandrov A, Polydorides S, Archontis G, Simonson T. Predicting the Acid/Base Behavior of Proteins: A Constant-pH Monte Carlo Approach with Generalized Born Solvent. J Phys Chem B 2010; 114:10634-48. [DOI: 10.1021/jp104406x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Alexey Aleksandrov
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France, and Department of Physics, University of Cyprus, PO20537, CY1678, Nicosia, Cyprus
| | - Savvas Polydorides
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France, and Department of Physics, University of Cyprus, PO20537, CY1678, Nicosia, Cyprus
| | - Georgios Archontis
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France, and Department of Physics, University of Cyprus, PO20537, CY1678, Nicosia, Cyprus
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France, and Department of Physics, University of Cyprus, PO20537, CY1678, Nicosia, Cyprus
| |
Collapse
|
19
|
Lopes A, Schmidt Am Busch M, Simonson T. Computational design of protein-ligand binding: modifying the specificity of asparaginyl-tRNA synthetase. J Comput Chem 2010; 31:1273-86. [PMID: 19862811 DOI: 10.1002/jcc.21414] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
A method for computational design of protein-ligand interactions is implemented and tested on the asparaginyl- and aspartyl-tRNA synthetase enzymes (AsnRS, AspRS). The substrate specificity of these enzymes is crucial for the accurate translation of the genetic code. The method relies on a molecular mechanics energy function and a simple, continuum electrostatic, implicit solvent model. As test calculations, we first compute AspRS-substrate binding free energy changes due to nine point mutations, for which experimental data are available; we also perform large-scale redesign of the entire active site of each enzyme (40 amino acids) and compare to experimental sequences. We then apply the method to engineer an increased binding of aspartyl-adenylate (AspAMP) into AsnRS. Mutants are obtained using several directed evolution protocols, where four or five amino acid positions in the active site are randomized. Promising mutants are subjected to molecular dynamics simulations; Poisson-Boltzmann calculations provide an estimate of the corresponding, AspAMP, binding free energy changes, relative to the native AsnRS. Several of the mutants are predicted to have an inverted binding specificity, preferring to bind AspAMP rather than the natural substrate, AsnAMP. The computed binding affinities are significantly weaker than the native, AsnRS:AsnAMP affinity, and in most cases, the active site structure is significantly changed, compared to the native complex. This almost certainly precludes catalytic activity. One of the designed sequences has a higher affinity and more native-like structure and may represent a valid candidate for Asp activity.
Collapse
Affiliation(s)
- Anne Lopes
- Laboratoire de Biochimie, Department of Biology, UMR CNRS 7654, Ecole Polytechnique, 91128 Palaiseau, France
| | | | | |
Collapse
|
20
|
Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010; 5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. METHODOLOGY/PRINCIPAL FINDINGS WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. CONCLUSIONS/SIGNIFICANCE For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Audrey Sedano
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
21
|
am Busch MS, Mignon D, Simonson T. Computational protein design as a tool for fold recognition. Proteins 2009; 77:139-58. [PMID: 19408297 DOI: 10.1002/prot.22426] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computationally designed protein sequences have been proposed as a basis to perform fold recognition and homology searching. To investigate this possibility, an automated procedure is used to completely redesign 24 SH3 proteins and 22 SH2 proteins. We use the experimental backbone coordinates as fixed templates in the folded state and a molecular mechanics model to compute the pairwise interaction energies between all sidechain types and conformations. Energy calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is then used to scan the sequence and conformational space for optimal solutions. We produced 200,000-450,000 sequences for each backbone template. The designed sequences ressemble moderately-distant, natural homologues of the initial templates, according to their identity scores and their similarity with respect to the Pfam sets of SH2 and SH3 domains. Standard homology detection tools document their native-like character: the Conserved Domain Database recognizes 61% (52%) of our low-energy sequences as SH3 (SH2) domains; the SUPERFAMILY, Hidden-Markov Model library recognizes 81% (84%). Conversely, position specific scoring matrices (PSSMs) derived from our designed sequences can be used to detect natural homologues in sequence databases. Within SwissProt, a set of natural SH3 PSSMs detects 772 SH3 domains, for example; our designed PSSMs detect 67% of these, plus one additional sequence and two false positives. If six amino acids involved in substrate binding (a selective pressure not accounted for in our design) are reset to their experimental types, then 77% of the experimental SH3 domains are detected. Results for the SH2 domains are similar. Several directions to improve the method further are discussed.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | | | | |
Collapse
|
22
|
Drevelle A, Urvoas A, Hamida-Rebaï MB, Van Vooren G, Nicaise M, Valerio-Lepiniec M, Desmadril M, Robert CH, Minard P. Disulfide Bond Substitution by Directed Evolution in an Engineered Binding Protein. Chembiochem 2009; 10:1349-59. [DOI: 10.1002/cbic.200800745] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
23
|
am Busch MS, Lopes A, Amara N, Bathelt C, Simonson T. Testing the Coulomb/Accessible Surface Area solvent model for protein stability, ligand binding, and protein design. BMC Bioinformatics 2008; 9:148. [PMID: 18366628 PMCID: PMC2292695 DOI: 10.1186/1471-2105-9-148] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2007] [Accepted: 03/13/2008] [Indexed: 11/10/2022] Open
Abstract
Background Protein structure prediction and computational protein design require efficient yet sufficiently accurate descriptions of aqueous solvent. We continue to evaluate the performance of the Coulomb/Accessible Surface Area (CASA) implicit solvent model, in combination with the Charmm19 molecular mechanics force field. We test a set of model parameters optimized earlier, and we also carry out a new optimization in this work, using as a target a set of experimental stability changes for single point mutations of various proteins and peptides. The optimization procedure is general, and could be used with other force fields. The computation of stability changes requires a model for the unfolded state of the protein. In our approach, this state is represented by tripeptide structures of the sequence Ala-X-Ala for each amino acid type X. We followed an iterative optimization scheme which, at each cycle, optimizes the solvation parameters and a set of tripeptide structures for the unfolded state. This protocol uses a set of 140 experimental stability mutations and a large set of tripeptide conformations to find the best tripeptide structures and solvation parameters. Results Using the optimized parameters, we obtain a mean unsigned error of 2.28 kcal/mol for the stability mutations. The performance of the CASA model is assessed by two further applications: (i) calculation of protein-ligand binding affinities and (ii) computational protein design. For these two applications, the previous parameters and the ones optimized here give a similar performance. For ligand binding, we obtain reasonable agreement with a set of 55 experimental mutation data, with a mean unsigned error of 1.76 kcal/mol with the new parameters and 1.47 kcal/mol with the earlier ones. We show that the optimized CASA model is not inferior to the Generalized Born/Surface Area (GB/SA) model for the prediction of these binding affinities. Likewise, the new parameters perform well for the design of 8 SH3 domain proteins where an average of 32.8% sequence identity relative to the native sequences was achieved. Further, it was shown that the computed sequences have the character of naturally-occuring homologues of the native sequences. Conclusion Overall, the two CASA variants explored here perform very well for a wide variety of applications. Both variants provide an efficient solvent treatment for the computational engineering of ligands and proteins.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, 91128, Palaiseau, France.
| | | | | | | | | |
Collapse
|