1
|
Liu Y, Liu H. Protein sequence design on given backbones with deep learning. Protein Eng Des Sel 2024; 37:gzad024. [PMID: 38157313 DOI: 10.1093/protein/gzad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 12/08/2023] [Accepted: 12/18/2023] [Indexed: 01/03/2024] Open
Abstract
Deep learning methods for protein sequence design focus on modeling and sampling the many- dimensional distribution of amino acid sequences conditioned on the backbone structure. To produce physically foldable sequences, inter-residue couplings need to be considered properly. These couplings are treated explicitly in iterative methods or autoregressive methods. Non-autoregressive models treating these couplings implicitly are computationally more efficient, but still await tests by wet experiment. Currently, sequence design methods are evaluated mainly using native sequence recovery rate and native sequence perplexity. These metrics can be complemented by sequence-structure compatibility metrics obtained from energy calculation or structure prediction. However, existing computational metrics have important limitations that may render the generalization of computational test results to performance in real applications unwarranted. Validation of design methods by wet experiments should be encouraged.
Collapse
Affiliation(s)
- Yufeng Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui 230027, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui 230027, China
- School of Biomedical Engineering, Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, Jiangsu 215004, China
| |
Collapse
|
2
|
Yan J, Li S, Zhang Y, Hao A, Zhao Q. ZetaDesign: an end-to-end deep learning method for protein sequence design and side-chain packing. Brief Bioinform 2023; 24:bbad257. [PMID: 37429578 DOI: 10.1093/bib/bbad257] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/12/2023] Open
Abstract
Computational protein design has been demonstrated to be the most powerful tool in the last few years among protein designing and repacking tasks. In practice, these two tasks are strongly related but often treated separately. Besides, state-of-the-art deep-learning-based methods cannot provide interpretability from an energy perspective, affecting the accuracy of the design. Here we propose a new systematic approach, including both a posterior probability and a joint probability parts, to solve the two essential questions once for all. This approach takes the physicochemical property of amino acids into consideration and uses the joint probability model to ensure the convergence between structure and amino acid type. Our results demonstrated that this method could generate feasible, high-confidence sequences with low-energy side conformations. The designed sequences can fold into target structures with high confidence and maintain relatively stable biochemical properties. The side chain conformation has a significantly lower energy landscape without delegating to a rotamer library or performing the expensive conformational searches. Overall, we propose an end-to-end method that combines the advantages of both deep learning and energy-based methods. The design results of this model demonstrate high efficiency, and precision, as well as a low energy state and good interpretability.
Collapse
Affiliation(s)
- Junyu Yan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Shuai Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Ying Zhang
- The Key Laboratory of Cell Proliferation and Regulation Biology, Ministry of Education, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Aimin Hao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Qinping Zhao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| |
Collapse
|
3
|
Liu H, Chen Q. Computational protein design with data‐driven approaches: Recent developments and perspectives. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine University of Science and Technology of China Hefei Anhui China
- Biomedical Sciences and Health Laboratory of Anhui Province University of Science and Technology of China Hefei Anhui China
- School of Data Science University of Science and Technology of China Hefei Anhui China
| | - Quan Chen
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine University of Science and Technology of China Hefei Anhui China
- Biomedical Sciences and Health Laboratory of Anhui Province University of Science and Technology of China Hefei Anhui China
| |
Collapse
|
4
|
Liu Y, Zhang L, Wang W, Zhu M, Wang C, Li F, Zhang J, Li H, Chen Q, Liu H. Rotamer-free protein sequence design based on deep learning and self-consistency. NATURE COMPUTATIONAL SCIENCE 2022; 2:451-462. [PMID: 38177863 DOI: 10.1038/s43588-022-00273-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 06/07/2022] [Indexed: 01/06/2024]
Abstract
Several previously proposed deep learning methods to design amino acid sequences that autonomously fold into a given protein backbone yielded promising results in computational tests but did not outperform conventional energy function-based methods in wet experiments. Here we present the ABACUS-R method, which uses an encoder-decoder network trained using a multitask learning strategy to predict the sidechain type of a central residue from its three-dimensional local environment, which includes, besides other features, the types but not the conformations of the surrounding sidechains. This eliminates the need to reconstruct and optimize sidechain structures, and drastically simplifies the sequence design process. Thus iteratively applying the encoder-decoder to different central residues is able to produce self-consistent overall sequences for a target backbone. Results of wet experiments, including five structures solved by X-ray crystallography, show that ABACUS-R outperforms state-of-the-art energy function-based methods in success rate and design precision.
Collapse
Affiliation(s)
- Yufeng Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Lu Zhang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Weilun Wang
- CAS Key Laboratory of GIPAS, School of Information Science and Technology, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, Anhui, China
| | - Min Zhu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Chenchen Wang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
| | - Fudong Li
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China
| | - Jiahai Zhang
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China
| | - Houqiang Li
- CAS Key Laboratory of GIPAS, School of Information Science and Technology, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, Anhui, China.
| | - Quan Chen
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China.
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China.
| | - Haiyan Liu
- MOE Key Laboratory for Membraneless Organelles and Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, China.
- Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, Hefei, Anhui, China.
- School of Data Science, University of Science and Technology of China, Hefei, Anhui, China.
| |
Collapse
|
5
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
6
|
Delaunay M, Ha-Duong T. Computational Tools and Strategies to Develop Peptide-Based Inhibitors of Protein-Protein Interactions. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2405:205-230. [PMID: 35298816 DOI: 10.1007/978-1-0716-1855-4_11] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein-protein interactions play crucial and subtle roles in many biological processes and modifications of their fine mechanisms generally result in severe diseases. Peptide derivatives are very promising therapeutic agents for modulating protein-protein associations with sizes and specificities between those of small compounds and antibodies. For the same reasons, rational design of peptide-based inhibitors naturally borrows and combines computational methods from both protein-ligand and protein-protein research fields. In this chapter, we aim to provide an overview of computational tools and approaches used for identifying and optimizing peptides that target protein-protein interfaces with high affinity and specificity. We hope that this review will help to implement appropriate in silico strategies for peptide-based drug design that builds on available information for the systems of interest.
Collapse
Affiliation(s)
| | - Tâp Ha-Duong
- Université Paris-Saclay, CNRS, BioCIS, Châtenay-Malabry, France.
| |
Collapse
|
7
|
Michael E, Polydorides S, Archontis G. Computational Design of Peptides with Improved Recognition of the Focal Adhesion Kinase FAT Domain. Methods Mol Biol 2022; 2405:383-402. [PMID: 35298823 DOI: 10.1007/978-1-0716-1855-4_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
We describe a two-stage computational protein design (CPD) methodology for the design of peptides binding to the FAT domain of the protein focal adhesion kinase. The first stage involves high-throughput CPD calculations with the Proteus software. The energies of the folded state are described by a physics-based energy function and of the unfolded peptides by a knowledge-based model that reproduces aminoacid compositions consistent with a helicity scale. The obtained sequences are filtered in terms of the affinity and the stability of the complex. In the second stage, design sequences are further evaluated by all-atom molecular dynamics simulations and binding free energy calculations with a molecular mechanics/implicit solvent free energy function.
Collapse
Affiliation(s)
- Eleni Michael
- Department of Physics, University of Cyprus, Nicosia, Cyprus
| | | | | |
Collapse
|
8
|
Opuu V, Mignon D, Simonson T. Knowledge-Based Unfolded State Model for Protein Design. Methods Mol Biol 2022; 2405:403-424. [PMID: 35298824 DOI: 10.1007/978-1-0716-1855-4_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The design of proteins and miniproteins is an important challenge. Designed variants should be stable, meaning the folded/unfolded free energy difference should be large enough. Thus, the unfolded state plays a central role. An extended peptide model is often used, where side chains interact with solvent and nearby backbone, but not each other. The unfolded energy is then a function of sequence composition only and can be empirically parametrized. If the space of sequences is explored with a Monte Carlo procedure, protein variants will be sampled according to a well-defined Boltzmann probability distribution. We can then choose unfolded model parameters to maximize the probability of sampling native-like sequences. This leads to a well-defined maximum likelihood framework. We present an iterative algorithm that follows the likelihood gradient. The method is presented in the context of our Proteus software, as a detailed downloadable tutorial. The unfolded model is combined with a folded model that uses molecular mechanics and a Generalized Born solvent. It was optimized for three PDZ domains and then used to redesign them. The sequences sampled are native-like and similar to a recent PDZ design study that was experimentally validated.
Collapse
Affiliation(s)
- Vaitea Opuu
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - David Mignon
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France.
| |
Collapse
|
9
|
Polydorides S, Archontis G. Computational optimization of the SARS-CoV-2 receptor-binding-motif affinity for human ACE2. Biophys J 2021; 120:2859-2871. [PMID: 33984310 PMCID: PMC8110322 DOI: 10.1016/j.bpj.2021.02.049] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 01/19/2021] [Accepted: 02/15/2021] [Indexed: 01/15/2023] Open
Abstract
The coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is responsible for the coronavirus disease 2019 pandemic, and the closely related SARS-CoV coronavirus enter cells by binding at the human angiotensin converting enzyme 2 (hACE2). The stronger hACE2 affinity of SARS-CoV-2 has been connected with its higher infectivity. In this work, we study hACE2 complexes with the receptor-binding domains (RBDs) of the human SARS-CoV-2 and human SARS-CoV viruses, using all-atom molecular dynamics simulations and computational protein design with a physics-based energy function. The molecular dynamics simulations identify charge-modifying substitutions between the CoV-2 and CoV RBDs, which either increase or decrease the hACE2 affinity of the SARS-CoV-2 RBD. The combined effect of these mutations is small, and the relative affinity is mainly determined by substitutions at residues in contact with hACE2. Many of these findings are in line and interpret recent experiments. Our computational protein design calculations redesign positions 455, 493, 494, and 501 of the SARS-CoV-2 receptor binding motif, which contact hACE2 in the complex and are important for ACE2 recognition. Sampling is enhanced by an adaptive importance sampling Monte Carlo method. Sequences with increased affinity replace CoV-2 glutamine by a negative residue at position 493; serine by a nonpolar or aromatic residue or an asparagine at position 494; and asparagine by valine or threonine at position 501. Substitutions at positions 455 and 501 have a smaller effect on affinity. Substitutions suggested by our design are seen in viral sequences encountered in other species, including bat and pangolin. Our results might be used to identify potential virus strains with higher human infectivity and assist in the design of peptide-based or peptidomimetic compounds with the potential to inhibit SARS-CoV-2 binding at hACE2.
Collapse
|
10
|
Michael E, Polydorides S, Simonson T, Archontis G. Hybrid MC/MD for protein design. J Chem Phys 2021; 153:054113. [PMID: 32770896 DOI: 10.1063/5.0013320] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Computational protein design relies on simulations of a protein structure, where selected amino acids can mutate randomly, and mutations are selected to enhance a target property, such as stability. Often, the protein backbone is held fixed and its degrees of freedom are modeled implicitly to reduce the complexity of the conformational space. We present a hybrid method where short molecular dynamics (MD) segments are used to explore conformations and alternate with Monte Carlo (MC) moves that apply mutations to side chains. The backbone is fully flexible during MD. As a test, we computed side chain acid/base constants or pKa's in five proteins. This problem can be considered a special case of protein design, with protonation/deprotonation playing the role of mutations. The solvent was modeled as a dielectric continuum. Due to cost, in each protein we allowed just one side chain position to change its protonation state and the other position to change its type or mutate. The pKa's were computed with a standard method that scans a range of pH values and with a new method that uses adaptive landscape flattening (ALF) to sample all protonation states in a single simulation. The hybrid method gave notably better accuracy than standard, fixed-backbone MC. ALF decreased the computational cost a factor of 13.
Collapse
Affiliation(s)
- Eleni Michael
- Department of Physics, University of Cyprus, P.O 20537, CY678 Nicosia, Cyprus
| | - Savvas Polydorides
- Department of Physics, University of Cyprus, P.O 20537, CY678 Nicosia, Cyprus
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Georgios Archontis
- Department of Physics, University of Cyprus, P.O 20537, CY678 Nicosia, Cyprus
| |
Collapse
|
11
|
Abstract
This chapter describes two computational methods for PDZ-peptide binding: high-throughput computational protein design (CPD) and a medium-throughput approach combining molecular dynamics for conformational sampling with a Poisson-Boltzmann (PB) Linear Interaction Energy for scoring. A new CPD method is outlined, which uses adaptive Monte Carlo simulations to efficiently sample peptide variants that tightly bind a PDZ domain, and provides at the same time precise estimates of their relative binding free energies. A detailed protocol is described based on the Proteus CPD software. The medium-throughput approach can be performed with standard MD and PB software, such as NAMD and Charmm. For 40 complexes between Tiam1 and peptide ligands, it gave high a2ccuracy, with mean errors of around 0.5 kcal/mol for relative binding free energies and no large errors. It requires a moderate amount of parameter fitting before it can be applied, and its transferability to other protein families is still untested.
Collapse
Affiliation(s)
- Nicolas Panel
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Francesco Villa
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Vaitea Opuu
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - David Mignon
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France.
| |
Collapse
|
12
|
Mignon D, Druart K, Michael E, Opuu V, Polydorides S, Villa F, Gaillard T, Panel N, Archontis G, Simonson T. Physics-Based Computational Protein Design: An Update. J Phys Chem A 2020; 124:10637-10648. [DOI: 10.1021/acs.jpca.0c07605] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- David Mignon
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Karen Druart
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Eleni Michael
- Department of Physics, University of Cyprus, PO20537, CY1678 Nicosia, Cyprus
| | - Vaitea Opuu
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Savvas Polydorides
- Department of Physics, University of Cyprus, PO20537, CY1678 Nicosia, Cyprus
| | - Francesco Villa
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Thomas Gaillard
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| | - Georgios Archontis
- Department of Physics, University of Cyprus, PO20537, CY1678 Nicosia, Cyprus
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128 Palaiseau, France
| |
Collapse
|
13
|
Michael E, Polydorides S, Promponas VJ, Skourides P, Archontis G. Recognition of LD motifs by the focal adhesion targeting domains of focal adhesion kinase and proline-rich tyrosine kinase 2-beta: Insights from molecular dynamics simulations. Proteins 2020; 89:29-52. [PMID: 32776636 DOI: 10.1002/prot.25992] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 06/21/2020] [Accepted: 07/26/2020] [Indexed: 12/13/2022]
Abstract
The focal adhesion kinase (FAK) and the proline-rich tyrosine kinase 2-beta (PYK2) are implicated in cancer progression and metastasis and represent promising biomarkers and targets for cancer therapy. FAK and PYK2 are recruited to focal adhesions (FAs) via interactions between their FA targeting (FAT) domains and conserved segments (LD motifs) on the proteins Paxillin, Leupaxin, and Hic-5. A promising new approach for the inhibition of FAK and PYK2 targets interactions of the FAK domains with proteins that promote localization at FAs. Advances toward this goal include the development of surface plasmon resonance, heteronuclear single quantum coherence nuclear magnetic resonance (HSQC-NMR) and fluorescence polarization assays for the identification of fragments or compounds interfering with the FAK-Paxillin interaction. We have recently validated this strategy, showing that Paxillin mimicking polypeptides with 2 to 3 LD motifs displace FAK from FAs and block kinase-dependent and independent functions of FAK, including downstream integrin signaling and FA localization of the protein p130Cas. In the present work we study by all-atom molecular dynamics simulations the recognition of peptides with the Paxillin and Leupaxin LD motifs by the FAK-FAT and PYK2-FAT domains. Our simulations and free-energy analysis interpret experimental data on binding of Paxillin and Leupaxin LD motifs at FAK-FAT and PYK2-FAT binding sites, and assess the roles of consensus LD regions and flanking residues. Our results can assist in the design of effective inhibitory peptides of the FAK-FAT: Paxillin and PYK2-FAT:Leupaxin complexes and the construction of pharmacophore models for the discovery of potential small-molecule inhibitors of the FAK-FAT and PYK2-FAT focal adhesion based functions.
Collapse
Affiliation(s)
- Eleni Michael
- Department of Physics, University of Cyprus, Nicosia, Cyprus
| | | | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Paris Skourides
- Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | | |
Collapse
|
14
|
Opuu V, Sun YJ, Hou T, Panel N, Fuentes EJ, Simonson T. A physics-based energy function allows the computational redesign of a PDZ domain. Sci Rep 2020; 10:11150. [PMID: 32636412 PMCID: PMC7341745 DOI: 10.1038/s41598-020-67972-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 06/08/2020] [Indexed: 11/30/2022] Open
Abstract
Computational protein design (CPD) can address the inverse folding problem, exploring a large space of sequences and selecting ones predicted to fold. CPD was used previously to redesign several proteins, employing a knowledge-based energy function for both the folded and unfolded states. We show that a PDZ domain can be entirely redesigned using a "physics-based" energy for the folded state and a knowledge-based energy for the unfolded state. Thousands of sequences were generated by Monte Carlo simulation. Three were chosen for experimental testing, based on their low energies and several empirical criteria. All three could be overexpressed and had native-like circular dichroism spectra and 1D-NMR spectra typical of folded structures. Two had upshifted thermal denaturation curves when a peptide ligand was present, indicating binding and suggesting folding to a correct, PDZ structure. Evidently, the physical principles that govern folded proteins, with a dash of empirical post-filtering, can allow successful whole-protein redesign.
Collapse
Affiliation(s)
- Vaitea Opuu
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Young Joo Sun
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, USA
| | - Titus Hou
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, USA
| | - Nicolas Panel
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France
| | - Ernesto J Fuentes
- Department of Biochemistry, Carver College of Medicine, University of Iowa, Iowa City, USA.
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau, France.
| |
Collapse
|
15
|
Zhou J, Panaitiu AE, Grigoryan G. A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. Proc Natl Acad Sci U S A 2020; 117:1059-1068. [PMID: 31892539 PMCID: PMC6969538 DOI: 10.1073/pnas.1908723117] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found. Here, we propose a design framework-one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions. We carry out extensive computational analyses and an experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.
Collapse
Affiliation(s)
- Jianfu Zhou
- Department of Computer Science, Dartmouth College, Hanover, NH 03755
| | | | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755;
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755
| |
Collapse
|
16
|
Adaptive landscape flattening allows the design of both enzyme: Substrate binding and catalytic power. PLoS Comput Biol 2020; 16:e1007600. [PMID: 31917825 PMCID: PMC7041857 DOI: 10.1371/journal.pcbi.1007600] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 02/25/2020] [Accepted: 12/11/2019] [Indexed: 01/30/2023] Open
Abstract
Designed enzymes are of fundamental and technological interest. Experimental directed evolution still has significant limitations, and computational approaches are a complementary route. A designed enzyme should satisfy multiple criteria: stability, substrate binding, transition state binding. Such multi-objective design is computationally challenging. Two recent studies used adaptive importance sampling Monte Carlo to redesign proteins for ligand binding. By first flattening the energy landscape of the apo protein, they obtained positive design for the bound state and negative design for the unbound. We have now extended the method to design an enzyme for specific transition state binding, i.e., for its catalytic power. We considered methionyl-tRNA synthetase (MetRS), which attaches methionine (Met) to its cognate tRNA, establishing codon identity. Previously, MetRS and other synthetases have been redesigned by experimental directed evolution to accept noncanonical amino acids as substrates, leading to genetic code expansion. Here, we have redesigned MetRS computationally to bind several ligands: the Met analog azidonorleucine, methionyl-adenylate (MetAMP), and the activated ligands that form the transition state for MetAMP production. Enzyme mutants known to have azidonorleucine activity were recovered by the design calculations, and 17 mutants predicted to bind MetAMP were characterized experimentally and all found to be active. Mutants predicted to have low activation free energies for MetAMP production were found to be active and the predicted reaction rates agreed well with the experimental values. We suggest the present method should become the paradigm for computational enzyme design. Designed enzymes are of major interest. Experimental directed evolution still has significant limitations, and computational approaches are another route. Enzymes must be stable, bind substrates, and be powerful catalysts. It is challenging to design for all these properties. A method to design substrate binding was proposed recently. It used an adaptive Monte Carlo method to explore mutations of a few amino acids near the substrate. A bias energy was gradually “learned” such that, in the absence of the ligand, the simulation visited most of the possible protein mutations with comparable probabilities. Remarkably, a simulation of the protein:ligand complex, including the bias, will then preferentially sample tight-binding sequences. We generalized the method to design binding specificity. We tested it for the methionyl-tRNA synthetase enzyme, which has been engineered in order to expand the genetic code. We redesigned the enzyme to obtain variants with low activation free energies for the catalytic step. The variants proposed by the simulations were shown experimentally to be active, and the predicted activation free energies were in reasonable agreement with the experimental values. We expect the new method will become the paradigm for computational enzyme design.
Collapse
|
17
|
Xiong P, Hu X, Huang B, Zhang J, Chen Q, Liu H. Increasing the efficiency and accuracy of the ABACUS protein sequence design method. Bioinformatics 2019; 36:136-144. [DOI: 10.1093/bioinformatics/btz515] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 05/29/2019] [Accepted: 06/21/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
The ABACUS (a backbone-based amino acid usage survey) method uses unique statistical energy functions to carry out protein sequence design. Although some of its results have been experimentally verified, its accuracy remains improvable because several important components of the method have not been specifically optimized for sequence design or in contexts of other parts of the method. The computational efficiency also needs to be improved to support interactive online applications or the consideration of a large number of alternative backbone structures.
Results
We derived a model to measure solvent accessibility with larger mutual information with residue types than previous models, optimized a set of rotamers which can approximate the sidechain atomic positions more accurately, and devised an empirical function to treat inter-atomic packing with parameters fitted to native structures and optimized in consistence with the rotamer set. Energy calculations have been accelerated by interpolation between pre-determined representative points in high-dimensional structural feature spaces. Sidechain repacking tests showed that ABACUS2 can accurately reproduce the conformation of native sidechains. In sequence design tests, the native residue type recovery rate reached 37.7%, exceeding the value of 32.7% for ABACUS1. Applying ABACUS2 to designed sequences on three native backbones produced proteins shown to be well-folded by experiments.
Availability and implementation
The ABACUS2 sequence design server can be visited at http://biocomp.ustc.edu.cn/servers/abacus-design.php.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peng Xiong
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Xiuhong Hu
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Bin Huang
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Jiahai Zhang
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Quan Chen
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Haiyan Liu
- School of Life Sciences, Hefei, Anhui 230026, China
- Hefei National Laboratory for Physical Sciences at the Microscale, Hefei, Anhui 230026, China
- School of Data Science, University of Sciences and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
18
|
Villa F, Simonson T. Protein pKa’s from Adaptive Landscape Flattening Instead of Constant-pH Simulations. J Chem Theory Comput 2018; 14:6714-6721. [DOI: 10.1021/acs.jctc.8b00970] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Francesco Villa
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
19
|
Charpentier A, Mignon D, Barbe S, Cortes J, Schiex T, Simonson T, Allouche D. Variable Neighborhood Search with Cost Function Networks To Solve Large Computational Protein Design Problems. J Chem Inf Model 2018; 59:127-136. [DOI: 10.1021/acs.jcim.8b00510] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - David Mignon
- Laboratoire de Biochimie (CNRS UMR 7654), École Polytechnique, 91128 Palaiseau, France
| | - Sophie Barbe
- Laboratoire d’Ingénierie des Systèmes Biologiques et Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, 31077 Toulouse, France
| | - Juan Cortes
- LAAS-CNRS, Université de Toulouse, CNRS, 31400 Toulouse, France
| | - Thomas Schiex
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR 7654), École Polytechnique, 91128 Palaiseau, France
| | - David Allouche
- MIAT, Université de Toulouse, INRA, 31326 Castanet-Tolosan, France
| |
Collapse
|
20
|
Villa F, Panel N, Chen X, Simonson T. Adaptive landscape flattening in amino acid sequence space for the computational design of protein:peptide binding. J Chem Phys 2018; 149:072302. [PMID: 30134674 DOI: 10.1063/1.5022249] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
For the high throughput design of protein:peptide binding, one must explore a vast space of amino acid sequences in search of low binding free energies. This complex problem is usually addressed with either simple heuristic scoring or expensive sequence enumeration schemes. Far more efficient than enumeration is a recent Monte Carlo approach that adaptively flattens the energy landscape in sequence space of the unbound peptide and provides formally exact binding free energy differences. The method allows the binding free energy to be used directly as the design criterion. We propose several improvements that allow still more efficient sampling and can address larger design problems. They include the use of Replica Exchange Monte Carlo and landscape flattening for both the unbound and bound peptides. We used the method to design peptides that bind to the PDZ domain of the Tiam1 signaling protein and could serve as inhibitors of its activity. Four peptide positions were allowed to mutate freely. Almost 75 000 peptide variants were processed in two simulations of 109 steps each that used 1 CPU hour on a desktop machine. 96% of the theoretical sequence space was sampled. The relative binding free energies agreed qualitatively with values from experiment. The sampled sequences agreed qualitatively with an experimental library of Tiam1-binding peptides. The main assumption limiting accuracy is the fixed backbone approximation, which could be alleviated in future work by using increased computational resources and multi-backbone designs.
Collapse
Affiliation(s)
- Francesco Villa
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Xingyu Chen
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
21
|
Setiawan D, Brender J, Zhang Y. Recent advances in automated protein design and its future challenges. Expert Opin Drug Discov 2018; 13:587-604. [PMID: 29695210 DOI: 10.1080/17460441.2018.1465922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Protein function is determined by protein structure which is in turn determined by the corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are understood, it should be possible to refine or even redefine the function of a protein by working backwards from the desired structure to the sequence. Automated protein design attempts to calculate the effects of mutations computationally with the goal of more radical or complex transformations than are accessible by experimental techniques. Areas covered: The authors give a brief overview of the recent methodological advances in computer-aided protein design, showing how methodological choices affect final design and how automated protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion: Automated protein design holds potential as a protein engineering technique, particularly in cases where screening by combinatorial mutagenesis is problematic. Considering solubility and immunogenicity issues, automated protein design is initially more likely to make an impact as a research tool for exploring basic biology in drug discovery than in the design of protein biologics.
Collapse
Affiliation(s)
- Dani Setiawan
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA
| | - Jeffrey Brender
- b Radiation Biology Branch , Center for Cancer Research, National Cancer Institute - NIH , Bethesda , MD , USA
| | - Yang Zhang
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA.,c Department of Biological Chemistry , University of Michigan , Ann Arbor , MI , USA
| |
Collapse
|
22
|
Gaillard T, Simonson T. Full Protein Sequence Redesign with an MMGBSA Energy Function. J Chem Theory Comput 2017; 13:4932-4943. [DOI: 10.1021/acs.jctc.7b00202] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Thomas Gaillard
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| |
Collapse
|
23
|
Villa F, Mignon D, Polydorides S, Simonson T. Comparing pairwise-additive and many-body generalized Born models for acid/base calculations and protein design. J Comput Chem 2017; 38:2396-2410. [PMID: 28749575 DOI: 10.1002/jcc.24898] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 06/30/2017] [Accepted: 07/06/2017] [Indexed: 12/13/2022]
Abstract
Generalized Born (GB) solvent models are common in acid/base calculations and protein design. With GB, the interaction between a pair of solute atoms depends on the shape of the protein/solvent boundary and, therefore, the positions of all solute atoms, so that GB is a many-body potential. For compute-intensive applications, the model is often simplified further, by introducing a mean, native-like protein/solvent boundary, which removes the many-body property. We investigate a method for both acid/base calculations and protein design that uses Monte Carlo simulations in which side chains can explore rotamers, bind/release protons, or mutate. The fluctuating protein/solvent dielectric boundary is treated in a way that is numerically exact (within the GB framework), in contrast to a mean boundary. Its originality is that it captures the many-body character while retaining the residue-pairwise complexity given by a fixed boundary. The method is implemented in the Proteus protein design software. It yields a slight but systematic improvement for acid/base constants in nine proteins and a significant improvement for the computational design of three PDZ domains. It eliminates a source of model uncertainty, which will facilitate the analysis of other model limitations. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Francesco Villa
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - David Mignon
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - Savvas Polydorides
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| | - Thomas Simonson
- Ecole Polytechnique, Laboratoire de Biochimie (CNRS UMR7654), Palaiseau, 91128, France
| |
Collapse
|
24
|
Mignon D, Panel N, Chen X, Fuentes EJ, Simonson T. Computational Design of the Tiam1 PDZ Domain and Its Ligand Binding. J Chem Theory Comput 2017; 13:2271-2289. [PMID: 28394603 DOI: 10.1021/acs.jctc.6b01255] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
PDZ domains direct protein-protein interactions and serve as models for protein design. Here, we optimized a protein design energy function for the Tiam1 and Cask PDZ domains that combines a molecular mechanics energy, Generalized Born solvent, and an empirical unfolded state model. Designed sequences were recognized as PDZ domains by the Superfamily fold recognition tool and had similarity scores comparable to natural PDZ sequences. The optimized model was used to redesign the two PDZ domains, by gradually varying the chemical potential of hydrophobic amino acids; the tendency of each position to lose or gain a hydrophobic character represents a novel hydrophobicity index. We also redesigned four positions in the Tiam1 PDZ domain involved in peptide binding specificity. The calculated affinity differences between designed variants reproduced experimental data and suggest substitutions with altered specificities.
Collapse
Affiliation(s)
- David Mignon
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Xingyu Chen
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Ernesto J Fuentes
- Department of Biochemistry, Roy J. & Lucille A. Carver College of Medicine and Holden Comprehensive Cancer Center, University of Iowa , Iowa City, Iowa 52242-1109, United States
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| |
Collapse
|
25
|
Piano V, Nenci S, Magnani F, Aliverti A, Mattevi A. Recombinant human dihydroxyacetonephosphate acyl-transferase characterization as an integral monotopic membrane protein. Biochem Biophys Res Commun 2016; 481:51-58. [PMID: 27836547 PMCID: PMC5146282 DOI: 10.1016/j.bbrc.2016.11.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 11/05/2016] [Indexed: 11/26/2022]
Abstract
Although the precise functions of ether phospholipids are still poorly understood, significant alterations in their physiological levels are associated either to inherited disorders or to aggressive metastatic cancer. The essential precursor, alkyl-dihydroxyacetone phosphate (DHAP), for all ether phospholipids species is synthetized in two consecutive reactions performed by two enzymes sitting on the inner side of the peroxisomal membrane. Here, we report the characterization of the recombinant human DHAP acyl-transferase, which performs the first step in alkyl-DHAP synthesis. By exploring several expression systems and designing a number of constructs, we were able to purify the enzyme in its active form and we found that it is tightly bound to the membrane through the N-terminal residues. Human DHAPAT is associated to peroxisomal membrane through the N-terminal region. Recombinant human DHAPAT expressed and purified from P. pastoris cells is active. Evidence of the in vitro reconstitution of DHAPAT/ADPS enzymatic complex.
Collapse
Affiliation(s)
- Valentina Piano
- Department of Biology and Biotechnology, University of Pavia, V. Ferrata 9, 27100, Pavia, Italy
| | - Simone Nenci
- Department of Biology and Biotechnology, University of Pavia, V. Ferrata 9, 27100, Pavia, Italy
| | - Francesca Magnani
- Department of Biology and Biotechnology, University of Pavia, V. Ferrata 9, 27100, Pavia, Italy
| | - Alessandro Aliverti
- Department of Biosciences, University of Milan, V. Celoria 26, 20133, Milan, Italy
| | - Andrea Mattevi
- Department of Biology and Biotechnology, University of Pavia, V. Ferrata 9, 27100, Pavia, Italy.
| |
Collapse
|
26
|
Topham CM, Barbe S, André I. An Atomistic Statistically Effective Energy Function for Computational Protein Design. J Chem Theory Comput 2016; 12:4146-68. [PMID: 27341125 DOI: 10.1021/acs.jctc.6b00090] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Shortcomings in the definition of effective free-energy surfaces of proteins are recognized to be a major contributory factor responsible for the low success rates of existing automated methods for computational protein design (CPD). The formulation of an atomistic statistically effective energy function (SEEF) suitable for a wide range of CPD applications and its derivation from structural data extracted from protein domains and protein-ligand complexes are described here. The proposed energy function comprises nonlocal atom-based and local residue-based SEEFs, which are coupled using a novel atom connectivity number factor to scale short-range, pairwise, nonbonded atomic interaction energies and a surface-area-dependent cavity energy term. This energy function was used to derive additional SEEFs describing the unfolded-state ensemble of any given residue sequence based on computed average energies for partially or fully solvent-exposed fragments in regions of irregular structure in native proteins. Relative thermal stabilities of 97 T4 bacteriophage lysozyme mutants were predicted from calculated energy differences for folded and unfolded states with an average unsigned error (AUE) of 0.84 kcal mol(-1) when compared to experiment. To demonstrate the utility of the energy function for CPD, further validation was carried out in tests of its capacity to recover cognate protein sequences and to discriminate native and near-native protein folds, loop conformers, and small-molecule ligand binding poses from non-native benchmark decoys. Experimental ligand binding free energies for a diverse set of 80 protein complexes could be predicted with an AUE of 2.4 kcal mol(-1) using an additional energy term to account for the loss in ligand configurational entropy upon binding. The atomistic SEEF is expected to improve the accuracy of residue-based coarse-grained SEEFs currently used in CPD and to extend the range of applications of extant atom-based protein statistical potentials.
Collapse
Affiliation(s)
- Christopher M Topham
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| | - Sophie Barbe
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| | - Isabelle André
- Université de Toulouse; INSA, UPS, INP; LISBP , 135 Avenue de Rangueil, F-31077 Toulouse, France.,CNRS, UMR5504 , F-31400 Toulouse, France.,INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés , F-31400 Toulouse, France
| |
Collapse
|
27
|
Liu H, Chen Q. Computational protein design for given backbone: recent progresses in general method-related aspects. Curr Opin Struct Biol 2016; 39:89-95. [PMID: 27348345 DOI: 10.1016/j.sbi.2016.06.013] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Revised: 05/18/2016] [Accepted: 06/15/2016] [Indexed: 10/21/2022]
Abstract
To achieve high success rate in protein design requires a reliable sequence design method to find amino acid sequences that stably fold into a desired backbone structure. This problem is addressed by computational protein design through the approach of energy minimization. Here we review recent method progresses related to improving the accuracy of this approach. First, the quality of the energy model is a key factor. Second, with structure sensitive energy functions, whether and how backbone flexibility is considered can have large effects on design accuracy, although usually only small adjustments of the backbone structure itself are involved. Third, the effective accuracy of design results can be boosted by post-processing a small number of designed sequences with complementary models that may not be efficient enough for full sequence optimization. Finally, computational method development will benefit greatly from increasingly efficient experimental approaches that can be applied to obtain extensive feedbacks.
Collapse
Affiliation(s)
- Haiyan Liu
- School of Life Sciences, University of Science and Technology of China, China; Hefei National Laboratory for Physical Sciences at the Microscales, China; Collaborative Innovation Center of Chemistry for Life Sciences, Hefei, Anhui 230027, China; Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui 230031, China.
| | - Quan Chen
- School of Life Sciences, University of Science and Technology of China, China
| |
Collapse
|
28
|
Mignon D, Simonson T. Comparing three stochastic search algorithms for computational protein design: Monte Carlo, replica exchange Monte Carlo, and a multistart, steepest-descent heuristic. J Comput Chem 2016; 37:1781-93. [PMID: 27197555 DOI: 10.1002/jcc.24393] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 02/26/2016] [Accepted: 03/27/2016] [Indexed: 01/11/2023]
Abstract
Computational protein design depends on an energy function and an algorithm to search the sequence/conformation space. We compare three stochastic search algorithms: a heuristic, Monte Carlo (MC), and a Replica Exchange Monte Carlo method (REMC). The heuristic performs a steepest-descent minimization starting from thousands of random starting points. The methods are applied to nine test proteins from three structural families, with a fixed backbone structure, a molecular mechanics energy function, and with 1, 5, 10, 20, 30, or all amino acids allowed to mutate. Results are compared to an exact, "Cost Function Network" method that identifies the global minimum energy conformation (GMEC) in favorable cases. The designed sequences accurately reproduce experimental sequences in the hydrophobic core. The heuristic and REMC agree closely and reproduce the GMEC when it is known, with a few exceptions. Plain MC performs well for most cases, occasionally departing from the GMEC by 3-4 kcal/mol. With REMC, the diversity of the sequences sampled agrees with exact enumeration where the latter is possible: up to 2 kcal/mol above the GMEC. Beyond, room temperature replicas sample sequences up to 10 kcal/mol above the GMEC, providing thermal averages and a solution to the inverse protein folding problem. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- David Mignon
- Laboratoire De Biochimie (UMR CNRS 7654), Department Of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire De Biochimie (UMR CNRS 7654), Department Of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
29
|
Gaillard T, Panel N, Simonson T. Protein side chain conformation predictions with an MMGBSA energy function. Proteins 2016; 84:803-19. [PMID: 26948696 DOI: 10.1002/prot.25030] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 02/22/2016] [Accepted: 02/27/2016] [Indexed: 12/17/2022]
Abstract
The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Nicolas Panel
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Thomas Simonson
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| |
Collapse
|
30
|
Simonson T, Ye-Lehmann S, Palmai Z, Amara N, Wydau-Dematteis S, Bigan E, Druart K, Moch C, Plateau P. Redesigning the stereospecificity of tyrosyl-tRNA synthetase. Proteins 2016; 84:240-53. [PMID: 26676967 DOI: 10.1002/prot.24972] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Revised: 09/30/2015] [Accepted: 11/26/2015] [Indexed: 12/14/2022]
Abstract
D-Amino acids are largely excluded from protein synthesis, yet they are of great interest in biotechnology. Unnatural amino acids have been introduced into proteins using engineered aminoacyl-tRNA synthetases (aaRSs), and this strategy might be applicable to D-amino acids. Several aaRSs can aminoacylate their tRNA with a D-amino acid; of these, tyrosyl-tRNA synthetase (TyrRS) has the weakest stereospecificity. We use computational protein design to suggest active site mutations in Escherichia coli TyrRS that could increase its D-Tyr binding further, relative to L-Tyr. The mutations selected all modify one or more sidechain charges in the Tyr binding pocket. We test their effect by probing the aminoacyl-adenylation reaction through pyrophosphate exchange experiments. We also perform extensive alchemical free energy simulations to obtain L-Tyr/D-Tyr binding free energy differences. Agreement with experiment is good, validating the structural models and detailed thermodynamic predictions the simulations provide. The TyrRS stereospecificity proves hard to engineer through charge-altering mutations in the first and second coordination shells of the Tyr ammonium group. Of six mutants tested, two are active towards D-Tyr; one of these has an inverted stereospecificity, with a large preference for D-Tyr. However, its activity is low. Evidently, the TyrRS stereospecificity is robust towards charge rearrangements near the ligand. Future design may have to consider more distant and/or electrically neutral target mutations, and possibly design for binding of the transition state, whose structure however can only be modeled.
Collapse
Affiliation(s)
- Thomas Simonson
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | | | - Zoltan Palmai
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Najette Amara
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Sandra Wydau-Dematteis
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Erwan Bigan
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Karen Druart
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Clara Moch
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Pierre Plateau
- Department of Biology, Laboratoire De Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| |
Collapse
|
31
|
Polydorides S, Michael E, Mignon D, Druart K, Archontis G, Simonson T. Proteus and the Design of Ligand Binding Sites. Methods Mol Biol 2016; 1414:77-97. [PMID: 27094287 DOI: 10.1007/978-1-4939-3569-7_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
This chapter describes the organization and use of Proteus, a multitool computational suite for the optimization of protein and ligand conformations and sequences, and the calculation of pK α shifts and relative binding affinities. The software offers the use of several molecular mechanics force fields and solvent models, including two generalized Born variants, and a large range of scoring functions, which can combine protein stability, ligand affinity, and ligand specificity terms, for positive and negative design. We present in detail the steps for structure preparation, system setup, construction of the interaction energy matrix, protein sequence and structure optimizations, pK α calculations, and ligand titration calculations. We discuss illustrative examples, including the chemical/structural optimization of a complex between the MHC class II protein HLA-DQ8 and the vinculin epitope, and the chemical optimization of the compstatin analog Ac-Val4Trp/His9Ala, which regulates the function of protein C3 of the complement system.
Collapse
Affiliation(s)
- Savvas Polydorides
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus
| | - Eleni Michael
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus
| | - David Mignon
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | - Karen Druart
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | - Georgios Archontis
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus.
| | - Thomas Simonson
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France.
| |
Collapse
|
32
|
Druart K, Palmai Z, Omarjee E, Simonson T. Protein:Ligand binding free energies: A stringent test for computational protein design. J Comput Chem 2015; 37:404-15. [PMID: 26503829 DOI: 10.1002/jcc.24230] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 10/01/2015] [Accepted: 10/02/2015] [Indexed: 01/29/2023]
Abstract
A computational protein design method is extended to allow Monte Carlo simulations where two ligands are titrated into a protein binding pocket, yielding binding free energy differences. These provide a stringent test of the physical model, including the energy surface and sidechain rotamer definition. As a test, we consider tyrosyl-tRNA synthetase (TyrRS), which has been extensively redesigned experimentally. We consider its specificity for its substrate l-tyrosine (l-Tyr), compared to the analogs d-Tyr, p-acetyl-, and p-azido-phenylalanine (ac-Phe, az-Phe). We simulate l- and d-Tyr binding to TyrRS and six mutants, and compare the structures and binding free energies to a more rigorous "MD/GBSA" procedure: molecular dynamics with explicit solvent for structures and a Generalized Born + Surface Area model for binding free energies. Next, we consider l-Tyr, ac- and az-Phe binding to six other TyrRS variants. The titration results are sensitive to the precise rotamer definition, which involves a short energy minimization for each sidechain pair to help relax bad contacts induced by the discrete rotamer set. However, when designed mutant structures are rescored with a standard GBSA energy model, results agree well with the more rigorous MD/GBSA. As a third test, we redesign three amino acid positions in the substrate coordination sphere, with either l-Tyr or d-Tyr as the ligand. For two, we obtain good agreement with experiment, recovering the wildtype residue when l-Tyr is the ligand and a d-Tyr specific mutant when d-Tyr is the ligand. For the third, we recover His with either ligand, instead of wildtype Gln.
Collapse
Affiliation(s)
- Karen Druart
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Zoltan Palmai
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Eyaz Omarjee
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
33
|
LuCore SD, Litman JM, Powers KT, Gao S, Lynn AM, Tollefson WTA, Fenn TD, Washington MT, Schnieders MJ. Dead-End Elimination with a Polarizable Force Field Repacks PCNA Structures. Biophys J 2015; 109:816-26. [PMID: 26287633 PMCID: PMC4547145 DOI: 10.1016/j.bpj.2015.06.062] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Revised: 06/07/2015] [Accepted: 06/29/2015] [Indexed: 11/15/2022] Open
Abstract
A balance of van der Waals, electrostatic, and hydrophobic forces drive the folding and packing of protein side chains. Although such interactions between residues are often approximated as being pairwise additive, in reality, higher-order many-body contributions that depend on environment drive hydrophobic collapse and cooperative electrostatics. Beginning from dead-end elimination, we derive the first algorithm, to our knowledge, capable of deterministic global repacking of side chains compatible with many-body energy functions. The approach is applied to seven PCNA x-ray crystallographic data sets with resolutions 2.5-3.8 Å (mean 3.0 Å) using an open-source software. While PDB_REDO models average an Rfree value of 29.5% and MOLPROBITY score of 2.71 Å (77th percentile), dead-end elimination with the polarizable AMOEBA force field lowered Rfree by 2.8-26.7% and improved mean MOLPROBITY score to atomic resolution at 1.25 Å (100th percentile). For structural biology applications that depend on side-chain repacking, including x-ray refinement, homology modeling, and protein design, the accuracy limitations of pairwise additivity can now be eliminated via polarizable or quantum mechanical potentials.
Collapse
Affiliation(s)
- Stephen D LuCore
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | - Jacob M Litman
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Kyle T Powers
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Shibo Gao
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Ava M Lynn
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | | | | | | | - Michael J Schnieders
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa; Department of Biochemistry, University of Iowa, Iowa City, Iowa.
| |
Collapse
|
34
|
van Heemst J, Jansen DTSL, Polydorides S, Moustakas AK, Bax M, Feitsma AL, Bontrop-Elferink DG, Baarse M, van der Woude D, Wolbink GJ, Rispens T, Koning F, de Vries RRP, Papadopoulos GK, Archontis G, Huizinga TW, Toes RE. Crossreactivity to vinculin and microbes provides a molecular basis for HLA-based protection against rheumatoid arthritis. Nat Commun 2015; 6:6681. [PMID: 25942574 DOI: 10.1038/ncomms7681] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 02/18/2015] [Indexed: 01/09/2023] Open
Abstract
The HLA locus is the strongest risk factor for anti-citrullinated protein antibody (ACPA)(+) rheumatoid arthritis (RA). Despite considerable efforts in the last 35 years, this association is poorly understood. Here we identify (citrullinated) vinculin, present in the joints of ACPA(+) RA patients, as an autoantigen targeted by ACPA and CD4(+) T cells. These T cells recognize an epitope with the core sequence DERAA, which is also found in many microbes and in protective HLA-DRB1*13 molecules, presented by predisposing HLA-DQ molecules. Moreover, these T cells crossreact with vinculin-derived and microbial-derived DERAA epitopes. Intriguingly, DERAA-directed T cells are not detected in HLA-DRB1*13(+) donors, indicating that the DERAA epitope from HLA-DRB1*13 mediates (thymic) tolerance in these donors and explaining the protective effects associated with HLA-DRB1*13. Together our data indicate the involvement of pathogen-induced DERAA-directed T cells in the HLA-RA association and provide a molecular basis for the contribution of protective/predisposing HLA alleles.
Collapse
Affiliation(s)
- Jurgen van Heemst
- Department of Rheumatology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Diahann T S L Jansen
- Department of Rheumatology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | | | - Antonis K Moustakas
- Faculty of Agricultural Technology, Technological Educational Institute of Ioanian Islands, Argostoli, Cephallonia 28100, Greece
| | - Marieke Bax
- Department of Rheumatology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Anouk L Feitsma
- Department of Rheumatology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Diënne G Bontrop-Elferink
- Department of Immunohematology and Bloodtransfusion, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Martine Baarse
- Department of Rheumatology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Diane van der Woude
- Department of Rheumatology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - Gert-Jan Wolbink
- Sanquin Research and Landsteiner Laboratory, Academic Medical Center, 1066 CX Amsterdam, The Netherlands
| | - Theo Rispens
- Sanquin Research and Landsteiner Laboratory, Academic Medical Center, 1066 CX Amsterdam, The Netherlands
| | - Frits Koning
- Department of Immunohematology and Bloodtransfusion, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - René R P de Vries
- Department of Immunohematology and Bloodtransfusion, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - George K Papadopoulos
- Laboratory of Biochemistry and Biophysics, Faculty of Agricultural Technology, Epirus Institute of Technology, Arta 47100, Greece
| | | | - Tom W Huizinga
- Department of Rheumatology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| | - René E Toes
- Department of Rheumatology, Leiden University Medical Center, 2300 RC Leiden, The Netherlands
| |
Collapse
|
35
|
Gaillard T, Simonson T. Pairwise decomposition of an MMGBSA energy function for computational protein design. J Comput Chem 2014; 35:1371-87. [PMID: 24854675 DOI: 10.1002/jcc.23637] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 04/14/2014] [Accepted: 05/01/2014] [Indexed: 02/02/2023]
Abstract
Computational protein design (CPD) aims at predicting new proteins or modifying existing ones. The computational challenge is huge as it requires exploring an enormous sequence and conformation space. The difficulty can be reduced by considering a fixed backbone and a discrete set of sidechain conformations. Another common strategy consists in precalculating a pairwise energy matrix, from which the energy of any sequence/conformation can be quickly obtained. In this work, we examine the pairwise decomposition of protein MMGBSA energy functions from a general theoretical perspective, and an implementation proposed earlier for CPD. It includes a Generalized Born term, whose many-body character is overcome using an effective dielectric environment, and a Surface Area term, for which we present an improved pairwise decomposition. A detailed evaluation of the error introduced by the decomposition on the different energy components is performed. We show that the error remains reasonable, compared to other uncertainties.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
36
|
Polydorides S, Simonson T. Monte Carlo simulations of proteins at constant pH with generalized Born solvent, flexible sidechains, and an effective dielectric boundary. J Comput Chem 2013; 34:2742-56. [PMID: 24122878 DOI: 10.1002/jcc.23450] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 09/04/2013] [Accepted: 09/08/2013] [Indexed: 12/11/2022]
Abstract
Titratable residues determine the acid/base behavior of proteins, strongly influencing their function; in addition, proton binding is a valuable reporter on electrostatic interactions. We describe a method for pK(a) calculations, using constant-pH Monte Carlo (MC) simulations to explore the space of sidechain conformations and protonation states, with an efficient and accurate generalized Born model (GB) for the solvent effects. To overcome the many-body dependency of the GB model, we use a "Native Environment" approximation, whose accuracy is shown to be good. It allows the precalculation and storage of interactions between all sidechain pairs, a strategy borrowed from computational protein design, which makes the MC simulations themselves very fast. The method is tested for 12 proteins and 167 titratable sidechains. It gives an rms error of 1.1 pH units, similar to the trivial "Null" model. The only adjustable parameter is the protein dielectric constant. The best accuracy is achieved for values between 4 and 8, a range that is physically plausible for a protein interior. For sidechains with large pKa shifts, ≥2, the rms error is 1.6, compared to 2.5 with the Null model and 1.5 with the empirical PROPKA method.
Collapse
Affiliation(s)
- Savvas Polydorides
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
37
|
Simonson T. What Is the Dielectric Constant of a Protein When Its Backbone Is Fixed? J Chem Theory Comput 2013; 9:4603-8. [DOI: 10.1021/ct400398e] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Thomas Simonson
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| |
Collapse
|