1
|
Liu S, Yang Q, Zhang L, Luo S. Accurate Protein p Ka Prediction with Physical Organic Chemistry Guided 3D Protein Representation. J Chem Inf Model 2024; 64:4410-4418. [PMID: 38780156 DOI: 10.1021/acs.jcim.4c00354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Protein pKa is a fundamental physicochemical parameter that dictates protein structure and function. However, accurately determining protein site-pKa values remains a substantial challenge, both experimentally and theoretically. In this study, we introduce a physical organic approach, leveraging a protein structural and physical-organic-parameter-based representation (P-SPOC), to develop a rapid and intuitive model for protein pKa prediction. Our P-SPOC model achieves state-of-the-art predictive accuracy, with a mean absolute error (MAE) of 0.33 pKa units. Furthermore, we have incorporated advanced protein structure prediction models, like AlphaFold2, to approximate structures for proteins lacking three-dimensional representations, which enhances the applicability of our model in the context of structure-undetermined protein research. To promote broader accessibility within the research community, an online prediction interface was also established at isyn.luoszgroup.com.
Collapse
Affiliation(s)
- Siyuan Liu
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Qi Yang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Long Zhang
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| | - Sanzhong Luo
- Center of Basic Molecular Science, Department of Chemistry, Tsinghua University, Beijing 100084, China
| |
Collapse
|
2
|
Hsieh YC, Delarue M, Orland H, Koehl P. Analyzing the Geometry and Dynamics of Viral Structures: A Review of Computational Approaches Based on Alpha Shape Theory, Normal Mode Analysis, and Poisson-Boltzmann Theories. Viruses 2023; 15:1366. [PMID: 37376665 DOI: 10.3390/v15061366] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/05/2023] [Accepted: 06/09/2023] [Indexed: 06/29/2023] Open
Abstract
The current SARS-CoV-2 pandemic highlights our fragility when we are exposed to emergent viruses either directly or through zoonotic diseases. Fortunately, our knowledge of the biology of those viruses is improving. In particular, we have more and more structural information on virions, i.e., the infective form of a virus that includes its genomic material and surrounding protective capsid, and on their gene products. It is important to have methods that enable the analyses of structural information on such large macromolecular systems. We review some of those methods in this paper. We focus on understanding the geometry of virions and viral structural proteins, their dynamics, and their energetics, with the ambition that this understanding can help design antiviral agents. We discuss those methods in light of the specificities of those structures, mainly that they are huge. We focus on three of our own methods based on the alpha shape theory for computing geometry, normal mode analyses to study dynamics, and modified Poisson-Boltzmann theories to study the organization of ions and co-solvent and solvent molecules around biomacromolecules. The corresponding software has computing times that are compatible with the use of regular desktop computers. We show examples of their applications on some outer shells and structural proteins of the West Nile Virus.
Collapse
Affiliation(s)
- Yin-Chen Hsieh
- Institute for Arctic and Marine Biology, Department of Biosciences, Fisheries, and Economics, UiT The Arctic University of Norway, 9037 Tromso, Norway
| | - Marc Delarue
- Institut Pasteur, Université Paris-Cité and CNRS, UMR 3528, Unité Architecture et Dynamique des Macromolécules Biologiques, 75015 Paris, France
| | - Henri Orland
- Institut de Physique Théorique, CEA, CNRS, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Patrice Koehl
- Department of Computer Science, University of California, Davis, CA 95616, USA
| |
Collapse
|
3
|
Koehl P, Akopyan A, Edelsbrunner H. Computing the Volume, Surface Area, Mean, and Gaussian Curvatures of Molecules and Their Derivatives. J Chem Inf Model 2023; 63:973-985. [PMID: 36638318 PMCID: PMC9930125 DOI: 10.1021/acs.jcim.2c01346] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Geometry is crucial in our efforts to comprehend the structures and dynamics of biomolecules. For example, volume, surface area, and integrated mean and Gaussian curvature of the union of balls representing a molecule are used to quantify its interactions with the water surrounding it in the morphometric implicit solvent models. The Alpha Shape theory provides an accurate and reliable method for computing these geometric measures. In this paper, we derive homogeneous formulas for the expressions of these measures and their derivatives with respect to the atomic coordinates, and we provide algorithms that implement them into a new software package, AlphaMol. The only variables in these formulas are the interatomic distances, making them insensitive to translations and rotations. AlphaMol includes a sequential algorithm and a parallel algorithm. In the parallel version, we partition the atoms of the molecule of interest into 3D rectangular blocks, using a kd-tree algorithm. We then apply the sequential algorithm of AlphaMol to each block, augmented by a buffer zone to account for atoms whose ball representations may partially cover the block. The current parallel version of AlphaMol leads to a 20-fold speed-up compared to an independent serial implementation when using 32 processors. For instance, it takes 31 s to compute the geometric measures and derivatives of each atom in a viral capsid with more than 26 million atoms on 32 Intel processors running at 2.7 GHz. The presence of the buffer zones, however, leads to redundant computations, which ultimately limit the impact of using multiple processors. AlphaMol is available as an OpenSource software.
Collapse
Affiliation(s)
- Patrice Koehl
- Department
of Computer Science, University of California, Davis, California95616, United States,
| | | | | |
Collapse
|
4
|
Raha FK, Hasan J, Ali A, Fakayode SO, Halim MA. Exploring the molecular level interaction of Xenoestrogen phthalate plasticisers with oestrogen receptor alpha (ERα) Y537S mutant. MOLECULAR SIMULATION 2022. [DOI: 10.1080/08927022.2022.2101675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Affiliation(s)
- Fahmida Khanam Raha
- Division of Molecular Cancer, The Red-Green Research Centre, BICCB, Dhaka, Bangladesh
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Jahid Hasan
- Department of Biochemistry and Molecular Biology, University of Dhaka, Dhaka, Bangladesh
| | - Ackas Ali
- Division of Molecular Cancer, The Red-Green Research Centre, BICCB, Dhaka, Bangladesh
| | - Sayo O. Fakayode
- Department of Chemistry, Physics & Astronomy, Georgia College & State University, Milledgeville, GA, USA
| | - Mohammad A. Halim
- Department of Chemistry and Biochemistry, Kennesaw State University, Kennesaw, GA, USA
| |
Collapse
|
5
|
Chitrala KN, Nagarkatti P, Nagarkatti M. Computational analysis of deleterious single nucleotide polymorphisms in catechol O-Methyltransferase conferring risk to post-traumatic stress disorder. J Psychiatr Res 2021; 138:207-218. [PMID: 33865170 PMCID: PMC8969201 DOI: 10.1016/j.jpsychires.2021.03.048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Revised: 03/18/2021] [Accepted: 03/24/2021] [Indexed: 10/21/2022]
Abstract
Post-traumatic stress disorder (PTSD) is one of the prevalent neurological disorder which is drawing increased attention over the past few decades. Major risk factors for PTSD can be categorized into environmental and genetic factors. Among the genetic risk factors, polymorphisms in the catechol-O-methyltransferase (COMT) gene is known to be associated with the risk for PTSD. In the present study, we analysed the impact of deleterious single nucleotide polymorphisms (SNPs) in the COMT gene conferring risk to PTSD using computational based approaches followed by molecular dynamic simulations. The data on COMT gene associated with PTSD were collected from several databases including Online Mendelian Inheritance in Man (OMIM) search. Datasets related to SNP were downloaded from the dbSNP database. To study the structural and dynamic effects of COMT wild type and mutant forms, we performed molecular dynamics simulations (MD simulations) at a time scale of 300 ns. Results from screening the SNPs using the computational tools SIFT and Polyphen-2 demonstrated that the SNP rs4680 (V158M) in COMT has a deleterious effect with phenotype in PTSD. Results from the MD simulations showed that there is some major fluctuations in the structural features including root mean square deviation (RMSD), radius of gyration (Rg), root mean square fluctuation (RMSF) and secondary structural elements including α-helices, sheets and turns between wild-type (WT) and mutant forms of COMT protein. In conclusion, our study provides novel insights into the deleterious effects and impact of V158M mutation on COMT protein structure which plays a key role in PTSD.
Collapse
Affiliation(s)
- Kumaraswamy Naidu Chitrala
- Dept. of Pathology, Microbiology and Immunology, University of South Carolina School of Medicine, Columbia, SC, 29208, USA; Fels Cancer Institute for Personalized Medicine, Lewis Katz School of Medicine, Temple University, Philadelphia, PA, 19140, USA.
| | - Prakash Nagarkatti
- Dept. of Pathology, Microbiology and Immunology, University of South Carolina School of Medicine, Columbia, SC, 29208, USA
| | - Mitzi Nagarkatti
- Dept. of Pathology, Microbiology and Immunology, University of South Carolina School of Medicine, Columbia, SC, 29208, USA
| |
Collapse
|
6
|
Vassetti D, Civalleri B, Labat F. Analytical calculation of the solvent-accessible surface area and its nuclear gradients by stereographic projection: A general approach for molecules, polymers, nanotubes, helices, and surfaces. J Comput Chem 2020; 41:1464-1479. [PMID: 32212337 DOI: 10.1002/jcc.26191] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 03/07/2020] [Accepted: 03/09/2020] [Indexed: 01/19/2023]
Abstract
In this article, we explore an alternative to the analytical Gauss-Bonnet approach for computing the solvent-accessible surface area (SASA) and its nuclear gradients. These two key quantities are required to evaluate the nonelectrostatic contribution to the solvation energy and its nuclear gradients in implicit solvation models. We extend a previously proposed analytical approach for finite systems based on the stereographic projection technique to infinite periodic systems such as polymers, nanotubes, helices, or surfaces and detail its implementation in the Crystal code. We provide the full derivation of the SASA nuclear gradients, and introduce an iterative perturbation scheme of the atomic coordinates to stabilize the gradients calculation for certain difficult symmetric systems. An excellent agreement of computed SASA with reference analytical values is found for finite systems, while the SASA size-extensivity is verified for infinite periodic systems. In addition, correctness of the analytical gradients is confirmed by the excellent agreement obtained with numerical gradients and by the translational invariance achieved, both for finite and infinite periodic systems. Overall therefore, the stereographic projection approach appears as a general, simple, and efficient technique to compute the key quantities required for the calculation of the nonelectrostatic contribution to the solvation energy and its nuclear gradients in implicit solvation models applicable to both finite and infinite periodic systems.
Collapse
Affiliation(s)
- Dario Vassetti
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences, Chemical Theory and Modelling Group, F-75005 Paris, France
| | - Bartolomeo Civalleri
- Department of Chemistry, NIS and INSTM Reference Centre, University of Turin, Via P. Giuria 7, I-10125 Torino, Italy
| | - Frédéric Labat
- Chimie ParisTech, PSL University, CNRS, Institute of Chemistry for Life and Health Sciences, Chemical Theory and Modelling Group, F-75005 Paris, France
| |
Collapse
|
7
|
Aprahamian ML, Lindert S. Utility of Covalent Labeling Mass Spectrometry Data in Protein Structure Prediction with Rosetta. J Chem Theory Comput 2019; 15:3410-3424. [PMID: 30946594 DOI: 10.1021/acs.jctc.9b00101] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Covalent labeling mass spectrometry experiments are growing in popularity and provide important information regarding protein structure. Information obtained from these experiments correlates with residue solvent exposure within the protein in solution. However, it is impossible to determine protein structure from covalent labeling data alone. Incorporation of sparse covalent labeling data into the protein structure prediction software Rosetta has been shown to improve protein tertiary structure prediction. Here, covalent labeling techniques were analyzed computationally to provide insight into what labeling data is needed to optimize tertiary protein structure prediction in Rosetta. We have successfully implemented a new scoring functionality that provides improved predictions. We developed two new covalent labeling based score terms that use a "cone"-based neighbor count to quantify the relative solvent exposure of each amino acid. To test our method, we used a set of 20 proteins with structures deposited in the Protein Data Bank. Decoy model sets were generated for each of these 20 proteins, and the normalized covalent labeling score versus RMSD distributions were evaluated. On the basis of these distributions, we have determined an optimal subset of residues to use when performing covalent labeling experiments in order to maximize the structure prediction capabilities of the covalent labeling data. We also investigated how much false negative and false positive data can be tolerated without meaningfully impacting protein structure prediction. Using these new covalent labeling score terms, protein models were rescored and the resulting models improved by 3.9 Å RMSD on average. New models were also generated using Rosetta's AbinitioRelax program under the guidance of covalent labeling information, and improvement in model quality was observed.
Collapse
Affiliation(s)
- Melanie L Aprahamian
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| | - Steffen Lindert
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| |
Collapse
|
8
|
Aprahamian ML, Chea EE, Jones LM, Lindert S. Rosetta Protein Structure Prediction from Hydroxyl Radical Protein Footprinting Mass Spectrometry Data. Anal Chem 2018; 90:7721-7729. [PMID: 29874044 DOI: 10.1021/acs.analchem.8b01624] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In recent years mass spectrometry-based covalent labeling techniques such as hydroxyl radical footprinting (HRF) have emerged as valuable structural biology techniques, yielding information on protein tertiary structure. These data, however, are not sufficient to predict protein structure unambiguously, as they provide information only on the relative solvent exposure of certain residues. Despite some recent advances, no software currently exists that can utilize covalent labeling mass spectrometry data to predict protein tertiary structure. We have developed the first such tool, which incorporates mass spectrometry derived protection factors from HRF labeling as a new centroid score term for the Rosetta scoring function to improve the prediction of protein tertiary structures. We tested our method on a set of four soluble benchmark proteins with known crystal structures and either published HRF experimental results or internally acquired data. Using the HRF labeling data, we rescored large decoy sets of structures predicted with Rosetta for each of the four benchmark proteins. As a result, the model quality improved for all benchmark proteins as compared to when scored with Rosetta alone. For two of the four proteins we were even able to identify atomic resolution models with the addition of HRF data.
Collapse
Affiliation(s)
- Melanie L Aprahamian
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| | - Emily E Chea
- Department of Pharmaceutical Sciences , University of Maryland , Baltimore , Maryland 21201 , United States
| | - Lisa M Jones
- Department of Pharmaceutical Sciences , University of Maryland , Baltimore , Maryland 21201 , United States
| | - Steffen Lindert
- Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States
| |
Collapse
|
9
|
Setiawan D, Brender J, Zhang Y. Recent advances in automated protein design and its future challenges. Expert Opin Drug Discov 2018; 13:587-604. [PMID: 29695210 DOI: 10.1080/17460441.2018.1465922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Protein function is determined by protein structure which is in turn determined by the corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are understood, it should be possible to refine or even redefine the function of a protein by working backwards from the desired structure to the sequence. Automated protein design attempts to calculate the effects of mutations computationally with the goal of more radical or complex transformations than are accessible by experimental techniques. Areas covered: The authors give a brief overview of the recent methodological advances in computer-aided protein design, showing how methodological choices affect final design and how automated protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion: Automated protein design holds potential as a protein engineering technique, particularly in cases where screening by combinatorial mutagenesis is problematic. Considering solubility and immunogenicity issues, automated protein design is initially more likely to make an impact as a research tool for exploring basic biology in drug discovery than in the design of protein biologics.
Collapse
Affiliation(s)
- Dani Setiawan
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA
| | - Jeffrey Brender
- b Radiation Biology Branch , Center for Cancer Research, National Cancer Institute - NIH , Bethesda , MD , USA
| | - Yang Zhang
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA.,c Department of Biological Chemistry , University of Michigan , Ann Arbor , MI , USA
| |
Collapse
|
10
|
Mignon D, Panel N, Chen X, Fuentes EJ, Simonson T. Computational Design of the Tiam1 PDZ Domain and Its Ligand Binding. J Chem Theory Comput 2017; 13:2271-2289. [PMID: 28394603 DOI: 10.1021/acs.jctc.6b01255] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
PDZ domains direct protein-protein interactions and serve as models for protein design. Here, we optimized a protein design energy function for the Tiam1 and Cask PDZ domains that combines a molecular mechanics energy, Generalized Born solvent, and an empirical unfolded state model. Designed sequences were recognized as PDZ domains by the Superfamily fold recognition tool and had similarity scores comparable to natural PDZ sequences. The optimized model was used to redesign the two PDZ domains, by gradually varying the chemical potential of hydrophobic amino acids; the tendency of each position to lose or gain a hydrophobic character represents a novel hydrophobicity index. We also redesigned four positions in the Tiam1 PDZ domain involved in peptide binding specificity. The calculated affinity differences between designed variants reproduced experimental data and suggest substitutions with altered specificities.
Collapse
Affiliation(s)
- David Mignon
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Nicolas Panel
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Xingyu Chen
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| | - Ernesto J Fuentes
- Department of Biochemistry, Roy J. & Lucille A. Carver College of Medicine and Holden Comprehensive Cancer Center, University of Iowa , Iowa City, Iowa 52242-1109, United States
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique , Palaiseau, France
| |
Collapse
|
11
|
Abstract
Computational protein sequence design is the rational design based on computer simulation of new protein molecules to fold to target three-dimensional structures, with the ultimate goal of designing novel functions. It requires a good understanding of the thermodynamic equilibrium properties of the protein of interest. Here, we consider the contribution of the solvent to the stability of the protein. We describe implicit solvent models, focusing on approximations of their nonpolar components using geometric potentials. We consider the surface area (SA) model in which the nonpolar solvation free energy is expressed as a sum of the contributions of all atoms, assumed to be proportional to their accessible surface areas (ASAs). We briefly review existing numerical and analytical approaches that compute the ASA. We describe in more detail the alpha shape theory as it provides a unifying mathematical framework that enables the analytical calculations of the surface area of a macromolecule represented as a union of balls.
Collapse
Affiliation(s)
- Jie Li
- Computational and Systems Biology Group, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA, 95616, USA.
| |
Collapse
|
12
|
Abstract
The ability of computational protein design (CPD) to identify protein sequences possessing desired characteristics in vast sequence spaces makes it a highly valuable tool in the protein engineering toolbox. CPD calculations are typically performed using a single-state design (SSD) approach in which amino-acid sequences are optimized on a single protein structure. Although SSD has been successfully applied to the design of numerous protein functions and folds, the approach can lead to the incorrect rejection of desirable sequences because of the combined use of a fixed protein backbone template and a set of rigid rotamers. This fixed backbone approximation can be addressed by using multistate design (MSD) with backbone ensembles. MSD improves the quality of predicted sequences by using ensembles approximating conformational flexibility as input templates instead of a single fixed protein structure. In this chapter, we present a step-by-step guide to the implementation and analysis of MSD calculations with backbone ensembles. Specifically, we describe ensemble generation with the PertMin protocol, execution of MSD calculations for recapitulation of Streptococcal protein G domain β1 mutant stability, and analysis of computational predictions by sequence binning. Furthermore, we provide a comparison between MSD and SSD calculation results and discuss the benefits of multistate approaches to CPD.
Collapse
|
13
|
Kumar A, Ranbhor R, Patel K, Ramakrishnan V, Durani S. Automated protein design: Landmarks and operational principles. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2016; 125:24-35. [PMID: 27979438 DOI: 10.1016/j.pbiomolbio.2016.12.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 12/06/2016] [Indexed: 11/25/2022]
Abstract
Protein design has an eventful history spanning over three decades, with handful of success stories reported, and numerous failures not reported. Design practices have benefited tremendously from improvements in computer hardware and advances in scientific algorithms. Though protein folding problem still remains unsolved, the possibility of having multiple sequence solutions for a single fold makes protein design a more tractable problem than protein folding. One of the most significant advancement in this area is the implementation of automated design algorithms on pre-defined templates or completely new folds, optimized through deterministic and heuristic search algorithms. This progress report provides a succinct presentation of important landmarks in automated design attempts, followed by brief account of operational principles in automated design methods.
Collapse
Affiliation(s)
- Anil Kumar
- Department of Chemistry, University of Toronto, ON, M5S3H6, Canada.
| | | | | | - Vibin Ramakrishnan
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, 781039, India.
| | - Susheel Durani
- Department of Chemistry, Indian Institute of Technology, Bombay, 400076, India
| |
Collapse
|
14
|
Druart K, Bigot J, Audit E, Simonson T. A Hybrid Monte Carlo Scheme for Multibackbone Protein Design. J Chem Theory Comput 2016; 12:6035-6048. [DOI: 10.1021/acs.jctc.6b00421] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Karen Druart
- Laboratoire
de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Julien Bigot
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Edouard Audit
- Maison
de la Simulation, CEA, CNRS, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, 91191 Gif-sur-Yvette, France
| | - Thomas Simonson
- Laboratoire
de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
15
|
Mignon D, Simonson T. Comparing three stochastic search algorithms for computational protein design: Monte Carlo, replica exchange Monte Carlo, and a multistart, steepest-descent heuristic. J Comput Chem 2016; 37:1781-93. [PMID: 27197555 DOI: 10.1002/jcc.24393] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Revised: 02/26/2016] [Accepted: 03/27/2016] [Indexed: 01/11/2023]
Abstract
Computational protein design depends on an energy function and an algorithm to search the sequence/conformation space. We compare three stochastic search algorithms: a heuristic, Monte Carlo (MC), and a Replica Exchange Monte Carlo method (REMC). The heuristic performs a steepest-descent minimization starting from thousands of random starting points. The methods are applied to nine test proteins from three structural families, with a fixed backbone structure, a molecular mechanics energy function, and with 1, 5, 10, 20, 30, or all amino acids allowed to mutate. Results are compared to an exact, "Cost Function Network" method that identifies the global minimum energy conformation (GMEC) in favorable cases. The designed sequences accurately reproduce experimental sequences in the hydrophobic core. The heuristic and REMC agree closely and reproduce the GMEC when it is known, with a few exceptions. Plain MC performs well for most cases, occasionally departing from the GMEC by 3-4 kcal/mol. With REMC, the diversity of the sequences sampled agrees with exact enumeration where the latter is possible: up to 2 kcal/mol above the GMEC. Beyond, room temperature replicas sample sequences up to 10 kcal/mol above the GMEC, providing thermal averages and a solution to the inverse protein folding problem. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- David Mignon
- Laboratoire De Biochimie (UMR CNRS 7654), Department Of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire De Biochimie (UMR CNRS 7654), Department Of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
16
|
Stanton CL, Houk KN. Benchmarking pKa Prediction Methods for Residues in Proteins. J Chem Theory Comput 2015; 4:951-66. [PMID: 26621236 DOI: 10.1021/ct8000014] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Methods for estimation of pKa values of residues in proteins were tested on a set of benchmark proteins with experimentally known pKa values. The benchmark set includes 80 different residues (20 each for Asp, Glu, Lys, and His), half of which consists of significantly variant cases (ΔpKa ≥ 1 pKa unit from the amino acid in solution). The method introduced by Case and co-workers [J. Am. Chem. Soc. 2004, 126, 4167-4180], referred to as the molecular dynamics/generalized-Born/thermodynamic integration (MD/GB/TI) technique, gives a root-mean-square deviation (rmsd) of 1.4 pKa units on the benchmark set. The use of explicit waters in the immediate region surrounding the residue was shown to generally reduce high errors for this method. Longer simulation time was also shown to increase the accuracy of this method. The empirical approach developed by Jensen and co-workers [Proteins 2005, 61, 704-721], PROPKA, also gives an overall rmsd of 1.4 pKa units and is more or less accurate based on residue type-the method does very well for Lys and Glu, but less so for Asp and His. Likewise, the absolute deviation is quite similar for the two methods-5.2 for PROPKA and 5.1 for MD/GB/TI. A comparison of these results with several prediction methods from the literature is presented. The error in pKa prediction is analyzed as a function of variation of the pKa from that in water and the solvent accessible surface area (SASA) of the residue. A case study of the catalytic lysine residue in 2-deoxyribose-5-phosphate aldolase (DERA) is also presented.
Collapse
Affiliation(s)
- Courtney L Stanton
- Department of Chemistry and Biochemistry, University of California Los Angeles, 607 Charles E. Young Drive East, Los Angeles, California 90095
| | - Kendall N Houk
- Department of Chemistry and Biochemistry, University of California Los Angeles, 607 Charles E. Young Drive East, Los Angeles, California 90095
| |
Collapse
|
17
|
Prediction of Stable Globular Proteins Using Negative Design with Non-native Backbone Ensembles. Structure 2015; 23:2011-21. [DOI: 10.1016/j.str.2015.07.021] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2015] [Revised: 07/26/2015] [Accepted: 07/29/2015] [Indexed: 11/21/2022]
|
18
|
Druart K, Palmai Z, Omarjee E, Simonson T. Protein:Ligand binding free energies: A stringent test for computational protein design. J Comput Chem 2015; 37:404-15. [PMID: 26503829 DOI: 10.1002/jcc.24230] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Revised: 10/01/2015] [Accepted: 10/02/2015] [Indexed: 01/29/2023]
Abstract
A computational protein design method is extended to allow Monte Carlo simulations where two ligands are titrated into a protein binding pocket, yielding binding free energy differences. These provide a stringent test of the physical model, including the energy surface and sidechain rotamer definition. As a test, we consider tyrosyl-tRNA synthetase (TyrRS), which has been extensively redesigned experimentally. We consider its specificity for its substrate l-tyrosine (l-Tyr), compared to the analogs d-Tyr, p-acetyl-, and p-azido-phenylalanine (ac-Phe, az-Phe). We simulate l- and d-Tyr binding to TyrRS and six mutants, and compare the structures and binding free energies to a more rigorous "MD/GBSA" procedure: molecular dynamics with explicit solvent for structures and a Generalized Born + Surface Area model for binding free energies. Next, we consider l-Tyr, ac- and az-Phe binding to six other TyrRS variants. The titration results are sensitive to the precise rotamer definition, which involves a short energy minimization for each sidechain pair to help relax bad contacts induced by the discrete rotamer set. However, when designed mutant structures are rescored with a standard GBSA energy model, results agree well with the more rigorous MD/GBSA. As a third test, we redesign three amino acid positions in the substrate coordination sphere, with either l-Tyr or d-Tyr as the ligand. For two, we obtain good agreement with experiment, recovering the wildtype residue when l-Tyr is the ligand and a d-Tyr specific mutant when d-Tyr is the ligand. For the third, we recover His with either ligand, instead of wildtype Gln.
Collapse
Affiliation(s)
- Karen Druart
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Zoltan Palmai
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Eyaz Omarjee
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire De Biochimie (UMR CNRS 7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
19
|
Davey JA, Chica RA. Optimization of rotamers prior to template minimization improves stability predictions made by computational protein design. Protein Sci 2015; 24:545-60. [PMID: 25492709 DOI: 10.1002/pro.2618] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 12/04/2014] [Indexed: 11/07/2022]
Abstract
Computational protein design (CPD) predictions are highly dependent on the structure of the input template used. However, it is unclear how small differences in template geometry translate to large differences in stability prediction accuracy. Herein, we explored how structural changes to the input template affect the outcome of stability predictions by CPD. To do this, we prepared alternate templates by Rotamer Optimization followed by energy Minimization (ROM) and used them to recapitulate the stability of 84 protein G domain β1 mutant sequences. In the ROM process, side-chain rotamers for wild-type (WT) or mutant sequences are optimized on crystal or nuclear magnetic resonance (NMR) structures prior to template minimization, resulting in alternate structures termed ROM templates. We show that use of ROM templates prepared from sequences known to be stable results predominantly in improved prediction accuracy compared to using the minimized crystal or NMR structures. Conversely, ROM templates prepared from sequences that are less stable than the WT reduce prediction accuracy by increasing the number of false positives. These observed changes in prediction outcomes are attributed to differences in side-chain contacts made by rotamers in ROM templates. Finally, we show that ROM templates prepared from sequences that are unfolded or that adopt a nonnative fold result in the selective enrichment of sequences that are also unfolded or that adopt a nonnative fold, respectively. Our results demonstrate the existence of a rotamer bias caused by the input template that can be harnessed to skew predictions toward sequences displaying desired characteristics.
Collapse
Affiliation(s)
- James A Davey
- Department of Chemistry, University of Ottawa, Ottawa, Ontario, Canada, K1N 6N5
| | | |
Collapse
|
20
|
Lanouette S, Davey JA, Elisma F, Ning Z, Figeys D, Chica RA, Couture JF. Discovery of substrates for a SET domain lysine methyltransferase predicted by multistate computational protein design. Structure 2014; 23:206-215. [PMID: 25533488 DOI: 10.1016/j.str.2014.11.004] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 11/04/2014] [Accepted: 11/05/2014] [Indexed: 01/01/2023]
Abstract
Characterization of lysine methylation has proven challenging despite its importance in biological processes such as gene transcription, protein turnover, and cytoskeletal organization. In contrast to other key posttranslational modifications, current proteomics techniques have thus far shown limited success at characterizing methyl-lysine residues across the cellular landscape. To complement current biochemical characterization methods, we developed a multistate computational protein design procedure to probe the substrate specificity of the protein lysine methyltransferase SMYD2. Modeling of substrate-bound SMYD2 identified residues important for substrate recognition and predicted amino acids necessary for methylation. Peptide- and protein- based substrate libraries confirmed that SMYD2 activity is dictated by the motif [LFM]-1-K(∗)-[AFYMSHRK]+1-[LYK]+2 around the target lysine K(∗). Comprehensive motif-based searches and mutational analysis further established four additional substrates of SMYD2. Our methodology paves the way to systematically predict and validate posttranslational modification sites while simultaneously pairing them with their associated enzymes.
Collapse
Affiliation(s)
- Sylvain Lanouette
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON K1H 8M5, Canada
| | - James A Davey
- Department of Chemistry, University of Ottawa, Ottawa, ON, K1N 6N5, Canada; Centre for Catalysis Research and Innovation, University of Ottawa, Ottawa, ON, K1N 6N5, Canada
| | - Fred Elisma
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON K1H 8M5, Canada
| | - Zhibin Ning
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON K1H 8M5, Canada
| | - Daniel Figeys
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON K1H 8M5, Canada; Department of Chemistry, University of Ottawa, Ottawa, ON, K1N 6N5, Canada
| | - Roberto A Chica
- Department of Chemistry, University of Ottawa, Ottawa, ON, K1N 6N5, Canada; Centre for Catalysis Research and Innovation, University of Ottawa, Ottawa, ON, K1N 6N5, Canada.
| | - Jean-François Couture
- Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON K1H 8M5, Canada; Centre for Catalysis Research and Innovation, University of Ottawa, Ottawa, ON, K1N 6N5, Canada.
| |
Collapse
|
21
|
Gaillard T, Simonson T. Pairwise decomposition of an MMGBSA energy function for computational protein design. J Comput Chem 2014; 35:1371-87. [PMID: 24854675 DOI: 10.1002/jcc.23637] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 04/14/2014] [Accepted: 05/01/2014] [Indexed: 02/02/2023]
Abstract
Computational protein design (CPD) aims at predicting new proteins or modifying existing ones. The computational challenge is huge as it requires exploring an enormous sequence and conformation space. The difficulty can be reduced by considering a fixed backbone and a discrete set of sidechain conformations. Another common strategy consists in precalculating a pairwise energy matrix, from which the energy of any sequence/conformation can be quickly obtained. In this work, we examine the pairwise decomposition of protein MMGBSA energy functions from a general theoretical perspective, and an implementation proposed earlier for CPD. It includes a Generalized Born term, whose many-body character is overcome using an effective dielectric environment, and a Surface Area term, for which we present an improved pairwise decomposition. A detailed evaluation of the error introduced by the decomposition on the different energy components is performed. We show that the error remains reasonable, compared to other uncertainties.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
22
|
Li J, Mach P, Koehl P. Measuring the shapes of macromolecules - and why it matters. Comput Struct Biotechnol J 2013; 8:e201309001. [PMID: 24688748 PMCID: PMC3962087 DOI: 10.5936/csbj.201309001] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Revised: 11/22/2013] [Accepted: 11/22/2013] [Indexed: 11/22/2022] Open
Abstract
The molecular basis of life rests on the activity of biological macromolecules, mostly nucleic acids and proteins. A perhaps surprising finding that crystallized over the last handful of decades is that geometric reasoning plays a major role in our attempt to understand these activities. In this paper, we address this connection between geometry and biology, focusing on methods for measuring and characterizing the shapes of macromolecules. We briefly review existing numerical and analytical approaches that solve these problems. We cover in more details our own work in this field, focusing on the alpha shape theory as it provides a unifying mathematical framework that enable the analytical calculations of the surface area and volume of a macromolecule represented as a union of balls, the detection of pockets and cavities in the molecule, and the quantification of contacts between the atomic balls. We have shown that each of these quantities can be related to physical properties of the molecule under study and ultimately provides insight on its activity. We conclude with a brief description of new challenges for the alpha shape theory in modern structural biology.
Collapse
Affiliation(s)
- Jie Li
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, United States
| | - Paul Mach
- Graduate Group of Applied Mathematics, University of California, Davis, 1, Shields Ave, Davis, CA, 95616, United States
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, 1, Shields Ave, Davis, CA, 95616, United States
| |
Collapse
|
23
|
Davey JA, Chica RA. Improving the accuracy of protein stability predictions with multistate design using a variety of backbone ensembles. Proteins 2013; 82:771-84. [DOI: 10.1002/prot.24457] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 10/07/2013] [Accepted: 10/21/2013] [Indexed: 11/11/2022]
Affiliation(s)
- James A. Davey
- Department of Chemistry; University of Ottawa; Ottawa Ontario K1N 6N5 Canada
| | - Roberto A. Chica
- Department of Chemistry; University of Ottawa; Ottawa Ontario K1N 6N5 Canada
| |
Collapse
|
24
|
Polydorides S, Simonson T. Monte Carlo simulations of proteins at constant pH with generalized Born solvent, flexible sidechains, and an effective dielectric boundary. J Comput Chem 2013; 34:2742-56. [PMID: 24122878 DOI: 10.1002/jcc.23450] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2013] [Revised: 09/04/2013] [Accepted: 09/08/2013] [Indexed: 12/11/2022]
Abstract
Titratable residues determine the acid/base behavior of proteins, strongly influencing their function; in addition, proton binding is a valuable reporter on electrostatic interactions. We describe a method for pK(a) calculations, using constant-pH Monte Carlo (MC) simulations to explore the space of sidechain conformations and protonation states, with an efficient and accurate generalized Born model (GB) for the solvent effects. To overcome the many-body dependency of the GB model, we use a "Native Environment" approximation, whose accuracy is shown to be good. It allows the precalculation and storage of interactions between all sidechain pairs, a strategy borrowed from computational protein design, which makes the MC simulations themselves very fast. The method is tested for 12 proteins and 167 titratable sidechains. It gives an rms error of 1.1 pH units, similar to the trivial "Null" model. The only adjustable parameter is the protein dielectric constant. The best accuracy is achieved for values between 4 and 8, a range that is physically plausible for a protein interior. For sidechains with large pKa shifts, ≥2, the rms error is 1.6, compared to 2.5 with the Null model and 1.5 with the empirical PROPKA method.
Collapse
Affiliation(s)
- Savvas Polydorides
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
25
|
Huang YM, Bystroff C. Expanded explorations into the optimization of an energy function for protein design. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1176-1187. [PMID: 24384706 PMCID: PMC3919130 DOI: 10.1109/tcbb.2013.113] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Nature possesses a secret formula for the energy as a function of the structure of a protein. In protein design, approximations are made to both the structural representation of the molecule and to the form of the energy equation, such that the existence of a general energy function for proteins is by no means guaranteed. Here, we present new insights toward the application of machine learning to the problem of finding a general energy function for protein design. Machine learning requires the definition of an objective function, which carries with it the implied definition of success in protein design. We explored four functions, consisting of two functional forms, each with two criteria for success. Optimization was carried out by a Monte Carlo search through the space of all variable parameters. Cross-validation of the optimized energy function against a test set gave significantly different results depending on the choice of objective function, pointing to relative correctness of the built-in assumptions. Novel energy cross terms correct for the observed nonadditivity of energy terms and an imbalance in the distribution of predicted amino acids. This paper expands on the work presented at the 2012 ACM-BCB.
Collapse
|
26
|
Simonson T, Gaillard T, Mignon D, Schmidt am Busch M, Lopes A, Amara N, Polydorides S, Sedano A, Druart K, Archontis G. Computational protein design: the Proteus software and selected applications. J Comput Chem 2013; 34:2472-84. [PMID: 24037756 DOI: 10.1002/jcc.23418] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 07/08/2013] [Accepted: 07/28/2013] [Indexed: 12/13/2022]
Abstract
We describe an automated procedure for protein design, implemented in a flexible software package, called Proteus. System setup and calculation of an energy matrix are done with the XPLOR modeling program and its sophisticated command language, supporting several force fields and solvent models. A second program provides algorithms to search sequence space. It allows a decomposition of the system into groups, which can be combined in different ways in the energy function, for both positive and negative design. The whole procedure can be controlled by editing 2-4 scripts. Two applications consider the tyrosyl-tRNA synthetase enzyme and its successful redesign to bind both O-methyl-tyrosine and D-tyrosine. For the latter, we present Monte Carlo simulations where the D-tyrosine concentration is gradually increased, displacing L-tyrosine from the binding pocket and yielding the binding free energy difference, in good agreement with experiment. Complete redesign of the Crk SH3 domain is presented. The top 10000 sequences are all assigned to the correct fold by the SUPERFAMILY library of Hidden Markov Models. Finally, we report the acid/base behavior of the SNase protein. Sidechain protonation is treated as a form of mutation; it is then straightforward to perform constant-pH Monte Carlo simulations, which yield good agreement with experiment. Overall, the software can be used for a wide range of application, producing not only native-like sequences but also thermodynamic properties with errors that appear comparable to other current software packages.
Collapse
Affiliation(s)
- Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, 91128, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Ding B, Li N, Wang W. Characterizing Binding of Small Molecules. II. Evaluating the Potency of Small Molecules to Combat Resistance Based on Docking Structures. J Chem Inf Model 2013; 53:1213-22. [DOI: 10.1021/ci400011c] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Bo Ding
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| | - Nan Li
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| | - Wei Wang
- Department
of Chemistry and Biochemistry, ‡Department of Cellular and Molecular Medicine, UCSD, La Jolla, California 92093-0359,
United States
| |
Collapse
|
28
|
Sivey JD, Howell SC, Bean DJ, McCurry DL, Mitch WA, Wilson CJ. Role of lysine during protein modification by HOCl and HOBr: halogen-transfer agent or sacrificial antioxidant? Biochemistry 2013; 52:1260-71. [PMID: 23327477 DOI: 10.1021/bi301523s] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Although protein degradation by neutrophil-derived hypochlorous acid (HOCl) and eosinophil-derived hypobromous acid (HOBr) can contribute to the inactivation of pathogens, collateral damage to host proteins can also occur and has been associated with inflammatory diseases ranging from arthritis to atherosclerosis. Though previous research suggested halotyrosines as biomarkers of protein damage and lysine as a mediator of the transfer of a halogen to tyrosine, these reactions within whole proteins are poorly understood. Herein, reactions of HOCl and HOBr with three well-characterized proteins [adenylate kinase (ADK), ribose binding protein, and bovine serum albumin] were characterized. Three assessments of oxidative modifications were evaluated for each of the proteins: (1) covalent modification of electron-rich amino acids (assessed via liquid chromatography and tandem mass spectrometry), (2) attenuation of secondary structure (via circular dichroism), and (3) fragmentation of protein backbones (via sodium dodecyl sulfate-polyacrylamide gel electrophoresis). In addition to forming halotyrosines, HOCl and HOBr converted lysine into lysine nitrile (2-amino-5-cyanopentanoic acid), a relatively stable and largely overlooked product, in yields of up to 80%. At uniform oxidant levels, fragmentation and loss of secondary structure correlated with protein size. To further examine the role of lysine, a lysine-free ADK variant was rationally designed. The absence of lysine increased yields of chlorinated tyrosines and decreased yields of brominated tyrosines following treatments with HOCl and HOBr, respectively, without influencing the susceptibility of ADK to HOX-mediated losses of secondary structure. These findings suggest that lysine serves predominantly as a sacrificial antioxidant (via formation of lysine nitrile) toward HOCl and as a halogen-transfer mediator [via reactions involving ε-N-(di)haloamines] with HOBr.
Collapse
Affiliation(s)
- John D Sivey
- Department of Chemical and Environmental Engineering, Yale University, New Haven, CT 06520-8286, USA
| | | | | | | | | | | |
Collapse
|
29
|
Ding B, Wang J, Li N, Wang W. Characterization of small molecule binding. I. Accurate identification of strong inhibitors in virtual screening. J Chem Inf Model 2013; 53:114-22. [PMID: 23259763 DOI: 10.1021/ci300508m] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Accurately ranking docking poses remains a great challenge in computer-aided drug design. In this study, we present an integrated approach called MIEC-SVM that combines structure modeling and statistical learning to characterize protein-ligand binding based on the complex structure generated from docking. Using the HIV-1 protease as a model system, we showed that MIEC-SVM can successfully rank the docking poses and consistently outperformed the state-of-art scoring functions when the true positives only account for 1% or 0.5% of all the compounds under consideration. More excitingly, we found that MIEC-SVM can achieve a significant enrichment in virtual screening even when trained on a set of known inhibitors as small as 50, especially when enhanced by a model average approach. Given these features of MIEC-SVM, we believe it provides a powerful tool for searching for and designing new drugs.
Collapse
Affiliation(s)
- Bo Ding
- Department of Chemistry and Biochemistry, UCSD, La Jolla, California 92093-0359, USA
| | | | | | | |
Collapse
|
30
|
Huang X, Yang J, Zhu Y. A solvated ligand rotamer approach and its application in computational protein design. J Mol Model 2012. [PMID: 23192355 DOI: 10.1007/s00894-012-1695-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The structure-based design of protein-ligand interfaces with respect to different small molecules is of great significance in the discovery of functional proteins. By statistical analysis of a set of protein-ligand complex structures, it was determined that water-mediated hydrogen bonding at the protein-ligand interface plays a crucial role in governing the binding between the protein and the ligand. Based on the novel statistic results, a solvated ligand rotamer approach was developed to explicitly describe the key water molecules at the protein-ligand interface and a water-mediated hydrogen bonding model was applied in the computational protein design context to complement the continuum solvent model. The solvated ligand rotamer approach produces only one additional solvated rotamer for each rotamer in the ligand rotamer library and does not change the number of side-chain rotamers at each protein design site. This has greatly reduced the total combinatorial number in sequence selection for protein design, and the accuracy of the model was confirmed by two tests. For the water placement test, 61% of the crystal water molecules were predicted correctly in five protein-ligand complex structures. For the sequence recapitulation test, 44.7% of the amino acid identities were recovered using the solvated ligand rotamer approach and the water-mediated hydrogen bonding model, while only 30.4% were recovered when the explicitly bound waters were removed. These results indicated that the developed solvated ligand rotamer approach is promising for functional protein design targeting novel protein-ligand interactions.
Collapse
Affiliation(s)
- Xiaoqiang Huang
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, China
| | | | | |
Collapse
|
31
|
An analytical method for computing atomic contact areas in biomolecules. J Comput Chem 2012; 34:105-20. [DOI: 10.1002/jcc.23111] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2012] [Revised: 08/07/2012] [Indexed: 11/07/2022]
|
32
|
Designing electrostatic interactions in biological systems via charge optimization or combinatorial approaches: insights and challenges with a continuum electrostatic framework. Theor Chem Acc 2012. [DOI: 10.1007/s00214-012-1252-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
33
|
Fernández A. Epistructural tension promotes protein associations. PHYSICAL REVIEW LETTERS 2012; 108:188102. [PMID: 22681121 DOI: 10.1103/physrevlett.108.188102] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Indexed: 06/01/2023]
Abstract
Epistructural tension is the reversible work per unit area required to span the aqueous interface of a soluble protein structure. The parameter accounts for the free-energy cost of imperfect hydration, involving water molecules with a shortage of hydrogen-bonding partnerships relative to bulk levels. The binding hot spots along protein-protein interfaces are identified with residues that contribute significantly to the epistructural tension in the free subunits. Upon association, such residues either displace or become deprived of low-coordination vicinal water molecules.
Collapse
Affiliation(s)
- Ariel Fernández
- Instituto Argentino de Matemática, CONICET (National Research Council), Saavedra 15, 1083 Buenos Aires, Argentina.
| |
Collapse
|
34
|
Fernández A. Communication: Epistructural thermodynamics of soluble proteins. J Chem Phys 2012; 136:091101. [DOI: 10.1063/1.3691890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
35
|
Koehl P, Orland H, Delarue M. Adapting Poisson-Boltzmann to the self-consistent mean field theory: application to protein side-chain modeling. J Chem Phys 2011; 135:055104. [PMID: 21823735 DOI: 10.1063/1.3621831] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present an extension of the self-consistent mean field theory for protein side-chain modeling in which solvation effects are included based on the Poisson-Boltzmann (PB) theory. In this approach, the protein is represented with multiple copies of its side chains. Each copy is assigned a weight that is refined iteratively based on the mean field energy generated by the rest of the protein, until self-consistency is reached. At each cycle, the variational free energy of the multi-copy system is computed; this free energy includes the internal energy of the protein that accounts for vdW and electrostatics interactions and a solvation free energy term that is computed using the PB equation. The method converges in only a few cycles and takes only minutes of central processing unit time on a commodity personal computer. The predicted conformation of each residue is then set to be its copy with the highest weight after convergence. We have tested this method on a database of hundred highly refined NMR structures to circumvent the problems of crystal packing inherent to x-ray structures. The use of the PB-derived solvation free energy significantly improves prediction accuracy for surface side chains. For example, the prediction accuracies for χ(1) for surface cysteine, serine, and threonine residues improve from 68%, 35%, and 43% to 80%, 53%, and 57%, respectively. A comparison with other side-chain prediction algorithms demonstrates that our approach is consistently better in predicting the conformations of exposed side chains.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Biological Sciences, National University of Singapore, Singapore.
| | | | | |
Collapse
|
36
|
Mach P, Koehl P. Geometric measures of large biomolecules: surface, volume, and pockets. J Comput Chem 2011; 32:3023-38. [PMID: 21823134 PMCID: PMC3188685 DOI: 10.1002/jcc.21884] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2011] [Accepted: 06/19/2011] [Indexed: 11/09/2022]
Abstract
Geometry plays a major role in our attempts to understand the activity of large molecules. For example, surface area and volume are used to quantify the interactions between these molecules and the water surrounding them in implicit solvent models. In addition, the detection of pockets serves as a starting point for predictive studies of biomolecule-ligand interactions. The alpha shape theory provides an exact and robust method for computing these geometric measures. Several implementations of this theory are currently available. We show however that these implementations fail on very large macromolecular systems. We show that these difficulties are not theoretical; rather, they are related to the architecture of current computers that rely on the use of cache memory to speed up calculation. By rewriting the algorithms that implement the different steps of the alpha shape theory such that we enforce locality, we show that we can remediate these cache problems; the corresponding code, UnionBall has an apparent O(n) behavior over a large range of values of n (up to tens of millions), where n is the number of atoms. As an example, it takes 136 sec with UnionBall to compute the contribution of each atom to the surface area and volume of a viral capsid with more than five million atoms on a commodity PC. UnionBall includes functions for computing analytically the surface area and volume of the intersection of two, three and four spheres that are fully detailed in an appendix. UnionBall is available as an OpenSource software.
Collapse
Affiliation(s)
- Paul Mach
- Graduate Group in Applied Mathematics, University of California, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616
| |
Collapse
|
37
|
Cai Q, Ye X, Wang J, Luo R. On-the-fly Numerical Surface Integration for Finite-Difference Poisson-Boltzmann Methods. J Chem Theory Comput 2011; 7:3608-3619. [PMID: 24772042 PMCID: PMC3998210 DOI: 10.1021/ct200389p] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Most implicit solvation models require the definition of a molecular surface as the interface that separates the solute in atomic detail from the solvent approximated as a continuous medium. Commonly used surface definitions include the solvent accessible surface (SAS), the solvent excluded surface (SES), and the van der Waals surface. In this study, we present an efficient numerical algorithm to compute the SES and SAS areas to facilitate the applications of finite-difference Poisson-Boltzmann methods in biomolecular simulations. Different from previous numerical approaches, our algorithm is physics-inspired and intimately coupled to the finite-difference Poisson-Boltzmann methods to fully take advantage of its existing data structures. Our analysis shows that the algorithm can achieve very good agreement with the analytical method in the calculation of the SES and SAS areas. Specifically, in our comprehensive test of 1,555 molecules, the average unsigned relative error is 0.27% in the SES area calculations and 1.05% in the SAS area calculations at the grid spacing of 1/2Å. In addition, a systematic correction analysis can be used to improve the accuracy for the coarse-grid SES area calculations, with the average unsigned relative error in the SES areas reduced to 0.13%. These validation studies indicate that the proposed algorithm can be applied to biomolecules over a broad range of sizes and structures. Finally, the numerical algorithm can also be adapted to evaluate the surface integral of either a vector field or a scalar field defined on the molecular surface for additional solvation energetics and force calculations.
Collapse
Affiliation(s)
- Qin Cai
- Department of Biomedical Engineering, University of California, Irvine, California
- Department of Molecular Biology and Biochemistry, University of California, Irvine, California
| | - Xiang Ye
- Department of Molecular Biology and Biochemistry, University of California, Irvine, California
- Department of Physics, Shanghai Normal University, Shanghai, China
| | - Jun Wang
- Department of Molecular Biology and Biochemistry, University of California, Irvine, California
| | - Ray Luo
- Department of Biomedical Engineering, University of California, Irvine, California
- Department of Molecular Biology and Biochemistry, University of California, Irvine, California
| |
Collapse
|
38
|
Klenin KV, Tristram F, Strunk T, Wenzel W. Derivatives of molecular surface area and volume: Simple and exact analytical formulas. J Comput Chem 2011; 32:2647-53. [DOI: 10.1002/jcc.21844] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2011] [Revised: 04/26/2011] [Accepted: 04/26/2011] [Indexed: 11/06/2022]
|
39
|
Ng AH, Snow CD. Polarizable protein packing. J Comput Chem 2011; 32:1334-44. [DOI: 10.1002/jcc.21714] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Revised: 10/12/2010] [Accepted: 10/17/2010] [Indexed: 11/11/2022]
|
40
|
Abstract
Molecular shape is essential in understanding molecular function, and understanding molecular shape requires definition of molecular boundaries. In this paper, we review the conceptual evolution of three molecular boundary types: the van der Waals surface, the Connolly surface, and the Lee-Richards (accessible) surface. Then, we point out the confusion among the names of these surfaces existing in the literature. Since it is desirable to have a well-defined terminology in a discipline, we propose the standard names of the three molecular boundary types and their corresponding volumes in order to maximize consistency among researchers, respect the first individual who defined or computed a surface type, and promote collaboration between biologists and geometers.
Collapse
Affiliation(s)
- Deok-Soo Kim
- Department of Industrial Engineering, Hanyang University, 17 Haengdang-dong, Seongdong-gu, Seoul 133-791, South Korea.
| | | | | |
Collapse
|
41
|
Schulz E, Frechero M, Appignanesi G, Fernández A. Sub-nanoscale surface ruggedness provides a water-tight seal for exposed regions in soluble protein structure. PLoS One 2010; 5. [PMID: 20862253 PMCID: PMC2941462 DOI: 10.1371/journal.pone.0012844] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2010] [Accepted: 08/26/2010] [Indexed: 12/03/2022] Open
Abstract
Soluble proteins must maintain backbone hydrogen bonds (BHBs) water-tight to ensure structural integrity. This protection is often achieved by burying the BHBs or wrapping them through intermolecular associations. On the other hand, water has low coordination resilience, with loss of hydrogen-bonding partnerships carrying significant thermodynamic cost. Thus, a core problem in structural biology is whether natural design actually exploits the water coordination stiffness to seal the backbone in regions that are exposed to the solvent. This work explores the molecular design features that make this type of seal operative, focusing on the side-chain arrangements that shield the protein backbone. We show that an efficient sealing is achieved by adapting the sub-nanoscale surface topography to the stringency of water coordination: an exposed BHB may be kept dry if the local concave curvature is small enough to impede formation of the coordination shell of a penetrating water molecule. Examination of an exhaustive database of uncomplexed proteins reveals that exposed BHBs invariably occur within such sub-nanoscale cavities in native folds, while this level of local ruggedness is absent in other regions. By contrast, BHB exposure in misfolded proteins occurs with larger local curvature promoting backbone hydration and consequently, structure disruption. These findings unravel physical constraints fitting a spatially dependent least-action for water coordination, introduce a molecular design concept, and herald the advent of water-tight peptide-based materials with sufficient backbone exposure to remain flexible.
Collapse
Affiliation(s)
- Erica Schulz
- Sección Fisicoquímica, Instituto de Química del Sur, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Científicas y Técnicas and Departamento de Química, Universidad Nacional del Sur, Bahía Blanca, Argentina
| | - Marisa Frechero
- Sección Fisicoquímica, Instituto de Química del Sur, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Científicas y Técnicas and Departamento de Química, Universidad Nacional del Sur, Bahía Blanca, Argentina
| | - Gustavo Appignanesi
- Sección Fisicoquímica, Instituto de Química del Sur, Universidad Nacional del Sur, Consejo Nacional de Investigaciones Científicas y Técnicas and Departamento de Química, Universidad Nacional del Sur, Bahía Blanca, Argentina
| | - Ariel Fernández
- Department of Bioengineering, Rice University, Houston, Texas, United States of America
- Department of Computer Science, The University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
42
|
Abstract
The interfacial tension of biological water, a promoter of biomolecular interactions, is difficult to determine because of the inhomogeneous nanoscale patterns that make up the surface of biomolecules. These patterns modulate solubility in peculiar manners, enabling specific associations while preventing phase separation or precipitation. In this work, we derive de novo the nanoscale thermodynamics associated with the creation of biological interfaces and validate the results against experimentally identified complex interfaces. Interfacial tension is shown to be generated by hot spots of red-shifted dielectric relaxation. The most common spots involve hindered polar hydration. Taken collectively, these patches contribute more to the interfacial tension than the better known non-polar cavities with nanometre curvature, where our results agree with the established length-scale dependence of hydrophobicity. The thermodynamic results are validated by showing that the inferred patches of interfacial tension actually promote biomolecular associations.
Collapse
Affiliation(s)
- Ariel Fernández
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
- Department of Computer Science, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
43
|
Lopes A, Schmidt Am Busch M, Simonson T. Computational design of protein-ligand binding: modifying the specificity of asparaginyl-tRNA synthetase. J Comput Chem 2010; 31:1273-86. [PMID: 19862811 DOI: 10.1002/jcc.21414] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
A method for computational design of protein-ligand interactions is implemented and tested on the asparaginyl- and aspartyl-tRNA synthetase enzymes (AsnRS, AspRS). The substrate specificity of these enzymes is crucial for the accurate translation of the genetic code. The method relies on a molecular mechanics energy function and a simple, continuum electrostatic, implicit solvent model. As test calculations, we first compute AspRS-substrate binding free energy changes due to nine point mutations, for which experimental data are available; we also perform large-scale redesign of the entire active site of each enzyme (40 amino acids) and compare to experimental sequences. We then apply the method to engineer an increased binding of aspartyl-adenylate (AspAMP) into AsnRS. Mutants are obtained using several directed evolution protocols, where four or five amino acid positions in the active site are randomized. Promising mutants are subjected to molecular dynamics simulations; Poisson-Boltzmann calculations provide an estimate of the corresponding, AspAMP, binding free energy changes, relative to the native AsnRS. Several of the mutants are predicted to have an inverted binding specificity, preferring to bind AspAMP rather than the natural substrate, AsnAMP. The computed binding affinities are significantly weaker than the native, AsnRS:AsnAMP affinity, and in most cases, the active site structure is significantly changed, compared to the native complex. This almost certainly precludes catalytic activity. One of the designed sequences has a higher affinity and more native-like structure and may represent a valid candidate for Asp activity.
Collapse
Affiliation(s)
- Anne Lopes
- Laboratoire de Biochimie, Department of Biology, UMR CNRS 7654, Ecole Polytechnique, 91128 Palaiseau, France
| | | | | |
Collapse
|
44
|
Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010; 5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. METHODOLOGY/PRINCIPAL FINDINGS WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. CONCLUSIONS/SIGNIFICANCE For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Audrey Sedano
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
45
|
OptForce: an optimization procedure for identifying all genetic manipulations leading to targeted overproductions. PLoS Comput Biol 2010; 6:e1000744. [PMID: 20419153 PMCID: PMC2855329 DOI: 10.1371/journal.pcbi.1000744] [Citation(s) in RCA: 279] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Accepted: 03/16/2010] [Indexed: 12/01/2022] Open
Abstract
Computational procedures for predicting metabolic interventions leading to the overproduction of biochemicals in microbial strains are widely in use. However, these methods rely on surrogate biological objectives (e.g., maximize growth rate or minimize metabolic adjustments) and do not make use of flux measurements often available for the wild-type strain. In this work, we introduce the OptForce procedure that identifies all possible engineering interventions by classifying reactions in the metabolic model depending upon whether their flux values must increase, decrease or become equal to zero to meet a pre-specified overproduction target. We hierarchically apply this classification rule for pairs, triples, quadruples, etc. of reactions. This leads to the identification of a sufficient and non-redundant set of fluxes that must change (i.e., MUST set) to meet a pre-specified overproduction target. Starting with this set we subsequently extract a minimal set of fluxes that must actively be forced through genetic manipulations (i.e., FORCE set) to ensure that all fluxes in the network are consistent with the overproduction objective. We demonstrate our OptForce framework for succinate production in Escherichia coli using the most recent in silico E. coli model, iAF1260. The method not only recapitulates existing engineering strategies but also reveals non-intuitive ones that boost succinate production by performing coordinated changes on pathways distant from the last steps of succinate synthesis. Over the past few years, there has been an unprecedented increase in the use of microorganisms for the production of biofuels, industrial chemicals and pharmaceutical precursors. In this regard, biotechnologists are confronted with the challenge to efficiently convert biomass and other renewable resources into useful biochemicals. With the advent of organism-specific mathematical models of metabolism, scientists have used computations to identify genetic modifications that maximize the yield of a desired product. In this paper, we introduce OptForce, an algorithm that identifies all possible metabolic interventions that lead to the overproduction of a biochemical of interest. Unlike existing techniques, OptForce does not rely on the maximization of a fitness function to predict metabolic fluxes. Instead, OptForce contrasts the metabolic flux patterns observed in an initial strain and a strain overproducing the chemical at the target yield. The essence of this procedure is the identification of all coordinated reaction modifications that force the network towards the overproduction target. We used OptForce to predict metabolic interventions for succinate overproduction in Escherichia coli. The results described in this paper not only uncover existing strain designs for succinate production but also elucidate new ones that can be experimentally explored.
Collapse
|
46
|
Abstract
Predictive methods for the computational design of proteins search for amino acid sequences adopting desired structures that perform specific functions. Typically, design of 'function' is formulated as engineering new and altered binding activities into proteins. Progress in the design of functional protein-protein interactions is directed toward engineering proteins to precisely control biological processes by specifically recognizing desired interaction partners while avoiding competitors. The field is aiming for strategies to harness recent advances in high-resolution computational modeling-particularly those exploiting protein conformational variability-to engineer new functions and incorporate many functional requirements simultaneously.
Collapse
Affiliation(s)
- Daniel J Mandell
- Graduate Program in Bioinformatics and Computational Biology, California Institute for Quantitative Biosciences, and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, USA
| | | |
Collapse
|
47
|
am Busch MS, Mignon D, Simonson T. Computational protein design as a tool for fold recognition. Proteins 2009; 77:139-58. [PMID: 19408297 DOI: 10.1002/prot.22426] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computationally designed protein sequences have been proposed as a basis to perform fold recognition and homology searching. To investigate this possibility, an automated procedure is used to completely redesign 24 SH3 proteins and 22 SH2 proteins. We use the experimental backbone coordinates as fixed templates in the folded state and a molecular mechanics model to compute the pairwise interaction energies between all sidechain types and conformations. Energy calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is then used to scan the sequence and conformational space for optimal solutions. We produced 200,000-450,000 sequences for each backbone template. The designed sequences ressemble moderately-distant, natural homologues of the initial templates, according to their identity scores and their similarity with respect to the Pfam sets of SH2 and SH3 domains. Standard homology detection tools document their native-like character: the Conserved Domain Database recognizes 61% (52%) of our low-energy sequences as SH3 (SH2) domains; the SUPERFAMILY, Hidden-Markov Model library recognizes 81% (84%). Conversely, position specific scoring matrices (PSSMs) derived from our designed sequences can be used to detect natural homologues in sequence databases. Within SwissProt, a set of natural SH3 PSSMs detects 772 SH3 domains, for example; our designed PSSMs detect 67% of these, plus one additional sequence and two false positives. If six amino acids involved in substrate binding (a selective pressure not accounted for in our design) are reset to their experimental types, then 77% of the experimental SH3 domains are detected. Results for the SH2 domains are similar. Several directions to improve the method further are discussed.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | | | | |
Collapse
|
48
|
Suárez M, Jaramillo A. Challenges in the computational design of proteins. J R Soc Interface 2009; 6 Suppl 4:S477-91. [PMID: 19324680 PMCID: PMC2843960 DOI: 10.1098/rsif.2008.0508.focus] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2008] [Accepted: 02/04/2009] [Indexed: 11/12/2022] Open
Abstract
Protein design has many applications not only in biotechnology but also in basic science. It uses our current knowledge in structural biology to predict, by computer simulations, an amino acid sequence that would produce a protein with targeted properties. As in other examples of synthetic biology, this approach allows the testing of many hypotheses in biology. The recent development of automated computational methods to design proteins has enabled proteins to be designed that are very different from any known ones. Moreover, some of those methods mostly rely on a physical description of atomic interactions, which allows the designed sequences not to be biased towards known proteins. In this paper, we will describe the use of energy functions in computational protein design, the use of atomic models to evaluate the free energy in the unfolded and folded states, the exploration and optimization of amino acid sequences, the problem of negative design and the design of biomolecular function. We will also consider its use together with the experimental techniques such as directed evolution. We will end by discussing the challenges ahead in computational protein design and some of their future applications.
Collapse
Affiliation(s)
- María Suárez
- Laboratoire de Biochimie, Ecole Polytechnique, CNRS, 91128 Palaiseau Cedex, France
- Epigenomics Project, Genopole, Université d'Evry Val d'Essonne-Genopole-CNRS, Tour Evry2, Etage 10, Terrasses de l'Agora, 91034 Evry Cedex, France
| | - Alfonso Jaramillo
- Laboratoire de Biochimie, Ecole Polytechnique, CNRS, 91128 Palaiseau Cedex, France
- Epigenomics Project, Genopole, Université d'Evry Val d'Essonne-Genopole-CNRS, Tour Evry2, Etage 10, Terrasses de l'Agora, 91034 Evry Cedex, France
| |
Collapse
|
49
|
Durham E, Dorr B, Woetzel N, Staritzbichler R, Meiler J. Solvent accessible surface area approximations for rapid and accurate protein structure prediction. J Mol Model 2009; 15:1093-108. [PMID: 19234730 PMCID: PMC2712621 DOI: 10.1007/s00894-009-0454-9] [Citation(s) in RCA: 220] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2008] [Accepted: 01/02/2009] [Indexed: 12/01/2022]
Abstract
The burial of hydrophobic amino acids in the protein core is a driving force in protein folding. The extent to which an amino acid interacts with the solvent and the protein core is naturally proportional to the surface area exposed to these environments. However, an accurate calculation of the solvent-accessible surface area (SASA), a geometric measure of this exposure, is numerically demanding as it is not pair-wise decomposable. Furthermore, it depends on a full-atom representation of the molecule. This manuscript introduces a series of four SASA approximations of increasing computational complexity and accuracy as well as knowledge-based environment free energy potentials based on these SASA approximations. Their ability to distinguish correctly from incorrectly folded protein models is assessed to balance speed and accuracy for protein structure prediction. We find the newly developed “Neighbor Vector” algorithm provides the most optimal balance of accurate yet rapid exposure measures.
Collapse
Affiliation(s)
- Elizabeth Durham
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, 465 21st Ave South, Nashville, TN 37232-8725, USA
| | | | | | | | | |
Collapse
|
50
|
Evaluating and optimizing computational protein design force fields using fixed composition-based negative design. Proc Natl Acad Sci U S A 2008; 105:12242-7. [PMID: 18708527 DOI: 10.1073/pnas.0805858105] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An accurate force field is essential to computational protein design and protein fold prediction studies. Proper force field tuning is problematic, however, due in part to the incomplete modeling of the unfolded state. Here, we evaluate and optimize a protein design force field by constraining the amino acid composition of the designed sequences to that of a well behaved model protein. According to the random energy model, unfolded state energies are dependent only on amino acid composition and not the specific arrangement of amino acids. Therefore, energy discrepancies between computational predictions and experimental results, for sequences of identical composition, can be directly attributed to flaws in the force field's ability to properly account for folded state sequence energies. This aspect of fixed composition design allows for force field optimization by focusing solely on the interactions in the folded state. Several rounds of fixed composition optimization of the 56-residue beta1 domain of protein G yielded force field parameters with significantly greater predictive power: Optimized sequences exhibited higher wild-type sequence identity in critical regions of the structure, and the wild-type sequence showed an improved Z-score. Experimental studies revealed a designed 24-fold mutant to be stably folded with a melting temperature similar to that of the wild-type protein. Sequence designs using engrailed homeodomain as a scaffold produced similar results, suggesting the tuned force field parameters were not specific to protein G.
Collapse
|