1
|
Risheh A, Rebel A, Nerenberg PS, Forouzesh N. Calculation of protein-ligand binding entropies using a rule-based molecular fingerprint. Biophys J 2024:S0006-3495(24)00182-6. [PMID: 38481102 DOI: 10.1016/j.bpj.2024.03.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 12/21/2023] [Accepted: 03/08/2024] [Indexed: 03/28/2024] Open
Abstract
The use of fast in silico prediction methods for protein-ligand binding free energies holds significant promise for the initial phases of drug development. Numerous traditional physics-based models (e.g., implicit solvent models), however, tend to either neglect or heavily approximate entropic contributions to binding due to their computational complexity. Consequently, such methods often yield imprecise assessments of binding strength. Machine learning models provide accurate predictions and can often outperform physics-based models. They, however, are often prone to overfitting, and the interpretation of their results can be difficult. Physics-guided machine learning models combine the consistency of physics-based models with the accuracy of modern data-driven algorithms. This work integrates physics-based model conformational entropies into a graph convolutional network. We introduce a new neural network architecture (a rule-based graph convolutional network) that generates molecular fingerprints according to predefined rules specifically optimized for binding free energy calculations. Our results on 100 small host-guest systems demonstrate significant improvements in convergence and preventing overfitting. We additionally demonstrate the transferability of our proposed hybrid model by training it on the aforementioned host-guest systems and then testing it on six unrelated protein-ligand systems. Our new model shows little difference in training set accuracy compared to a previous model but an order-of-magnitude improvement in test set accuracy. Finally, we show how the results of our hybrid model can be interpreted in a straightforward fashion.
Collapse
Affiliation(s)
- Ali Risheh
- Department of Computer Science, California State University, Los Angeles, California
| | - Alles Rebel
- Department of Computer Science, California State University, Los Angeles, California
| | - Paul S Nerenberg
- Kravis Department of Integrated Sciences, Claremont McKenna College, Claremont, California
| | - Negin Forouzesh
- Department of Computer Science, California State University, Los Angeles, California.
| |
Collapse
|
2
|
Kalayan J, Chakravorty A, Warwicker J, Henchman RH. Total free energy analysis of fully hydrated proteins. Proteins 2023; 91:74-90. [PMID: 35964252 PMCID: PMC10087023 DOI: 10.1002/prot.26411] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/04/2022] [Accepted: 08/09/2022] [Indexed: 12/15/2022]
Abstract
The total free energy of a hydrated biomolecule and its corresponding decomposition of energy and entropy provides detailed information about regions of thermodynamic stability or instability. The free energies of four hydrated globular proteins with different net charges are calculated from a molecular dynamics simulation, with the energy coming from the system Hamiltonian and entropy using multiscale cell correlation. Water is found to be most stable around anionic residues, intermediate around cationic and polar residues, and least stable near hydrophobic residues, especially when more buried, with stability displaying moderate entropy-enthalpy compensation. Conversely, anionic residues in the proteins are energetically destabilized relative to singly solvated amino acids, while trends for other residues are less clear-cut. Almost all residues lose intraresidue entropy when in the protein, enthalpy changes are negative on average but may be positive or negative, and the resulting overall stability is moderate for some proteins and negligible for others. The free energy of water around single amino acids is found to closely match existing hydrophobicity scales. Regarding the effect of secondary structure, water is slightly more stable around loops, of intermediate stability around β strands and turns, and least stable around helices. An interesting asymmetry observed is that cationic residues stabilize a residue when bonded to its N-terminal side but destabilize it when on the C-terminal side, with a weaker reversed trend for anionic residues.
Collapse
Affiliation(s)
- Jas Kalayan
- Division of Pharmacy and Optometry, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Arghya Chakravorty
- Department of Chemistry and Biophysics, University of Michigan, Ann Arbor, Michigan, USA
| | - Jim Warwicker
- Manchester Institute of Biotechnology and School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK
| | - Richard H Henchman
- Sydney Medical School, Faculty of Medicine and Health, University of Sydney, Sydney, Australia
| |
Collapse
|
3
|
Díaz N, Suárez D. Toward Reliable and Insightful Entropy Calculations on Flexible Molecules. J Chem Theory Comput 2022; 18:7166-7178. [PMID: 36426866 DOI: 10.1021/acs.jctc.2c00858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The absolute entropy of a flexible molecule can be approximated by the sum of a rigid-rotor-harmonic-oscillator (RRHO) entropy and a Gibbs-Shannon entropy associated to the Boltzmann distribution for the occupation of the conformational energy levels. Herein, we show that such partitioning, which has received renewed interest, leads to accurate entropies of single molecules of increasing size provided that the conformational part is estimated by means of a set of discretization and expansion techniques that are able to capture the significant correlation effects among the torsional motions. To ensure a reliable entropy estimation, we rely on extensive sampling as that produced by classical molecular dynamics simulations on the microsecond time scale, which is currently affordable for small- and medium-sized molecules. According to test calculations, the gas-phase entropy of simple organic molecules is predicted with a mean unsigned error of 0.9 cal/(mol K) when the RRHO entropies are computed at the B3LYP-D3/cc-pVTZ level. Remarkably, the same protocol gives small errors [<1 cal/(mol K)] for the extremely flexible linear alkane molecules (CnH2n+2, n = 14, 16, and 18). Similarly, we obtain well-converged entropies for a more challenging test of drug molecules, which exhibit more pronounced correlation effects. We also perform equivalent entropy calculations on a 76 amino acid protein, ubiquitin, by taking advantage of the cutoff-dependent formulation of an expansion technique (correlation-consistent multibody local approximation, CC-MLA), which incorporates genuine correlation effects among the neighboring dihedral angles. Moreover, we show that insightful descriptors of the coupled torsional motions can be obtained with the CC-MLA approach.
Collapse
Affiliation(s)
- Natalia Díaz
- Departamento de Química Física y Analítica, Universidad de Oviedo, Avda. Julián Clavería 8, Oviedo33006, SPAIN
| | - Dimas Suárez
- Departamento de Química Física y Analítica, Universidad de Oviedo, Avda. Julián Clavería 8, Oviedo33006, SPAIN
| |
Collapse
|
4
|
Khade P, Jernigan RL. Entropies Derived from the Packing Geometries within a Single Protein Structure. ACS OMEGA 2022; 7:20719-20730. [PMID: 35755337 PMCID: PMC9219053 DOI: 10.1021/acsomega.2c00999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 05/17/2022] [Indexed: 05/17/2023]
Abstract
A fast, simple, yet robust method to calculate protein entropy from a single protein structure is presented here. The focus is on the atomic packing details, which are calculated by combining Voronoi diagrams and Delaunay tessellations. Even though the method is simple, the entropies computed exhibit an extremely high correlation with the entropies previously derived by other methods based on quasi-harmonic motions, quantum mechanics, and molecular dynamics simulations. These packing-based entropies account directly for the local freedom and provide entropy for any individual protein structure that could be used to compute free energies directly during simulations for the generation of more reliable trajectories and also for better evaluations of modeled protein structures. Physico-chemical properties of amino acids are compared with these packing entropies to uncover the relationships with the entropies of different residue types. A public packing entropy web server is provided at packing-entropy.bb.iastate.edu, and the application programing interface is available within the PACKMAN (https://github.com/Pranavkhade/PACKMAN) package.
Collapse
|
5
|
Estrada Pabón JD, Haddox HK, Van Aken G, Pendleton IM, Eramian H, Singer JM, Schrier J. The Role of Configurational Entropy in Miniprotein Stability. J Phys Chem B 2021; 125:3057-3065. [PMID: 33739115 DOI: 10.1021/acs.jpcb.0c09888] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Predicting protein stability is a challenge due to the many competing thermodynamic effects. Through de novo protein design, one begins with a target structure and searches for a sequence that will fold into it. Previous work by Rocklin et al. introduced a data set of more than 16,000 miniproteins spanning four structural topologies with information on stability. These structures were characterized with a set of 46 structural descriptors, with no explicit inclusion of configurational entropy (Scnf). Our work focused on creating a set of 17 descriptors intended to capture variations in Scnf and its comparison to an extended set of 113 structural and energy model features that extend the Rocklin et al. feature set (R). The Scnf descriptors statistically discriminate between stable and unstable distributions within topologies and best describe EEHEE topology stability (where E = β sheet and H = α helix). Between 50 and 80% of the variation in each Scnf descriptor is described by linear combinations of R features. Despite containing useful information about minipeptide stability, providing Scnf features as inputs to machine learning models does not improve overall performance when predicting protein stability, as the R features sufficiently capture the implicit variations.
Collapse
Affiliation(s)
- Jan D Estrada Pabón
- Department of Chemistry, Haverford College, 370 Lancaster Avenue, Haverford, Pennsylvania 19041, United States
| | - Hugh K Haddox
- Institute for Protein Design, University of Washington, Seattle, Washington 98195, United States
| | - Greg Van Aken
- Department of Chemistry, Haverford College, 370 Lancaster Avenue, Haverford, Pennsylvania 19041, United States
| | - Ian M Pendleton
- Department of Chemistry, Haverford College, 370 Lancaster Avenue, Haverford, Pennsylvania 19041, United States
| | - Hamed Eramian
- Netrias LLC, 3100 Clarendon Boulevard, Suite 200, Arlington, Virginia 22201, United States
| | - Jedediah M Singer
- Two Six Technologies, 901 North Stuart Street, Suite 1000, Arlington, Virginia 22203, United States
| | - Joshua Schrier
- Department of Chemistry, Haverford College, 370 Lancaster Avenue, Haverford, Pennsylvania 19041, United States.,Department of Chemistry, Fordham University, 441 East Fordham Road, The Bronx, New York 10458, United States
| |
Collapse
|
6
|
Chakravorty A, Higham J, Henchman RH. Entropy of Proteins Using Multiscale Cell Correlation. J Chem Inf Model 2020; 60:5540-5551. [PMID: 32955869 DOI: 10.1021/acs.jcim.0c00611] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A new multiscale method is presented to calculate the entropy of proteins from molecular dynamics simulations. Termed Multiscale Cell Correlation (MCC), the method decomposes the protein into sets of rigid-body units based on their covalent-bond connectivity at three levels of hierarchy: molecule, residue, and united atom. It evaluates the vibrational and topographical entropy from forces, torques, and dihedrals at each level, taking into account correlations between sets of constituent units that together make up a larger unit at the coarser length scale. MCC gives entropies in close agreement with normal-mode analysis and smaller than those using quasiharmonic analysis as well as providing much faster convergence. Moreover, MCC provides an insightful decomposition of entropy at each length scale and for each type of amino acid according to their solvent exposure and whether they are terminal residues. While the residue entropy depends weakly on solvent exposure, there is greater variation in entropy components for larger, more polar amino acids, which have increased conformational entropy but reduced vibrational entropy with greater solvent exposure.
Collapse
Affiliation(s)
- Arghya Chakravorty
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Jonathan Higham
- MRC Human Genetics Unit, Institute of Genetics & Molecular Medicine, The University of Edinburgh, Western General Hospital, Crewe Road South, Edinburgh EH4 2XU, United Kingdom
| | - Richard H Henchman
- Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, United Kingdom.,Department of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, United Kingdom
| |
Collapse
|
7
|
Atkinson JT, Jones AM, Nanda V, Silberg JJ. Protein tolerance to random circular permutation correlates with thermostability and local energetics of residue-residue contacts. Protein Eng Des Sel 2019; 32:489-501. [PMID: 32626892 DOI: 10.1093/protein/gzaa012] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 04/13/2020] [Accepted: 04/15/2020] [Indexed: 01/08/2023] Open
Abstract
Adenylate kinase (AK) orthologs with a range of thermostabilities were subjected to random circular permutation, and deep mutational scanning was used to evaluate where new protein termini were nondisruptive to activity. The fraction of circularly permuted variants that retained function in each library correlated with AK thermostability. In addition, analysis of the positional tolerance to new termini, which increase local conformational flexibility, showed that bonds were either functionally sensitive to cleavage across all homologs, differentially sensitive, or uniformly tolerant. The mobile AMP-binding domain, which displays the highest calculated contact energies, presented the greatest tolerance to new termini across all AKs. In contrast, retention of function in the lid and core domains was more dependent upon AK melting temperature. These results show that family permutation profiling identifies primary structure that has been selected by evolution for dynamics that are critical to activity within an enzyme family. These findings also illustrate how deep mutational scanning can be applied to protein homologs in parallel to differentiate how topology, stability, and local energetics govern mutational tolerance.
Collapse
Affiliation(s)
- Joshua T Atkinson
- Systems, Synthetic, and Physical Biology Graduate Program, Rice University, 6100 Main Street, MS-180, Houston, TX 77005, USA.,Department of BioSciences, Rice University, 6100 Main Street, MS-140, Houston, TX 77005, USA
| | - Alicia M Jones
- Biochemistry and Cell Biology Graduate Program, Rice University, 6100 Main Street, MS-140, Houston, TX 77005, USA
| | - Vikas Nanda
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Jonathan J Silberg
- Department of BioSciences, Rice University, 6100 Main Street, MS-140, Houston, TX 77005, USA.,Department of Bioengineering, Rice University, 6100 Main Street, MS-142, Houston, TX 77005, USA.,Department of Chemical and Biomolecular Engineering, Rice University, 6100 Main Street, MS-362, Houston, TX 77005, USA
| |
Collapse
|
8
|
Ali HS, Higham J, Henchman RH. Entropy of Simulated Liquids Using Multiscale Cell Correlation. ENTROPY 2019; 21:e21080750. [PMID: 33267464 PMCID: PMC7515279 DOI: 10.3390/e21080750] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2019] [Revised: 07/22/2019] [Accepted: 07/28/2019] [Indexed: 12/16/2022]
Abstract
Accurately calculating the entropy of liquids is an important goal, given that many processes take place in the liquid phase. Of almost equal importance is understanding the values obtained. However, there are few methods that can calculate the entropy of such systems, and fewer still to make sense of the values obtained. We present our multiscale cell correlation (MCC) method to calculate the entropy of liquids from molecular dynamics simulations. The method uses forces and torques at the molecule and united-atom levels and probability distributions of molecular coordinations and conformations. The main differences with previous work are the consistent treatment of the mean-field cell approximation to the approriate degrees of freedom, the separation of the force and torque covariance matrices, and the inclusion of conformation correlation for molecules with multiple dihedrals. MCC is applied to a broader set of 56 important industrial liquids modeled using the Generalized AMBER Force Field (GAFF) and Optimized Potentials for Liquid Simulations (OPLS) force fields with 1.14*CM1A charges. Unsigned errors versus experimental entropies are 8.7 J K - 1 mol - 1 for GAFF and 9.8 J K - 1 mol - 1 for OPLS. This is significantly better than the 2-Phase Thermodynamics method for the subset of molecules in common, which is the only other method that has been applied to such systems. MCC makes clear why the entropy has the value it does by providing a decomposition in terms of translational and rotational vibrational entropy and topographical entropy at the molecular and united-atom levels.
Collapse
Affiliation(s)
- Hafiz Saqib Ali
- Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
- School of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Jonathan Higham
- Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
- School of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Richard H. Henchman
- Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester M1 7DN, UK
- School of Chemistry, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
- Correspondence: ; Tel.: +44-161-306-5194
| |
Collapse
|
9
|
Goethe M, Fita I, Rubi JM. Entropic Stabilization of Cas4 Protein SSO0001 Predicted with Popcoen. ENTROPY (BASEL, SWITZERLAND) 2018; 20:e20080580. [PMID: 33265669 PMCID: PMC7513108 DOI: 10.3390/e20080580] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 07/27/2018] [Accepted: 07/28/2018] [Indexed: 06/12/2023]
Abstract
Popcoen is a method for configurational entropy estimation of proteins based on machine-learning. Entropy is predicted with an artificial neural network which was trained on simulation trajectories of a large set of representative proteins. Popcoen is extremely fast compared to other approaches based on the sampling of a multitude of microstates. Consequently, Popcoen can be incorporated into a large class of protein software which currently neglects configurational entropy for performance reasons. Here, we apply Popcoen to various conformations of the Cas4 protein SSO0001 of Sulfolobus solfataricus, a protein that assembles to a decamer of known toroidal shape. We provide numerical evidence that the native state (NAT) of a SSO0001 monomer has a similar structure to the protomers of the oligomer, where NAT of the monomer is stabilized mainly entropically. Due to its large amount of configurational entropy, NAT has lower free energy than alternative conformations of very low enthalpy and solvation free-energy. Hence, SSO0001 serves as an example case where neglecting configurational entropy leads to incorrect conclusion. Our results imply that no refolding of the subunits is required during oligomerization which suggests that configurational entropy is employed by nature to largely enhance the rate of assembly.
Collapse
Affiliation(s)
- Martin Goethe
- Department of Condensed Matter Physics, University of Barcelona, Carrer Martí i Franquès 1, 08028 Barcelona, Spain
- Department of Inorganic and Organic Chemistry, University of Barcelona, Carrer Martí i Franquès 1, 08028 Barcelona, Spain
| | - Ignacio Fita
- Molecular Biology Institute of Barcelona (IBMB-CSIC, Maria de Maeztu Unit of Excellence), Carrer Baldiri Reixac 4-8, 08028 Barcelona, Spain
| | - J. Miguel Rubi
- Department of Condensed Matter Physics, University of Barcelona, Carrer Martí i Franquès 1, 08028 Barcelona, Spain
| |
Collapse
|