1
|
Ferraz-Caetano J, Teixeira F, Cordeiro MNDS. Explainable Supervised Machine Learning Model To Predict Solvation Gibbs Energy. J Chem Inf Model 2024; 64:2250-2262. [PMID: 37603608 PMCID: PMC11005042 DOI: 10.1021/acs.jcim.3c00544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Indexed: 08/23/2023]
Abstract
Many challenges persist in developing accurate computational models for predicting solvation free energy (ΔGsol). Despite recent developments in Machine Learning (ML) methodologies that outperformed traditional quantum mechanical models, several issues remain concerning explanatory insights for broad chemical predictions with an acceptable speed-accuracy trade-off. To overcome this, we present a novel supervised ML model to predict the ΔGsol for an array of solvent-solute pairs. Using two different ensemble regressor algorithms, we made fast and accurate property predictions using open-source chemical features, encoding complex electronic, structural, and surface area descriptors for every solvent and solute. By integrating molecular properties and chemical interaction features, we have analyzed individual descriptor importance and optimized our model though explanatory information form feature groups. On aqueous and organic solvent databases, ML models revealed the predictive relevance of solutes with increasing polar surface area and decreasing polarizability, yielding better results than state-of-the-art benchmark Neural Network methods (without complex quantum mechanical or molecular dynamic simulations). Both algorithms successfully outperformed previous ΔGsol predictions methods, with a maximum absolute error of 0.22 ± 0.02 kcal mol-1, further validated in an external benchmark database and with solvent hold-out tests. With these explanatory and statistical insights, they allow a thoughtful application of this method for predicting other thermodynamic properties, stressing the relevance of ML modeling for further complex computational chemistry problems.
Collapse
Affiliation(s)
- José Ferraz-Caetano
- Department
of Chemistry and Biochemistry − Faculty of Sciences, University of Porto - Rua do Campo Alegre, S/N, 4169-007 Porto, Portugal
| | - Filipe Teixeira
- Centre
of Chemistry, University of Minho, Campus
de Gualtar, 4710-057 Braga, Portugal
| | - M. Natália D. S. Cordeiro
- Department
of Chemistry and Biochemistry − Faculty of Sciences, University of Porto - Rua do Campo Alegre, S/N, 4169-007 Porto, Portugal
| |
Collapse
|
2
|
Ries B, Alibay I, Swenson DWH, Baumann HM, Henry MM, Eastwood JRB, Gowers RJ. Kartograf: A Geometrically Accurate Atom Mapper for Hybrid-Topology Relative Free Energy Calculations. J Chem Theory Comput 2024; 20:1862-1877. [PMID: 38330251 PMCID: PMC10941767 DOI: 10.1021/acs.jctc.3c01206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 01/17/2024] [Accepted: 01/18/2024] [Indexed: 02/10/2024]
Abstract
Relative binding free energy (RBFE) calculations have emerged as a powerful tool that supports ligand optimization in drug discovery. Despite many successes, the use of RBFEs can often be limited by automation problems, in particular, the setup of such calculations. Atom mapping algorithms are an essential component in setting up automatic large-scale hybrid-topology RBFE calculation campaigns. Traditional algorithms typically employ a 2D subgraph isomorphism solver (SIS) in order to estimate the maximum common substructure. SIS-based approaches can be limited by time-intensive operations and issues with capturing geometry-linked chemical properties, potentially leading to suboptimal solutions. To overcome these limitations, we have developed Kartograf, a geometric-graph-based algorithm that uses primarily the 3D coordinates of atoms to find a mapping between two ligands. In free energy approaches, the ligand conformations are usually derived from docking or other previous modeling approaches, giving the coordinates a certain importance. By considering the spatial relationships between atoms related to the molecule coordinates, our algorithm bypasses the computationally complex subgraph matching of SIS-based approaches and reduces the problem to a much simpler bipartite graph matching problem. Moreover, Kartograf effectively circumvents typical mapping issues induced by molecule symmetry and stereoisomerism, making it a more robust approach for atom mapping from a geometric perspective. To validate our method, we calculated mappings with our novel approach using a diverse set of small molecules and used the mappings in relative hydration and binding free energy calculations. The comparison with two SIS-based algorithms showed that Kartograf offers a fast alternative approach. The code for Kartograf is freely available on GitHub (https://github.com/OpenFreeEnergy/kartograf). While developed for the OpenFE ecosystem, Kartograf can also be utilized as a standalone Python package.
Collapse
Affiliation(s)
- Benjamin Ries
- Medicinal
Chemistry, Boehringer Ingelheim Pharma GmbH
& Co KG, Birkendorfer Str 65, 88397 Biberach an der Riss, Germany
- Open
Free Energy, Open Molecular Software Foundation, Davis, 95616 California, United States
| | - Irfan Alibay
- Open
Free Energy, Open Molecular Software Foundation, Davis, 95616 California, United States
| | - David W. H. Swenson
- Open
Free Energy, Open Molecular Software Foundation, Davis, 95616 California, United States
| | - Hannah M. Baumann
- Open
Free Energy, Open Molecular Software Foundation, Davis, 95616 California, United States
| | - Michael M. Henry
- Open
Free Energy, Open Molecular Software Foundation, Davis, 95616 California, United States
- Computational
and Systems Biology Program, Sloan Kettering
Institute, Memorial Sloan Kettering Cancer Center, New York, 1275 New York, United States
| | - James R. B. Eastwood
- Open
Free Energy, Open Molecular Software Foundation, Davis, 95616 California, United States
| | - Richard J. Gowers
- Open
Free Energy, Open Molecular Software Foundation, Davis, 95616 California, United States
| |
Collapse
|
3
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
4
|
Yoo J, Kim TY, Joung I, Song SO. Industrializing AI/ML during the end-to-end drug discovery process. Curr Opin Struct Biol 2023; 79:102528. [PMID: 36736243 DOI: 10.1016/j.sbi.2023.102528] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 02/04/2023]
Abstract
Drug discovery aims to select proper targets and drug candidates to address unmet clinical needs. The end-to-end drug discovery process includes all stages of drug discovery from target identification to drug candidate selection. Recently, several artificial intelligence and machine learning (AI/ML)-based drug discovery companies have attempted to build data-driven platforms spanning the end-to-end drug discovery process. The ability to identify elusive targets essentially leads to the diversification of discovery pipelines, thereby increasing the ability to address unmet needs. Modern ML technologies are complementing traditional computer-aided drug discovery by accelerating candidate optimization in innovative ways. This review summarizes recent developments in AI/ML methods from target identification to molecule optimization, and concludes with an overview of current industrial trends in end-to-end AI/ML platforms.
Collapse
Affiliation(s)
- Jiho Yoo
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - Tae Yong Kim
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - InSuk Joung
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118
| | - Sang Ok Song
- Standigm Inc., 3F, 70 Nonhyeon-ro 85-gil, Gangnam-gu, Seoul, South Korea, 06234 +82.2.501.8118.
| |
Collapse
|
5
|
Ligand binding free energy evaluation by Monte Carlo Recursion. Comput Biol Chem 2023; 103:107830. [PMID: 36812825 DOI: 10.1016/j.compbiolchem.2023.107830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 01/27/2023] [Accepted: 02/13/2023] [Indexed: 02/17/2023]
Abstract
The correct evaluation of ligand binding free energies by computational methods is still a very challenging active area of research. The most employed methods for these calculations can be roughly classified into four groups: (i) the fastest and less accurate methods, such as molecular docking, designed to sample a large number of molecules and rapidly rank them according to the potential binding energy; (ii) the second class of methods use a thermodynamic ensemble, typically generated by molecular dynamics, to analyze the endpoints of the thermodynamic cycle for binding and extract differences, in the so-called 'end-point' methods; (iii) the third class of methods is based on the Zwanzig relationship and computes the free energy difference after a chemical change of the system (alchemical methods); and (iv) methods based on biased simulations, such as metadynamics, for example. These methods require increased computational power and as expected, result in increased accuracy for the determination of the strength of binding. Here, we describe an intermediate approach, based on the Monte Carlo Recursion (MCR) method first developed by Harold Scheraga. In this method, the system is sampled at increasing effective temperatures, and the free energy of the system is assessed from a series of terms W(b,T), computed from Monte Carlo (MC) averages at each iteration. We show the application of the MCR for ligand binding with datasets of guest-hosts systems (N = 75) and we observed that a good correlation is obtained between experimental data and the binding energies computed with MCR. We also compared the experimental data with an end-point calculation from equilibrium Monte Carlo calculations that allowed us to conclude that the lower-energy (lower-temperature) terms in the calculation are the most relevant to the estimation of the binding energies, resulting in similar correlations between MCR and MC data and the experimental values. On the other hand, the MCR method provides a reasonable view of the binding energy funnel, with possible connections with the ligand binding kinetics, as well. The codes developed for this analysis are publicly available on GitHub as a part of the LiBELa/MCLiBELa project (https://github.com/alessandronascimento/LiBELa).
Collapse
|
6
|
Zhang ZY, Peng D, Liu L, Shen L, Fang WH. Machine Learning Prediction of Hydration Free Energy with Physically Inspired Descriptors. J Phys Chem Lett 2023; 14:1877-1884. [PMID: 36779933 DOI: 10.1021/acs.jpclett.2c03858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
We present machine learning models for predicting experimental hydration free energies of molecules without any atom-, bond-, or geometry-specific input feature. Four types of physically inspired descriptors are adopted for predictions. The first type is composed of the total dipole moment, anisotropic polarizability, and vibrational analysis results of the solute molecule. The second and third types are derived from the electrostatic potential distribution of the solute. The last type includes the solvent accessible surface area and shape similarities. Several machine learning regression models are built on the basis of the FreeSolv database with ∼600 samples, showing a better performance in comparison with that of most traditional approaches and other prediction methods based on molecular fingerprints. In particular, the present descriptors are capable of predicting hydration free energies of new compounds with elements or fragments that are never seen in the training set. The importance of these descriptors, the impact of dissociation energies of specific covalent bonds, and the outliers with relatively large prediction errors are also discussed.
Collapse
Affiliation(s)
- Zhan-Yun Zhang
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, P. R. China
| | - Ding Peng
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, P. R. China
| | - Lihong Liu
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, P. R. China
| | - Lin Shen
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, P. R. China
- Yantai-Jingshi Institute of Material Genome Engineering, Yantai 265505, Shandong, P. R. China
| | - Wei-Hai Fang
- Key Laboratory of Theoretical and Computational Photochemistry of Ministry of Education, College of Chemistry, Beijing Normal University, Beijing 100875, P. R. China
- Shandong Laboratory of Yantai Advanced Materials and Green Manufacturing, Yantai 264006, Shandong, P. R. China
| |
Collapse
|
7
|
Tunjic TM, Weber N, Brunsteiner M. Computer aided drug design in the development of proteolysis targeting chimeras. Comput Struct Biotechnol J 2023; 21:2058-2067. [PMID: 36968015 PMCID: PMC10030821 DOI: 10.1016/j.csbj.2023.02.042] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/21/2023] [Accepted: 02/22/2023] [Indexed: 03/18/2023] Open
Abstract
Proteolysis targeting chimeras represent a class of drug molecules with a number of attractive properties, most notably a potential to work for targets that, so far, have been in-accessible for conventional small molecule inhibitors. Due to their different mechanism of action, and physico-chemical properties, many of the methods that have been designed and applied for computer aided design of traditional small molecule drugs are not applicable for proteolysis targeting chimeras. Here we review recent developments in this field focusing on three aspects: de-novo linker-design, estimation of absorption for beyond-rule-of-5 compounds, and the generation and ranking of ternary complex structures. In spite of this field still being young, we find that a good number of models and algorithms are available, with the potential to assist the design of such compounds in-silico, and accelerate applied pharmaceutical research.
Collapse
|
8
|
Low K, Coote ML, Izgorodina EI. Explainable Solvation Free Energy Prediction Combining Graph Neural Networks with Chemical Intuition. J Chem Inf Model 2022; 62:5457-5470. [PMID: 36317829 DOI: 10.1021/acs.jcim.2c01013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The prediction of a molecule's solvation Gibbs free (ΔGsolv) energy in a given solvent is an important task which has traditionally been carried out via quantum chemical continuum methods or force field-based molecular simulations. Machine learning (ML) and graph neural networks in particular have emerged as powerful techniques for elucidating structure-property relationships. This work presents a graph neural network (GNN) for the prediction of ΔGsolv which, in addition to encoding typical atom and bond-level features, incorporates chemically intuitive, solvation-relevant parameters into the featurization process: semiempirical partial atomic charges and solvent dielectric constant. Solute-solvent interactions are included via an interaction map layer which can be visualized to examine solubility-enhancing or -decreasing interactions learnt by the model. On a test set of small organic molecules, our GNN predicts ΔGsolv in water and cyclohexane with an accuracy comparable to polarizable and ab initio generated force field methods [mean absolute error (MAE) = 0.4 and 0.2 kcal mol-1, respectively], without the need for any molecular simulation. For the FreeSolv data set of hydration free energies, the test MAE is 0.7 kcal mol-1. Interpretability and applicability of the model is highlighted through several examples including rationalizing the increased solubility of modified diaminoanthraquinones in organic solvents. The clear explanations afforded by our GNN allow for easy understanding of the model's predictions, giving the experimental chemist confidence in employing ML models toward more optimized synthetic routes.
Collapse
Affiliation(s)
- Kaycee Low
- Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria3800, Australia
| | - Michelle L Coote
- Institute for Nanoscale Science and Technology, College of Science and Engineering, Flinders University, Bedford Park, South Australia5042, Australia
| | - Ekaterina I Izgorodina
- Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria3800, Australia
| |
Collapse
|
9
|
Weinreich J, Lemm D, von Rudorff GF, von Lilienfeld OA. Ab initio machine learning of phase space averages. J Chem Phys 2022; 157:024303. [PMID: 35840379 DOI: 10.1063/5.0095674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Equilibrium structures determine material properties and biochemical functions. We here propose to machine learn phase space averages, conventionally obtained by ab initio or force-field-based molecular dynamics (MD) or Monte Carlo (MC) simulations. In analogy to ab initio MD, our ab initio machine learning (AIML) model does not require bond topologies and, therefore, enables a general machine learning pathway to obtain ensemble properties throughout the chemical compound space. We demonstrate AIML for predicting Boltzmann averaged structures after training on hundreds of MD trajectories. The AIML output is subsequently used to train machine learning models of free energies of solvation using experimental data and to reach competitive prediction errors (mean absolute error ∼ 0.8 kcal/mol) for out-of-sample molecules-within milliseconds. As such, AIML effectively bypasses the need for MD or MC-based phase space sampling, enabling exploration campaigns of Boltzmann averages throughout the chemical compound space at a much accelerated pace. We contextualize our findings by comparison to state-of-the-art methods resulting in a Pareto plot for the free energy of solvation predictions in terms of accuracy and time.
Collapse
Affiliation(s)
- Jan Weinreich
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Dominik Lemm
- Faculty of Physics, University of Vienna, Kolingasse 14-16, AT-1090 Wien, Austria
| | | | | |
Collapse
|
10
|
Dong L, Qu X, Wang B. XLPFE: A Simple and Effective Machine Learning Scoring Function for Protein-Ligand Scoring and Ranking. ACS OMEGA 2022; 7:21727-21735. [PMID: 35785279 PMCID: PMC9245135 DOI: 10.1021/acsomega.2c01723] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 05/30/2022] [Indexed: 06/15/2023]
Abstract
Prediction of protein-ligand binding affinities is a central issue in structure-based computer-aided drug design. In recent years, much effort has been devoted to the prediction of the binding affinity in protein-ligand complexes using machine learning (ML). Due to the remarkable ability of ML methods in nonlinear fitting, ML-based scoring functions (SFs) can deliver much improved performance on a selected test set, such as the comparative assessment of scoring functions (CASF), when compared to the classical SFs. However, the performance of ML-based SFs heavily relies on the overall similarity of the training set and the test set. To improve the performance and transferability of an SF, we have tried to combine various features including energy terms from X-score and AutoDock Vina, the properties of ligands, and the statistical sequence-related information from either the binding site or the full protein. In conjunction with extreme trees (ET), an ML model, we have developed XLPFE, a new SF. Compared with other tested methods such as X-score, AutoDock Vina, ΔvinaXGB, PSH-ML, or CNN-score, XLPFE achieves consistently better scoring and ranking power for various types of protein-ligand complex structures beyond the CASF, suggesting that XLPFE has superior transferability. In particular, XLPFE performs better with metalloenzymes. With its faster speed, improved accuracy, and better transferability, XLPFE could be usefully applied to a diverse range of protein-ligand complexes.
Collapse
Affiliation(s)
- Lina Dong
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Xiaoyang Qu
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Binju Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| |
Collapse
|
11
|
The performance of ensemble-based free energy protocols in computing binding affinities to ROS1 kinase. Sci Rep 2022; 12:10433. [PMID: 35729177 PMCID: PMC9211793 DOI: 10.1038/s41598-022-13319-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/23/2022] [Indexed: 11/08/2022] Open
Abstract
Optimization of binding affinities for compounds to their target protein is a primary objective in drug discovery. Herein we report on a collaborative study that evaluates a set of compounds binding to ROS1 kinase. We use ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) and TIES (thermodynamic integration with enhanced sampling) protocols to rank the binding free energies. The predicted binding free energies from ESMACS simulations show good correlations with experimental data for subsets of the compounds. Consistent binding free energy differences are generated for TIES and ESMACS. Although an unexplained overestimation exists, we obtain excellent statistical rankings across the set of compounds from the TIES protocol, with a Pearson correlation coefficient of 0.90 between calculated and experimental activities.
Collapse
|
12
|
Wan S, Bhati AP, Wright DW, Wall ID, Graves AP, Green D, Coveney PV. Ensemble Simulations and Experimental Free Energy Distributions: Evaluation and Characterization of Isoxazole Amides as SMYD3 Inhibitors. J Chem Inf Model 2022; 62:2561-2570. [PMID: 35508076 PMCID: PMC9131449 DOI: 10.1021/acs.jcim.2c00255] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Optimization of binding affinities for ligands to their target protein is a primary objective in rational drug discovery. Herein, we report on a collaborative study that evaluates various compounds designed to bind to the SET and MYND domain-containing protein 3 (SMYD3). SMYD3 is a histone methyltransferase and plays an important role in transcriptional regulation in cell proliferation, cell cycle, and human carcinogenesis. Experimental measurements using the scintillation proximity assay show that the distributions of binding free energies from a large number of independent measurements exhibit non-normal properties. We use ESMACS (enhanced sampling of molecular dynamics with approximation of continuum solvent) and TIES (thermodynamic integration with enhanced sampling) protocols to predict the binding free energies and to provide a detailed chemical insight into the nature of ligand-protein binding. Our results show that the 1-trajectory ESMACS protocol works well for the set of ligands studied here. Although one unexplained outlier exists, we obtain excellent statistical ranking across the set of compounds from the ESMACS protocol and good agreement between calculations and experiments for the relative binding free energies from the TIES protocol. ESMACS and TIES are again found to be powerful protocols for the accurate comparison of the binding free energies.
Collapse
Affiliation(s)
- Shunzhou Wan
- Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K
| | - Agastya P Bhati
- Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K
| | - David W Wright
- Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K
| | - Ian D Wall
- GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Alan P Graves
- GlaxoSmithKline, 1250 South Collegeville Road, Collegeville, Pennsylvania 19426, United States
| | - Darren Green
- GlaxoSmithKline, Gunnels Wood Road, Stevenage, Hertfordshire SG1 2NY, U.K
| | - Peter V Coveney
- Centre for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, U.K.,Advanced Research Computing Centre, University College London, London WC1H 0AJ U.K.,Institute for Informatics, Faculty of Science, University of Amsterdam, 1098XH Amsterdam, The Netherlands
| |
Collapse
|
13
|
Bhati A, Coveney PV. Large Scale Study of Ligand-Protein Relative Binding Free Energy Calculations: Actionable Predictions from Statistically Robust Protocols. J Chem Theory Comput 2022; 18:2687-2702. [PMID: 35293737 PMCID: PMC9009079 DOI: 10.1021/acs.jctc.1c01288] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Indexed: 12/28/2022]
Abstract
The accurate and reliable prediction of protein-ligand binding affinities can play a central role in the drug discovery process as well as in personalized medicine. Of considerable importance during lead optimization are the alchemical free energy methods that furnish an estimation of relative binding free energies (RBFE) of similar molecules. Recent advances in these methods have increased their speed, accuracy, and precision. This is evident from the increasing number of retrospective as well as prospective studies employing them. However, such methods still have limited applicability in real-world scenarios due to a number of important yet unresolved issues. Here, we report the findings from a large data set comprising over 500 ligand transformations spanning over 300 ligands binding to a diverse set of 14 different protein targets which furnish statistically robust results on the accuracy, precision, and reproducibility of RBFE calculations. We use ensemble-based methods which are the only way to provide reliable uncertainty quantification given that the underlying molecular dynamics is chaotic. These are implemented using TIES (Thermodynamic Integration with Enhanced Sampling). Results achieve chemical accuracy in all cases. Ensemble simulations also furnish information on the statistical distributions of the free energy calculations which exhibit non-normal behavior. We find that the "enhanced sampling" method known as replica exchange with solute tempering degrades RBFE predictions. We also report definitively on numerous associated alchemical factors including the choice of ligand charge method, flexibility in ligand structure, and the size of the alchemical region including the number of atoms involved in transforming one ligand into another. Our findings provide a key set of recommendations that should be adopted for the reliable application of RBFE methods.
Collapse
Affiliation(s)
- Agastya
P. Bhati
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, United Kingdom
| | - Peter V. Coveney
- Centre
for Computational Science, Department of Chemistry, University College London, London WC1H 0AJ, United Kingdom
- Informatics
Institute, University of Amsterdam, P.O. Box 94323, 1090 GH Amsterdam, Netherlands
| |
Collapse
|
14
|
Sivakumar D, Wu S. Classical and Machine Learning Methods for Protein - Ligand Binding Free Energy Estimation. Curr Drug Metab 2022; 23:252-259. [PMID: 35293293 DOI: 10.2174/1389200223666220315160835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 01/11/2022] [Accepted: 01/25/2022] [Indexed: 11/22/2022]
Abstract
Binding free energy estimation of drug candidates to their biomolecular target is one of the best quantitative estimators in computer-aided drug discovery. Accurate binding free energy estimation is still a challengeable task even after decades of research, along with the complexity of the algorithm, time-consuming procedures, and reproducibility issues. In this review, we have discussed the advantages and disadvantages of diverse free energy methods like Thermodynamic Integration (TI), Bennett's Acceptance Ratio (BAR), Free Energy Perturbation (FEP), alchemical methods. Moreover, we discussed the possible application of the machine learning method in protein-ligand binding free energy estimation.
Collapse
Affiliation(s)
| | - Sangwook Wu
- R&D center, PharmCADD, Busan, Republic of Korea,48060.
- Department of Physics, Pukyong National University, Busan, Republic of Korea, 48513
| |
Collapse
|
15
|
Gianti E, Percec S. Machine Learning at the Interface of Polymer Science and Biology: How Far Can We Go? Biomacromolecules 2022; 23:576-591. [PMID: 35133143 DOI: 10.1021/acs.biomac.1c01436] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This Perspective outlines recent progress and future directions for using machine learning (ML), a data-driven method, to address critical questions in the design, synthesis, processing, and characterization of biomacromolecules. The achievement of these tasks requires the navigation of vast and complex chemical and biological spaces, difficult to accomplish with reasonable speed. Using modern algorithms and supercomputers, quantum physics methods are able to examine systems containing a few hundred interacting species and determine the probability of finding them in a particular region of phase space, thereby anticipating their properties. Likewise, modern approaches in chemistry and biomolecular simulation, supported by high performance computing, have culminated in producing data sets of escalating size and intrinsically high complexity. Hence, using ML to extract relevant information from these fields is of paramount importance to advance our understanding of chemical and biomolecular systems. At the heart of ML approaches lie statistical algorithms, which by evaluating a portion of a given data set, identify, learn, and manipulate the underlying rules that govern the whole data set. The assembly of a quality model to represent the data followed by the predictions and elimination of error sources are the key steps in ML. In addition to a growing infrastructure of ML tools to address complex problems, an increasing number of aspects related to our understanding of the fundamental properties of biomacromolecules are exposed to ML. These fields, including those residing at the interface of polymer science and biology (i.e., structure determination, de novo design, folding, and dynamics), strive to adopt and take advantage of the transformative power offered by approaches in the ML domain, which clearly has the potential of accelerating research in the field of biomacromolecules.
Collapse
Affiliation(s)
- Eleonora Gianti
- Institute for Computational Molecular Science (ICMS), Temple University, Philadelphia, Pennsylvania 19122, United States.,Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| | - Simona Percec
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
16
|
Dong L, Qu X, Zhao Y, Wang B. Prediction of Binding Free Energy of Protein-Ligand Complexes with a Hybrid Molecular Mechanics/Generalized Born Surface Area and Machine Learning Method. ACS OMEGA 2021; 6:32938-32947. [PMID: 34901645 PMCID: PMC8655939 DOI: 10.1021/acsomega.1c04996] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 11/10/2021] [Indexed: 06/14/2023]
Abstract
Accurate prediction of protein-ligand binding free energies is important in enzyme engineering and drug discovery. The molecular mechanics/generalized Born surface area (MM/GBSA) approach is widely used to estimate ligand-binding affinities, but its performance heavily relies on the accuracy of its energy components. A hybrid strategy combining MM/GBSA and machine learning (ML) has been developed to predict the binding free energies of protein-ligand systems. Based on the MM/GBSA energy terms and several features associated with protein-ligand interactions, our ML-based scoring function, GXLE, shows much better performance than MM/GBSA without entropy. In particular, the good transferability of the GXLE model is highlighted by its good performance in ranking power for prediction of the binding affinity of different ligands for either the docked structures or crystal structures. The GXLE scoring function and its code are freely available and can be used to correct the binding free energies computed by MM/GBSA.
Collapse
Affiliation(s)
- Lina Dong
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Xiaoyang Qu
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| | - Yuan Zhao
- The
Key Laboratory of Natural Medicine and Immuno-Engineering, Henan University, Kaifeng 475004, P. R.
China
| | - Binju Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry,
College of Chemistry and Chemical Engineering, Xiamen University, Xiamen 360015, P. R. China
| |
Collapse
|
17
|
Bhati AP, Wan S, Alfè D, Clyde AR, Bode M, Tan L, Titov M, Merzky A, Turilli M, Jha S, Highfield RR, Rocchia W, Scafuri N, Succi S, Kranzlmüller D, Mathias G, Wifling D, Donon Y, Di Meglio A, Vallecorsa S, Ma H, Trifan A, Ramanathan A, Brettin T, Partin A, Xia F, Duan X, Stevens R, Coveney PV. Pandemic drugs at pandemic speed: infrastructure for accelerating COVID-19 drug discovery with hybrid machine learning- and physics-based simulations on high-performance computers. Interface Focus 2021; 11:20210018. [PMID: 34956592 PMCID: PMC8504892 DOI: 10.1098/rsfs.2021.0018] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2021] [Indexed: 12/13/2022] Open
Abstract
The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers.
Collapse
Affiliation(s)
- Agastya P. Bhati
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
| | - Shunzhou Wan
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
| | - Dario Alfè
- Department of Earth Sciences, London Centre for Nanotechnology and Thomas Young Centre at University College London, University College London, Gower Street, London WC1E 6BT, UK
- Dipartimento di Fisica Ettore Pancini, Università di Napoli Federico II, Monte Sant'Angelo, Napoli 80126, Italy
| | - Austin R. Clyde
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | - Mathis Bode
- Institute for Combustion Technology, RWTH Aachen University, Aachen 52056, Germany
| | - Li Tan
- Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Mikhail Titov
- Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| | - Andre Merzky
- Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| | - Matteo Turilli
- Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| | - Shantenu Jha
- Brookhaven National Laboratory, Upton, NY 11973, USA
- Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, Piscataway, NJ 08854, USA
| | | | - Walter Rocchia
- Concept Lab, Italian Institute of Technology, Via Melen, Genova, Italy
| | - Nicola Scafuri
- Concept Lab, Italian Institute of Technology, Via Melen, Genova, Italy
| | - Sauro Succi
- Center for Life Nanosciences at La Sapienza, Italian Institute of Technology, viale Regina Elena, Roma, Italy
| | - Dieter Kranzlmüller
- Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, Boltzmannstrasse 1, Garching bei München 85748, Germany
| | - Gerald Mathias
- Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, Boltzmannstrasse 1, Garching bei München 85748, Germany
| | - David Wifling
- Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Sciences and Humanities, Boltzmannstrasse 1, Garching bei München 85748, Germany
| | | | | | | | - Heng Ma
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Anda Trifan
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Arvind Ramanathan
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Tom Brettin
- Computing, Environment and Life Sciences Directorate, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Alexander Partin
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Fangfang Xia
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Xiaotan Duan
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | - Rick Stevens
- Computing, Environment and Life Sciences Directorate, Argonne National Laboratory, Lemont, IL 60439, USA
| | - Peter V. Coveney
- Centre for Computational Science, University College London, Gordon Street, London WC1H 0AJ, UK
- Institute for Informatics, University of Amsterdam, Science Park 904, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
18
|
Róg T, Girych M, Bunker A. Mechanistic Understanding from Molecular Dynamics in Pharmaceutical Research 2: Lipid Membrane in Drug Design. Pharmaceuticals (Basel) 2021; 14:1062. [PMID: 34681286 PMCID: PMC8537670 DOI: 10.3390/ph14101062] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 10/14/2021] [Accepted: 10/15/2021] [Indexed: 11/17/2022] Open
Abstract
We review the use of molecular dynamics (MD) simulation as a drug design tool in the context of the role that the lipid membrane can play in drug action, i.e., the interaction between candidate drug molecules and lipid membranes. In the standard "lock and key" paradigm, only the interaction between the drug and a specific active site of a specific protein is considered; the environment in which the drug acts is, from a biophysical perspective, far more complex than this. The possible mechanisms though which a drug can be designed to tinker with physiological processes are significantly broader than merely fitting to a single active site of a single protein. In this paper, we focus on the role of the lipid membrane, arguably the most important element outside the proteins themselves, as a case study. We discuss work that has been carried out, using MD simulation, concerning the transfection of drugs through membranes that act as biological barriers in the path of the drugs, the behavior of drug molecules within membranes, how their collective behavior can affect the structure and properties of the membrane and, finally, the role lipid membranes, to which the vast majority of drug target proteins are associated, can play in mediating the interaction between drug and target protein. This review paper is the second in a two-part series covering MD simulation as a tool in pharmaceutical research; both are designed as pedagogical review papers aimed at both pharmaceutical scientists interested in exploring how the tool of MD simulation can be applied to their research and computational scientists interested in exploring the possibility of a pharmaceutical context for their research.
Collapse
Affiliation(s)
- Tomasz Róg
- Department of Physics, University of Helsinki, 00014 Helsinki, Finland;
| | - Mykhailo Girych
- Department of Physics, University of Helsinki, 00014 Helsinki, Finland;
| | - Alex Bunker
- Drug Research Program, Division of Pharmaceutical Biosciences, Faculty of Pharmacy, University of Helsinki, 00014 Helsinki, Finland;
| |
Collapse
|
19
|
Castelli M, Serapian SA, Marchetti F, Triveri A, Pirota V, Torielli L, Collina S, Doria F, Freccero M, Colombo G. New perspectives in cancer drug development: computational advances with an eye to design. RSC Med Chem 2021; 12:1491-1502. [PMID: 34671733 PMCID: PMC8459323 DOI: 10.1039/d1md00192b] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 07/06/2021] [Indexed: 02/06/2023] Open
Abstract
Computational chemistry has come of age in drug discovery. Indeed, most pharmaceutical development programs rely on computer-based data and results at some point. Herein, we discuss recent applications of advanced simulation techniques to difficult challenges in drug discovery. These entail the characterization of allosteric mechanisms and the identification of allosteric sites or cryptic pockets determined by protein motions, which are not immediately evident in the experimental structure of the target; the study of ligand binding mechanisms and their kinetic profiles; and the evaluation of drug-target affinities. We analyze different approaches to tackle challenging and emerging biological targets. Finally, we discuss the possible perspectives of future application of computation in drug discovery.
Collapse
Affiliation(s)
- Matteo Castelli
- Department of Chemistry, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Stefano A Serapian
- Department of Chemistry, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Filippo Marchetti
- Department of Chemistry, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Alice Triveri
- Department of Chemistry, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Valentina Pirota
- Department of Chemistry, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Luca Torielli
- Department of Drug Sciences, Medicinal Chemistry and Pharmaceutical Technology Section, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Simona Collina
- Department of Drug Sciences, Medicinal Chemistry and Pharmaceutical Technology Section, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Filippo Doria
- Department of Chemistry, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Mauro Freccero
- Department of Chemistry, University of Pavia via Taramelli 12 27100 Pavia Italy
| | - Giorgio Colombo
- Department of Chemistry, University of Pavia via Taramelli 12 27100 Pavia Italy
| |
Collapse
|
20
|
Bertazzo M, Gobbo D, Decherchi S, Cavalli A. Machine Learning and Enhanced Sampling Simulations for Computing the Potential of Mean Force and Standard Binding Free Energy. J Chem Theory Comput 2021; 17:5287-5300. [PMID: 34260233 PMCID: PMC8389529 DOI: 10.1021/acs.jctc.1c00177] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Indexed: 02/07/2023]
Abstract
Computational capabilities are rapidly increasing, primarily because of the availability of GPU-based architectures. This creates unprecedented simulative possibilities for the systematic and robust computation of thermodynamic observables, including the free energy of a drug binding to a target. In contrast to calculations of relative binding free energy, which are nowadays widely exploited for drug discovery, we here push the boundary of computing the binding free energy and the potential of mean force. We introduce a novel protocol that leverages enhanced sampling, machine learning, and ad hoc algorithms to limit human intervention, computing time, and free parameters in free energy calculations. We first validate the method on a host-guest system, and then we apply the protocol to glycogen synthase kinase 3 beta, a protein kinase of pharmacological interest. Overall, we obtain a good correlation with experimental values in relative and absolute terms. While we focus on protein-ligand binding, the strategy is of broad applicability to any complex event that can be described with a path collective variable. We systematically discuss key details that influence the final result. The parameters and simulation settings are available at PLUMED-NEST to allow full reproducibility.
Collapse
Affiliation(s)
- Martina Bertazzo
- Computational
& Chemical Biology, Fondazione Istituto
Italiano di Tecnologia, via Morego 30, 16163 Genoa, Italy
- Department
of Pharmacy and Biotechnology (FaBiT), Alma
Mater Studiorum − University of Bologna, via Belmeloro 6, 40126 Bologna, Italy
| | - Dorothea Gobbo
- Computational
& Chemical Biology, Fondazione Istituto
Italiano di Tecnologia, via Morego 30, 16163 Genoa, Italy
| | - Sergio Decherchi
- Computational
& Chemical Biology, Fondazione Istituto
Italiano di Tecnologia, via Morego 30, 16163 Genoa, Italy
- BiKi
Technologies s.r.l., Via XX Settembre 33/10, 16121 Genoa, Italy
| | - Andrea Cavalli
- Computational
& Chemical Biology, Fondazione Istituto
Italiano di Tecnologia, via Morego 30, 16163 Genoa, Italy
- Department
of Pharmacy and Biotechnology (FaBiT), Alma
Mater Studiorum − University of Bologna, via Belmeloro 6, 40126 Bologna, Italy
| |
Collapse
|
21
|
Lim H, Jung Y. MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning. J Cheminform 2021; 13:56. [PMID: 34332634 PMCID: PMC8325294 DOI: 10.1186/s13321-021-00533-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 07/15/2021] [Indexed: 01/04/2023] Open
Abstract
Recent advances in machine learning technologies and their applications have led to the development of diverse structure-property relationship models for crucial chemical properties. The solvation free energy is one of them. Here, we introduce a novel ML-based solvation model, which calculates the solvation energy from pairwise atomistic interactions. The novelty of the proposed model consists of a simple architecture: two encoding functions extract atomic feature vectors from the given chemical structure, while the inner product between the two atomistic feature vectors calculates their interactions. The results of 6239 experimental measurements achieve outstanding performance and transferability for enlarging training data owing to its solvent-non-specific nature. An analysis of the interaction map shows that our model has significant potential for producing group contributions on the solvation energy, which indicates that the model provides not only predictions of target properties but also more detailed physicochemical insights.
Collapse
Affiliation(s)
- Hyuntae Lim
- Department of Chemistry, Seoul National University, Seoul, 08826, South Korea
| | - YounJoon Jung
- Department of Chemistry, Seoul National University, Seoul, 08826, South Korea.
| |
Collapse
|
22
|
Abstract
Machine learning (ML) techniques applied to chemical reactions have a long history. The present contribution discusses applications ranging from small molecule reaction dynamics to computational platforms for reaction planning. ML-based techniques can be particularly relevant for problems involving both computation and experiments. For one, Bayesian inference is a powerful approach to develop models consistent with knowledge from experiments. Second, ML-based methods can also be used to handle problems that are formally intractable using conventional approaches, such as exhaustive characterization of state-to-state information in reactive collisions. Finally, the explicit simulation of reactive networks as they occur in combustion has become possible using machine-learned neural network potentials. This review provides an overview of the questions that can and have been addressed using machine learning techniques, and an outlook discusses challenges in this diverse and stimulating field. It is concluded that ML applied to chemistry problems as practiced and conceived today has the potential to transform the way with which the field approaches problems involving chemical reactions, in both research and academic teaching.
Collapse
Affiliation(s)
- Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, 4056 Basel, Switzerland.,Department of Chemistry, Brown University, Providence, Rhode Island 02912, United States
| |
Collapse
|
23
|
Weinreich J, Browning NJ, von Lilienfeld OA. Machine learning of free energies in chemical compound space using ensemble representations: Reaching experimental uncertainty for solvation. J Chem Phys 2021; 154:134113. [PMID: 33832231 DOI: 10.1063/5.0041548] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Free energies govern the behavior of soft and liquid matter, and improving their predictions could have a large impact on the development of drugs, electrolytes, or homogeneous catalysts. Unfortunately, it is challenging to devise an accurate description of effects governing solvation such as hydrogen-bonding, van der Waals interactions, or conformational sampling. We present a Free energy Machine Learning (FML) model applicable throughout chemical compound space and based on a representation that employs Boltzmann averages to account for an approximated sampling of configurational space. Using the FreeSolv database, FML's out-of-sample prediction errors of experimental hydration free energies decay systematically with training set size, and experimental uncertainty (0.6 kcal/mol) is reached after training on 490 molecules (80% of FreeSolv). Corresponding FML model errors are on par with state-of-the art physics based approaches. To generate the input representation for a new query compound, FML requires approximate and short molecular dynamics runs. We showcase its usefulness through analysis of solvation free energies for 116k organic molecules (all force-field compatible molecules in the QM9 database), identifying the most and least solvated systems and rediscovering quasi-linear structure-property relationships in terms of simple descriptors such as hydrogen-bond donors, number of NH or OH groups, number of oxygen atoms in hydrocarbons, and number of heavy atoms. FML's accuracy is maximal when the temperature used for the molecular dynamics simulation to generate averaged input representation samples in training is the same as for the query compounds. The sampling time for the representation converges rapidly with respect to the prediction error.
Collapse
Affiliation(s)
- Jan Weinreich
- University of Vienna, Faculty of Physics, Kolingasse 14-16, AT-1090 Wien, Austria
| | - Nicholas J Browning
- Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials (MARVEL), Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
| | | |
Collapse
|
24
|
Bieniek M, Bhati AP, Wan S, Coveney PV. TIES 20: Relative Binding Free Energy with a Flexible Superimposition Algorithm and Partial Ring Morphing. J Chem Theory Comput 2021; 17:1250-1265. [PMID: 33486956 PMCID: PMC7876800 DOI: 10.1021/acs.jctc.0c01179] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Indexed: 12/14/2022]
Abstract
The TIES (Thermodynamic Integration with Enhanced Sampling) protocol is a formally exact alchemical approach in computational chemistry to the calculation of relative binding free energies. The validity of TIES relies on the correctness of matching atoms across compared pairs of ligands, laying the foundation for the transformation along an alchemical pathway. We implement a flexible topology superimposition algorithm which uses an exhaustive joint-traversal for computing the largest common component(s). The algorithm is employed to enable matching and morphing of partial rings in the TIES protocol along with a validation study using 55 transformations and five different proteins from our previous work. We find that TIES 20 with the RESP charge system, using the new superimposition algorithm, reproduces the previous results with mean unsigned error of 0.75 kcal/mol with respect to the experimental data. Enabling the morphing of partial rings decreases the size of the alchemical region in the dual-topology transformations resulting in a significant improvement in the prediction precision. We find that increasing the ensemble size from 5 to 20 replicas per λ window only has a minimal impact on the accuracy. However, the non-normal nature of the relative free energy distributions underscores the importance of ensemble simulation. We further compare the results with the AM1-BCC charge system and show that it improves agreement with the experimental data by slightly over 10%. This improvement is partly due to AM1-BCC affecting only the charges of the atoms local to the mutation, which translates to even fewer morphed atoms, consequently reducing issues with sampling and therefore ensemble averaging. TIES 20, in conjunction with the enablement of ring morphing, reduces the size of the alchemical region and significantly improves the precision of the predicted free energies.
Collapse
Affiliation(s)
- Mateusz
K. Bieniek
- Centre for Computational Science, Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| | - Agastya P. Bhati
- Centre for Computational Science, Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| | - Shunzhou Wan
- Centre for Computational Science, Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| | - Peter V. Coveney
- Centre for Computational Science, Department
of Chemistry, University College London, 20 Gordon Street, London WC1H 0AJ, United Kingdom
| |
Collapse
|
25
|
Armacost KA, Riniker S, Cournia Z. Exploring Novel Directions in Free Energy Calculations. J Chem Inf Model 2020; 60:5283-5286. [PMID: 33222441 DOI: 10.1021/acs.jcim.0c01266] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Kira A Armacost
- Computational and Structural Chemistry, MRL, Merck & Co., Inc. West Point, Pennsylvania 19486, United States
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Zoe Cournia
- Biomedical Research Foundation Academy of Athens, Soranou Ephessiou 4, 11527 Athens, Greece
| |
Collapse
|
26
|
Lee TS, Allen BK, Giese TJ, Guo Z, Li P, Lin C, McGee TD, Pearlman DA, Radak BK, Tao Y, Tsai HC, Xu H, Sherman W, York DM. Alchemical Binding Free Energy Calculations in AMBER20: Advances and Best Practices for Drug Discovery. J Chem Inf Model 2020; 60:5595-5623. [PMID: 32936637 PMCID: PMC7686026 DOI: 10.1021/acs.jcim.0c00613] [Citation(s) in RCA: 161] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Predicting protein-ligand binding affinities and the associated thermodynamics of biomolecular recognition is a primary objective of structure-based drug design. Alchemical free energy simulations offer a highly accurate and computationally efficient route to achieving this goal. While the AMBER molecular dynamics package has successfully been used for alchemical free energy simulations in academic research groups for decades, widespread impact in industrial drug discovery settings has been minimal because of the previous limitations within the AMBER alchemical code, coupled with challenges in system setup and postprocessing workflows. Through a close academia-industry collaboration we have addressed many of the previous limitations with an aim to improve accuracy, efficiency, and robustness of alchemical binding free energy simulations in industrial drug discovery applications. Here, we highlight some of the recent advances in AMBER20 with a focus on alchemical binding free energy (BFE) calculations, which are less computationally intensive than alternative binding free energy methods where full binding/unbinding paths are explored. In addition to scientific and technical advances in AMBER20, we also describe the essential practical aspects associated with running relative alchemical BFE calculations, along with recommendations for best practices, highlighting the importance not only of the alchemical simulation code but also the auxiliary functionalities and expertise required to obtain accurate and reliable results. This work is intended to provide a contemporary overview of the scientific, technical, and practical issues associated with running relative BFE simulations in AMBER20, with a focus on real-world drug discovery applications.
Collapse
Affiliation(s)
- Tai-Sung Lee
- Rutgers, the State University of New Jersey, Laboratory for Biomolecular Simulation Research, and Department of Chemistry and Chemical Biology, United States
| | - Bryce K. Allen
- Silicon Therapeutics, Boston, Massachusetts 02210, United States
| | - Timothy J. Giese
- Rutgers, the State University of New Jersey, Laboratory for Biomolecular Simulation Research, and Department of Chemistry and Chemical Biology, United States
| | - Zhenyu Guo
- Silicon Therapeutics, Boston, Massachusetts 02210, United States
| | - Pengfei Li
- Silicon Therapeutics, Boston, Massachusetts 02210, United States
| | - Charles Lin
- Silicon Therapeutics, Boston, Massachusetts 02210, United States
| | - T. Dwight McGee
- Silicon Therapeutics, Boston, Massachusetts 02210, United States
| | - David A. Pearlman
- QSimulate Incorporated, Cambridge, Massachusetts 02139, United States
| | - Brian K. Radak
- Silicon Therapeutics, Boston, Massachusetts 02210, United States
| | - Yujun Tao
- Rutgers, the State University of New Jersey, Laboratory for Biomolecular Simulation Research, and Department of Chemistry and Chemical Biology, United States
| | - Hsu-Chun Tsai
- Rutgers, the State University of New Jersey, Laboratory for Biomolecular Simulation Research, and Department of Chemistry and Chemical Biology, United States
| | - Huafeng Xu
- Silicon Therapeutics, Boston, Massachusetts 02210, United States
| | - Woody Sherman
- Silicon Therapeutics, Boston, Massachusetts 02210, United States
| | - Darrin M. York
- Rutgers, the State University of New Jersey, Laboratory for Biomolecular Simulation Research, and Department of Chemistry and Chemical Biology, United States
| |
Collapse
|
27
|
Mey ASJS, Allen BK, Macdonald HEB, Chodera JD, Hahn DF, Kuhn M, Michel J, Mobley DL, Naden LN, Prasad S, Rizzi A, Scheen J, Shirts MR, Tresadern G, Xu H. Best Practices for Alchemical Free Energy Calculations [Article v1.0]. LIVING JOURNAL OF COMPUTATIONAL MOLECULAR SCIENCE 2020; 2:18378. [PMID: 34458687 PMCID: PMC8388617 DOI: 10.33011/livecoms.2.1.18378] [Citation(s) in RCA: 114] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Alchemical free energy calculations are a useful tool for predicting free energy differences associated with the transfer of molecules from one environment to another. The hallmark of these methods is the use of "bridging" potential energy functions representing alchemical intermediate states that cannot exist as real chemical species. The data collected from these bridging alchemical thermodynamic states allows the efficient computation of transfer free energies (or differences in transfer free energies) with orders of magnitude less simulation time than simulating the transfer process directly. While these methods are highly flexible, care must be taken in avoiding common pitfalls to ensure that computed free energy differences can be robust and reproducible for the chosen force field, and that appropriate corrections are included to permit direct comparison with experimental data. In this paper, we review current best practices for several popular application domains of alchemical free energy calculations performed with equilibrium simulations, in particular relative and absolute small molecule binding free energy calculations to biomolecular targets.
Collapse
Affiliation(s)
- Antonia S. J. S. Mey
- EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh, EH9 3FJ, UK
| | | | - Hannah E. Bruce Macdonald
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York NY, USA
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York NY, USA
| | - David F. Hahn
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium
| | - Maximilian Kuhn
- EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh, EH9 3FJ, UK
- Cresset, Cambridgeshire, UK
| | - Julien Michel
- EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh, EH9 3FJ, UK
| | - David L. Mobley
- Departments of Pharmaceutical Sciences and Chemistry, University of California, Irvine, Irvine, USA
| | - Levi N. Naden
- Molecular Sciences Software Institute, Blacksburg VA, USA
| | | | - Andrea Rizzi
- Silicon Therapeutics, Boston, MA, USA
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA
| | - Jenke Scheen
- EaStCHEM School of Chemistry, David Brewster Road, Joseph Black Building, The King’s Buildings, Edinburgh, EH9 3FJ, UK
| | | | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Turnhoutseweg 30, Beerse B-2340, Belgium
| | | |
Collapse
|