1
|
Zheng T, Wang A, Han X, Xia Y, Xu X, Zhan J, Liu Y, Chen Y, Wang Z, Wu X, Gong S, Yan W. Data-driven parametrization of molecular mechanics force fields for expansive chemical space coverage. Chem Sci 2025; 16:2730-2740. [PMID: 39802691 PMCID: PMC11721737 DOI: 10.1039/d4sc06640e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2024] [Accepted: 12/25/2024] [Indexed: 01/16/2025] Open
Abstract
A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this study, we address this issue using a modern data-driven approach, developing ByteFF, an Amber-compatible force field for drug-like molecules. To create ByteFF, we generated an expansive and highly diverse molecular dataset at the B3LYP-D3(BJ)/DZVP level of theory. This dataset includes 2.4 million optimized molecular fragment geometries with analytical Hessian matrices, along with 3.2 million torsion profiles. We then trained an edge-augmented, symmetry-preserving molecular graph neural network (GNN) on this dataset, employing a carefully optimized training strategy. Our model predicts all bonded and non-bonded MM force field parameters for drug-like molecules simultaneously across a broad chemical space. ByteFF demonstrates state-of-the-art performance on various benchmark datasets, excelling in predicting relaxed geometries, torsional energy profiles, and conformational energies and forces. Its exceptional accuracy and expansive chemical space coverage make ByteFF a valuable tool for multiple stages of computational drug discovery.
Collapse
Affiliation(s)
- Tianze Zheng
- ByteDance Research, Beijing Beijing 100098 China
| | - Ailun Wang
- ByteDance Research Bellevue Washington 98004 USA
| | - Xu Han
- ByteDance Research, Beijing Beijing 100098 China
| | - Yu Xia
- ByteDance Research, Beijing Beijing 100098 China
| | - Xingyuan Xu
- ByteDance Research, Beijing Beijing 100098 China
| | - Jiawei Zhan
- ByteDance Research Bellevue Washington 98004 USA
| | - Yu Liu
- ByteDance Research Bellevue Washington 98004 USA
| | - Yang Chen
- ByteDance Research, Beijing Beijing 100098 China
| | - Zhi Wang
- ByteDance Research Bellevue Washington 98004 USA
| | - Xiaojie Wu
- ByteDance Research Bellevue Washington 98004 USA
| | - Sheng Gong
- ByteDance Research Bellevue Washington 98004 USA
| | - Wen Yan
- ByteDance Research Bellevue Washington 98004 USA
| |
Collapse
|
2
|
Karwounopoulos J, Bieniek M, Wu Z, Baskerville AL, König G, Cossins BP, Wood GPF. Evaluation of Machine Learning/Molecular Mechanics End-State Corrections with Mechanical Embedding to Calculate Relative Protein-Ligand Binding Free Energies. J Chem Theory Comput 2025; 21:967-977. [PMID: 39753520 DOI: 10.1021/acs.jctc.4c01427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2025]
Abstract
The development of machine-learning (ML) potentials offers significant accuracy improvements compared to molecular mechanics (MM) because of the inclusion of quantum-mechanical effects in molecular interactions. However, ML simulations are several times more computationally demanding than MM simulations, so there is a trade-off between speed and accuracy. One possible compromise are hybrid machine learning/molecular mechanics (ML/MM) approaches with mechanical embedding that treat the intramolecular interactions of the ligand at the ML level and the protein-ligand interactions at the MM level. Recent studies have reported improved protein-ligand binding free energy results based on ML/MM using ANI-2x with mechanical embedding, arguing that intramolecular interactions like torsion potentials of the ligand are often the limiting factor for accuracy. This claim is evaluated based on 108 relative binding free energy calculations for four different benchmark systems. As an alternative strategy, we also tested a tool that fits the MM dihedral potentials to the ML level of theory. Fitting was performed with the ML potentials ANI-2x and AIMNet2, and, for the benchmark system TYK2, also with quantum-mechanical calculations using ωB97M-D3(BJ)/def2-TZVPPD. Overall, the relative binding free energy results from MM with Open Force Field 2.2.0, MM with ML-fitted torsion potentials, and the corresponding ML/MM end-state corrected simulations show no statistically significant differences in the mean absolute errors (between 0.8 and 0.9 kcal mol-1). This can probably be explained by the usage of the same MM parameters to calculate the protein-ligand interactions. Therefore, a well-parametrized force field is on a par with simple mechanical embedding ML/MM simulations for protein-ligand binding. In terms of computational costs, the reparametrization of poor torsional potentials is preferable over employing computationally intensive ML/MM simulations of protein-ligand complexes with mechanical embedding. Also, the refitting strategy leads to lower variances of the protein-ligand binding free energy results than the ML/MM end-state corrections. For free energy corrections with ML/MM, the results indicate that better convergence and more advanced ML/MM schemes will be required for applications in computer-guided drug discovery.
Collapse
Affiliation(s)
| | - Mateusz Bieniek
- Exscientia, Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Zhiyi Wu
- Exscientia, Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Adam L Baskerville
- Exscientia, Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Gerhard König
- Exscientia, Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Benjamin P Cossins
- Exscientia, Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| | - Geoffrey P F Wood
- Exscientia, Schrödinger Building, Oxford Science Park, Oxford OX4 4GE, U.K
| |
Collapse
|
3
|
Williams CD, Kalayan J, Burton NA, Bryce RA. Stable and accurate atomistic simulations of flexible molecules using conformationally generalisable machine learned potentials. Chem Sci 2024; 15:12780-12795. [PMID: 39148799 PMCID: PMC11323334 DOI: 10.1039/d4sc01109k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 07/07/2024] [Indexed: 08/17/2024] Open
Abstract
Computational simulation methods based on machine learned potentials (MLPs) promise to revolutionise shape prediction of flexible molecules in solution, but their widespread adoption has been limited by the way in which training data is generated. Here, we present an approach which allows the key conformational degrees of freedom to be properly represented in reference molecular datasets. MLPs trained on these datasets using a global descriptor scheme are generalisable in conformational space, providing quantum chemical accuracy for all conformers. These MLPs are capable of propagating long, stable molecular dynamics trajectories, an attribute that has remained a challenge. We deploy the MLPs in obtaining converged conformational free energy surfaces for flexible molecules via well-tempered metadynamics simulations; this approach provides a hitherto inaccessible route to accurately computing the structural, dynamical and thermodynamical properties of a wide variety of flexible molecular systems. It is further demonstrated that MLPs must be trained on reference datasets with complete coverage of conformational space, including in barrier regions, to achieve stable molecular dynamics trajectories.
Collapse
Affiliation(s)
- Christopher D Williams
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester Oxford Road Manchester M13 9PL UK
| | - Jas Kalayan
- Science and Technologies Facilities Council (STFC), Daresbury Laboratory Keckwick Lane, Daresbury Warrington WA4 4AD UK
| | - Neil A Burton
- Department of Chemistry, School of Natural Sciences, Faculty of Science and Engineering, The University of Manchester Oxford Road Manchester M13 9PL UK
| | - Richard A Bryce
- Division of Pharmacy and Optometry, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester Oxford Road Manchester M13 9PL UK
| |
Collapse
|
4
|
Wang L, Behara PK, Thompson MW, Gokey T, Wang Y, Wagner JR, Cole DJ, Gilson MK, Shirts MR, Mobley DL. The Open Force Field Initiative: Open Software and Open Science for Molecular Modeling. J Phys Chem B 2024; 128:7043-7067. [PMID: 38989715 DOI: 10.1021/acs.jpcb.4c01558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Force fields are a key component of physics-based molecular modeling, describing the energies and forces in a molecular system as a function of the positions of the atoms and molecules involved. Here, we provide a review and scientific status report on the work of the Open Force Field (OpenFF) Initiative, which focuses on the science, infrastructure and data required to build the next generation of biomolecular force fields. We introduce the OpenFF Initiative and the related OpenFF Consortium, describe its approach to force field development and software, and discuss accomplishments to date as well as future plans. OpenFF releases both software and data under open and permissive licensing agreements to enable rapid application, validation, extension, and modification of its force fields and software tools. We discuss lessons learned to date in this new approach to force field development. We also highlight ways that other force field researchers can get involved, as well as some recent successes of outside researchers taking advantage of OpenFF tools and data.
Collapse
Affiliation(s)
- Lily Wang
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Pavan Kumar Behara
- Center for Neurotherapeutics, University of California, Irvine, California 92697, United States
| | - Matthew W Thompson
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Trevor Gokey
- Department of Chemistry, University of California, Irvine, California 92697, United States
| | - Yuanqing Wang
- Simons Center for Computational Physical Chemistry and Center for Data Science, New York, New York 10004, United States
| | - Jeffrey R Wagner
- Open Force Field, Open Molecular Software Foundation, Davis, California 95616, United States
| | - Daniel J Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Michael K Gilson
- Skaggs School of Pharmacy and Pharmaceutical Sciences, The University of California at San Diego, La Jolla, California 92093, United States
| | - Michael R Shirts
- Department of Chemical and Biological Engineering, University of Colorado Boulder, Boulder, Colorado 80305, United States
| | - David L Mobley
- Department of Chemistry, University of California, Irvine, California 92697, United States
- Department of Pharmaceutical Sciences, University of California, Irvine, California 92697, United States
| |
Collapse
|
5
|
Grassano JS, Pickering I, Roitberg AE, González Lebrero MC, Estrin DA, Semelak JA. Assessment of Embedding Schemes in a Hybrid Machine Learning/Classical Potentials (ML/MM) Approach. J Chem Inf Model 2024; 64:4047-4058. [PMID: 38710065 DOI: 10.1021/acs.jcim.4c00478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Machine learning (ML) methods have reached high accuracy levels for the prediction of in vacuo molecular properties. However, the simulation of large systems solely through ML methods (such as those based on neural network potentials) is still a challenge. In this context, one of the most promising frameworks for integrating ML schemes in the simulation of complex molecular systems are the so-called ML/MM methods. These multiscale approaches combine ML methods with classical force fields (MM), in the same spirit as the successful hybrid quantum mechanics-molecular mechanics methods (QM/MM). The key issue for such ML/MM methods is an adequate description of the coupling between the region of the system described by ML and the region described at the MM level. In the context of QM/MM schemes, the main ingredient of the interaction is electrostatic, and the state of the art is the so-called electrostatic-embedding. In this study, we analyze the quality of simpler mechanical embedding-based approaches, specifically focusing on their application within a ML/MM framework utilizing atomic partial charges derived in vacuo. Taking as reference electrostatic embedding calculations performed at a QM(DFT)/MM level, we explore different atomic charges schemes, as well as a polarization correction computed using atomic polarizabilites. Our benchmark data set comprises a set of about 80k small organic structures from the ANI-1x and ANI-2x databases, solvated in water. The results suggest that the minimal basis iterative stockholder (MBIS) atomic charges yield the best agreement with the reference coupling energy. Remarkable enhancements are achieved by including a simple polarization correction.
Collapse
Affiliation(s)
- Juan S Grassano
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Ignacio Pickering
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Adrian E Roitberg
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Mariano C González Lebrero
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Dario A Estrin
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Jonathan A Semelak
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| |
Collapse
|
6
|
Rezaee M, Ekrami S, Hashemianzadeh SM. Comparing ANI-2x, ANI-1ccx neural networks, force field, and DFT methods for predicting conformational potential energy of organic molecules. Sci Rep 2024; 14:11791. [PMID: 38783010 PMCID: PMC11116541 DOI: 10.1038/s41598-024-62242-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 05/15/2024] [Indexed: 05/25/2024] Open
Abstract
In this study, the conformational potential energy surfaces of Amylmetacresol, Benzocaine, Dopamine, Betazole, and Betahistine molecules were scanned and analyzed using the neural network architecture ANI-2 × and ANI-1ccx, the force field method OPLS, and density functional theory with the exchange-correlation functional B3LYP and the basis set 6-31G(d). The ANI-1ccx and ANI-2 × methods demonstrated the highest accuracy in predicting torsional energy profiles, effectively capturing the minimum and maximum values of these profiles. Conformational potential energy values calculated by B3LYP and the OPLS force field method differ from those calculated by ANI-1ccx and ANI-2x, which account for non-bonded intramolecular interactions, since the B3LYP functional and OPLS force field weakly consider van der Waals and other intramolecular forces in torsional energy profiles. For a more comprehensive analysis, electronic parameters such as dipole moment, HOMO, and LUMO energies for different torsional angles were calculated at two levels of theory, B3LYP/6-31G(d) and ωB97X/6-31G(d). These calculations confirmed that ANI predictions are more accurate than density functional theory calculations with B3LYP functional and OPLS force field for determining potential energy surfaces. This research successfully addressed the challenges in determining conformational potential energy levels and shows how machine learning and deep neural networks offer a more accurate, cost-effective, and rapid alternative for predicting torsional energy profiles.
Collapse
Affiliation(s)
- Mozafar Rezaee
- Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, Tehran, Iran
| | - Saeid Ekrami
- CNRS, LCPME, Université de Lorraine, 54000, Nancy, France
| | - Seyed Majid Hashemianzadeh
- Molecular Simulation Research Laboratory, Department of Chemistry, Iran University of Science and Technology, Tehran, Iran.
| |
Collapse
|
7
|
Chen M, Jiang X, Zhang L, Chen X, Wen Y, Gu Z, Li X, Zheng M. The emergence of machine learning force fields in drug design. Med Res Rev 2024; 44:1147-1182. [PMID: 38173298 DOI: 10.1002/med.22008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024]
Abstract
In the field of molecular simulation for drug design, traditional molecular mechanic force fields and quantum chemical theories have been instrumental but limited in terms of scalability and computational efficiency. To overcome these limitations, machine learning force fields (MLFFs) have emerged as a powerful tool capable of balancing accuracy with efficiency. MLFFs rely on the relationship between molecular structures and potential energy, bypassing the need for a preconceived notion of interaction representations. Their accuracy depends on the machine learning models used, and the quality and volume of training data sets. With recent advances in equivariant neural networks and high-quality datasets, MLFFs have significantly improved their performance. This review explores MLFFs, emphasizing their potential in drug design. It elucidates MLFF principles, provides development and validation guidelines, and highlights successful MLFF implementations. It also addresses potential challenges in developing and applying MLFFs. The review concludes by illuminating the path ahead for MLFFs, outlining the challenges to be overcome and the opportunities to be harnessed. This inspires researchers to embrace MLFFs in their investigations as a new tool to perform molecular simulations in drug design.
Collapse
Affiliation(s)
- Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Physical Science and Technology, ShanghaiTech University, Shanghai, China
- Lingang Laboratory, Shanghai, China
| | - Xinyu Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxu Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Yiming Wen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Zhiyong Gu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- School of Pharmacy, University of Chinese Academy of Sciences, Beijing, China
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, China
| |
Collapse
|
8
|
Romanini M, Macovez R, Valenti S, Noor W, Tamarit JL. Dielectric Spectroscopy Studies of Conformational Relaxation Dynamics in Molecular Glass-Forming Liquids. Int J Mol Sci 2023; 24:17189. [PMID: 38139017 PMCID: PMC10743228 DOI: 10.3390/ijms242417189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 11/29/2023] [Accepted: 12/04/2023] [Indexed: 12/24/2023] Open
Abstract
We review experimental results obtained with broadband dielectric spectroscopy concerning the relaxation times and activation energies of intramolecular conformational relaxation processes in small-molecule glass-formers. Such processes are due to the interconversion between different conformers of relatively flexible molecules, and generally involve conformational changes of flexible chain or ring moieties, or else the rigid rotation of planar groups, such as conjugated phenyl rings. Comparative analysis of molecules possessing the same (type of) functional group is carried out in order to test the possibility of assigning the dynamic conformational isomerism of given families of organic compounds to the motion of specific molecular subunits. These range from terminal halomethyl and acetyl/acetoxy groups to both rigid and flexible ring structures, such as the planar halobenzene cycles or the buckled saccharide and diazepine rings. A short section on polyesters provides a generalisation of these findings to synthetic macromolecules.
Collapse
Affiliation(s)
| | | | | | | | - Josep Lluís Tamarit
- Grup de Caracterització de Materials, Departament de Física and Barcelona Research Center in Multiscale Science and Engineering, Universitat Politècnica de Catalunya, Barcelona East School of Engineering (EEBE), Av. Eduard Maristany 10-14, E-08019 Barcelona, Spain; (M.R.); (R.M.); (S.V.); (W.N.)
| |
Collapse
|
9
|
Galvelis R, Varela-Rial A, Doerr S, Fino R, Eastman P, Markland TE, Chodera JD, De Fabritiis G. NNP/MM: Accelerating Molecular Dynamics Simulations with Machine Learning Potentials and Molecular Mechanics. J Chem Inf Model 2023; 63:5701-5708. [PMID: 37694852 PMCID: PMC10577237 DOI: 10.1021/acs.jcim.3c00773] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared with traditional molecular mechanics. To tackle this issue, we introduce an optimized implementation of the hybrid method (NNP/MM), which combines a neural network potential (NNP) and molecular mechanics (MM). This approach models a portion of the system, such as a small molecule, using NNP while employing MM for the remaining system to boost efficiency. By conducting molecular dynamics (MD) simulations on various protein-ligand complexes and metadynamics (MTD) simulations on a ligand, we showcase the capabilities of our implementation of NNP/MM. It has enabled us to increase the simulation speed by ∼5 times and achieve a combined sampling of 1 μs for each complex, marking the longest simulations ever reported for this class of simulations.
Collapse
Affiliation(s)
- Raimondas Galvelis
- Acellera Labs, C/Doctor Trueta 183, Barcelona 08005, Spain
- Computational Science Laboratory, Universitat Pompeu Fabra, PRBB, C/Doctor Aiguader 88, Barcelona 08003, Spain
| | - Alejandro Varela-Rial
- Acellera Ltd, Devonshire House 582 Honeypot Lane, Stanmore Middlesex, HA7 1JS, United Kingdom
| | - Stefan Doerr
- Acellera Ltd, Devonshire House 582 Honeypot Lane, Stanmore Middlesex, HA7 1JS, United Kingdom
| | - Roberto Fino
- Acellera Labs, C/Doctor Trueta 183, Barcelona 08005, Spain
| | - Peter Eastman
- Department of Chemistry, Stanford University, 337 Campus Drive, Stanford, California 94305, United States
| | - Thomas E Markland
- Department of Chemistry, Stanford University, 337 Campus Drive, Stanford, California 94305, United States
| | - John D Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Gianni De Fabritiis
- Computational Science Laboratory, Universitat Pompeu Fabra, PRBB, C/Doctor Aiguader 88, Barcelona 08003, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys 23, Barcelona 08010, Spain
- Acellera Ltd, Devonshire House 582 Honeypot Lane, Stanmore Middlesex, HA7 1JS, United Kingdom
| |
Collapse
|
10
|
Kovács DP, Batatia I, Arany ES, Csányi G. Evaluation of the MACE force field architecture: From medicinal chemistry to materials science. J Chem Phys 2023; 159:044118. [PMID: 37522405 DOI: 10.1063/5.0155322] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 06/29/2023] [Indexed: 08/01/2023] Open
Abstract
The MACE architecture represents the state of the art in the field of machine learning force fields for a variety of in-domain, extrapolation, and low-data regime tasks. In this paper, we further evaluate MACE by fitting models for published benchmark datasets. We show that MACE generally outperforms alternatives for a wide range of systems, from amorphous carbon, universal materials modeling, and general small molecule organic chemistry to large molecules and liquid water. We demonstrate the capabilities of the model on tasks ranging from constrained geometry optimization to molecular dynamics simulations and find excellent performance across all tested domains. We show that MACE is very data efficient and can reproduce experimental molecular vibrational spectra when trained on as few as 50 randomly selected reference configurations. We further demonstrate that the strictly local atom-centered model is sufficient for such tasks even in the case of large molecules and weakly interacting molecular assemblies.
Collapse
Affiliation(s)
- Dávid Péter Kovács
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| | - Ilyes Batatia
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
- ENS Paris-Saclay, Université Paris-Saclay, 91190 Gif-sur-Yvette, France
| | - Eszter Sára Arany
- School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SP, United Kingdom
| | - Gábor Csányi
- Engineering Laboratory, University of Cambridge, Cambridge CB2 1PZ, United Kingdom
| |
Collapse
|
11
|
Morado J, Mortenson PN, Nissink JWM, Essex JW, Skylaris CK. Does a Machine-Learned Potential Perform Better Than an Optimally Tuned Traditional Force Field? A Case Study on Fluorohydrins. J Chem Inf Model 2023; 63:2810-2827. [PMID: 37071825 PMCID: PMC10170518 DOI: 10.1021/acs.jcim.2c01510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/20/2023]
Abstract
We present a comparative study that evaluates the performance of a machine learning potential (ANI-2x), a conventional force field (GAFF), and an optimally tuned GAFF-like force field in the modeling of a set of 10 γ-fluorohydrins that exhibit a complex interplay between intra- and intermolecular interactions in determining conformer stability. To benchmark the performance of each molecular model, we evaluated their energetic, geometric, and sampling accuracies relative to quantum-mechanical data. This benchmark involved conformational analysis both in the gas phase and chloroform solution. We also assessed the performance of the aforementioned molecular models in estimating nuclear spin-spin coupling constants by comparing their predictions to experimental data available in chloroform. The results and discussion presented in this study demonstrate that ANI-2x tends to predict stronger-than-expected hydrogen bonding and overstabilize global minima and shows problems related to inadequate description of dispersion interactions. Furthermore, while ANI-2x is a viable model for modeling in the gas phase, conventional force fields still play an important role, especially for condensed-phase simulations. Overall, this study highlights the strengths and weaknesses of each model, providing guidelines for the use and future development of force fields and machine learning potentials.
Collapse
Affiliation(s)
- João Morado
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| | - Paul N Mortenson
- Astex Pharmaceuticals, 436 Cambridge Science Park, Milton Road, Cambridge CB4 0QA, United Kingdom
| | - J Willem M Nissink
- Computational Chemistry, Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - Jonathan W Essex
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| | - Chris-Kriton Skylaris
- School of Chemistry, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| |
Collapse
|
12
|
Bieniek MK, Cree B, Pirie R, Horton JT, Tatum NJ, Cole DJ. An open-source molecular builder and free energy preparation workflow. Commun Chem 2022; 5:136. [PMID: 36320862 PMCID: PMC9607723 DOI: 10.1038/s42004-022-00754-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 10/11/2022] [Indexed: 01/27/2023] Open
Abstract
Automated free energy calculations for the prediction of binding free energies of congeneric series of ligands to a protein target are growing in popularity, but building reliable initial binding poses for the ligands is challenging. Here, we introduce the open-source FEgrow workflow for building user-defined congeneric series of ligands in protein binding pockets for input to free energy calculations. For a given ligand core and receptor structure, FEgrow enumerates and optimises the bioactive conformations of the grown functional group(s), making use of hybrid machine learning/molecular mechanics potential energy functions where possible. Low energy structures are optionally scored using the gnina convolutional neural network scoring function, and output for more rigorous protein-ligand binding free energy predictions. We illustrate use of the workflow by building and scoring binding poses for ten congeneric series of ligands bound to targets from a standard, high quality dataset of protein-ligand complexes. Furthermore, we build a set of 13 inhibitors of the SARS-CoV-2 main protease from the literature, and use free energy calculations to retrospectively compute their relative binding free energies. FEgrow is freely available at https://github.com/cole-group/FEgrow, along with a tutorial.
Collapse
Affiliation(s)
- Mateusz K. Bieniek
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Ben Cree
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Rachael Pirie
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Joshua T. Horton
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| | - Natalie J. Tatum
- Newcastle University Centre for Cancer, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH UK
| | - Daniel J. Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne, NE1 7RU UK
| |
Collapse
|
13
|
Kuntz D, Wilson AK. Machine learning, artificial intelligence, and chemistry: how smart algorithms are reshaping simulation and the laboratory. PURE APPL CHEM 2022. [DOI: 10.1515/pac-2022-0202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Machine learning and artificial intelligence are increasingly gaining in prominence through image analysis, language processing, and automation, to name a few applications. Machine learning is also making profound changes in chemistry. From revisiting decades-old analytical techniques for the purpose of creating better calibration curves, to assisting and accelerating traditional in silico simulations, to automating entire scientific workflows, to being used as an approach to deduce underlying physics of unexplained chemical phenomena, machine learning and artificial intelligence are reshaping chemistry, accelerating scientific discovery, and yielding new insights. This review provides an overview of machine learning and artificial intelligence from a chemist’s perspective and focuses on a number of examples of the use of these approaches in computational chemistry and in the laboratory.
Collapse
Affiliation(s)
- David Kuntz
- Department of Chemistry , University of North Texas , Denton , TX 76201 , USA
| | - Angela K. Wilson
- Department of Chemistry , Michigan State University , East Lansing , MI 48824 , USA
| |
Collapse
|
14
|
Prasad VK, Otero-de-la-Roza A, DiLabio GA. Small-Basis Set Density-Functional Theory Methods Corrected with Atom-Centered Potentials. J Chem Theory Comput 2022; 18:2913-2930. [PMID: 35412817 DOI: 10.1021/acs.jctc.2c00036] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Density functional theory (DFT) is currently the most popular method for modeling noncovalent interactions and thermochemistry. The accurate calculation of noncovalent interaction energies, reaction energies, and barrier heights requires choosing an appropriate functional and, typically, a relatively large basis set. Deficiencies of the density-functional approximation and the use of a limited basis set are the leading sources of error in the calculation of noncovalent and thermochemical properties in molecular systems. In this article, we present three new DFT methods based on the BLYP, M06-2X, and CAM-B3LYP functionals in combination with the 6-31G* basis set and corrected with atom-centered potentials (ACPs). ACPs are one-electron potentials that have the same form as effective-core potentials, except they do not replace any electrons. The ACPs developed in this work are used to generate energy corrections to the underlying DFT/basis-set method such that the errors in predicted chemical properties are minimized while maintaining the low computational cost of the parent methods. ACPs were developed for the elements H, B, C, N, O, F, Si, P, S, and Cl. The ACP parameters were determined using an extensive training set of 118655 data points, mostly of complete basis set coupled-cluster level quality. The target molecular properties for the ACP-corrected methods include noncovalent interaction energies, molecular conformational energies, reaction energies, barrier heights, and bond separation energies. The ACPs were tested first on the training set and then on a validation set of 42567 additional data points. We show that the ACP-corrected methods can predict the target molecular properties with accuracy close to complete basis set wavefunction theory methods, but at a computational cost of double-ζ DFT methods. This makes the new BLYP/6-31G*-ACP, M06-2X/6-31G*-ACP, and CAM-B3LYP/6-31G*-ACP methods uniquely suited to the calculation of noncovalent, thermochemical, and kinetic properties in large molecular systems.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada
| | - Alberto Otero-de-la-Roza
- Departamento de Química Física y Analítica, Facultad de Química, Universidad de Oviedo, MALTA Consolider Team, Oviedo E-33006, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia V1V 1V7, Canada
| |
Collapse
|
15
|
Prasad VK, Otero-de-la-Roza A, DiLabio GA. Fast and Accurate Quantum Mechanical Modeling of Large Molecular Systems Using Small Basis Set Hartree-Fock Methods Corrected with Atom-Centered Potentials. J Chem Theory Comput 2022; 18:2208-2232. [PMID: 35313106 DOI: 10.1021/acs.jctc.1c01128] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
There has been significant interest in developing fast and accurate quantum mechanical methods for modeling large molecular systems. In this work, by utilizing a machine learning regression technique, we have developed new low-cost quantum mechanical approaches to model large molecular systems. The developed approaches rely on using one-electron Gaussian-type functions called atom-centered potentials (ACPs) to correct for the basis set incompleteness and the lack of correlation effects in the underlying minimal or small basis set Hartree-Fock (HF) methods. In particular, ACPs are proposed for ten elements common in organic and bioorganic chemistry (H, B, C, N, O, F, Si, P, S, and Cl) and four different base methods: two minimal basis sets (MINIs and MINIX) plus a double-ζ basis set (6-31G*) in combination with dispersion-corrected HF (HF-D3/MINIs, HF-D3/MINIX, HF-D3/6-31G*) and the HF-3c method. The new ACPs are trained on a very large set (73 832 data points) of noncovalent properties (interaction and conformational energies) and validated additionally on a set of 32 048 data points. All reference data are of complete basis set coupled-cluster quality, mostly CCSD(T)/CBS. The proposed ACP-corrected methods are shown to give errors in the tenths of a kcal/mol range for noncovalent interaction energies and up to 2 kcal/mol for molecular conformational energies. More importantly, the average errors are similar in the training and validation sets, confirming the robustness and applicability of these methods outside the boundaries of the training set. In addition, the performance of the new ACP-corrected methods is similar to complete basis set density functional theory (DFT) but at a cost that is orders of magnitude lower, and the proposed ACPs can be used in any computational chemistry program that supports effective-core potentials without modification. It is also shown that ACPs improve the description of covalent and noncovalent bond geometries of the underlying methods and that the improvement brought about by the application of the ACPs is directly related to the number of atoms to which they are applied, allowing the treatment of systems containing some atoms for which ACPs are not available. Overall, the ACP-corrected methods proposed in this work constitute an alternative accurate, economical, and reliable quantum mechanical approach to describe the geometries, interaction energies, and conformational energies of systems with hundreds to thousands of atoms.
Collapse
Affiliation(s)
- Viki Kumar Prasad
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| | - Alberto Otero-de-la-Roza
- MALTA Consolider Team, Departamento de Química Física y Analítica, Facultad de Química, Universidad de Oviedo, E-33006 Oviedo, Spain
| | - Gino A DiLabio
- Department of Chemistry, University of British Columbia, Okanagan, 3247 University Way, Kelowna, British Columbia, Canada V1V 1V7
| |
Collapse
|
16
|
Gokcan H, Isayev O. Learning molecular potentials with neural networks. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1564] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Hatice Gokcan
- Department of Chemistry, Mellon College of Science Carnegie Mellon University Pittsburgh Pennsylvania USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science Carnegie Mellon University Pittsburgh Pennsylvania USA
| |
Collapse
|
17
|
Yang L, Horton JT, Payne MC, Penfold TJ, Cole DJ. Modeling Molecular Emitters in Organic Light-Emitting Diodes with the Quantum Mechanical Bespoke Force Field. J Chem Theory Comput 2021; 17:5021-5033. [PMID: 34264669 DOI: 10.1021/acs.jctc.1c00135] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Combined molecular dynamics (MD) and quantum mechanics (QM) simulation procedures have gained popularity in modeling the spectral properties of functional organic molecules. However, the potential energy surfaces used to propagate long-time scale dynamics in these simulations are typically described using general, transferable force fields designed for organic molecules in their electronic ground states. These force fields do not typically include spectroscopic data in their training, and importantly, there is no general protocol for including changes in geometry or intermolecular interactions with the environment that may occur upon electronic excitation. In this work, we show that parameters tailored for thermally activated delayed fluorescence (TADF) emitters used in organic light-emitting diodes (OLEDs), in both their ground and electronically excited states, can be readily derived from a small number of QM calculations using the QUBEKit (QUantum mechanical BEspoke toolKit) software and improve the overall accuracy of these simulations.
Collapse
Affiliation(s)
- Lupeng Yang
- TCM Group, Cavendish Laboratory, 19 JJ Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| | - Joshua T Horton
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Michael C Payne
- TCM Group, Cavendish Laboratory, 19 JJ Thomson Avenue, Cambridge CB3 0HE, United Kingdom
| | - Thomas J Penfold
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| | - Daniel J Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, United Kingdom
| |
Collapse
|
18
|
Rosenberger D, Smith JS, Garcia AE. Modeling of Peptides with Classical and Novel Machine Learning Force Fields: A Comparison. J Phys Chem B 2021; 125:3598-3612. [DOI: 10.1021/acs.jpcb.0c10401] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Affiliation(s)
- David Rosenberger
- Los Alamos National Laboratory, Theoretical Division, Chemistry and Physics of Materials Group, Los Alamos, 87545 New Mexico, United States
- Los Alamos National Laboratory, Theoretical Division, Center for Nonlinear Studies, Los Alamos, 87545 New Mexico, United States
| | - Justin S. Smith
- Los Alamos National Laboratory, Theoretical Division, Chemistry and Physics of Materials Group, Los Alamos, 87545 New Mexico, United States
| | - Angel E. Garcia
- Los Alamos National Laboratory, Theoretical Division, Center for Nonlinear Studies, Los Alamos, 87545 New Mexico, United States
| |
Collapse
|