1
|
Lehner MT, Katzberger P, Maeder N, Landrum GA, Riniker S. DASH properties: Estimating atomic and molecular properties from a dynamic attention-based substructure hierarchy. J Chem Phys 2024; 161:074103. [PMID: 39145551 DOI: 10.1063/5.0218154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Accepted: 08/01/2024] [Indexed: 08/16/2024] Open
Abstract
Recently, we presented a method to assign atomic partial charges based on the DASH (dynamic attention-based substructure hierarchy) tree with high efficiency and quantum mechanical (QM)-like accuracy. In addition, the approach can be considered "rule based"-where the rules are derived from the attention values of a graph neural network-and thus, each assignment is fully explainable by visualizing the underlying molecular substructures. In this work, we demonstrate that these hierarchically sorted substructures capture the key features of the local environment of an atom and allow us to predict different atomic properties with high accuracy without building a new DASH tree for each property. The fast prediction of atomic properties in molecules with the DASH tree can, for example, be used as an efficient way to generate feature vectors for machine learning without the need for expensive QM calculations. The final DASH tree with the different atomic properties as well as the complete dataset with wave functions is made freely available.
Collapse
Affiliation(s)
- Marc T Lehner
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Paul Katzberger
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Niels Maeder
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Gregory A Landrum
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Sereina Riniker
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
2
|
Abarbanel OD, Hutchison GR. QupKake: Integrating Machine Learning and Quantum Chemistry for Micro-p Ka Predictions. J Chem Theory Comput 2024; 20:6946-6956. [PMID: 38832803 PMCID: PMC11325546 DOI: 10.1021/acs.jctc.4c00328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2024]
Abstract
Accurate prediction of micro-pKa values is crucial for understanding and modulating the acidity and basicity of organic molecules, with applications in drug discovery, materials science, and environmental chemistry. This work introduces QupKake, a novel method that combines graph neural network models with semiempirical quantum mechanical (QM) features to achieve exceptional accuracy and generalization in micro-pKa prediction. QupKake outperforms state-of-the-art models on a variety of benchmark data sets, with root-mean-square errors between 0.5 and 0.8 pKa units on five external test sets. Feature importance analysis reveals the crucial role of QM features in both the reaction site enumeration and micro-pKa prediction models. QupKake represents a significant advancement in micro-pKa prediction, offering a powerful tool for various applications in chemistry and beyond.
Collapse
Affiliation(s)
- Omri D Abarbanel
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
- Department of Chemical and Petroleum Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
3
|
Solov’yov AV, Verkhovtsev AV, Mason NJ, Amos RA, Bald I, Baldacchino G, Dromey B, Falk M, Fedor J, Gerhards L, Hausmann M, Hildenbrand G, Hrabovský M, Kadlec S, Kočišek J, Lépine F, Ming S, Nisbet A, Ricketts K, Sala L, Schlathölter T, Wheatley AEH, Solov’yov IA. Condensed Matter Systems Exposed to Radiation: Multiscale Theory, Simulations, and Experiment. Chem Rev 2024; 124:8014-8129. [PMID: 38842266 PMCID: PMC11240271 DOI: 10.1021/acs.chemrev.3c00902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 05/02/2024] [Accepted: 05/10/2024] [Indexed: 06/07/2024]
Abstract
This roadmap reviews the new, highly interdisciplinary research field studying the behavior of condensed matter systems exposed to radiation. The Review highlights several recent advances in the field and provides a roadmap for the development of the field over the next decade. Condensed matter systems exposed to radiation can be inorganic, organic, or biological, finite or infinite, composed of different molecular species or materials, exist in different phases, and operate under different thermodynamic conditions. Many of the key phenomena related to the behavior of irradiated systems are very similar and can be understood based on the same fundamental theoretical principles and computational approaches. The multiscale nature of such phenomena requires the quantitative description of the radiation-induced effects occurring at different spatial and temporal scales, ranging from the atomic to the macroscopic, and the interlinks between such descriptions. The multiscale nature of the effects and the similarity of their manifestation in systems of different origins necessarily bring together different disciplines, such as physics, chemistry, biology, materials science, nanoscience, and biomedical research, demonstrating the numerous interlinks and commonalities between them. This research field is highly relevant to many novel and emerging technologies and medical applications.
Collapse
Affiliation(s)
| | | | - Nigel J. Mason
- School
of Physics and Astronomy, University of
Kent, Canterbury CT2 7NH, United
Kingdom
| | - Richard A. Amos
- Department
of Medical Physics and Biomedical Engineering, University College London, London WC1E 6BT, U.K.
| | - Ilko Bald
- Institute
of Chemistry, University of Potsdam, Karl-Liebknecht-Str. 24-25, 14476 Potsdam, Germany
| | - Gérard Baldacchino
- Université
Paris-Saclay, CEA, LIDYL, 91191 Gif-sur-Yvette, France
- CY Cergy Paris Université,
CEA, LIDYL, 91191 Gif-sur-Yvette, France
| | - Brendan Dromey
- Centre
for Light Matter Interactions, School of Mathematics and Physics, Queen’s University Belfast, Belfast BT7 1NN, United Kingdom
| | - Martin Falk
- Institute
of Biophysics of the Czech Academy of Sciences, Královopolská 135, 61200 Brno, Czech Republic
- Kirchhoff-Institute
for Physics, Heidelberg University, Im Neuenheimer Feld 227, 69120 Heidelberg, Germany
| | - Juraj Fedor
- J.
Heyrovský Institute of Physical Chemistry, Czech Academy of Sciences, Dolejškova 3, 18223 Prague, Czech Republic
| | - Luca Gerhards
- Institute
of Physics, Carl von Ossietzky University, Carl-von-Ossietzky-Str. 9-11, 26129 Oldenburg, Germany
| | - Michael Hausmann
- Kirchhoff-Institute
for Physics, Heidelberg University, Im Neuenheimer Feld 227, 69120 Heidelberg, Germany
| | - Georg Hildenbrand
- Kirchhoff-Institute
for Physics, Heidelberg University, Im Neuenheimer Feld 227, 69120 Heidelberg, Germany
- Faculty
of Engineering, University of Applied Sciences
Aschaffenburg, Würzburger
Str. 45, 63743 Aschaffenburg, Germany
| | | | - Stanislav Kadlec
- Eaton European
Innovation Center, Bořivojova
2380, 25263 Roztoky, Czech Republic
| | - Jaroslav Kočišek
- J.
Heyrovský Institute of Physical Chemistry, Czech Academy of Sciences, Dolejškova 3, 18223 Prague, Czech Republic
| | - Franck Lépine
- Université
Claude Bernard Lyon 1, CNRS, Institut Lumière
Matière, F-69622, Villeurbanne, France
| | - Siyi Ming
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield
Road, Cambridge CB2 1EW, United Kingdom
| | - Andrew Nisbet
- Department
of Medical Physics and Biomedical Engineering, University College London, London WC1E 6BT, U.K.
| | - Kate Ricketts
- Department
of Targeted Intervention, University College
London, Gower Street, London WC1E 6BT, United Kingdom
| | - Leo Sala
- J.
Heyrovský Institute of Physical Chemistry, Czech Academy of Sciences, Dolejškova 3, 18223 Prague, Czech Republic
| | - Thomas Schlathölter
- Zernike
Institute for Advanced Materials, University
of Groningen, Nijenborgh
4, 9747 AG Groningen, The Netherlands
- University
College Groningen, University of Groningen, Hoendiepskade 23/24, 9718 BG Groningen, The Netherlands
| | - Andrew E. H. Wheatley
- Yusuf
Hamied Department of Chemistry, University
of Cambridge, Lensfield
Road, Cambridge CB2 1EW, United Kingdom
| | - Ilia A. Solov’yov
- Institute
of Physics, Carl von Ossietzky University, Carl-von-Ossietzky-Str. 9-11, 26129 Oldenburg, Germany
| |
Collapse
|
4
|
Xie Q, Horsfield AP. Coordinate-Free and Low-Order Scaling Machine Learning Model for Atomic Partial Charge Prediction for Any Size of Molecules. J Chem Inf Model 2024; 64:4419-4425. [PMID: 38757521 PMCID: PMC11167589 DOI: 10.1021/acs.jcim.4c00376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/18/2024]
Abstract
The atomic partial charge is of great importance in many fields, such as chemistry and drug-target recognition. However, conventional quantum-based computing of atomic charges is relatively slow, limiting further applications of atomic charge analysis. With the help of machine learning methods, various kinds of models appear to speed up atomic charge calculations. However, there are still some concerning problems. Some models based on geometric coordinates require high-accuracy geometry optimization as a preprocess, while other models have a limitation on the size of input molecules that narrow the applications of the model. Here, we propose a machine learning atomic charge model based on a message-passing featurizer. This preprocessing featurizer can quickly extract atomic environment information from a molecule according to the connectivity inside the molecule. The resulting descriptor can be used with a neural network to quickly predict the atomic partial charge. The model is able to automatically adapt to any size of molecule while remaining efficient and achieves a root-mean-square error in the Hirshfeld charge prediction of 0.018e, with an overall time complexity of O(n2). Thus, this model could enlarge the range of applications of atomic partial charge to more fields and cases.
Collapse
Affiliation(s)
- Qin Xie
- Department of Materials, Imperial
College London, SW7 2AZ London, U.K.
| | | |
Collapse
|
5
|
Grassano JS, Pickering I, Roitberg AE, González Lebrero MC, Estrin DA, Semelak JA. Assessment of Embedding Schemes in a Hybrid Machine Learning/Classical Potentials (ML/MM) Approach. J Chem Inf Model 2024; 64:4047-4058. [PMID: 38710065 DOI: 10.1021/acs.jcim.4c00478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Machine learning (ML) methods have reached high accuracy levels for the prediction of in vacuo molecular properties. However, the simulation of large systems solely through ML methods (such as those based on neural network potentials) is still a challenge. In this context, one of the most promising frameworks for integrating ML schemes in the simulation of complex molecular systems are the so-called ML/MM methods. These multiscale approaches combine ML methods with classical force fields (MM), in the same spirit as the successful hybrid quantum mechanics-molecular mechanics methods (QM/MM). The key issue for such ML/MM methods is an adequate description of the coupling between the region of the system described by ML and the region described at the MM level. In the context of QM/MM schemes, the main ingredient of the interaction is electrostatic, and the state of the art is the so-called electrostatic-embedding. In this study, we analyze the quality of simpler mechanical embedding-based approaches, specifically focusing on their application within a ML/MM framework utilizing atomic partial charges derived in vacuo. Taking as reference electrostatic embedding calculations performed at a QM(DFT)/MM level, we explore different atomic charges schemes, as well as a polarization correction computed using atomic polarizabilites. Our benchmark data set comprises a set of about 80k small organic structures from the ANI-1x and ANI-2x databases, solvated in water. The results suggest that the minimal basis iterative stockholder (MBIS) atomic charges yield the best agreement with the reference coupling energy. Remarkable enhancements are achieved by including a simple polarization correction.
Collapse
Affiliation(s)
- Juan S Grassano
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Ignacio Pickering
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Adrian E Roitberg
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
- Department of Chemistry, University of Florida, Gainesville, Florida 32611, United States
| | - Mariano C González Lebrero
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Dario A Estrin
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| | - Jonathan A Semelak
- Facultad de Ciencias Exactas y Naturales, Departamento de Química Inorgánica, Analítica y Química Física, Universidad de Buenos Aires, Intendente Güiraldes 2160, Buenos Aires C1428EHA, Argentina
- CONICET─Universidad de Buenos Aires, Instituto de Química-Física de los Materiales, Medio Ambiente y Energía (INQUIMAE), Ciudad Universitaria, Pabellón 2, Buenos Aires C1428EHA, Argentina
| |
Collapse
|
6
|
Wang Y, Pulido I, Takaba K, Kaminow B, Scheen J, Wang L, Chodera JD. EspalomaCharge: Machine Learning-Enabled Ultrafast Partial Charge Assignment. J Phys Chem A 2024; 128:4160-4167. [PMID: 38717302 PMCID: PMC11129294 DOI: 10.1021/acs.jpca.4c01287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/17/2024] [Accepted: 04/17/2024] [Indexed: 05/24/2024]
Abstract
Atomic partial charges are crucial parameters in molecular dynamics simulation, dictating the electrostatic contributions to intermolecular energies and thereby the potential energy landscape. Traditionally, the assignment of partial charges has relied on surrogates of ab initio semiempirical quantum chemical methods such as AM1-BCC and is expensive for large systems or large numbers of molecules. We propose a hybrid physical/graph neural network-based approximation to the widely popular AM1-BCC charge model that is orders of magnitude faster while maintaining accuracy comparable to differences in AM1-BCC implementations. Our hybrid approach couples a graph neural network to a streamlined charge equilibration approach in order to predict molecule-specific atomic electronegativity and hardness parameters, followed by analytical determination of optimal charge-equilibrated parameters that preserve total molecular charge. This hybrid approach scales linearly with the number of atoms, enabling for the first time the use of fully consistent charge models for small molecules and biopolymers for the construction of next-generation self-consistent biomolecular force fields. Implemented in the free and open source package EspalomaCharge, this approach provides drop-in replacements for both AmberTools antechamber and the Open Force Field Toolkit charging workflows, in addition to stand-alone charge generation interfaces. Source code is available at https://github.com/choderalab/espaloma-charge.
Collapse
Affiliation(s)
- Yuanqing Wang
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
- Simons
Center for Computational Chemistry and Center for Data Science, New York University, New York, New York 10004, United States
| | - Iván Pulido
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Kenichiro Takaba
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
- Pharmaceutical
Research Center, Advanced Drug Discovery, Asahi Kasei Pharma Corporation, Shizuoka 410-2321, Japan
| | - Benjamin Kaminow
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
- Tri-Institutional
PhD Program in Computational Biology and Medicine, Weill Cornell Medical
College, Cornell University, New York, New York 10065, United States
| | - Jenke Scheen
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| | - Lily Wang
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
- Open Molecular Sciences Foundation, Davis, California 95618, United States
| | - John D. Chodera
- Computational
and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, New York 10065, United States
| |
Collapse
|
7
|
Sarikas AP, Gkagkas K, Froudakis GE. Gas adsorption meets deep learning: voxelizing the potential energy surface of metal-organic frameworks. Sci Rep 2024; 14:2242. [PMID: 38278851 PMCID: PMC10817925 DOI: 10.1038/s41598-023-50309-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 12/17/2023] [Indexed: 01/28/2024] Open
Abstract
Intrinsic properties of metal-organic frameworks (MOFs), such as their ultra porosity and high surface area, deem them promising solutions for problems involving gas adsorption. Nevertheless, due to their combinatorial nature, a huge number of structures is feasible which renders cumbersome the selection of the best candidates with traditional techniques. Recently, machine learning approaches have emerged as efficient tools to deal with this challenge, by allowing researchers to rapidly screen large databases of MOFs via predictive models. The performance of the latter is tightly tied to the mathematical representation of a material, thus necessitating the use of informative descriptors. In this work, a generalized framework to predict gaseous adsorption properties is presented, using as one and only descriptor the capstone of chemical information: the potential energy surface (PES). In order to be machine understandable, the PES is voxelized and subsequently a 3D convolutional neural network (CNN) is exploited to process this 3D energy image. As a proof of concept, the proposed pipeline is applied on predicting [Formula: see text] uptake in MOFs. The resulting model outperforms a conventional model built with geometric descriptors and requires two orders of magnitude less training data to reach a given level of performance. Moreover, the transferability of the approach to different host-guest systems is demonstrated, examining [Formula: see text] uptake in COFs. The generic character of the proposed methodology, inherited from the PES, renders it applicable to fields other than reticular chemistry.
Collapse
Affiliation(s)
- Antonios P Sarikas
- Department of Chemistry, University of Crete, Voutes Campus, 70013, Heraklion, Crete, Greece
| | - Konstantinos Gkagkas
- Advanced Technology Division, Toyota Motor Europe NV/SA, Technical Center, Hoge Wei 33B, 1930, Zaventem, Belgium
| | - George E Froudakis
- Department of Chemistry, University of Crete, Voutes Campus, 70013, Heraklion, Crete, Greece.
| |
Collapse
|
8
|
Grandits M, Ecker GF. Ligand- and Structure-based Approaches for Transmembrane Transporter Modeling. Curr Drug Res Rev 2024; 16:81-93. [PMID: 37157206 PMCID: PMC11340286 DOI: 10.2174/2589977515666230508123041] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/15/2023] [Accepted: 03/28/2023] [Indexed: 05/10/2023]
Abstract
The study of transporter proteins is key to understanding the mechanism behind multidrug resistance and drug-drug interactions causing severe side effects. While ATP-binding transporters are well-studied, solute carriers illustrate an understudied family with a high number of orphan proteins. To study these transporters, in silico methods can be used to shed light on the basic molecular machinery by studying protein-ligand interactions. Nowadays, computational methods are an integral part of the drug discovery and development process. In this short review, computational approaches, such as machine learning, are discussed, which try to tackle interactions between transport proteins and certain compounds to locate target proteins. Furthermore, a few cases of selected members of the ATP binding transporter and solute carrier family are covered, which are of high interest in clinical drug interaction studies, especially for regulatory agencies. The strengths and limitations of ligand-based and structure-based methods are discussed to highlight their applicability for different studies. Furthermore, the combination of multiple approaches can improve the information obtained to find crucial amino acids that explain important interactions of protein-ligand complexes in more detail. This allows the design of drug candidates with increased activity towards a target protein, which further helps to support future synthetic efforts.
Collapse
Affiliation(s)
- Melanie Grandits
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| | - Gerhard F. Ecker
- Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria
| |
Collapse
|
9
|
Lehner MT, Katzberger P, Maeder N, Schiebroek CC, Teetz J, Landrum GA, Riniker S. DASH: Dynamic Attention-Based Substructure Hierarchy for Partial Charge Assignment. J Chem Inf Model 2023; 63:6014-6028. [PMID: 37738206 PMCID: PMC10565818 DOI: 10.1021/acs.jcim.3c00800] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Indexed: 09/24/2023]
Abstract
We present a robust and computationally efficient approach for assigning partial charges of atoms in molecules. The method is based on a hierarchical tree constructed from attention values extracted from a graph neural network (GNN), which was trained to predict atomic partial charges from accurate quantum-mechanical (QM) calculations. The resulting dynamic attention-based substructure hierarchy (DASH) approach provides fast assignment of partial charges with the same accuracy as the GNN itself, is software-independent, and can easily be integrated in existing parametrization pipelines, as shown for the Open force field (OpenFF). The implementation of the DASH workflow, the final DASH tree, and the training set are available as open source/open data from public repositories.
Collapse
Affiliation(s)
| | | | - Niels Maeder
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Carl C.G. Schiebroek
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Jakob Teetz
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Gregory A. Landrum
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Sereina Riniker
- Department of Chemistry and
Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| |
Collapse
|
10
|
Janitra RS, Destiarani W, Hardianto A, Baroroh U, Rohmatulloh FG, Rustaman, Subroto T, Rukiah, Yusuf M. Multilayer Model of Gold Nanoparticles (AuNPs) and Its Application in the Classical Molecular Dynamics Simulation of Citrate-Capped AuNPs. J Phys Chem B 2023; 127:7103-7110. [PMID: 37540714 DOI: 10.1021/acs.jpcb.3c00771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/06/2023]
Abstract
Studies on the interaction between gold nanoparticles (AuNPs) and functional proteins have been useful in developing diagnostic and therapeutic agents. Such studies require a realistic computational model of AuNPs for successful molecular design works. This study offers a new multilayer model of AuNPs to address the inconsistency between its molecular mechanics' interpretation and AuNP's plasmonic nature. We performed partial charge quantum calculation of AuNPs using Au13 and Au55 models. The result showed that it has partial negative charges on the surface and partial positive charges on the inner part, indicating that the AuNP model should be composed of multiatom types. We tested the partial charge parameters of these gold (Au) atoms in classical molecular dynamics simulation (CMD) of AuNPs. The result showed that our parameters performed better in simulating the adsorption of Na+ and dicarboxy acetone in terms of consistency with surface charge density than the zero charges Au in the interface force field (IFF). We proposed that the multiple-charged AuNP model can be developed further into a simpler four-atom type of Au in a larger AuNP size.
Collapse
Affiliation(s)
- Regaputra S Janitra
- Biotechnology Master Program, Postgraduate School, Universitas Padjadjaran, Jl. Dipatiukur 35, Bandung 40132, West Java, Indonesia
| | - Wanda Destiarani
- Research Center for Molecular Biotechnology and Bioinformatics, Universitas Padjadjaran, Jl. Singaperbangsa 2, Bandung 40132, West Java, Indonesia
| | - Ari Hardianto
- Department of Chemistry, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl. Raya Bandung-Sumedang KM 21, Jatinangor 45363, West Java, Indonesia
- Research Center for Molecular Biotechnology and Bioinformatics, Universitas Padjadjaran, Jl. Singaperbangsa 2, Bandung 40132, West Java, Indonesia
| | - Umi Baroroh
- Research Center for Molecular Biotechnology and Bioinformatics, Universitas Padjadjaran, Jl. Singaperbangsa 2, Bandung 40132, West Java, Indonesia
- Department of Biotechnology, Indonesian School of Pharmacy, Jl. Soekarno Hatta No. 354, Bandung 40266, West Java, Indonesia
| | - Fauzian G Rohmatulloh
- Research Center for Molecular Biotechnology and Bioinformatics, Universitas Padjadjaran, Jl. Singaperbangsa 2, Bandung 40132, West Java, Indonesia
| | - Rustaman
- Department of Chemistry, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl. Raya Bandung-Sumedang KM 21, Jatinangor 45363, West Java, Indonesia
| | - Toto Subroto
- Department of Chemistry, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl. Raya Bandung-Sumedang KM 21, Jatinangor 45363, West Java, Indonesia
- Research Center for Molecular Biotechnology and Bioinformatics, Universitas Padjadjaran, Jl. Singaperbangsa 2, Bandung 40132, West Java, Indonesia
| | - Rukiah
- Department of Chemistry, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl. Raya Bandung-Sumedang KM 21, Jatinangor 45363, West Java, Indonesia
| | - Muhammad Yusuf
- Department of Chemistry, Faculty of Mathematics and Natural Sciences, Universitas Padjadjaran, Jl. Raya Bandung-Sumedang KM 21, Jatinangor 45363, West Java, Indonesia
- Research Center for Molecular Biotechnology and Bioinformatics, Universitas Padjadjaran, Jl. Singaperbangsa 2, Bandung 40132, West Java, Indonesia
| |
Collapse
|
11
|
Gallegos M, Martín Pendás Á. Developing a User-Friendly Code for the Fast Estimation of Well-Behaved Real-Space Partial Charges. J Chem Inf Model 2023. [PMID: 37339425 DOI: 10.1021/acs.jcim.3c00597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/22/2023]
Abstract
The Quantum Theory of Atoms in Molecules (QTAIM) provides an intuitive, yet physically sound, strategy to determine the partial charges of any chemical system relying on the topology induced by the electron density ρ(r) . In a previous work [J. Chem. Phys. 2022, 156, 014112], we introduced a machine learning (ML) model for the computation of QTAIM charges of C, H, O, and N atoms at a fraction of the conventional computational cost. Unfortunately, the independent nature of the atomistic predictions implies that the raw atomic charges may not necessarily reconstruct the exact molecular charge, limiting the applicability of the latter in the chemistry realm. Trying to solve such an inconvenience, we introduce NNAIMGUI, a user-friendly code which combines the inferring abilities of ML with an equilibration strategy to afford adequately behaved partial charges. The performance of this approach is put to the test in a variety of scenarios including interpolation and extrapolation regimes (e.g chemical reactions) as well as large systems. The results of this work prove that the equilibrated charges retain the chemically accurate behavior reproduced by the ML models. Furthermore, NNAIMGUI is a fully flexible architecture allowing users to train and use tailor-made models targeted at any atomic property of choice. In this way, the GUI-interfaced code, equipped with visualization utilities, makes the computation of real-space atomic properties much more appealing and intuitive, paving the way toward the extension of QTAIM related descriptors beyond the theoretical chemistry community.
Collapse
Affiliation(s)
- Miguel Gallegos
- Departamento Química Física y Analítica, Universidad de Oviedo, 33006 Oviedo, Spain
| | - Ángel Martín Pendás
- Departamento Química Física y Analítica, Universidad de Oviedo, 33006 Oviedo, Spain
| |
Collapse
|
12
|
Han B, Isborn CM, Shi L. Incorporating Polarization and Charge Transfer into a Point-Charge Model for Water Using Machine Learning. J Phys Chem Lett 2023; 14:3869-3877. [PMID: 37067482 DOI: 10.1021/acs.jpclett.3c00036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Rigid nonpolarizable water models with fixed point charges have been widely employed in molecular dynamics simulations due to their efficiency and reasonable accuracy for the potential energy surface. However, the dipole moment surface of water is not necessarily well-described by the same fixed charges, leading to failure in reproducing dipole-related properties. Here, we developed a machine-learning model trained against electronic structure data to assign point charges for water, and the resulting dipole moment surface significantly improved the predictions of the dielectric constant and the low-frequency IR spectrum of liquid water. Our analysis reveals that within our atom-centered point-charge description of the dipole moment surface, the intermolecular charge transfer is the major source of the peak intensity at 200 cm-1, whereas the intramolecular polarization controls the enhancement of the dielectric constant. The effects of exact Hartree-Fock exchange in the hybrid density functional on these properties are also discussed.
Collapse
Affiliation(s)
- Bowen Han
- Chemistry and Biochemistry, University of California, Merced, California 95343, United States
| | - Christine M Isborn
- Chemistry and Biochemistry, University of California, Merced, California 95343, United States
| | - Liang Shi
- Chemistry and Biochemistry, University of California, Merced, California 95343, United States
| |
Collapse
|
13
|
Bleiziffer P, Schaller K, Riniker S. Correction to "Machine Learning of Partial Charges Derived From High-Quality Quantum-Mechanical Calculations". J Chem Inf Model 2023; 63:2265. [PMID: 36940093 DOI: 10.1021/acs.jcim.3c00351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2023]
|
14
|
Kříž K, Schmidt L, Andersson AT, Walz MM, van der Spoel D. An Imbalance in the Force: The Need for Standardized Benchmarks for Molecular Simulation. J Chem Inf Model 2023; 63:412-431. [PMID: 36630710 PMCID: PMC9875315 DOI: 10.1021/acs.jcim.2c01127] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Indexed: 01/12/2023]
Abstract
Force fields (FFs) for molecular simulation have been under development for more than half a century. As with any predictive model, rigorous testing and comparisons of models critically depends on the availability of standardized data sets and benchmarks. While such benchmarks are rather common in the fields of quantum chemistry, this is not the case for empirical FFs. That is, few benchmarks are reused to evaluate FFs, and development teams rather use their own training and test sets. Here we present an overview of currently available tests and benchmarks for computational chemistry, focusing on organic compounds, including halogens and common ions, as FFs for these are the most common ones. We argue that many of the benchmark data sets from quantum chemistry can in fact be reused for evaluating FFs, but new gas phase data is still needed for compounds containing phosphorus and sulfur in different valence states. In addition, more nonequilibrium interaction energies and forces, as well as molecular properties such as electrostatic potentials around compounds, would be beneficial. For the condensed phases there is a large body of experimental data available, and tools to utilize these data in an automated fashion are under development. If FF developers, as well as researchers in artificial intelligence, would adopt a number of these data sets, it would become easier to compare the relative strengths and weaknesses of different models and to, eventually, restore the balance in the force.
Collapse
Affiliation(s)
- Kristian Kříž
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - Lisa Schmidt
- Faculty
of Biosciences, University of Heidelberg, Heidelberg69117, Germany
| | - Alfred T. Andersson
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - Marie-Madeleine Walz
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| | - David van der Spoel
- Department
of Cell and Molecular Biology, Uppsala University, Box 596, SE-75124Uppsala, Sweden
| |
Collapse
|
15
|
Yin C, Song Z, Tian H, Palzkill T, Tao P. Unveiling the structural features that regulate carbapenem deacylation in KPC-2 through QM/MM and interpretable machine learning. Phys Chem Chem Phys 2023; 25:1349-1362. [PMID: 36537692 PMCID: PMC11162551 DOI: 10.1039/d2cp03724f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Resistance to carbapenem β-lactams presents major clinical and economical challenges for the treatment of pathogen infections. The fast hydrolysis of carbapenems by carbapenemase-producing bacterial strains enables the effective deactivation of carbapenem antibiotics. In this study, we aim to unravel the structural features that distinguish the notable deacylation activity of carbapenemases. The deacylation reactions between imipenem (IPM) and the KPC-2 class A serine-based β-lactamases (ASβLs) are modeled with combined quantum mechanical/molecular mechanical (QM/MM) minimum energy pathway (MEP) calculations and interpretable machine-learning (ML) methods. We first applied a dual-level computational protocol to achieve fast sampling of QM/MM MEPs. A tree-based ensemble ML model was employed to learn the MEP activation barriers from the conformational features of the KPC-2/IPM active site. The barrier-predicting model was then unboxed using the Shapley additive explanation (SHAP) importance attribution methods to derive mechanistic insights, which were also verified by additional QM/MM wavefunction analysis. Essentially, we show that potential hydrogen bonding interactions of the general base and the tautomerization states of the carbapenem pyrroline ring could concertedly regulate the activation barrier of KPC-2/IPM deacylation. Nonetheless, we demonstrate the efficacy of interpretable ML to assist the analysis of QM/MM simulation data for robust extraction of human-interpretable mechanistic insights.
Collapse
Affiliation(s)
- Chao Yin
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75205, USA.
| | - Zilin Song
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75205, USA.
| | - Hao Tian
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75205, USA.
| | - Timothy Palzkill
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, 77030, USA
| | - Peng Tao
- Department of Chemistry, Center for Research Computing, Center for Drug Discovery, Design, and Delivery (CD4), Southern Methodist University, Dallas, Texas, 75205, USA.
| |
Collapse
|
16
|
Combining machine‐learning and molecular‐modeling methods for drug‐target affinity predictions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
17
|
Wang Y, Fass J, Kaminow B, Herr JE, Rufa D, Zhang I, Pulido I, Henry M, Bruce Macdonald HE, Takaba K, Chodera JD. End-to-end differentiable construction of molecular mechanics force fields. Chem Sci 2022; 13:12016-12033. [PMID: 36349096 PMCID: PMC9600499 DOI: 10.1039/d2sc02739a] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 09/05/2022] [Indexed: 01/07/2023] Open
Abstract
Molecular mechanics (MM) potentials have long been a workhorse of computational chemistry. Leveraging accuracy and speed, these functional forms find use in a wide variety of applications in biomolecular modeling and drug discovery, from rapid virtual screening to detailed free energy calculations. Traditionally, MM potentials have relied on human-curated, inflexible, and poorly extensible discrete chemical perception rules (atom types) for applying parameters to small molecules or biopolymers, making it difficult to optimize both types and parameters to fit quantum chemical or physical property data. Here, we propose an alternative approach that uses graph neural networks to perceive chemical environments, producing continuous atom embeddings from which valence and nonbonded parameters can be predicted using invariance-preserving layers. Since all stages are built from smooth neural functions, the entire process-spanning chemical perception to parameter assignment-is modular and end-to-end differentiable with respect to model parameters, allowing new force fields to be easily constructed, extended, and applied to arbitrary molecules. We show that this approach is not only sufficiently expressive to reproduce legacy atom types, but that it can learn to accurately reproduce and extend existing molecular mechanics force fields. Trained with arbitrary loss functions, it can construct entirely new force fields self-consistently applicable to both biopolymers and small molecules directly from quantum chemical calculations, with superior fidelity than traditional atom or parameter typing schemes. When adapted to simultaneously fit partial charge models, espaloma delivers high-quality partial atomic charges orders of magnitude faster than current best-practices with low inaccuracy. When trained on the same quantum chemical small molecule dataset used to parameterize the Open Force Field ("Parsley") openff-1.2.0 small molecule force field augmented with a peptide dataset, the resulting espaloma model shows superior accuracy vis-á-vis experiments in computing relative alchemical free energy calculations for a popular benchmark. This approach is implemented in the free and open source package espaloma, available at https://github.com/choderalab/espaloma.
Collapse
Affiliation(s)
- Yuanqing Wang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Physiology, Biophysics and System Biology PhD Program, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA,MFA Program in Creative Writing, Division of Humanities and Arts, City College of New York, City University of New YorkNew York 10031NYUSA
| | - Josh Fass
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA
| | - Benjamin Kaminow
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA
| | - John E. Herr
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| | - Dominic Rufa
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA
| | - Ivy Zhang
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medical College, Cornell UniversityNew York 10065NYUSA
| | - Iván Pulido
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| | - Mike Henry
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| | - Hannah E. Bruce Macdonald
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| | - Kenichiro Takaba
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA,Pharmaceutical Research Center, Advanced Drug Discovery, Asahi Kasei Pharma CorporationShizuoka 410-2321Japan
| | - John D. Chodera
- Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer CenterNew York 10065NYUSA
| |
Collapse
|
18
|
Zhao L, Pu M, Wang H, Ma X, Zhang YJ. Modified Electrostatic Complementary Score Function and Its Application Boundary Exploration in Drug Design. J Chem Inf Model 2022; 62:4420-4426. [PMID: 36069259 DOI: 10.1021/acs.jcim.2c00616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
In recent years, machine learning (ML) models have been found to quickly predict various molecular properties with accuracy comparable to high-level quantum chemistry methods. One such example is the calculation of electrostatic potential (ESP). Different ESP prediction ML models were proposed to generate surface molecular charge distribution. Electrostatic complementarity (EC) can apply ESP data to quantify the complementarity between a ligand and its binding pocket, leading to the potential to increase the efficiency of drug design. However, there is not much research discussing EC score functions and their applicability domain. We propose a new EC score function modified from the one originally developed by Bauer and Mackey, and confirm its effectiveness against the available Pearson's R correlation coefficient. Additionally, the applicability domain of the EC score and two indices used to define the EC score application scope will be discussed.
Collapse
Affiliation(s)
- Liming Zhao
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, Beijing 100080, China
| | - Mengchen Pu
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, Beijing 100080, China
| | - Huting Wang
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, Beijing 100080, China
| | - Xiangyu Ma
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, Beijing 100080, China
| | - Yingsheng J Zhang
- Beijing StoneWise Technology Co Ltd., Haidian Street #15, Haidian District, Beijing 100080, China
| |
Collapse
|
19
|
Fedik N, Zubatyuk R, Kulichenko M, Lubbers N, Smith JS, Nebgen B, Messerly R, Li YW, Boldyrev AI, Barros K, Isayev O, Tretiak S. Extending machine learning beyond interatomic potentials for predicting molecular properties. Nat Rev Chem 2022; 6:653-672. [PMID: 37117713 DOI: 10.1038/s41570-022-00416-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2022] [Indexed: 11/09/2022]
Abstract
Machine learning (ML) is becoming a method of choice for modelling complex chemical processes and materials. ML provides a surrogate model trained on a reference dataset that can be used to establish a relationship between a molecular structure and its chemical properties. This Review highlights developments in the use of ML to evaluate chemical properties such as partial atomic charges, dipole moments, spin and electron densities, and chemical bonding, as well as to obtain a reduced quantum-mechanical description. We overview several modern neural network architectures, their predictive capabilities, generality and transferability, and illustrate their applicability to various chemical properties. We emphasize that learned molecular representations resemble quantum-mechanical analogues, demonstrating the ability of the models to capture the underlying physics. We also discuss how ML models can describe non-local quantum effects. Finally, we conclude by compiling a list of available ML toolboxes, summarizing the unresolved challenges and presenting an outlook for future development. The observed trends demonstrate that this field is evolving towards physics-based models augmented by ML, which is accompanied by the development of new methods and the rapid growth of user-friendly ML frameworks for chemistry.
Collapse
|
20
|
Berryman JT, Taghavi A, Mazur F, Tkatchenko A. Quantum machine learning corrects classical forcefields: Stretching DNA base pairs in explicit solvent. J Chem Phys 2022; 157:064107. [PMID: 35963717 DOI: 10.1063/5.0094727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In order to improve the accuracy of molecular dynamics simulations, classical forcefields are supplemented with a kernel-based machine learning method trained on quantum-mechanical fragment energies. As an example application, a potential-energy surface is generalized for a small DNA duplex, taking into account explicit solvation and long-range electron exchange-correlation effects. A long-standing problem in molecular science is that experimental studies of the structural and thermodynamic behavior of DNA under tension are not well confirmed by simulation; study of the potential energy vs extension taking into account a novel correction shows that leading classical DNA models have excessive stiffness with respect to stretching. This discrepancy is found to be common across multiple forcefields. The quantum correction is in qualitative agreement with the experimental thermodynamics for larger DNA double helices, providing a candidate explanation for the general and long-standing discrepancy between single molecule stretching experiments and classical calculations of DNA stretching. The new dataset of quantum calculations should facilitate multiple types of nucleic acid simulation, and the associated Kernel Modified Molecular Dynamics method (KMMD) is applicable to biomolecular simulations in general. KMMD is made available as part of the AMBER22 simulation software.
Collapse
Affiliation(s)
- Joshua T Berryman
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Amirhossein Taghavi
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Florian Mazur
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| | - Alexandre Tkatchenko
- Department of Physics and Materials Science, University of Luxembourg, L-1511 Luxembourg City, Luxembourg
| |
Collapse
|
21
|
When machine learning meets molecular synthesis. TRENDS IN CHEMISTRY 2022. [DOI: 10.1016/j.trechm.2022.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
22
|
Ringrose C, Horton JT, Wang LP, Cole DJ. Exploration and validation of force field design protocols through QM-to-MM mapping. Phys Chem Chem Phys 2022; 24:17014-17027. [PMID: 35792069 DOI: 10.1039/d2cp02864f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The scale of the parameter optimisation problem in traditional molecular mechanics force field construction means that design of a new force field is a long process, and sub-optimal choices made in the early stages can persist for many generations. We hypothesise that careful use of quantum mechanics to inform molecular mechanics parameter derivation (QM-to-MM mapping) should be used to significantly reduce the number of parameters that require fitting to experiment and increase the pace of force field development. Here, we design and train a collection of 15 new protocols for small, organic molecule force field derivation, and test their accuracy against experimental liquid properties. Our best performing model has only seven fitting parameters, yet achieves mean unsigned errors of just 0.031 g cm-3 and 0.69 kcal mol-1 in liquid densities and heats of vaporisation, compared to experiment. The software required to derive the designed force fields is freely available at https://github.com/qubekit/QUBEKit.
Collapse
Affiliation(s)
- Chris Ringrose
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
| | - Joshua T Horton
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
| | - Lee-Ping Wang
- Department of Chemistry, The University of California at Davis, Davis, California 95616, USA
| | - Daniel J Cole
- School of Natural and Environmental Sciences, Newcastle University, Newcastle upon Tyne NE1 7RU, UK.
| |
Collapse
|
23
|
Deep learning of dynamically responsive chemical Hamiltonians with semiempirical quantum mechanics. Proc Natl Acad Sci U S A 2022; 119:e2120333119. [PMID: 35776544 PMCID: PMC9271210 DOI: 10.1073/pnas.2120333119] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Machine learning is revolutionizing computational chemistry by greatly reducing the computational difficulty of many simulations performed by computational chemists while maintaining accuracies of 1 kcal/mol or better. A major challenge in this field is addressing the poor extensibility and transferability of conventional machine-learning (ML) models, which result in degraded accuracy when applying these models to large or new chemical systems. To build a more general and interpretable model, we incorporate a quantum chemistry framework into the deep neural network, resulting in an interpretable Hamiltonian-based model with markedly high training efficiency. We validate this method on multiple large biochemical molecules by predicting various properties with consistently high accuracies, indicating the model is both extensible and transferable. Conventional machine-learning (ML) models in computational chemistry learn to directly predict molecular properties using quantum chemistry only for reference data. While these heuristic ML methods show quantum-level accuracy with speeds several orders of magnitude faster than traditional quantum chemistry methods, they suffer from poor extensibility and transferability; i.e., their accuracy degrades on large or new chemical systems. Incorporating quantum chemistry frameworks into the ML models directly solves this problem. Here we take the structure of semiempirical quantum mechanics (SEQM) methods to construct dynamically responsive Hamiltonians. SEQM methods use empirical parameters fitted to experimental properties to construct reduced-order Hamiltonians, facilitating much faster calculations than ab initio methods but with compromised accuracy. By replacing these static parameters with machine-learned dynamic values inferred from the local environment, we greatly improve the accuracy of the SEQM methods. Trained on molecular energies and atomic forces, these dynamically generated Hamiltonian parameters show a strong correlation with atomic hybridization and bonding. Trained with only about 60,000 small organic molecular conformers, the resulting model retains interpretability, extensibility, and transferability when testing on much larger chemical systems and predicting various molecular properties. Overall, this work demonstrates the virtues of incorporating physics-based descriptions with ML to develop models that are simultaneously accurate, transferable, and interpretable.
Collapse
|
24
|
Hu X, Lenz-Himmer MO, Baldauf C. Better force fields start with better data: A data set of cation dipeptide interactions. Sci Data 2022; 9:327. [PMID: 35715420 PMCID: PMC9205945 DOI: 10.1038/s41597-022-01297-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 03/18/2022] [Indexed: 11/08/2022] Open
Abstract
We present a data set from a first-principles study of amino-methylated and acetylated (capped) dipeptides of the 20 proteinogenic amino acids - including alternative possible side chain protonation states and their interactions with selected divalent cations (Ca2+, Mg2+ and Ba2+). The data covers 21,909 stationary points on the respective potential-energy surfaces in a wide relative energy range of up to 4 eV (390 kJ/mol). Relevant properties of interest, like partial charges, were derived for the conformers. The motivation was to provide a solid data basis for force field parameterization and further applications like machine learning or benchmarking. In particular the process of creating all this data on the same first-principles footing, i.e. density-functional theory calculations employing the generalized gradient approximation with a van der Waals correction, makes this data suitable for first principles data-driven force field development. To make the data accessible across domain borders and to machines, we formalized the metadata in an ontology.
Collapse
Affiliation(s)
- Xiaojuan Hu
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195, Berlin, Germany.
| | | | - Carsten Baldauf
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195, Berlin, Germany.
| |
Collapse
|
25
|
Song Z, Trozzi F, Tian H, Yin C, Tao P. Mechanistic Insights into Enzyme Catalysis from Explaining Machine-Learned Quantum Mechanical and Molecular Mechanical Minimum Energy Pathways. ACS PHYSICAL CHEMISTRY AU 2022; 2:316-330. [PMID: 35936506 PMCID: PMC9344433 DOI: 10.1021/acsphyschemau.2c00005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
With the increasing popularity of machine learning (ML) applications, the demand for explainable artificial intelligence techniques to explain ML models developed for computational chemistry has also emerged. In this study, we present the development of the Boltzmann-weighted cumulative integrated gradients (BCIG) approach for effective explanation of mechanistic insights into ML models trained on high-level quantum mechanical and molecular mechanical (QM/MM) minimum energy pathways. Using the acylation reactions of the Toho-1 β-lactamase and two antibiotics (ampicillin and cefalexin) as the model systems, we show that the BCIG approach could quantitatively attribute the energetic contribution in one system and the relative reactivity of individual steps across different systems to specific chemical processes such as the bond making/breaking and proton transfers. The proposed BCIG contribution attribution method quantifies chemistry-interpretable insights in terms of contributions from each elementary chemical process, which is in agreement with the validating QM/MM calculations and our intuitive mechanistic understandings of the model reactions.
Collapse
|
26
|
Cons BD, Twigg DG, Kumar R, Chessari G. Electrostatic Complementarity in Structure-Based Drug Design. J Med Chem 2022; 65:7476-7488. [PMID: 35512344 DOI: 10.1021/acs.jmedchem.2c00164] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Optimization of electrostatic complementarity is an important strategy in structure-based drug discovery for improving the affinity of molecules against a specific protein target. In this Miniperspective we identify examples where deliberate optimization of protein-ligand electrostatic complementarity or intramolecular electrostatic interactions gave improvements in target affinity (up to 250-fold), physicochemical properties, in vitro properties, and off-target selectivity. We also look retrospectively at a series of factor Xa inhibitors that show an almost 8000-fold range in potency that can be correlated with the calculated electrostatic potential (ESP) surfaces. Recent developments using a graph-convolutional deep neural network to rapidly generate high quality ESP surfaces have the potential to make this useful tool more accessible for a wider audience within the field of medicinal chemistry.
Collapse
Affiliation(s)
- Benjamin D Cons
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge CB4 0QA, U.K
| | - David G Twigg
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge CB4 0QA, U.K
| | - Rajendra Kumar
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge CB4 0QA, U.K
| | - Gianni Chessari
- Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge CB4 0QA, U.K
| |
Collapse
|
27
|
Sripattaraphan A, Sanachai K, Chavasiri W, Boonyasuppayakorn S, Maitarad P, Rungrotmongkol T. Computational Screening of Newly Designed Compounds against Coxsackievirus A16 and Enterovirus A71. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27061908. [PMID: 35335272 PMCID: PMC8955072 DOI: 10.3390/molecules27061908] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/27/2022] [Accepted: 03/10/2022] [Indexed: 12/20/2022]
Abstract
Outbreaks of hand, foot, and mouth disease (HFMD) that occur worldwide are mainly caused by the Coxsackievirus-A16 (CV-A16) and Enterovirus-A71 (EV-A71). Unfortunately, neither an anti-HFMD drug nor a vaccine is currently available. Rupintrivir in phase II clinical trial candidate for rhinovirus showed highly potent antiviral activities against enteroviruses as an inhibitor for 3C protease (3Cpro). In the present study, we focused on designing 50 novel rupintrivir analogs against CV-A16 and EV-A71 3Cpro using computational tools. From their predicted binding affinities, the five compounds with functional group modifications at P1′, P2, P3, and P4 sites, namely P1′-1, P2-m3, P3-4, P4-5, and P4-19, could bind with both CV-A16 and EV-A71 3Cpro better than rupintrivir. Subsequently, these five analogs were studied by 500 ns molecular dynamics simulations. Among them, P2-m3, the derivative with meta-aminomethyl-benzyl group at the P2 site, showed the greatest potential to interact with the 3Cpro target by delivering the highest number of intermolecular hydrogen bonds and contact atoms. It formed the hydrogen bonds with L127 and K130 residues at the P2 site stronger than rupintrivir, supported by significantly lower MM/PB(GB)SA binding free energies. Elucidation of designed rupintrivir analogs in our study provides the basis for developing compounds that can be candidate compounds for further HFMD treatment.
Collapse
Affiliation(s)
- Amita Sripattaraphan
- Structural and Computational Biology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand; (A.S.); (K.S.)
| | - Kamonpan Sanachai
- Structural and Computational Biology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand; (A.S.); (K.S.)
| | - Warinthorn Chavasiri
- Department of Chemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand;
| | - Siwaporn Boonyasuppayakorn
- Applied Medical Virology Research Unit, Department of Microbiology, Faculty of Medicine, Chulalongkorn University, Bangkok 10330, Thailand;
| | - Phornphimon Maitarad
- Research Center of Nano Science and Technology, Shanghai University, Shanghai 200444, China;
| | - Thanyada Rungrotmongkol
- Structural and Computational Biology Research Unit, Department of Biochemistry, Faculty of Science, Chulalongkorn University, Bangkok 10330, Thailand; (A.S.); (K.S.)
- Ph.D. Program in Bioinformatics and Computational Biology, Graduate School, Chulalongkorn University, Bangkok 10330, Thailand
- Correspondence: or
| |
Collapse
|
28
|
Kumar A, Pandey P, Chatterjee P, MacKerell AD. Deep Neural Network Model to Predict the Electrostatic Parameters in the Polarizable Classical Drude Oscillator Force Field. J Chem Theory Comput 2022; 18:1711-1725. [PMID: 35148088 PMCID: PMC8904317 DOI: 10.1021/acs.jctc.1c01166] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The Drude polarizable force field (FF) captures electronic polarization effects via auxiliary Drude particles that are attached to non-hydrogen atoms, distinguishing it from commonly used additive FFs that rely on fixed charges. The Drude FF currently includes parameters for biomolecules such as proteins, nucleic acids, lipids, and carbohydrates and small-molecule representative of those classes of molecules as well as a range of atomic ions. Extension of the Drude FF to novel small druglike molecules is challenging as it requires the assignment of partial charges, atomic polarizabilities, and Thole scaling factors. In the present article, deep neural network (DNN) models are trained on quantum mechanical (QM)-based partial charges and atomic polarizabilities along with Thole scale factors trained to target QM molecular dipole moments and polarizabilities. Training of the DNN model used a collection of 39 421 molecules with molecular weights up to 200 Da and containing H, C, N, O, P, S, F, Cl, Br, or I atoms. The DNN model utilizes bond connectivity, including 1,2, 1,3, 1,4, and 1,5 terms and distances of Drude FF atom types as the feature vector to build the model, allowing it to capture both local and nonlocal effects in the molecules. Novel methods have been developed to determine restrained electrostatic potential (RESP) charges on atoms and external points representing lone pairs and to determine Thole scale factors, which have no QM analogue. A penalty scheme is devised as a performance predictor of the trained model. Validation studies show that these DNN models can precisely predict molecular dipole and polarizabilities of Food and Drug Administration (FDA)-approved drugs compared to reference MP2 calculations. The availability of the DNN model allowing for the rapid estimation of the Drude electrostatic parameters will facilitate its applicability to a wider range of molecular species.
Collapse
Affiliation(s)
- Anmol Kumar
- School of Pharmacy, University of Maryland, Baltimore, 20 Penn Street, HSFII, Baltimore, Maryland 21201, United States
| | - Poonam Pandey
- School of Pharmacy, University of Maryland, Baltimore, 20 Penn Street, HSFII, Baltimore, Maryland 21201, United States
| | - Payal Chatterjee
- School of Pharmacy, University of Maryland, Baltimore, 20 Penn Street, HSFII, Baltimore, Maryland 21201, United States
| | - Alexander D MacKerell
- School of Pharmacy, University of Maryland, Baltimore, 20 Penn Street, HSFII, Baltimore, Maryland 21201, United States
| |
Collapse
|
29
|
Gokcan H, Isayev O. Learning molecular potentials with neural networks. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1564] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Hatice Gokcan
- Department of Chemistry, Mellon College of Science Carnegie Mellon University Pittsburgh Pennsylvania USA
| | - Olexandr Isayev
- Department of Chemistry, Mellon College of Science Carnegie Mellon University Pittsburgh Pennsylvania USA
| |
Collapse
|
30
|
Rai BK, Sresht V, Yang Q, Unwalla R, Tu M, Mathiowetz AM, Bakken GA. TorsionNet: A Deep Neural Network to Rapidly Predict Small-Molecule Torsional Energy Profiles with the Accuracy of Quantum Mechanics. J Chem Inf Model 2022; 62:785-800. [PMID: 35119861 DOI: 10.1021/acs.jcim.1c01346] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Fast and accurate assessment of small-molecule dihedral energetics is crucial for molecular design and optimization in medicinal chemistry. Yet, accurate prediction of torsion energy profiles remains challenging as the current molecular mechanics (MM) methods are limited by insufficient coverage of drug-like chemical space and accurate quantum mechanical (QM) methods are too expensive. To address this limitation, we introduce TorsionNet, a deep neural network (DNN) model specifically developed to predict small-molecule torsion energy profiles with QM-level accuracy. We applied active learning to identify nearly 50k fragments (with elements H, C, N, O, F, S, and Cl) that maximized the coverage of our corporate compound library and leveraged massively parallel cloud computing resources for density functional theory (DFT) torsion scans of these fragments, generating a training data set of 1.2 million DFT energies. After training TorsionNet on this data set, we obtain a model that can rapidly predict the torsion energy profile of typical drug-like fragments with DFT-level accuracy. Importantly, our method also provides an uncertainty estimate for the predicted profiles without any additional calculations. In this report, we show that TorsionNet can accurately identify the preferred dihedral geometries observed in crystal structures. Our TorsionNet-based analysis of a diverse set of protein-ligand complexes with measured binding affinity shows a strong association between high ligand strain and low potency. We also present practical applications of TorsionNet that demonstrate how consideration of DNN-based strain energy leads to substantial improvement in existing lead discovery and design workflows. TorsionNet500, a benchmark data set comprising 500 chemically diverse fragments with DFT torsion profiles (12k MM- and DFT-optimized geometries and energies), has been created and is made publicly available.
Collapse
Affiliation(s)
- Brajesh K Rai
- Simulation and Modeling Sciences, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Vishnu Sresht
- Simulation and Modeling Sciences, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Qingyi Yang
- Medicine Design, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Ray Unwalla
- Medicine Design, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Meihua Tu
- Medicine Design, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Alan M Mathiowetz
- Medicine Design, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States
| | - Gregory A Bakken
- Digital, Pfizer, Eastern Point Road, Groton, Connecticut 06340, United States
| |
Collapse
|
31
|
Thürlemann M, Böselt L, Riniker S. Learning Atomic Multipoles: Prediction of the Electrostatic Potential with Equivariant Graph Neural Networks. J Chem Theory Comput 2022; 18:1701-1710. [PMID: 35112866 DOI: 10.1021/acs.jctc.1c01021] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The accurate description of electrostatic interactions remains a challenging problem for classical potential-energy functions. The commonly used fixed partial-charge approximation fails to reproduce the electrostatic potential at short range due to its insensitivity to conformational changes and anisotropic effects. At the same time, possibly more accurate machine-learned (ML) potentials struggle with the long-range behavior due to their inherent locality ansatz. Employing a multipole expansion offers in principle an exact treatment of the electrostatic potential such that the long-range and short-range electrostatic interactions can be treated simultaneously with high accuracy. However, such an expansion requires the calculation of the electron density using computationally expensive quantum-mechanical (QM) methods. Here, we introduce an equivariant graph neural network (GNN) to address this issue. The proposed model predicts atomic multipoles up to the quadrupole, circumventing the need for expensive QM computations. By using an equivariant architecture, the model enforces the correct symmetry by design without relying on local reference frames. The GNN reproduces the electrostatic potential of various systems with high fidelity. Possible uses for such an approach include the separate treatment of long-range interactions in ML potentials, the analysis of electrostatic potential surfaces, and static multipoles in polarizable force fields.
Collapse
Affiliation(s)
- Moritz Thürlemann
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Lennard Böselt
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
32
|
Jiang D, Sun H, Wang J, Hsieh CY, Li Y, Wu Z, Cao D, Wu J, Hou T. Out-of-the-box deep learning prediction of quantum-mechanical partial charges by graph representation and transfer learning. Brief Bioinform 2022; 23:6513729. [DOI: 10.1093/bib/bbab597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 12/14/2021] [Accepted: 12/23/2021] [Indexed: 11/14/2022] Open
Abstract
Abstract
Accurate prediction of atomic partial charges with high-level quantum mechanics (QM) methods suffers from high computational cost. Numerous feature-engineered machine learning (ML)-based predictors with favorable computability and reliability have been developed as alternatives. However, extensive expertise effort was needed for feature engineering of atom chemical environment, which may consequently introduce domain bias. In this study, SuperAtomicCharge, a data-driven deep graph learning framework, was proposed to predict three important types of partial charges (i.e. RESP, DDEC4 and DDEC78) derived from high-level QM calculations based on the structures of molecules. SuperAtomicCharge was designed to simultaneously exploit the 2D and 3D structural information of molecules, which was proved to be an effective way to improve the prediction accuracy of the model. Moreover, a simple transfer learning strategy and a multitask learning strategy based on self-supervised descriptors were also employed to further improve the prediction accuracy of the proposed model. Compared with the latest baselines, including one GNN-based predictor and two ML-based predictors, SuperAtomicCharge showed better performance on all the three external test sets and had better usability and portability. Furthermore, the QM partial charges of new molecules predicted by SuperAtomicCharge can be efficiently used in drug design applications such as structure-based virtual screening, where the predicted RESP and DDEC4 charges of new molecules showed more robust scoring and screening power than the commonly used partial charges. Finally, two tools including an online server (http://cadd.zju.edu.cn/deepchargepredictor) and the source code command lines (https://github.com/zjujdj/SuperAtomicCharge) were developed for the easy access of the SuperAtomicCharge services.
Collapse
|
33
|
Wang S, Krummenacher K, Landrum GA, Sellers BD, Di Lello P, Robinson SJ, Martin B, Holden JK, Tom JYK, Murthy AC, Popovych N, Riniker S. Incorporating NOE-Derived Distances in Conformer Generation of Cyclic Peptides with Distance Geometry. J Chem Inf Model 2022; 62:472-485. [PMID: 35029985 DOI: 10.1021/acs.jcim.1c01165] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Nuclear magnetic resonance (NMR) data from NOESY (nuclear Overhauser enhancement spectroscopy) and ROESY (rotating frame Overhauser enhancement spectroscopy) experiments can easily be combined with distance geometry (DG) based conformer generators by modifying the molecular distance bounds matrix. In this work, we extend the modern DG based conformer generator ETKDG, which has been shown to reproduce experimental crystal structures from small molecules to large macrocycles well, to include NOE-derived interproton distances. In noeETKDG, the experimentally derived interproton distances are incorporated into the distance bounds matrix as loose upper (or lower) bounds to generate large conformer sets. Various subselection techniques can subsequently be applied to yield a conformer bundle that best reproduces the NOE data. The approach is benchmarked using a set of 24 (mostly) cyclic peptides for which NOE-derived distances as well as reference solution structures obtained by other software are available. With respect to other packages currently available, the advantages of noeETKDG are its speed and that no prior force-field parametrization is required, which is especially useful for peptides with unnatural amino acids. The resulting conformer bundles can be further processed with the use of structural refinement techniques to improve the modeling of the intramolecular nonbonded interactions. The noeETKDG code is released as a fully open-source software package available at www.github.com/rinikerlab/customETKDG.
Collapse
Affiliation(s)
- Shuzhe Wang
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Kajo Krummenacher
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Gregory A Landrum
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Benjamin D Sellers
- Department of Discovery Chemistry, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Paola Di Lello
- Department of Structural Biology, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Sarah J Robinson
- Department of Discovery Chemistry, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Bryan Martin
- Department of Structural Biology, Genentech, Inc., 1 DNA Way, South San Francisco, California 94080, United States
| | - Jeffrey K Holden
- Department of Early Discovery Biochemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Jeffrey Y K Tom
- Department of Early Discovery Biochemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Anastasia C Murthy
- Department of Early Discovery Biochemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Nataliya Popovych
- Department of Early Discovery Biochemistry, Genentech, Inc., South San Francisco, California 94080, United States
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| |
Collapse
|
34
|
Velez C, Acevedo O. Simulation of deep eutectic solvents: Progress to promises. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1598] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Caroline Velez
- Department of Chemistry University of Miami Coral Gables Florida USA
| | - Orlando Acevedo
- Department of Chemistry University of Miami Coral Gables Florida USA
| |
Collapse
|
35
|
Gallegos M, Guevara-Vela JM, Pendás ÁM. NNAIMQ: A neural network model for predicting QTAIM charges. J Chem Phys 2022; 156:014112. [PMID: 34998318 DOI: 10.1063/5.0076896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Atomic charges provide crucial information about the electronic structure of a molecular system. Among the different definitions of these descriptors, the one proposed by the Quantum Theory of Atoms in Molecules (QTAIM) is particularly attractive given its invariance against orbital transformations although the computational cost associated with their calculation limits its applicability. Given that Machine Learning (ML) techniques have been shown to accelerate orders of magnitude the computation of a number of quantum mechanical observables, in this work, we take advantage of ML knowledge to develop an intuitive and fast neural network model (NNAIMQ) for the computation of QTAIM charges for C, H, O, and N atoms with high accuracy. Our model has been trained and tested using data from quantum chemical calculations in more than 45 000 molecular environments of the near-equilibrium CHON chemical space. The reliability and performance of NNAIMQ have been analyzed in a variety of scenarios, from equilibrium geometries to molecular dynamics simulations. Altogether, NNAIMQ yields remarkably small prediction errors, well below the 0.03 electron limit in the general case, while accelerating the calculation of QTAIM charges by several orders of magnitude.
Collapse
Affiliation(s)
- Miguel Gallegos
- Depto. Química Física y Analítica, Universidad de Oviedo, 33006 Oviedo, Spain
| | - José Manuel Guevara-Vela
- Institute of Chemistry, National Autonomous University of Mexico, Circuito Exterior, Ciudad Universitaria, Delegación Coyoacán, Mexico City C.P. 04510, Mexico
| | - Ángel Martín Pendás
- Depto. Química Física y Analítica, Universidad de Oviedo, 33006 Oviedo, Spain
| |
Collapse
|
36
|
Ries B, Normak K, Weiß RG, Rieder S, Barros EP, Champion C, König G, Riniker S. Relative free-energy calculations for scaffold hopping-type transformations with an automated RE-EDS sampling procedure. J Comput Aided Mol Des 2022; 36:117-130. [PMID: 34978000 PMCID: PMC8907147 DOI: 10.1007/s10822-021-00436-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 11/23/2021] [Indexed: 11/29/2022]
Abstract
The calculation of relative free-energy differences between different compounds plays an important role in drug design to identify potent binders for a given protein target. Most rigorous methods based on molecular dynamics simulations estimate the free-energy difference between pairs of ligands. Thus, the comparison of multiple ligands requires the construction of a “state graph”, in which the compounds are connected by alchemical transformations. The computational cost can be optimized by reducing the state graph to a minimal set of transformations. However, this may require individual adaptation of the sampling strategy if a transformation process does not converge in a given simulation time. In contrast, path-free methods like replica-exchange enveloping distribution sampling (RE-EDS) allow the sampling of multiple states within a single simulation without the pre-definition of alchemical transition paths. To optimize sampling and convergence, a set of RE-EDS parameters needs to be estimated in a pre-processing step. Here, we present an automated procedure for this step that determines all required parameters, improving the robustness and ease of use of the methodology. To illustrate the performance, the relative binding free energies are calculated for a series of checkpoint kinase 1 inhibitors containing challenging transformations in ring size, opening/closing, and extension, which reflect changes observed in scaffold hopping. The simulation of such transformations with RE-EDS can be conducted with conventional force fields and, in particular, without soft bond-stretching terms.
Collapse
Affiliation(s)
- Benjamin Ries
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - Karl Normak
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - R Gregor Weiß
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - Salomé Rieder
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - Emília P Barros
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - Candide Champion
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - Gerhard König
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland.
| |
Collapse
|
37
|
Caldeweyher E, Bauer C, Tehrani AS. An open-source framework for fast-yet-accurate calculation of quantum mechanical features. Phys Chem Chem Phys 2022; 24:10599-10610. [DOI: 10.1039/d2cp01165d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We present the open-source framework kallisto that enables the efficient and robust calculation of quantum mechanical features for atoms and molecules. For a benchmark set of 49 experimental molecular polarizabilities,...
Collapse
|
38
|
Kahle L, Zipoli F. Quality of uncertainty estimates from neural network potential ensembles. Phys Rev E 2022; 105:015311. [PMID: 35193257 DOI: 10.1103/physreve.105.015311] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 01/03/2022] [Indexed: 06/14/2023]
Abstract
Neural network potentials (NNPs) combine the computational efficiency of classical interatomic potentials with the high accuracy and flexibility of the ab initio methods used to create the training set, but can also result in unphysical predictions when employed outside their training set distribution. Estimating the epistemic uncertainty of a NNP is required in active learning or on-the-fly generation of potentials. Inspired from their use in other machine-learning applications, NNP ensembles have been used for uncertainty prediction in several studies, with the caveat that ensembles do not provide a rigorous Bayesian estimate of the uncertainty. To test whether NNP ensembles provide accurate uncertainty estimates, we train such ensembles in four different case studies and compare the predicted uncertainty with the errors on out-of-distribution validation sets. Our results indicate that NNP ensembles are often overconfident, underestimating the uncertainty of the model, and require to be calibrated for each system and architecture. We also provide evidence that Bayesian NNPs, obtained by sampling the posterior distribution of the model parameters using Monte Carlo techniques, can provide better uncertainty estimates.
Collapse
Affiliation(s)
- Leonid Kahle
- National Centre for Computational Design and Discovery of Novel Materials MARVEL, IBM Research Europe, Zurich, Switzerland
| | - Federico Zipoli
- National Centre for Computational Design and Discovery of Novel Materials MARVEL, IBM Research Europe, Zurich, Switzerland
| |
Collapse
|
39
|
Lykhin AO, Truhlar DG, Gagliardi L. Dipole Moment Calculations Using Multiconfiguration Pair-Density Functional Theory and Hybrid Multiconfiguration Pair-Density Functional Theory. J Chem Theory Comput 2021; 17:7586-7601. [PMID: 34793166 DOI: 10.1021/acs.jctc.1c00915] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The dipole moment is the molecular property that most directly indicates molecular polarity. The accuracy of computed dipole moments depends strongly on the quality of the calculated electron density, and the breakdown of single-reference methods for strongly correlated systems can lead to poor predictions of the dipole moments in those cases. Here, we derive the analytical expression for obtaining the electric dipole moment by multiconfiguration pair-density functional theory (MC-PDFT), and we assess the accuracy of MC-PDFT for predicting dipole moments at equilibrium and nonequilibrium geometries. We show that MC-PDFT dipole moment curves have reasonable behavior even for stretched geometries, and they significantly improve upon the CASSCF results by capturing more electron correlation. The analysis of a dataset consisting of 18 first-row transition-metal diatomics and 6 main-group polyatomic molecules with a multireference character suggests that MC-PDFT and its hybrid extension (HMC-PDFT) perform comparably to CASPT2 and MRCISD+Q methods and have a mean unsigned deviation of 0.2-0.3 D with respect to the best available dipole moment reference values. We explored the dependence of the predicted dipole moments upon the choice of the on-top density functional and active space, and we recommend the tPBE and hybrid tPBE0 on-top choices for the functionals combined with the moderate correlated-participating-orbitals scheme for selecting the active space. With these choices, the mean unsigned deviations (in debyes) of the calculated equilibrium dipole moments from the best estimates are 0.77 for CASSCF, 0.29 for MC-PDFT, 0.24 for HMC-PDFT, 0.28 for CASPT2, and 0.25 for MRCISD+Q. These results are encouraging because the computational cost of MC-PDFT or HMC-PDFT is largely reduced compared to the CASPT2 and MRCISD+Q methods.
Collapse
Affiliation(s)
- Aleksandr O Lykhin
- Department of Chemistry, Pritzker School of Molecular Engineering, The James Franck Institute and Chicago Center for Theoretical Chemistry, The University of Chicago, Chicago, Illinois 60637, United States
| | - Donald G Truhlar
- Department of Chemistry, Chemical Theory Center, and Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Laura Gagliardi
- Department of Chemistry, Pritzker School of Molecular Engineering, The James Franck Institute and Chicago Center for Theoretical Chemistry, The University of Chicago, Chicago, Illinois 60637, United States.,Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
40
|
Muller C, Rabal O, Diaz Gonzalez C. Artificial Intelligence, Machine Learning, and Deep Learning in Real-Life Drug Design Cases. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:383-407. [PMID: 34731478 DOI: 10.1007/978-1-0716-1787-8_16] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery and development of drugs is a long and expensive process with a high attrition rate. Computational drug discovery contributes to ligand discovery and optimization, by using models that describe the properties of ligands and their interactions with biological targets. In recent years, artificial intelligence (AI) has made remarkable modeling progress, driven by new algorithms and by the increase in computing power and storage capacities, which allow the processing of large amounts of data in a short time. This review provides the current state of the art of AI methods applied to drug discovery, with a focus on structure- and ligand-based virtual screening, library design and high-throughput analysis, drug repurposing and drug sensitivity, de novo design, chemical reactions and synthetic accessibility, ADMET, and quantum mechanics.
Collapse
Affiliation(s)
- Christophe Muller
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | - Obdulia Rabal
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | | |
Collapse
|
41
|
Zubatyuk R, Smith JS, Nebgen BT, Tretiak S, Isayev O. Teaching a neural network to attach and detach electrons from molecules. Nat Commun 2021; 12:4870. [PMID: 34381051 PMCID: PMC8357920 DOI: 10.1038/s41467-021-24904-0] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/01/2021] [Indexed: 02/07/2023] Open
Abstract
Interatomic potentials derived with Machine Learning algorithms such as Deep-Neural Networks (DNNs), achieve the accuracy of high-fidelity quantum mechanical (QM) methods in areas traditionally dominated by empirical force fields and allow performing massive simulations. Most DNN potentials were parametrized for neutral molecules or closed-shell ions due to architectural limitations. In this work, we propose an improved machine learning framework for simulating open-shell anions and cations. We introduce the AIMNet-NSE (Neural Spin Equilibration) architecture, which can predict molecular energies for an arbitrary combination of molecular charge and spin multiplicity with errors of about 2-3 kcal/mol and spin-charges with error errors ~0.01e for small and medium-sized organic molecules, compared to the reference QM simulations. The AIMNet-NSE model allows to fully bypass QM calculations and derive the ionization potential, electron affinity, and conceptual Density Functional Theory quantities like electronegativity, hardness, and condensed Fukui functions. We show that these descriptors, along with learned atomic representations, could be used to model chemical reactivity through an example of regioselectivity in electrophilic aromatic substitution reactions.
Collapse
Affiliation(s)
- Roman Zubatyuk
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Benjamin T Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, NM, USA
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
42
|
Kulichenko M, Smith JS, Nebgen B, Li YW, Fedik N, Boldyrev AI, Lubbers N, Barros K, Tretiak S. The Rise of Neural Networks for Materials and Chemical Dynamics. J Phys Chem Lett 2021; 12:6227-6243. [PMID: 34196559 DOI: 10.1021/acs.jpclett.1c01357] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Machine learning (ML) is quickly becoming a premier tool for modeling chemical processes and materials. ML-based force fields, trained on large data sets of high-quality electron structure calculations, are particularly attractive due their unique combination of computational efficiency and physical accuracy. This Perspective summarizes some recent advances in the development of neural network-based interatomic potentials. Designing high-quality training data sets is crucial to overall model accuracy. One strategy is active learning, in which new data are automatically collected for atomic configurations that produce large ML uncertainties. Another strategy is to use the highest levels of quantum theory possible. Transfer learning allows training to a data set of mixed fidelity. A model initially trained to a large data set of density functional theory calculations can be significantly improved by retraining to a relatively small data set of expensive coupled cluster theory calculations. These advances are exemplified by applications to molecules and materials.
Collapse
Affiliation(s)
- Maksim Kulichenko
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Justin S Smith
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Benjamin Nebgen
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Ying Wai Li
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Nikita Fedik
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Alexander I Boldyrev
- Department of Chemistry and Biochemistry, Utah State University, Logan, Utah 84322, United States
| | - Nicholas Lubbers
- Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Kipton Barros
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sergei Tretiak
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
- Center for Integrated Nanotechnologies, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| |
Collapse
|
43
|
Schindler O, Raček T, Maršavelski A, Koča J, Berka K, Svobodová R. Optimized SQE atomic charges for peptides accessible via a web application. J Cheminform 2021; 13:45. [PMID: 34193251 PMCID: PMC8243439 DOI: 10.1186/s13321-021-00528-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 06/18/2021] [Indexed: 12/03/2022] Open
Abstract
Background Partial atomic charges find many applications in computational chemistry, chemoinformatics, bioinformatics, and nanoscience. Currently, frequently used methods for charge calculation are the Electronegativity Equalization Method (EEM), Charge Equilibration method (QEq), and Extended QEq (EQeq). They all are fast, even for large molecules, but require empirical parameters. However, even these advanced methods have limitations—e.g., their application for peptides, proteins, and other macromolecules is problematic. An empirical charge calculation method that is promising for peptides and other macromolecular systems is the Split-charge Equilibration method (SQE) and its extension SQE+q0. Unfortunately, only one parameter set is available for these methods, and their implementation is not easily accessible. Results In this article, we present for the first time an optimized guided minimization method (optGM) for the fast parameterization of empirical charge calculation methods and compare it with the currently available guided minimization (GDMIN) method. Then, we introduce a further extension to SQE, SQE+qp, adapted for peptide datasets, and compare it with the common approaches EEM, QEq EQeq, SQE, and SQE+q0. Finally, we integrate SQE and SQE+qp into the web application Atomic Charge Calculator II (ACC II), including several parameter sets. Conclusion The main contribution of the article is that it makes SQE methods with their parameters accessible to the users via the ACC II web application (https://acc2.ncbr.muni.cz) and also via a command-line application. Furthermore, our improvement, SQE+qp, provides an excellent solution for peptide datasets. Additionally, optGM provides comparable parameters to GDMIN in a markedly shorter time. Therefore, optGM allows us to perform parameterizations for charge calculation methods with more parameters (e.g., SQE and its extensions) using large datasets. Graphic Abstract ![]()
Collapse
Affiliation(s)
- Ondřej Schindler
- CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, 602 00, Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic
| | - Tomáš Raček
- CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, 602 00, Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic.,Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
| | - Aleksandra Maršavelski
- Division of Biochemistry, Department of Chemistry, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000, Zagreb, Croatia
| | - Jaroslav Koča
- CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, 602 00, Brno, Czech Republic.,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic
| | - Karel Berka
- Department of Physical Chemistry, Faculty of Science, Palacký University Olomouc, 17. listopadu 1192/12, 771 46, Olomouc, Czech Republic
| | - Radka Svobodová
- CEITEC-Central European Institute of Technology, Masaryk University, Kamenice 5, 602 00, Brno, Czech Republic. .,National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic.
| |
Collapse
|
44
|
Glick ZL, Koutsoukas A, Cheney DL, Sherrill CD. Cartesian message passing neural networks for directional properties: Fast and transferable atomic multipoles. J Chem Phys 2021; 154:224103. [PMID: 34241239 DOI: 10.1063/5.0050444] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The message passing neural network (MPNN) framework is a promising tool for modeling atomic properties but is, until recently, incompatible with directional properties, such as Cartesian tensors. We propose a modified Cartesian MPNN (CMPNN) suitable for predicting atom-centered multipoles, an essential component of ab initio force fields. The efficacy of this model is demonstrated on a newly developed dataset consisting of 46 623 chemical structures and corresponding high-quality atomic multipoles, which was deposited into the publicly available Molecular Sciences Software Institute QCArchive server. We show that the CMPNN accurately predicts atom-centered charges, dipoles, and quadrupoles and that errors in the predicted atomic multipoles have a negligible effect on multipole-multipole electrostatic energies. The CMPNN is accurate enough to model conformational dependencies of a molecule's electronic structure. This opens up the possibility of recomputing atomic multipoles on the fly throughout a simulation in which they might exhibit strong conformational dependence.
Collapse
Affiliation(s)
- Zachary L Glick
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA
| | - Alexios Koutsoukas
- Molecular Structure and Design, Bristol Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, USA
| | - Daniel L Cheney
- Molecular Structure and Design, Bristol Myers Squibb Company, P.O. Box 5400, Princeton, New Jersey 08543, USA
| | - C David Sherrill
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, and School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA
| |
Collapse
|
45
|
Wang J, Sun H, Chen J, Jiang D, Wang Z, Wu Z, Chen X, Cao D, Hou T. DeepChargePredictor: A web server for predicting QM-based atomic charges via state-of-the-art machine-learning algorithms. Bioinformatics 2021; 37:4255-4257. [PMID: 34009308 DOI: 10.1093/bioinformatics/btab389] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 04/28/2021] [Accepted: 05/17/2021] [Indexed: 11/14/2022] Open
Abstract
SUMMARY High-level quantum mechanics (QM) methods are no doubt the most reliable approaches for the prediction of atomic charges, but it usually needs very large computational resources, which apparently hinders the use of high-quality atomic charges in large-scale molecular modeling, such as high-throughput virtual screening. To solve this problem, several algorithms based on machine-learning (ML) have been developed to fit high-level QM atomic charges. Here, we proposed DeepChargePredictor, a web server that is able to generate the high-level QM atomic charges for small molecules based on two state-of-the-art ML algorithms developed in our group, namely AtomPathDescriptor and DeepAtomicCharge. These two algorithms were seamlessly integrated into the platform with the capability to predict three kinds of charges (i.e., RESP, AM1-BCC and DDEC) widely used in structure-based drug design. Moreover, we have comprehensively evaluated the performance of these charges generated by DeepChargePredictor for large-scale drug design applications, such as end-point binding free energy calculations and virtual screening, which all show reliable or even better performance compared with the baseline methods. AVAILABILITY AND IMPLEMENTATION DeepChargePredictor server is accessible at http://cadd.zju.edu.cn/deepchargepredictor/.
Collapse
Affiliation(s)
- Jike Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China.,School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Huiyong Sun
- Department of Medicinal Chemistry, China Pharmaceutical University, Nanjing 210009, Jiangsu, China
| | - Jiawen Chen
- Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan 430071, Hubei, China
| | - Dejun Jiang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Xi Chen
- School of Computer Science, Wuhan University, Wuhan 430072, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410004, Hunan, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
46
|
Affiliation(s)
- Jörg Behler
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 Göttingen, Germany
| |
Collapse
|
47
|
Sifain AE, Rice BM, Yalkowsky SH, Barnes BC. Machine learning transition temperatures from 2D structure. J Mol Graph Model 2021; 105:107848. [PMID: 33667863 DOI: 10.1016/j.jmgm.2021.107848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/11/2021] [Accepted: 01/19/2021] [Indexed: 10/22/2022]
Abstract
A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp2-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergström) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.
Collapse
Affiliation(s)
- Andrew E Sifain
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Betsy M Rice
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Samuel H Yalkowsky
- Department of Pharmaceutics, College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA
| | - Brian C Barnes
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA.
| |
Collapse
|
48
|
Automated discovery of a robust interatomic potential for aluminum. Nat Commun 2021; 12:1257. [PMID: 33623036 PMCID: PMC7902823 DOI: 10.1038/s41467-021-21376-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Accepted: 01/15/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning, trained on quantum mechanics (QM) calculations, is a powerful tool for modeling potential energy surfaces. A critical factor is the quality and diversity of the training dataset. Here we present a highly automated approach to dataset construction and demonstrate the method by building a potential for elemental aluminum (ANI-Al). In our active learning scheme, the ML potential under development is used to drive non-equilibrium molecular dynamics simulations with time-varying applied temperatures. Whenever a configuration is reached for which the ML uncertainty is large, new QM data is collected. The ML model is periodically retrained on all available QM data. The final ANI-Al potential makes very accurate predictions of radial distribution function in melt, liquid-solid coexistence curve, and crystal properties such as defect energies and barriers. We perform a 1.3M atom shock simulation and show that ANI-Al force predictions shine in their agreement with new reference DFT calculations. The accuracy of a machine-learned potential is limited by the quality and diversity of the training dataset. Here the authors propose an active learning approach to automatically construct general purpose machine-learning potentials here demonstrated for the aluminum case.
Collapse
|
49
|
Ko TW, Finkler JA, Goedecker S, Behler J. General-Purpose Machine Learning Potentials Capturing Nonlocal Charge Transfer. Acc Chem Res 2021; 54:808-817. [PMID: 33513012 DOI: 10.1021/acs.accounts.0c00689] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The development of first-principles-quality machine learning potentials (MLP) has seen tremendous progress, now enabling computer simulations of complex systems for which sufficiently accurate interatomic potentials have not been available. These advances and the increasing use of MLPs for more and more diverse systems gave rise to new questions regarding their applicability and limitations, which has constantly driven new developments. The resulting MLPs can be classified into several generations depending on the types of systems they are able to describe. First-generation MLPs, as introduced 25 years ago, have been applicable to low-dimensional systems such as small molecules. MLPs became a practical tool for complex systems in chemistry and materials science with the introduction of high-dimensional neural network potentials (HDNNP) in 2007, which represented the first MLP of the second generation. Second-generation MLPs are based on the concept of locality and express the total energy as a sum of environment-dependent atomic energies, which allows applications to very large systems containing thousands of atoms with linearly scaling computational costs. Since second-generation MLPs do not consider interactions beyond the local chemical environments, a natural extension has been the inclusion of long-range interactions without truncation, mainly electrostatics, employing environment-dependent charges establishing the third MLP generation. A variety of second- and, to some extent, also third-generation MLPs are currently the standard methods in ML-based atomistic simulations.In spite of countless successful applications, in recent years it has been recognized that the accuracy of MLPs relying on local atomic energies and charges is still insufficient for systems with long-ranged dependencies in the electronic structure. These can, for instance, result from nonlocal charge transfer or ionization and are omnipresent in many important types of systems and chemical processes such as the protonation and deprotonation of organic and biomolecules, redox reactions, and defects and doping in materials. In all of these situations, small local modifications can change the system globally, resulting in different equilibrium structures, charge distributions, and reactivity. These phenomena cannot be captured by second- and third-generation MLPs. Consequently, the inclusion of nonlocal phenomena has been identified as a next key step in the development of a new fourth generation of MLPs. While a first fourth-generation MLP, the charge equilibration neural network technique (CENT), was introduced in 2015, only very recently have a range of new general-purpose methods applicable to a broad range of physical scenarios emerged. In this Account, we show how fourth-generation HDNNPs can be obtained by combining the concepts of CENT and second-generation HDNNPs. These new MLPs allow for a highly accurate description of systems where nonlocal charge transfer is important.
Collapse
Affiliation(s)
- Tsz Wai Ko
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 Göttingen, Germany
| | - Jonas A. Finkler
- Department of Physics, Universität Basel, Klingelbergstrasse 82, 4056 Basel, Switzerland
| | - Stefan Goedecker
- Department of Physics, Universität Basel, Klingelbergstrasse 82, 4056 Basel, Switzerland
| | - Jörg Behler
- Universität Göttingen, Institut für Physikalische Chemie, Theoretische Chemie, Tammannstraße 6, 37077 Göttingen, Germany
| |
Collapse
|
50
|
A comprehensive comparison of molecular feature representations for use in predictive modeling. Comput Biol Med 2021; 130:104197. [PMID: 33429140 DOI: 10.1016/j.compbiomed.2020.104197] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Revised: 12/21/2020] [Accepted: 12/21/2020] [Indexed: 11/23/2022]
Abstract
Machine learning methods are commonly used for predicting molecular properties to accelerate material and drug design. An important part of this process is deciding how to represent the molecules. Typically, machine learning methods expect examples represented by vectors of values, and many methods for calculating molecular feature representations have been proposed. In this paper, we perform a comprehensive comparison of different molecular features, including traditional methods such as fingerprints and molecular descriptors, and recently proposed learnable representations based on neural networks. Feature representations are evaluated on 11 benchmark datasets, used for predicting properties and measures such as mutagenicity, melting points, activity, solubility, and IC50. Our experiments show that several molecular features work similarly well over all benchmark datasets. The ones that stand out most are Spectrophores, which give significantly worse performance than other features on most datasets. Molecular descriptors from the PaDEL library seem very well suited for predicting physical properties of molecules. Despite their simplicity, MACCS fingerprints performed very well overall. The results show that learnable representations achieve competitive performance compared to expert based representations. However, task-specific representations (graph convolutions and Weave methods) rarely offer any benefits, even though they are computationally more demanding. Lastly, combining different molecular feature representations typically does not give a noticeable improvement in performance compared to individual feature representations.
Collapse
|