1
|
Dral PO, Ge F, Hou YF, Zheng P, Chen Y, Barbatti M, Isayev O, Wang C, Xue BX, Pinheiro Jr M, Su Y, Dai Y, Chen Y, Zhang L, Zhang S, Ullah A, Zhang Q, Ou Y. MLatom 3: A Platform for Machine Learning-Enhanced Computational Chemistry Simulations and Workflows. J Chem Theory Comput 2024; 20:1193-1213. [PMID: 38270978 PMCID: PMC10867807 DOI: 10.1021/acs.jctc.3c01203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/29/2023] [Accepted: 01/03/2024] [Indexed: 01/26/2024]
Abstract
Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command-line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing service at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pretrained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.
Collapse
Affiliation(s)
- Pavlo O. Dral
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Fuchun Ge
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Yi-Fan Hou
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Peikun Zheng
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Yuxinxin Chen
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Mario Barbatti
- Aix
Marseille University, CNRS, ICR, Marseille 13013, France
- Institut
Universitaire de France, Paris 75231, France
| | - Olexandr Isayev
- Department
of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania15213, United States
| | - Cheng Wang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- iChem, Xiamen University, Xiamen, Fujian 361005, China
| | - Bao-Xin Xue
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Max Pinheiro Jr
- Aix
Marseille University, CNRS, ICR, Marseille 13013, France
| | - Yuming Su
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- iChem, Xiamen University, Xiamen, Fujian 361005, China
| | - Yiheng Dai
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- iChem, Xiamen University, Xiamen, Fujian 361005, China
| | - Yangtao Chen
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- iChem, Xiamen University, Xiamen, Fujian 361005, China
| | - Lina Zhang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Shuang Zhang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Arif Ullah
- School
of Physics and Optoelectronic Engineering, Anhui University, Hefei230601, China
| | - Quanhao Zhang
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| | - Yanchi Ou
- State
Key Laboratory of Physical Chemistry of Solid Surfaces, College of
Chemistry and Chemical Engineering, and Innovation Laboratory for
Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen University, Xiamen, Fujian 361005, China
- Fujian
Provincial Key Laboratory of Theoretical and Computational Chemistry, Xiamen, Fujian 361005, China
| |
Collapse
|
2
|
Modee R, Mehta S, Laghuvarapu S, Priyakumar UD. MolOpt: Autonomous Molecular Geometry Optimization Using Multiagent Reinforcement Learning. J Phys Chem B 2023; 127:10295-10303. [PMID: 38013420 DOI: 10.1021/acs.jpcb.3c04771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Most optimization problems require the user to select an algorithm and, to some extent, also tune it for better performance. Although intuition and knowledge about the problem can speed up these selection and fine-tuning processes, users often use trial-and-error methodologies, which can be time-consuming and inefficient. With all of that in mind and much more, the concept of "learned optimizers", "learning to learn", and "meta-learning" has been gathering attention in recent years. In this article, we propose MolOpt that uses multiagent reinforcement learning (MARL) for autonomous molecular geometry optimization (MGO). Typically MGO algorithms are hand-designed, but MolOpt uses MARL to learn a learned optimizer (policy) that can perform MGO without the need for other hand-designed optimizers. We cast MGO as a MARL problem, where each agent corresponds to a single atom in the molecule. MolOpt performs MGO by minimizing the forces on each atom of the molecule. Our experiments demonstrate the generalizing ability of MolOpt for the MGO of propane, pentane, heptane, hexane, and octane when trained on ethane, butane, and isobutane. In terms of performance, MolOpt outperforms the MDMin optimizer and demonstrates performance similar to that of the FIRE optimizer. However, it does not surpass the BFGS optimizer. The results demonstrate that MolOpt has the potential to introduce innovative advancements in MGO by providing a novel approach using reinforcement learning (RL), which may open up new research directions for MGO. Overall, this work serves as a proof-of-concept for the potential of MARL in MGO.
Collapse
Affiliation(s)
- Rohit Modee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Sarvesh Mehta
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| |
Collapse
|
3
|
Hagg A, Kirschner KN. Open-Source Machine Learning in Computational Chemistry. J Chem Inf Model 2023; 63:4505-4532. [PMID: 37466636 PMCID: PMC10430767 DOI: 10.1021/acs.jcim.3c00643] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Indexed: 07/20/2023]
Abstract
The field of computational chemistry has seen a significant increase in the integration of machine learning concepts and algorithms. In this Perspective, we surveyed 179 open-source software projects, with corresponding peer-reviewed papers published within the last 5 years, to better understand the topics within the field being investigated by machine learning approaches. For each project, we provide a short description, the link to the code, the accompanying license type, and whether the training data and resulting models are made publicly available. Based on those deposited in GitHub repositories, the most popular employed Python libraries are identified. We hope that this survey will serve as a resource to learn about machine learning or specific architectures thereof by identifying accessible codes with accompanying papers on a topic basis. To this end, we also include computational chemistry open-source software for generating training data and fundamental Python libraries for machine learning. Based on our observations and considering the three pillars of collaborative machine learning work, open data, open source (code), and open models, we provide some suggestions to the community.
Collapse
Affiliation(s)
- Alexander Hagg
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Electrical Engineering, Mechanical Engineering and Technical Journalism, University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| | - Karl N. Kirschner
- Institute
of Technology, Resource and Energy-Efficient Engineering (TREE), University of Applied Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
- Department
of Computer Science, University of Applied
Sciences Bonn-Rhein-Sieg, 53757 Sankt Augustin, Germany
| |
Collapse
|
4
|
Chen BWJ, Zhang X, Zhang J. Accelerating explicit solvent models of heterogeneous catalysts with machine learning interatomic potentials. Chem Sci 2023; 14:8338-8354. [PMID: 37564405 PMCID: PMC10411631 DOI: 10.1039/d3sc02482b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/11/2023] [Indexed: 08/12/2023] Open
Abstract
Realistically modelling how solvents affect catalytic reactions is a longstanding challenge due to its prohibitive computational cost. Typically, an explicit atomistic treatment of the solvent molecules is needed together with molecular dynamics (MD) simulations and enhanced sampling methods. Here, we demonstrate the utility of machine learning interatomic potentials (MLIPs), coupled with active learning, to enable fast and accurate explicit solvent modelling of adsorption and reactions on heterogeneous catalysts. MLIPs trained on-the-fly were able to accelerate ab initio MD simulations by up to 4 orders of magnitude while reproducing with high fidelity the geometrical features of water in the bulk and at metal-water interfaces. Using these ML-accelerated simulations, we accurately predicted key catalytic quantities such as the adsorption energies of CO*, OH*, COH*, HCO*, and OCCHO* on Cu surfaces and the free energy barriers of C-H scission of ethylene glycol over Cu and Pd surfaces, as validated with ab initio calculations. We envision that such simulations will pave the way towards detailed and realistic studies of solvated catalysts at large time- and length-scales.
Collapse
Affiliation(s)
- Benjamin W J Chen
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR) 1 Fusionopolis Way, #16-16 Connexis Singapore 138632 Singapore
| | - Xinglong Zhang
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR) 1 Fusionopolis Way, #16-16 Connexis Singapore 138632 Singapore
| | - Jia Zhang
- Institute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR) 1 Fusionopolis Way, #16-16 Connexis Singapore 138632 Singapore
| |
Collapse
|
5
|
Mehta S, Goel M, Priyakumar UD. MO-MEMES: A method for accelerating virtual screening using multi-objective Bayesian optimization. Front Med (Lausanne) 2022; 9:916481. [PMID: 36213671 PMCID: PMC9537730 DOI: 10.3389/fmed.2022.916481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 08/29/2022] [Indexed: 11/13/2022] Open
Abstract
The pursuit of potential inhibitors for novel targets has become a very important problem especially over the last 2 years with the world in the midst of the COVID-19 pandemic. This entails performing high throughput screening exercises on drug libraries to identify potential “hits”. These hits are identified using analysis of their physical properties like binding affinity to the target receptor, octanol-water partition coefficient (LogP) and more. However, drug libraries can be extremely large and it is infeasible to calculate and analyze the physical properties for each of those molecules within acceptable time and moreover, each molecule must possess a multitude of properties apart from just the binding affinity. To address this problem, in this study, we propose an extension to the Machine learning framework for Enhanced MolEcular Screening (MEMES) framework for multi-objective Bayesian optimization. This approach is capable of identifying over 90% of the most desirable molecules with respect to all required properties while explicitly calculating the values of each of those properties on only 6% of the entire drug library. This framework would provide an immense boost in identifying potential hits that possess all properties required for a drug molecules.
Collapse
|
6
|
Xu Y, Huang X, Li C, Wei Z, Wang M. Predicting Structure‐dependent Properties Directly from the
3D
Molecular Images via Convolutional Neural Networks. AIChE J 2022. [DOI: 10.1002/aic.17721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Yunhao Xu
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Xun Huang
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Cunpu Li
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Zidong Wei
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| | - Meng Wang
- School of Chemistry and Chemical Engineering Chongqing University Chongqing 400044 China
| |
Collapse
|
7
|
Karthikeyan A, Priyakumar UD. Artificial intelligence: machine learning for chemical sciences. J CHEM SCI 2021; 134:2. [PMID: 34955617 PMCID: PMC8691161 DOI: 10.1007/s12039-021-01995-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 09/08/2021] [Accepted: 09/14/2021] [Indexed: 12/05/2022]
Abstract
Research in molecular sciences witnessed the rise and fall of Artificial Intelligence (AI)/ Machine Learning (ML) methods, especially artificial neural networks, few decades ago. However, we see a major resurgence in the use of modern ML methods in scientific research during the last few years. These methods have had phenomenal success in the areas of computer vision, speech recognition, natural language processing (NLP), etc. This has inspired chemists and biologists to apply these algorithms to problems in natural sciences. Availability of high performance Graphics Processing Unit (GPU) accelerators, large datasets, new algorithms, and libraries has enabled this surge. ML algorithms have successfully been applied to various domains in molecular sciences by providing much faster and sometimes more accurate solutions compared to traditional methods like Quantum Mechanical (QM) calculations, Density Functional Theory (DFT) or Molecular Mechanics (MM) based methods, etc. Some of the areas where the potential of ML methods are shown to be effective are in drug design, prediction of high-level quantum mechanical energies, molecular design, molecular dynamics materials, and retrosynthesis of organic compounds, etc. This article intends to conceptually introduce various modern ML methods and their relevance and applications in computational natural sciences. Synopsis Recent surge in the application of machine learning (ML) methods in fundamental sciences has led to a perspective that these methods may become important tools in chemical science. This perspective provides an overview of the modern ML methods and their successful applications in chemistry during the last few years.
Collapse
Affiliation(s)
- Akshaya Karthikeyan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032 India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500 032 India
| |
Collapse
|
8
|
Modee R, Laghuvarapu S, Priyakumar UD. Benchmark study on deep neural network potentials for small organic molecules. J Comput Chem 2021; 43:308-318. [PMID: 34870332 DOI: 10.1002/jcc.26790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 11/13/2021] [Accepted: 11/15/2021] [Indexed: 11/06/2022]
Abstract
There has been tremendous advancement in machine learning (ML) applications in computational chemistry, particularly in neural network potentials (NNP). NNPs can approximate potential energy surface (PES) as a high dimensional function by learning from existing reference data, thereby circumventing the need to solve the electronic Schrödinger equation explicitly. As a result, ML accelerates chemical space exploration and property prediction compared to quantum mechanical methods. Novel ML methods have the potential to provide efficient means for predicting the properties of molecules. However, this potential has been limited by the lack of standard comparative evaluations. In this work, we compare four selected models, that is, ANI, PhysNet, SchNet, and BAND-NN, developed to represent the PES of small organic molecules. We evaluate these models for their accuracy and transferability on two different test sets (i) Small organic molecules of up to eight-heavy atoms on which ANI and SchNet achieve root mean square error (RMSE) of 0.55 and 0.60 kcal/mol, respectively. (ii) On random selection of molecules from the GDB-11 database with 10-heavy atoms, ANI achieves RMSE of 1.17 kcal/mol and SchNet achieves RMSE of 1.89 kcal/mol. We examine their ability to produce smooth meaningful surface by performing PES scans for bond stretch, angle bend, and dihedral rotations on relatively large molecules to assess their possible application in molecular dynamics simulations. We also evaluate their performance for yielding minimum energy structures via geometry optimization using various minimization algorithms. All these models were also able to accurately differentiate different isomers of the same empirical formula C 10 H 20 . ANI and PhysNet achieve an RMSE of 0.29 and 0.52 kcal/mol, respectively, on C 10 H 20 isomers.
Collapse
Affiliation(s)
- Rohit Modee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| |
Collapse
|
9
|
Pinheiro M, Ge F, Ferré N, Dral PO, Barbatti M. Choosing the right molecular machine learning potential. Chem Sci 2021; 12:14396-14413. [PMID: 34880991 PMCID: PMC8580106 DOI: 10.1039/d1sc03564a] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/14/2021] [Indexed: 11/21/2022] Open
Abstract
Quantum-chemistry simulations based on potential energy surfaces of molecules provide invaluable insight into the physicochemical processes at the atomistic level and yield such important observables as reaction rates and spectra. Machine learning potentials promise to significantly reduce the computational cost and hence enable otherwise unfeasible simulations. However, the surging number of such potentials begs the question of which one to choose or whether we still need to develop yet another one. Here, we address this question by evaluating the performance of popular machine learning potentials in terms of accuracy and computational cost. In addition, we deliver structured information for non-specialists in machine learning to guide them through the maze of acronyms, recognize each potential's main features, and judge what they could expect from each one.
Collapse
Affiliation(s)
- Max Pinheiro
- Aix Marseille University, CNRS, ICR Marseille France
| | - Fuchun Ge
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Nicolas Ferré
- Aix Marseille University, CNRS, ICR Marseille France
| | - Pavlo O Dral
- State Key Laboratory of Physical Chemistry of Solid Surfaces, Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, Department of Chemistry, College of Chemistry and Chemical Engineering, Xiamen University China
| | - Mario Barbatti
- Aix Marseille University, CNRS, ICR Marseille France
- Institut Universitaire de France 75231 Paris France
| |
Collapse
|
10
|
Modee R, Agarwal S, Verma A, Joshi K, Priyakumar UD. DART: deep learning enabled topological interaction model for energy prediction of metal clusters and its application in identifying unique low energy isomers. Phys Chem Chem Phys 2021; 23:21995-22003. [PMID: 34569568 DOI: 10.1039/d1cp02956h] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Recently, machine learning (ML) has proven to yield fast and accurate predictions of chemical properties to accelerate the discovery of novel molecules and materials. The majority of the work is on organic molecules, and much more work needs to be done for inorganic molecules, especially clusters. In the present work, we introduce a simple topological atomic descriptor called TAD, which encodes chemical environment information of each atom in the cluster. TAD is a simple and interpretable descriptor where each value represents the atom count in three shells. We also introduce the DART deep learning enabled topological interaction model, which uses TAD as a feature vector to predict energies of metal clusters, in our case gallium clusters with sizes ranging from 31 to 70 atoms. The DART model is designed based on the principle that the energy is a function of atomic interactions and allows us to model these complex atomic interactions to predict the energy. We further introduce a new dataset called GNC_31-70, which comprises structures and DFT optimized energies of gallium clusters with sizes ranging from 31 to 70 atoms. We show how DART can be used to accelerate the process of identification of low energy structures without geometry optimization. Albeit using a topological descriptor, DART achieves a mean absolute error (MAE) of 3.59 kcal mol-1 (0.15 eV) on the test set. We also show that our model can distinguish core and surface atoms in the Ga-70 cluster, which the model has never encountered earlier. Finally, we demonstrate the transferability of the DART model by predicting energies for about 6k unseen configurations picked up from molecular dynamics (MD) data for three cluster sizes (46, 57, and 60) within seconds. The DART model was able to reduce the load on DFT optimizations while identifying unique low energy structures from MD data.
Collapse
Affiliation(s)
- Rohit Modee
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India.
| | - Sheena Agarwal
- Physical and Materials Chemistry Division, CSIR-National Chemical Laboratory, Dr Homi Bhabha Road, Pune-411008, India. .,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh-201002, India
| | - Ashwini Verma
- Physical and Materials Chemistry Division, CSIR-National Chemical Laboratory, Dr Homi Bhabha Road, Pune-411008, India.
| | - Kavita Joshi
- Physical and Materials Chemistry Division, CSIR-National Chemical Laboratory, Dr Homi Bhabha Road, Pune-411008, India. .,Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, Uttar Pradesh-201002, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India.
| |
Collapse
|
11
|
Samaga YBL, Raghunathan S, Priyakumar UD. SCONES: Self-Consistent Neural Network for Protein Stability Prediction Upon Mutation. J Phys Chem B 2021; 125:10657-10671. [PMID: 34546056 DOI: 10.1021/acs.jpcb.1c04913] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Engineering proteins to have desired properties by mutating amino acids at specific sites is commonplace. Such engineered proteins must be stable to function. Experimental methods used to determine stability at throughputs required to scan the protein sequence space thoroughly are laborious. To this end, many machine learning based methods have been developed to predict thermodynamic stability changes upon mutation. These methods have been evaluated for symmetric consistency by testing with hypothetical reverse mutations. In this work, we propose transitive data augmentation, evaluating transitive consistency with our new Stransitive data set, and a new machine learning based method, the first of its kind, that incorporates both symmetric and transitive properties into the architecture. Our method, called SCONES, is an interpretable neural network that predicts small relative protein stability changes for missense mutations that do not significantly alter the structure. It estimates a residue's contributions toward protein stability (ΔG) in its local structural environment, and the difference between independently predicted contributions of the reference and mutant residues is reported as ΔΔG. We show that this self-consistent machine learning architecture is immune to many common biases in data sets, relies less on data than existing methods, is robust to overfitting, and can explain a substantial portion of the variance in experimental data.
Collapse
Affiliation(s)
- Yashas B L Samaga
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - Shampa Raghunathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
12
|
Mehta S, Laghuvarapu S, Pathak Y, Sethi A, Alvala M, Priyakumar UD. MEMES: Machine learning framework for Enhanced MolEcular Screening. Chem Sci 2021; 12:11710-11721. [PMID: 34659706 PMCID: PMC8442698 DOI: 10.1039/d1sc02783b] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Accepted: 07/24/2021] [Indexed: 01/20/2023] Open
Abstract
In drug discovery applications, high throughput virtual screening exercises are routinely performed to determine an initial set of candidate molecules referred to as "hits". In such an experiment, each molecule from a large small-molecule drug library is evaluated in terms of physical properties such as the docking score against a target receptor. In real-life drug discovery experiments, drug libraries are extremely large but still there is only a minor representation of the essentially infinite chemical space, and evaluation of physical properties for each molecule in the library is not computationally feasible. In the current study, a novel Machine learning framework for Enhanced MolEcular Screening (MEMES) based on Bayesian optimization is proposed for efficient sampling of the chemical space. The proposed framework is demonstrated to identify 90% of the top-1000 molecules from a molecular library of size about 100 million, while calculating the docking score only for about 6% of the complete library. We believe that such a framework would tremendously help to reduce the computational effort in not only drug-discovery but also areas that require such high-throughput experiments.
Collapse
Affiliation(s)
- Sarvesh Mehta
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology Hyderabad 500 032 India +91 40 6653 1413 +91 40 6653 1161
| | - Siddhartha Laghuvarapu
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology Hyderabad 500 032 India +91 40 6653 1413 +91 40 6653 1161
| | - Yashaswi Pathak
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology Hyderabad 500 032 India +91 40 6653 1413 +91 40 6653 1161
| | - Aaftaab Sethi
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research Hyderabad 500 037 India
| | - Mallika Alvala
- School of Pharmacy and Technology Management, Narsee Monjee Institute of Management Sciences Hyderabad India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology Hyderabad 500 032 India +91 40 6653 1413 +91 40 6653 1161
| |
Collapse
|
13
|
Folmsbee DL, Koes DR, Hutchison GR. Evaluation of Thermochemical Machine Learning for Potential Energy Curves and Geometry Optimization. J Phys Chem A 2021; 125:1987-1993. [PMID: 33630611 DOI: 10.1021/acs.jpca.0c10147] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
While many machine learning (ML) methods, particularly deep neural networks, have been trained for density functional and quantum chemical energies and properties, the vast majority of these methods focus on single-point energies. In principle, such ML methods, once trained, offer thermochemical accuracy on par with density functional and wave function methods but at speeds comparable to traditional force fields or approximate semiempirical methods. So far, most efforts have focused on optimized equilibrium single-point energies and properties. In this work, we evaluate the accuracy of several leading ML methods across a range of bond potential energy curves and torsional potentials. The methods were trained on the existing ANI-1 training set, calculated using the ωB97X/6-31G(d) single points at nonequilibrium geometries. We find that across a range of small molecules, several methods offer both qualitative accuracy (e.g., correct minima, both repulsive and attractive bond regions, anharmonic shape, and single minima) and quantitative accuracy in terms of the mean absolute percent error near the minima. At the moment, ANI-2x, FCHL, and a new libmolgrid-based convolutional neural net, the Colorful CNN, show good performance.
Collapse
Affiliation(s)
- Dakota L Folmsbee
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - David R Koes
- Department of Computational & Systems Biology, School of Medicine, University of Pittsburgh, 3420 Forbes Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - Geoffrey R Hutchison
- Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States.,Department of Chemical and Petroleum Engineering, University of Pittsburgh, 3700 O'Hara Street, Pittsburgh, Pennsylvania 15261, United States
| |
Collapse
|
14
|
Pathak Y, Mehta S, Priyakumar UD. Learning Atomic Interactions through Solvation Free Energy Prediction Using Graph Neural Networks. J Chem Inf Model 2021; 61:689-698. [PMID: 33546556 DOI: 10.1021/acs.jcim.0c01413] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Solvation free energy is a fundamental property that influences various chemical and biological processes, such as reaction rates, protein folding, drug binding, and bioavailability of drugs. In this work, we present a deep learning method based on graph networks to accurately predict solvation free energies of small organic molecules. The proposed model, comprising three phases, namely, message passing, interaction, and prediction, is able to predict solvation free energies in any generic organic solvent with a mean absolute error of 0.16 kcal/mol. In terms of accuracy, the current model outperforms all of the proposed machine learning-based models so far. The atomic interactions predicted in an unsupervised manner are able to explain the trends of free energies consistent with chemical wisdom. Further, the robustness of the machine learning-based model has been tested thoroughly, and its capability to interpret the predictions has been verified with several examples.
Collapse
Affiliation(s)
- Yashaswi Pathak
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Sarvesh Mehta
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| |
Collapse
|
15
|
Rahaman O, Gagliardi A. Deep Learning Total Energies and Orbital Energies of Large Organic Molecules Using Hybridization of Molecular Fingerprints. J Chem Inf Model 2020; 60:5971-5983. [PMID: 33118351 DOI: 10.1021/acs.jcim.0c00687] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
The ability to predict material properties without the need for resource-consuming experimental efforts can immensely accelerate material and drug discovery. Although ab initio methods can be reliable and accurate in making such predictions, they are computationally too expensive on a large scale. The recent advancements in artificial intelligence and machine learning as well as the availability of large quantum mechanics derived datasets enable us to train models on these datasets as a benchmark and to make fast predictions on much larger datasets. The success of these machine learning models highly depends on the machine-readable fingerprints of the molecules that capture their chemical properties as well as topological information. In this work, we propose a common deep learning-based framework to combine different types of molecular fingerprints to enhance prediction accuracy. A graph neural network (GNN), many-body tensor representation (MBTR), and a set of simple molecular descriptors (MD) were used to predict the total energies, highest occupied molecular orbital (HOMO) energies, and lowest unoccupied molecular orbital (LUMO) energies of a dataset containing ∼62k large organic molecules with complex aromatic rings and remarkably diverse functional groups. The results demonstrate that a combination of best performing molecular fingerprints can produce better results than the individual ones. The simple and flexible deep learning framework developed in this work can be easily adapted to incorporate other types of molecular fingerprints.
Collapse
Affiliation(s)
- Obaidur Rahaman
- Technische Universität München, Karlstr. 45, 80333 Munich, Germany
| | | |
Collapse
|
16
|
Pattnaik P, Raghunathan S, Kalluri T, Bhimalapuram P, Jawahar CV, Priyakumar UD. Machine Learning for Accurate Force Calculations in Molecular Dynamics Simulations. J Phys Chem A 2020; 124:6954-6967. [DOI: 10.1021/acs.jpca.0c03926] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Punyaslok Pattnaik
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - Shampa Raghunathan
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - Tarun Kalluri
- Center for Visual Information Technology, KCIS, International Institute of Information Technology, Hyderabad 500 032, India
| | - Prabhakar Bhimalapuram
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| | - C. V. Jawahar
- Center for Visual Information Technology, KCIS, International Institute of Information Technology, Hyderabad 500 032, India
| | - U. Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500 032, India
| |
Collapse
|
17
|
Pathak Y, Juneja KS, Varma G, Ehara M, Priyakumar UD. Deep learning enabled inorganic material generator. Phys Chem Chem Phys 2020; 22:26935-26943. [DOI: 10.1039/d0cp03508d] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
A machine learning framework that generates material compositions exhibiting properties desired by the user.
Collapse
Affiliation(s)
- Yashaswi Pathak
- International Institute of Information Technology
- Hyderabad 500 032
- India
| | | | - Girish Varma
- International Institute of Information Technology
- Hyderabad 500 032
- India
| | - Masahiro Ehara
- Research Center for Computational Science
- Institute for Molecular Science
- Okazaki 444-8585
- Japan
| | | |
Collapse
|