1
|
Spiekermann KA, Dong X, Menon A, Green WH, Pfeifle M, Sandfort F, Welz O, Bergeler M. Accurately Predicting Barrier Heights for Radical Reactions in Solution Using Deep Graph Networks. J Phys Chem A 2024; 128:8384-8403. [PMID: 39298746 DOI: 10.1021/acs.jpca.4c04121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
Quantitative estimates of reaction barriers and solvent effects are essential for developing kinetic mechanisms and predicting reaction outcomes. Here, we create a new data set of 5,600 unique elementary radical reactions calculated using the M06-2X/def2-QZVP//B3LYP-D3(BJ)/def2-TZVP level of theory. A conformer search is done for each species using TPSS/def2-TZVP. Gibbs free energies of activation and of reaction for these radical reactions in 40 common solvents are obtained using COSMO-RS for solvation effects. These balanced reactions involve the elements H, C, N, O, and S, contain up to 19 heavy atoms, and have atom-mapped SMILES. All transition states are verified by an intrinsic reaction coordinate calculation. We next train a deep graph network to directly estimate the Gibbs free energy of activation and of reaction in both gas and solution phases using only the atom-mapped SMILES of the reactant and product and the SMILES of the solvent. This simple input representation avoids computationally expensive optimizations for the reactant, transition state, and product structures during inference, making our model well-suited for high-throughput predictive chemistry and quickly providing information for (retro-)synthesis planning tools. To properly measure model performance, we report results on both interpolative and extrapolative data splits and also compare to several baseline models. During training and testing, the data set is augmented by including the reverse direction of each reaction and variants with different resonance structures. After data augmentation, we have around 2 million entries to train the model, which achieves a testing set mean absolute error of 1.16 kcal mol-1 for the Gibbs free energy of activation in solution. We anticipate this model will accelerate predictions for high-throughput screening to quickly identify relevant reactions in solution, and our data set will serve as a benchmark for future studies.
Collapse
Affiliation(s)
- Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Xiaorui Dong
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Mark Pfeifle
- BASF Digital Solutions GmbH, Ludwigshafen am Rhein 67061, Germany
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Oliver Welz
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Maike Bergeler
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| |
Collapse
|
2
|
Ferraz-Caetano J, Teixeira F, Cordeiro MNDS. Explainable Supervised Machine Learning Model To Predict Solvation Gibbs Energy. J Chem Inf Model 2024; 64:2250-2262. [PMID: 37603608 PMCID: PMC11005042 DOI: 10.1021/acs.jcim.3c00544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Indexed: 08/23/2023]
Abstract
Many challenges persist in developing accurate computational models for predicting solvation free energy (ΔGsol). Despite recent developments in Machine Learning (ML) methodologies that outperformed traditional quantum mechanical models, several issues remain concerning explanatory insights for broad chemical predictions with an acceptable speed-accuracy trade-off. To overcome this, we present a novel supervised ML model to predict the ΔGsol for an array of solvent-solute pairs. Using two different ensemble regressor algorithms, we made fast and accurate property predictions using open-source chemical features, encoding complex electronic, structural, and surface area descriptors for every solvent and solute. By integrating molecular properties and chemical interaction features, we have analyzed individual descriptor importance and optimized our model though explanatory information form feature groups. On aqueous and organic solvent databases, ML models revealed the predictive relevance of solutes with increasing polar surface area and decreasing polarizability, yielding better results than state-of-the-art benchmark Neural Network methods (without complex quantum mechanical or molecular dynamic simulations). Both algorithms successfully outperformed previous ΔGsol predictions methods, with a maximum absolute error of 0.22 ± 0.02 kcal mol-1, further validated in an external benchmark database and with solvent hold-out tests. With these explanatory and statistical insights, they allow a thoughtful application of this method for predicting other thermodynamic properties, stressing the relevance of ML modeling for further complex computational chemistry problems.
Collapse
Affiliation(s)
- José Ferraz-Caetano
- Department
of Chemistry and Biochemistry − Faculty of Sciences, University of Porto - Rua do Campo Alegre, S/N, 4169-007 Porto, Portugal
| | - Filipe Teixeira
- Centre
of Chemistry, University of Minho, Campus
de Gualtar, 4710-057 Braga, Portugal
| | - M. Natália D. S. Cordeiro
- Department
of Chemistry and Biochemistry − Faculty of Sciences, University of Porto - Rua do Campo Alegre, S/N, 4169-007 Porto, Portugal
| |
Collapse
|
3
|
Kim Y, Jung H, Kumar S, Paton RS, Kim S. Designing solvent systems using self-evolving solubility databases and graph neural networks. Chem Sci 2024; 15:923-939. [PMID: 38239675 PMCID: PMC10793204 DOI: 10.1039/d3sc03468b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/04/2023] [Indexed: 01/22/2024] Open
Abstract
Designing solvent systems is key to achieving the facile synthesis and separation of desired products from chemical processes, so many machine learning models have been developed to predict solubilities. However, breakthroughs are needed to address deficiencies in the model's predictive accuracy and generalizability; this can be addressed by expanding and integrating experimental and computational solubility databases. To maximize predictive accuracy, these two databases should not be trained separately, and they should not be simply combined without reconciling the discrepancies from different magnitudes of errors and uncertainties. Here, we introduce self-evolving solubility databases and graph neural networks developed through semi-supervised self-training approaches. Solubilities from quantum-mechanical calculations are referred to during semi-supervised learning, but they are not directly added to the experimental database. Dataset augmentation is performed from 11 637 experimental solubilities to >900 000 data points in the integrated database, while correcting for the discrepancies between experiment and computation. Our model was successfully applied to study solvent selection in organic reactions and separation processes. The accuracy (mean absolute error around 0.2 kcal mol-1 for the test set) is quantitatively useful in exploring Linear Free Energy Relationships between reaction rates and solvation free energies for 11 organic reactions. Our model also accurately predicted the partition coefficients of lignin-derived monomers and drug-like molecules. While there is room for expanding solubility predictions to transition states, radicals, charged species, and organometallic complexes, this approach will be attractive to predictive chemistry areas where experimental, computational, and other heterogeneous data should be combined.
Collapse
Affiliation(s)
- Yeonjoon Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
- Department of Chemistry, Pukyong National University Busan 48513 Republic of Korea
| | - Hojin Jung
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Sabari Kumar
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Seonah Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
4
|
Pattanaik L, Menon A, Settels V, Spiekermann KA, Tan Z, Vermeire FH, Sandfort F, Eiden P, Green WH. ConfSolv: Prediction of Solute Conformer-Free Energies across a Range of Solvents. J Phys Chem B 2023; 127:10151-10170. [PMID: 37966798 DOI: 10.1021/acs.jpcb.3c05904] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Predicting Gibbs free energy of solution is key to understanding the solvent effects on thermodynamics and reaction rates for kinetic modeling. Accurately computing solution free energies requires the enumeration and evaluation of relevant solute conformers in solution. However, even after generation of relevant conformers, determining their free energy of solution requires an expensive workflow consisting of several ab initio computational chemistry calculations. To help address this challenge, we generate a large data set of solution free energies for nearly 44,000 solutes with almost 9 million conformers calculated in 41 different solvents using density functional theory and COSMO-RS and quantify the impact of solute conformers on the solution free energy. We then train a message passing neural network to predict the relative solution free energies of a set of solute conformers, enabling the identification of a small subset of thermodynamically relevant conformers. The model offers substantial computational time savings with predictions usually substantially within 1 kcal/mol of the free energy of the solution calculated by using computational chemical methods.
Collapse
Affiliation(s)
- Lagnajit Pattanaik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Angiras Menon
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Volker Settels
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Kevin A Spiekermann
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Zipei Tan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
- Department of Chemical Engineering, KU Leuven, Celestijnenlaan 200F, Leuven 3001, Belgium
| | - Frederik Sandfort
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - Philipp Eiden
- BASF SE, Scientific Modeling, Group Research, Ludwigshafen am Rhein 67056, Germany
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
5
|
Kubečka J, Knattrup Y, Engsvang M, Jensen AB, Ayoubi D, Wu H, Christiansen O, Elm J. Current and future machine learning approaches for modeling atmospheric cluster formation. NATURE COMPUTATIONAL SCIENCE 2023; 3:495-503. [PMID: 38177415 DOI: 10.1038/s43588-023-00435-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 03/16/2023] [Indexed: 01/06/2024]
Abstract
The formation of strongly bound atmospheric molecular clusters is the first step towards forming new aerosol particles. Recent advances in the application of machine learning models open an enormous opportunity for complementing expensive quantum chemical calculations with efficient machine learning predictions. In this Perspective, we present how data-driven approaches can be applied to accelerate cluster configurational sampling, thereby greatly increasing the number of chemically relevant systems that can be covered.
Collapse
Affiliation(s)
- Jakub Kubečka
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Yosef Knattrup
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | | | | | - Daniel Ayoubi
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | - Haide Wu
- Department of Chemistry, Aarhus University, Aarhus, Denmark
| | | | - Jonas Elm
- Department of Chemistry, Aarhus University, Aarhus, Denmark.
- iCLIMATE Aarhus University Interdisciplinary Centre for Climate Change, Aarhus, Denmark.
| |
Collapse
|
6
|
Liao M, Wu F, Yu X, Zhao L, Wu H, Zhou J. Random Forest Algorithm-Based Prediction of Solvation Gibbs Energies. J SOLUTION CHEM 2023. [DOI: 10.1007/s10953-023-01247-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
7
|
Yao S, Van R, Pan X, Park JH, Mao Y, Pu J, Mei Y, Shao Y. Machine learning based implicit solvent model for aqueous-solution alanine dipeptide molecular dynamics simulations. RSC Adv 2023; 13:4565-4577. [PMID: 36760282 PMCID: PMC9900604 DOI: 10.1039/d2ra08180f] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 01/20/2023] [Indexed: 02/05/2023] Open
Abstract
Inspired by the recent work from Noé and coworkers on the development of machine learning based implicit solvent model for the simulation of solvated peptides [Chen et al., J. Chem. Phys., 2021, 155, 084101], here we report another investigation of the possibility of using machine learning (ML) techniques to "derive" an implicit solvent model directly from explicit solvent molecular dynamics (MD) simulations. For alanine dipeptide, a machine learning potential (MLP) based on the DeepPot-SE representation of the molecule was trained to capture its interactions with its average solvent environment configuration (ASEC). The predicted forces on the solute deviated only by an RMSD of 0.4 kcal mol-1 Å-1 from the reference values, and the MLP-based free energy surface differed from that obtained from explicit solvent MD simulations by an RMSD of less than 0.9 kcal mol-1. Our MLP training protocol could also accurately reproduce combined quantum mechanical molecular mechanical (QM/MM) forces on the quantum mechanical (QM) solute in ASEC environment, thus enabling the development of accurate ML-based implicit solvent models for ab initio-QM MD simulations. Such ML-based implicit solvent models for QM calculations are cost-effective in both the training stage, where the use of ASEC reduces the number of data points to be labelled, and the inference stage, where the MLP can be evaluated at a relatively small additional cost on top of the QM calculation of the solute.
Collapse
Affiliation(s)
- Songyuan Yao
- Department of Chemistry and Biochemistry, University of Oklahoma Norman OK 73019 USA
| | - Richard Van
- Department of Chemistry and Biochemistry, University of Oklahoma Norman OK 73019 USA
| | - Xiaoliang Pan
- Department of Chemistry and Biochemistry, University of Oklahoma Norman OK 73019 USA
| | - Ji Hwan Park
- School of Computer Science, University of Oklahoma Norman OK 73019 USA
| | - Yuezhi Mao
- Department of Chemistry and Biochemistry, San Diego State University San Diego CA 92182 USA
| | - Jingzhi Pu
- Department of Chemistry and Chemical Biology, Indiana University-Purdue University Indianapolis Indianapolis IN 46202 USA
| | - Ye Mei
- State Key Laboratory of Precision Spectroscopy, School of Physics and Electronic Science, East China Normal University Shanghai 200062 China
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai Shanghai 200062 China
- Collaborative Innovation Center of Extreme Optics, Shanxi University Taiyuan Shanxi 030006 China
| | - Yihan Shao
- Department of Chemistry and Biochemistry, University of Oklahoma Norman OK 73019 USA
| |
Collapse
|
8
|
Low K, Coote ML, Izgorodina EI. Explainable Solvation Free Energy Prediction Combining Graph Neural Networks with Chemical Intuition. J Chem Inf Model 2022; 62:5457-5470. [PMID: 36317829 DOI: 10.1021/acs.jcim.2c01013] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The prediction of a molecule's solvation Gibbs free (ΔGsolv) energy in a given solvent is an important task which has traditionally been carried out via quantum chemical continuum methods or force field-based molecular simulations. Machine learning (ML) and graph neural networks in particular have emerged as powerful techniques for elucidating structure-property relationships. This work presents a graph neural network (GNN) for the prediction of ΔGsolv which, in addition to encoding typical atom and bond-level features, incorporates chemically intuitive, solvation-relevant parameters into the featurization process: semiempirical partial atomic charges and solvent dielectric constant. Solute-solvent interactions are included via an interaction map layer which can be visualized to examine solubility-enhancing or -decreasing interactions learnt by the model. On a test set of small organic molecules, our GNN predicts ΔGsolv in water and cyclohexane with an accuracy comparable to polarizable and ab initio generated force field methods [mean absolute error (MAE) = 0.4 and 0.2 kcal mol-1, respectively], without the need for any molecular simulation. For the FreeSolv data set of hydration free energies, the test MAE is 0.7 kcal mol-1. Interpretability and applicability of the model is highlighted through several examples including rationalizing the increased solubility of modified diaminoanthraquinones in organic solvents. The clear explanations afforded by our GNN allow for easy understanding of the model's predictions, giving the experimental chemist confidence in employing ML models toward more optimized synthetic routes.
Collapse
Affiliation(s)
- Kaycee Low
- Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria3800, Australia
| | - Michelle L Coote
- Institute for Nanoscale Science and Technology, College of Science and Engineering, Flinders University, Bedford Park, South Australia5042, Australia
| | - Ekaterina I Izgorodina
- Monash Computational Chemistry Group, School of Chemistry, Monash University, Clayton, Victoria3800, Australia
| |
Collapse
|
9
|
Kanamaru Y, Matsui T. Factor analysis of error in oxidation potential calculation: A machine learning study. J Comput Chem 2022; 43:1504-1512. [PMID: 35762851 DOI: 10.1002/jcc.26953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/23/2022] [Accepted: 06/06/2022] [Indexed: 11/11/2022]
Abstract
The conductor-like polarizable continuum model (C-PCM), which is a low-cost solvation model, cannot treat characteristic interactions between the solvent and substructure(s) of the solute. Moreover, the error in a charged system is significant. Using machine learning, we clarified that the systematic error of the oxidation potential calculated by the G3B3/C-PCM was correlated with the molecular size of a solute. The G3B3/C-PCM overestimated the Gibbs oxidation energy by averaging 6.94 kcal/mol. According to the performance of related methods reported in previous studies, this error is mainly due to the solvation energy of the charged solute. Additionally, we succeeded in reducing the error to 2.27 kcal/mol (32%)-3.2 kcal/mol (40%) by correction based on the substructure information of the solute. To modify the C-PCM, effects that correlate with the molecular size of the solute in the charged system should be incorporated.
Collapse
Affiliation(s)
- Yuki Kanamaru
- Department of Chemistry, Graduate School of Pure and Applied Science, University of Tsukuba, Tsukuba, Japan
| | - Toru Matsui
- Department of Chemistry, Graduate School of Pure and Applied Science, University of Tsukuba, Tsukuba, Japan
| |
Collapse
|
10
|
Vermeire FH, Chung Y, Green WH. Predicting Solubility Limits of Organic Solutes for a Wide Range of Solvents and Temperatures. J Am Chem Soc 2022; 144:10785-10797. [PMID: 35687887 DOI: 10.1021/jacs.2c01768] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The solubility of organic molecules is crucial in organic synthesis and industrial chemistry; it is important in the design of many phase separation and purification units, and it controls the migration of many species into the environment. To decide which solvents and temperatures can be used in the design of new processes, trial and error is often used, as the choice is restricted by unknown solid solubility limits. Here, we present a fast and convenient computational method for estimating the solubility of solid neutral organic molecules in water and many organic solvents for a broad range of temperatures. The model is developed by combining fundamental thermodynamic equations with machine learning models for solvation free energy, solvation enthalpy, Abraham solute parameters, and aqueous solid solubility at 298 K. We provide free open-source and online tools for the prediction of solid solubility limits and a curated data collection (SolProp) that includes more than 5000 experimental solid solubility values for validation of the model. The model predictions are accurate for aqueous systems and for a huge range of organic solvents up to 550 K or higher. Methods to further improve solid solubility predictions by providing experimental data on the solute of interest in another solvent, or on the solute's sublimation enthalpy, are also presented.
Collapse
Affiliation(s)
- Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Singh S, Sunoj RB. A Transfer Learning Approach for Reaction Discovery in Small Data Situations Using Generative Model. iScience 2022; 25:104661. [PMID: 35832891 PMCID: PMC9272387 DOI: 10.1016/j.isci.2022.104661] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 05/20/2022] [Accepted: 06/16/2022] [Indexed: 11/01/2022] Open
Abstract
Sustainable practices in chemical sciences can be better realized by adopting interdisciplinary approaches that combine the advantages of machine learning (ML) on the initially acquired small data in reaction discovery. Developing new reactions generally remains heuristic and even time and resource intensive. For instance, synthesis of fluorine-containing compounds, which constitute ∼20% of the marketed drugs, relies on deoxyfluorination of abundantly available alcohols. Herein, we demonstrate the use of a recurrent neural network-based deep generative model built on a library of just 37 alcohols for effective learning and exploration of the chemical space. The proof-of-concept ML model is able to generate good quality, synthetically accessible, higher-yielding novel alcohol molecules. This protocol would have superior utility for deployment into a practical reaction discovery pipeline. Dual pronged transfer learning, both to generate and predict yields of new molecules Demonstrated the utility for an important family of deoxyfluorination of alcohols Applicable for practically more likely situations with relatively smaller data Extendable to other reaction manifolds to facilitate expedited reaction discovery
Collapse
|
12
|
Lou C, Yang H, Wang J, Huang M, Li W, Liu G, Lee PW, Tang Y. IDL-PPBopt: A Strategy for Prediction and Optimization of Human Plasma Protein Binding of Compounds via an Interpretable Deep Learning Method. J Chem Inf Model 2022; 62:2788-2799. [PMID: 35607907 DOI: 10.1021/acs.jcim.2c00297] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The prediction and optimization of pharmacokinetic properties are essential in lead optimization. Traditional strategies mainly depend on the empirical chemical rules from medicinal chemists. However, with the rising amount of data, it is getting more difficult to manually extract useful medicinal chemistry knowledge. To this end, we introduced IDL-PPBopt, a computational strategy for predicting and optimizing the plasma protein binding (PPB) property based on an interpretable deep learning method. At first, a curated PPB data set was used to construct an interpretable deep learning model, which showed excellent predictive performance with a root mean squared error of 0.112 for the entire test set. Then, we designed a detection protocol based on the model and Wilcoxon test to identify the PPB-related substructures (named privileged substructures, PSubs) for each molecule. In total, 22 general privileged substructures (GPSubs) were identified, which shared some common features such as nitrogen-containing groups, diamines with two carbon units, and azetidine. Furthermore, a series of second-level chemical rules for each GPSub were derived through a statistical test and then summarized into substructure pairs. We demonstrated that these substructure pairs were equally applicable outside the training set and accordingly customized the structural modification schemes for each GPSub, which provided alternatives for the optimization of the PPB property. Therefore, IDL-PPBopt provides a promising scheme for the prediction and optimization of the PPB property and would be helpful for lead optimization of other pharmacokinetic properties.
Collapse
Affiliation(s)
- Chaofeng Lou
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Hongbin Yang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Jiye Wang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Mengting Huang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Weihua Li
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Guixia Liu
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Philip W Lee
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| | - Yun Tang
- Shanghai Frontiers Science Center of Optogenetic Techniques for Cell Metabolism, School of Pharmacy, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
13
|
Modeling of the Crystallization Conditions for Organic Synthesis Product Purification Using Deep Learning. ELECTRONICS 2022. [DOI: 10.3390/electronics11091360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Crystallization is an important purification technique for solid products in a chemical laboratory. However, the correct selection of a solvent is important for the success of the procedure. In order to accelerate the solvent or solvent mixture search process, we offer an in silico alternative, i.e., a never previously demonstrated approach that can model the reaction mixture crystallization conditions which are invariant to the reaction type. The offered deep learning-based method is trained to directly predict the solvent labels used in the crystallization steps of the synthetic procedure. Our solvent label prediction task is a multi-label multi-class classification task during which the method must correctly choose one or several solvents from 13 possible examples. During the experimental investigation, we tested two multi-label classifiers (i.e., Feed-Forward and Long Short-Term Memory neural networks) applied on top of vectors. For the vectorization, we used two methods (i.e., extended-connectivity fingerprints and autoencoders) with various parameters. Our optimized technique was able to reach the accuracy of 0.870 ± 0.004 (which is 0.693 above the baseline) on the testing dataset. This allows us to assume that the proposed approach can help to accelerate manual R&D processes in chemical laboratories.
Collapse
|
14
|
Zhang D, Xia S, Zhang Y. Accurate Prediction of Aqueous Free Solvation Energies Using 3D Atomic Feature-Based Graph Neural Network with Transfer Learning. J Chem Inf Model 2022; 62:1840-1848. [PMID: 35422122 PMCID: PMC9038704 DOI: 10.1021/acs.jcim.2c00260] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Graph neural network (GNN)-based deep learning (DL) models have been widely implemented to predict the experimental aqueous solvation free energy, while its prediction accuracy has reached a plateau partly due to the scarcity of available experimental data. In order to tackle this challenge, we first build a large and diverse calculated data set Frag20-Aqsol-100K of aqueous solvation free energy with reasonable computational cost and accuracy via electronic structure calculations with continuum solvent models. Then, we develop a novel 3D atomic feature-based GNN model with the principal neighborhood aggregation (PNAConv) and demonstrate that 3D atomic features obtained from molecular mechanics-optimized geometries can significantly improve the learning power of GNN models in predicting calculated solvation free energies. Finally, we employ a transfer learning strategy by pre-training our DL model on Frag20-Aqsol-100K and fine-tuning it on the small experimental data set, and the fine-tuned model A3D-PNAConv-FT achieves the state-of-the-art prediction on the FreeSolv data set with a root-mean-squared error of 0.719 kcal/mol and a mean-absolute error of 0.417 kcal/mol using random data splits. These results indicate that integrating molecular modeling and DL would be a promising strategy to develop robust prediction models in molecular science. The source code and data are accessible at: https://yzhang.hpc.nyu.edu/IMA.
Collapse
Affiliation(s)
- Dongdong Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Song Xia
- Department of Chemistry, New York University, New York, New York 10003, United States
| | - Yingkai Zhang
- Department of Chemistry, New York University, New York, New York 10003, United States
- NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| |
Collapse
|
15
|
Parakkal S, Datta R, Das D. DeepBBBP: High accuracy Blood-Brain-Barrier Permeability Prediction with a Mixed Deep Learning Model. Mol Inform 2022; 41:e2100315. [PMID: 35393777 DOI: 10.1002/minf.202100315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 04/07/2022] [Indexed: 11/05/2022]
Abstract
Blood-brain-barrier permeability (BBBP) is an important property that is used to establish the drug-likeness of a molecule, as it establishes whether the molecule can cross the BBB when desired. It also eliminates those molecules which are not supposed to cross the barrier, as doing so would lead to toxicity. BBBP can be measured in vivo, in vitro or in silico. With the advent and subsequent rise of in silico methods for virtual drug screening, quite a bit of work has been done to predict this feature using statistical machine learning (ML) and deep learning (DL) based methods. In this work a mixed DL-based model, consisting of a Multi-layer Perceptron (MLP) and Convolutional Neural Network layers, has been paired with Mol2vec. Mol2vec is a convenient and unsupervised machine learning technique which produces high-dimensional vector representations of molecules and its molecular substructures. These succinct vector representations are utilized as inputs to the mixed DL model that is used for BBBP predictions. Several well-known benchmarks incorporating BBBP data have been used for supervised training and prediction by our mixed DL model which demonstrates superior results when compared to existing ML and DL techniques used for predicting BBBP.
Collapse
|
16
|
Harada Y, Hatakeyama M, Maeda S, Gao Q, Koizumi K, Sakamoto Y, Ono Y, Nakamura S. Molecular Design Learned from the Natural Product Porphyra-334: Molecular Generation via Chemical Variational Autoencoder versus Database Mining via Similarity Search, A Comparative Study. ACS OMEGA 2022; 7:8581-8590. [PMID: 35309498 PMCID: PMC8928499 DOI: 10.1021/acsomega.1c06453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/18/2022] [Indexed: 06/14/2023]
Abstract
A comparative study is presented. The method via chemical variational autoencoder (VAE) and the method via similarity search are compared, focusing on their generation ability for new functional molecular design. Focusing on the natural porphyra-334 as a model molecule, we generated three groups: molecules of mycosporine-like amino acids (MAAs) as seeds (G SEEDS ), molecules generated via chemical VAE (G VAE ) and molecules gathered via similarity search (G SIM ). The number of molecules that satisfy the condition for the light absorption ability of porphyra-334 in G SEEDS , G VAE , and G SIM are 52, 138, and 6, respectively. The method via chemical VAE shows a promising potential for future molecular design. By using quantum chemistry wave function properties for chemical VAE, we find new molecules that are comparable to porphyra-334, including some with unexpected geometries. At the end, we show a group of molecules found with this method.
Collapse
Affiliation(s)
- Yuki Harada
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Makoto Hatakeyama
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
- Sanyo-Onoda
City University, 1-1-1
Daigakudori, Sanyo-Onoda, Yamaguchi 756-0884, Japan
| | - Shuichi Maeda
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Qi Gao
- Mitsubishi
Chemical Corporation Science & Innovation Center 1000 Kamoshida-cho, Yokohama, Kanagawa 227-8502, Japan
| | - Kenichi Koizumi
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Yuki Sakamoto
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| | - Yuuki Ono
- Mitsubishi
Chemical Corporation Science & Innovation Center 1000 Kamoshida-cho, Yokohama, Kanagawa 227-8502, Japan
| | - Shinichiro Nakamura
- Cluster
for Science, Technology, and Innovation Hub, Nakamura Laboratory, RIKEN, 2-1, Hirosawa, Wako, Saitama 351-0198, Japan
| |
Collapse
|
17
|
Bensberg M, Türtscher PL, Unsleber JP, Reiher M, Neugebauer J. Solvation Free Energies in Subsystem Density Functional Theory. J Chem Theory Comput 2022; 18:723-740. [PMID: 34985890 DOI: 10.1021/acs.jctc.1c00864] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
For many chemical processes the accurate description of solvent effects are vitally important. Here, we describe a hybrid ansatz for the explicit quantum mechanical description of solute-solvent and solvent-solvent interactions based on subsystem density functional theory and continuum solvation schemes. Since explicit solvent molecules may compromise the scalability of the model and transferability of the predicted solvent effect, we aim to retain both, for different solutes as well as for different solvents. The key for the transferability is the consistent subsystem decomposition of solute and solvent. The key for the scalability is the performance of subsystem DFT for increasing numbers of subsystems. We investigate molecular dynamics and stationary point sampling of solvent configurations and compare the resulting (Gibbs) free energies to experiment and theoretical methods. We can show that with our hybrid model reaction barriers and reaction energies are accurately reproduced compared to experimental data.
Collapse
Affiliation(s)
- Moritz Bensberg
- Theoretische Organische Chemie, Organisch-Chemisches Institut and Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Paul L Türtscher
- ETH Zürich, Laboratorium für Physikalische Chemie, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Jan P Unsleber
- ETH Zürich, Laboratorium für Physikalische Chemie, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Markus Reiher
- ETH Zürich, Laboratorium für Physikalische Chemie, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland
| | - Johannes Neugebauer
- Theoretische Organische Chemie, Organisch-Chemisches Institut and Center for Multiscale Theory and Computation, Westfälische Wilhelms-Universität Münster, Corrensstraße 36, 48149 Münster, Germany
| |
Collapse
|
18
|
Komp E, Janulaitis N, Valleau S. Progress towards machine learning reaction rate constants. Phys Chem Chem Phys 2021; 24:2692-2705. [PMID: 34935798 DOI: 10.1039/d1cp04422b] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Quantum and classical reaction rate constant calculations come at the cost of exploring potential energy surfaces. Due to the "curse of dimensionality", their evaluation quickly becomes unfeasible as the system size grows. Machine learning algorithms can accelerate the calculation of reaction rate constants by predicting them using low cost input features. In this perspective, we briefly introduce supervised machine learning algorithms in the context of reaction rate constant prediction. We discuss existing and recently created kinetic datasets and input feature representations as well as the use and design of machine learning algorithms to predict reaction rate constants or quantities required for their computation. Amongst these, we first describe the use of machine learning to predict activation, reaction, solvation and dissociation energies. We then look at the use of machine learning to predict reactive force field parameters, reaction rate constants as well as to help accelerate the search for minimum energy paths. Lastly, we provide an outlook on areas which have yet to be explored so as to improve and evaluate the use of machine learning algorithms for chemical reaction rate constants.
Collapse
Affiliation(s)
- Evan Komp
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| | - Nida Janulaitis
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| | - Stéphanie Valleau
- Department of Chemical Engineering, University of Washington, Seattle, Washington 98195, USA.
| |
Collapse
|
19
|
Gao P, Yang X, Tang YH, Zheng M, Andersen A, Murugesan V, Hollas A, Wang W. Graphical Gaussian process regression model for aqueous solvation free energy prediction of organic molecules in redox flow batteries. Phys Chem Chem Phys 2021; 23:24892-24904. [PMID: 34724700 DOI: 10.1039/d1cp04475c] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The solvation free energy of organic molecules is a critical parameter in determining emergent properties such as solubility, liquid-phase equilibrium constants, pKa and redox potentials in an organic redox flow battery. In this work, we present a machine learning (ML) model that can learn and predict the aqueous solvation free energy of an organic molecule using the Gaussian process regression method based on a new molecular graph kernel. To investigate the performance of the ML model for electrostatic interaction, the nonpolar interaction contribution of the solvent and the conformational entropy of the solute in the solvation free energy, three data sets with implicit or explicit water solvent models, and contribution of the conformational entropy of the solute are tested. We demonstrate that our ML model can predict the solvation free energy of molecules at chemical accuracy with a mean absolute error of less than 1 kcal mol-1 for subsets of the QM9 dataset and the Freesolv database. To solve the general data scarcity problem for a graph-based ML model, we propose a dimension reduction algorithm based on the distance between molecular graphs, which can be used to examine the diversity of the molecular data set. It provides a promising way to build a minimum training set to improve prediction for certain test sets where the space of molecular structures is predetermined.
Collapse
Affiliation(s)
- Peiyuan Gao
- Pacific Northwest National Laboratory, Richland 99352, USA.
| | - Xiu Yang
- Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA 18015, USA.
| | - Yu-Hang Tang
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| | - Muqing Zheng
- Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA 18015, USA.
| | - Amity Andersen
- Pacific Northwest National Laboratory, Richland 99352, USA.
| | | | - Aaron Hollas
- Pacific Northwest National Laboratory, Richland 99352, USA.
| | - Wei Wang
- Pacific Northwest National Laboratory, Richland 99352, USA.
| |
Collapse
|
20
|
Panwar A, Shirazian S, Singh M, Walker GM. Comprehensive modelling of pharmaceutical solvation energy in different solvents. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2021.117390] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
21
|
Ye S, Liang J, Zhu X. Catalyst deep neural networks (Cat-DNNs) in singlet fission property prediction. Phys Chem Chem Phys 2021; 23:20835-20840. [PMID: 34505584 DOI: 10.1039/d1cp03594k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Many current deep neural network (DNN) models only focus on straightforward optimization over the given database. However, most numerical fitting procedures depart from physical laws. By introducing the concept of "catalysis" from physical chemistry, we propose that the physical correlations among molecular properties could spontaneously act as a catalyst in the DNNs, which increases the accuracy, and more importantly, guides the DNNs in the right way. These Catalysis-DNNs (Cat-DNNs) could precisely predict both the ground and excited-state properties, especially the molecules' screening with singlet fission character. We show that traditional machine learning metrics are not suitable for evaluating model accuracy in physical-chemical tasks and issue new physical errors. We believe that the agile transfer of fundamental physics or chemistry domain knowledge, like the catalyst, could significantly benefit both the architecture and application of artificial intelligence technology in the future.
Collapse
Affiliation(s)
- Shuqian Ye
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| | - Jiechun Liang
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| | - Xi Zhu
- School of Science and Engineering (SSE), Shenzhen Institute of Artificial Intelligence and Robotics for Society (AIRS), The Chinese University of Hong Kong, Shenzhen(CUHK-Shenzhen), 14-15F, Tower G2, Xinghe World, Rd Yabao, Longgang District, Shenzhen, Guangdong, 518172, China.
| |
Collapse
|
22
|
Nandy A, Duan C, Taylor MG, Liu F, Steeves AH, Kulik HJ. Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning. Chem Rev 2021; 121:9927-10000. [PMID: 34260198 DOI: 10.1021/acs.chemrev.1c00347] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transition-metal complexes are attractive targets for the design of catalysts and functional materials. The behavior of the metal-organic bond, while very tunable for achieving target properties, is challenging to predict and necessitates searching a wide and complex space to identify needles in haystacks for target applications. This review will focus on the techniques that make high-throughput search of transition-metal chemical space feasible for the discovery of complexes with desirable properties. The review will cover the development, promise, and limitations of "traditional" computational chemistry (i.e., force field, semiempirical, and density functional theory methods) as it pertains to data generation for inorganic molecular discovery. The review will also discuss the opportunities and limitations in leveraging experimental data sources. We will focus on how advances in statistical modeling, artificial intelligence, multiobjective optimization, and automation accelerate discovery of lead compounds and design rules. The overall objective of this review is to showcase how bringing together advances from diverse areas of computational chemistry and computer science have enabled the rapid uncovering of structure-property relationships in transition-metal chemistry. We aim to highlight how unique considerations in motifs of metal-organic bonding (e.g., variable spin and oxidation state, and bonding strength/nature) set them and their discovery apart from more commonly considered organic molecules. We will also highlight how uncertainty and relative data scarcity in transition-metal chemistry motivate specific developments in machine learning representations, model training, and in computational chemistry. Finally, we will conclude with an outlook of areas of opportunity for the accelerated discovery of transition-metal complexes.
Collapse
Affiliation(s)
- Aditya Nandy
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Chenru Duan
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States.,Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael G Taylor
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Fang Liu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Adam H Steeves
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Heather J Kulik
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
23
|
Lim H, Jung Y. MLSolvA: solvation free energy prediction from pairwise atomistic interactions by machine learning. J Cheminform 2021; 13:56. [PMID: 34332634 PMCID: PMC8325294 DOI: 10.1186/s13321-021-00533-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 07/15/2021] [Indexed: 01/04/2023] Open
Abstract
Recent advances in machine learning technologies and their applications have led to the development of diverse structure-property relationship models for crucial chemical properties. The solvation free energy is one of them. Here, we introduce a novel ML-based solvation model, which calculates the solvation energy from pairwise atomistic interactions. The novelty of the proposed model consists of a simple architecture: two encoding functions extract atomic feature vectors from the given chemical structure, while the inner product between the two atomistic feature vectors calculates their interactions. The results of 6239 experimental measurements achieve outstanding performance and transferability for enlarging training data owing to its solvent-non-specific nature. An analysis of the interaction map shows that our model has significant potential for producing group contributions on the solvation energy, which indicates that the model provides not only predictions of target properties but also more detailed physicochemical insights.
Collapse
Affiliation(s)
- Hyuntae Lim
- Department of Chemistry, Seoul National University, Seoul, 08826, South Korea
| | - YounJoon Jung
- Department of Chemistry, Seoul National University, Seoul, 08826, South Korea.
| |
Collapse
|
24
|
Abstract
Machine learning (ML) techniques applied to chemical reactions have a long history. The present contribution discusses applications ranging from small molecule reaction dynamics to computational platforms for reaction planning. ML-based techniques can be particularly relevant for problems involving both computation and experiments. For one, Bayesian inference is a powerful approach to develop models consistent with knowledge from experiments. Second, ML-based methods can also be used to handle problems that are formally intractable using conventional approaches, such as exhaustive characterization of state-to-state information in reactive collisions. Finally, the explicit simulation of reactive networks as they occur in combustion has become possible using machine-learned neural network potentials. This review provides an overview of the questions that can and have been addressed using machine learning techniques, and an outlook discusses challenges in this diverse and stimulating field. It is concluded that ML applied to chemistry problems as practiced and conceived today has the potential to transform the way with which the field approaches problems involving chemical reactions, in both research and academic teaching.
Collapse
Affiliation(s)
- Markus Meuwly
- Department of Chemistry, University of Basel, Klingelbergstrasse 80, 4056 Basel, Switzerland.,Department of Chemistry, Brown University, Providence, Rhode Island 02912, United States
| |
Collapse
|
25
|
Magdău IB, Miller TF. Machine Learning Solvation Environments in Conductive Polymers: Application to ProDOT-2Hex with Solvent Swelling. Macromolecules 2021. [DOI: 10.1021/acs.macromol.0c02132] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Ioan-Bogdan Magdău
- Division of Chemistry & Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| | - Thomas F. Miller
- Division of Chemistry & Chemical Engineering, California Institute of Technology, Pasadena, California 91125, United States
| |
Collapse
|
26
|
Ding J, Xu N, Nguyen MT, Qiao Q, Shi Y, He Y, Shao Q. Machine learning for molecular thermodynamics. Chin J Chem Eng 2021. [DOI: 10.1016/j.cjche.2020.10.044] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
27
|
Pathak Y, Mehta S, Priyakumar UD. Learning Atomic Interactions through Solvation Free Energy Prediction Using Graph Neural Networks. J Chem Inf Model 2021; 61:689-698. [PMID: 33546556 DOI: 10.1021/acs.jcim.0c01413] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Solvation free energy is a fundamental property that influences various chemical and biological processes, such as reaction rates, protein folding, drug binding, and bioavailability of drugs. In this work, we present a deep learning method based on graph networks to accurately predict solvation free energies of small organic molecules. The proposed model, comprising three phases, namely, message passing, interaction, and prediction, is able to predict solvation free energies in any generic organic solvent with a mean absolute error of 0.16 kcal/mol. In terms of accuracy, the current model outperforms all of the proposed machine learning-based models so far. The atomic interactions predicted in an unsupervised manner are able to explain the trends of free energies consistent with chemical wisdom. Further, the robustness of the machine learning-based model has been tested thoroughly, and its capability to interpret the predictions has been verified with several examples.
Collapse
Affiliation(s)
- Yashaswi Pathak
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - Sarvesh Mehta
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| | - U Deva Priyakumar
- Center for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad 500032, India
| |
Collapse
|
28
|
Abstract
The unprecedented ability of computations to probe atomic-level details of catalytic systems holds immense promise for the fundamentals-based bottom-up design of novel heterogeneous catalysts, which are at the heart of the chemical and energy sectors of industry. Here, we critically analyze recent advances in computational heterogeneous catalysis. First, we will survey the progress in electronic structure methods and atomistic catalyst models employed, which have enabled the catalysis community to build increasingly intricate, realistic, and accurate models of the active sites of supported transition-metal catalysts. We then review developments in microkinetic modeling, specifically mean-field microkinetic models and kinetic Monte Carlo simulations, which bridge the gap between nanoscale computational insights and macroscale experimental kinetics data with increasing fidelity. We finally review the advancements in theoretical methods for accelerating catalyst design and discovery. Throughout the review, we provide ample examples of applications, discuss remaining challenges, and provide our outlook for the near future.
Collapse
Affiliation(s)
- Benjamin W J Chen
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Lang Xu
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| | - Manos Mavrikakis
- Department of Chemical and Biological Engineering, University of Wisconsin-Madison, Madison, Wisconsin 53706, United States
| |
Collapse
|
29
|
Wang XY, Chen BB, Zhang J, Zhou ZR, Lv J, Geng XP, Qian RC. Exploiting deep learning for predictable carbon dot design. Chem Commun (Camb) 2020; 57:532-535. [PMID: 33336670 DOI: 10.1039/d0cc07882d] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
In this study, we developed a deep convolution neural network (DCNN) model for predicting the optical properties of carbon dots (CDs), including spectral properties and fluorescence color under ultraviolet irradiation. These results demonstrate the powerful potential of DCNN for guiding the synthesis of CDs.
Collapse
Affiliation(s)
- Xiao-Yuan Wang
- Key Laboratory for Advanced Materials School of Chemistry & Molecular Engineering, East China University of Science and Technology, Shanghai, 200237, P. R. China.
| | | | | | | | | | | | | |
Collapse
|
30
|
Yang J, Knape MJ, Burkert O, Mazzini V, Jung A, Craig VSJ, Miranda-Quintana RA, Bluhmki E, Smiatek J. Artificial neural networks for the prediction of solvation energies based on experimental and computational data. Phys Chem Chem Phys 2020; 22:24359-24364. [PMID: 33084665 DOI: 10.1039/d0cp03701j] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The knowledge of thermodynamic properties for novel electrolyte formulations is of fundamental interest for industrial applications as well as academic research. Herewith, we present an artificial neural networks (ANN) approach for the prediction of solvation energies and entropies for distinct ion pairs in various protic and aprotic solvents. The considered feed-forward ANN is trained either by experimental data or computational results from conceptual density functional theory calculations. The proposed concept of mapping computed values to experimental data lowers the amount of time-consuming and costly experiments and helps to overcome certain limitations. Our findings reveal high correlation coefficients between predicted and experimental values which demonstrate the validity of our approach.
Collapse
Affiliation(s)
- Jiyoung Yang
- Boehringer Ingelheim Pharma GmbH & Co. KG, Analytical Development Biologicals, Birkendorfer Strasse 65, D-88397 Biberach (Riss), Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Scheen J, Wu W, Mey ASJS, Tosco P, Mackey M, Michel J. Hybrid Alchemical Free Energy/Machine-Learning Methodology for the Computation of Hydration Free Energies. J Chem Inf Model 2020; 60:5331-5339. [PMID: 32639733 DOI: 10.1021/acs.jcim.0c00600] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
A methodology that combines alchemical free energy calculations (FEP) with machine learning (ML) has been developed to compute accurate absolute hydration free energies. The hybrid FEP/ML methodology was trained on a subset of the FreeSolv database and retrospectively shown to outperform most submissions from the SAMPL4 competition. Compared to pure machine-learning approaches, FEP/ML yields more precise estimates of free energies of hydration and requires a fraction of the training set size to outperform standalone FEP calculations. The ML-derived correction terms are further shown to be transferable to a range of related FEP simulation protocols. The approach may be used to inexpensively improve the accuracy of FEP calculations and to flag molecules which will benefit the most from bespoke force field parametrization efforts.
Collapse
Affiliation(s)
- Jenke Scheen
- EaStCHEM School of Chemistry, University of Edinburgh, David Brewster Road, Edinburgh EH9 3FJ, United Kingdom
| | - Wilson Wu
- EaStCHEM School of Chemistry, University of Edinburgh, David Brewster Road, Edinburgh EH9 3FJ, United Kingdom
| | - Antonia S J S Mey
- EaStCHEM School of Chemistry, University of Edinburgh, David Brewster Road, Edinburgh EH9 3FJ, United Kingdom
| | - Paolo Tosco
- Cresset Group, New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire SG8 0SS, United Kingdom
| | - Mark Mackey
- Cresset Group, New Cambridge House, Bassingbourn Road, Litlington, Cambridgeshire SG8 0SS, United Kingdom
| | - Julien Michel
- EaStCHEM School of Chemistry, University of Edinburgh, David Brewster Road, Edinburgh EH9 3FJ, United Kingdom
| |
Collapse
|
32
|
Rauer C, Bereau T. Hydration free energies from kernel-based machine learning: Compound-database bias. J Chem Phys 2020; 153:014101. [DOI: 10.1063/5.0012230] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Clemens Rauer
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
| | - Tristan Bereau
- Max Planck Institute for Polymer Research, 55128 Mainz, Germany
- Van ’t Hoff Institute for Molecular Sciences and Informatics Institute, University of Amsterdam, Amsterdam 1098 XH, The Netherlands
| |
Collapse
|
33
|
Sels H, De Smet H, Geuens J. SUSSOL-Using Artificial Intelligence for Greener Solvent Selection and Substitution. Molecules 2020; 25:molecules25133037. [PMID: 32635177 PMCID: PMC7411708 DOI: 10.3390/molecules25133037] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 06/29/2020] [Accepted: 07/01/2020] [Indexed: 12/23/2022] Open
Abstract
Solvents come in many shapes and types. Looking for solvents for a specific application can be hard, and looking for green alternatives for currently used nonbenign solvents can be even harder. We describe a new methodology for solvent selection and substitution, by applying Artificial Intelligence (AI) software to cluster a database of solvents based on their physical properties. The solvents are processed by a neural network, the Self-organizing Map of Kohonen, which results in a 2D map of clusters. The resulting clusters are validated both chemically and statistically and are presented in user-friendly visualizations by the SUSSOL (Sustainable Solvents Selection and Substitution Software) software. The software helps the user in exploring the solvent space and in generating and evaluating a list of possible alternatives for a specific solvent. The alternatives are ranked based on their safety, health, and environment scores. Cases are discussed to demonstrate the possibilities of our approach and to show that it can help in the search for more sustainable and greener solvents. The SUSSOL software makes intuitive sense and in most case studies, the software confirms the findings in literature, thus providing a sound platform for selecting the most sustainable solvent candidate.
Collapse
Affiliation(s)
- Hannes Sels
- Correspondence: (H.S.); (J.G.); Tel.: +32-3-502-22-16 (J.G.)
| | | | - Jeroen Geuens
- Correspondence: (H.S.); (J.G.); Tel.: +32-3-502-22-16 (J.G.)
| |
Collapse
|
34
|
Subramanian V, Ratkova E, Palmer D, Engkvist O, Fedorov M, Llinas A. Multisolvent Models for Solvation Free Energy Predictions Using 3D-RISM Hydration Thermodynamic Descriptors. J Chem Inf Model 2020; 60:2977-2988. [PMID: 32311268 DOI: 10.1021/acs.jcim.0c00065] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The potential to predict solvation free energies (SFEs) in any solvent using a machine learning (ML) model based on thermodynamic output, extracted exclusively from 3D-RISM simulations in water is investigated. The models on multiple solvents take into account both the solute and solvent description and offer the possibility to predict SFEs of any solute in any solvent with root mean squared errors less than 1 kcal/mol. Validations that involve exclusion of fractions or clusters of the solutes or solvents exemplify the model's capability to predict SFEs of novel solutes and solvents with diverse chemical profiles. In addition to being predictive, our models can identify the solute and solvent features that influence SFE predictions. Furthermore, using 3D-RISM hydration thermodynamic output to predict SFEs in any organic solvent reduces the need to run 3D-RISM simulations in all these solvents. Altogether, our multisolvent models for SFE predictions that take advantage of the solvation effects are expected to have an impact in the property prediction space.
Collapse
Affiliation(s)
- Vigneshwari Subramanian
- Drug Metabolism and Pharmacokinetics, Research and Early Development-Respiratory, Inflammation and Autoimmune, Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden 1, SE-431 83, Mölndal, Sweden.,Department of Pure and Applied Chemistry, University of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow, Scotland G1 1XL, U.K
| | - Ekaterina Ratkova
- Medicinal Chemistry, Research and Early Development - Cardiovascular, Renal and Metabolism, Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden 1, SE-431 83, Mölndal, Sweden
| | - David Palmer
- Department of Pure and Applied Chemistry, University of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow, Scotland G1 1XL, U.K
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Pepparedsleden 1, SE-431 83, Mölndal, Sweden
| | - Maxim Fedorov
- Center for Computational and Data-Intensive Science and Engineering, Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Moscow, 143026, Russia.,Department of Physics, Scottish Universities Physics Alliance (SUPA), University of Strathclyde, John Anderson Building, 107 Rottenrow, Glasgow, Scotland G4 0NG, U.K
| | - Antonio Llinas
- Drug Metabolism and Pharmacokinetics, Research and Early Development-Respiratory, Inflammation and Autoimmune, Biopharmaceuticals R&D, AstraZeneca, Pepparedsleden 1, SE-431 83, Mölndal, Sweden
| |
Collapse
|
35
|
Basdogan Y, Groenenboom MC, Henderson E, De S, Rempe SB, Keith JA. Machine Learning-Guided Approach for Studying Solvation Environments. J Chem Theory Comput 2019; 16:633-642. [PMID: 31809056 DOI: 10.1021/acs.jctc.9b00605] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Molecular-level understanding and characterization of solvation environments are often needed across chemistry, biology, and engineering. Toward practical modeling of local solvation effects of any solute in any solvent, we report a static and all-quantum mechanics-based cluster-continuum approach for calculating single-ion solvation free energies. This approach uses a global optimization procedure to identify low-energy molecular clusters with different numbers of explicit solvent molecules and then employs the smooth overlap for atomic positions learning kernel to quantify the similarity between different low-energy solute environments. From these data, we use sketch maps, a nonlinear dimensionality reduction algorithm, to obtain a two-dimensional visual representation of the similarity between solute environments in differently sized microsolvated clusters. After testing this approach on different ions having charges 2+, 1+, 1-, and 2-, we find that the solvation environment around each ion can be seen to usually become more similar in hand with its calculated single-ion solvation free energy. Without needing either dynamics simulations or an a priori knowledge of local solvation structure of the ions, this approach can be used to calculate solvation free energies within 5% of experimental measurements for most cases, and it should be transferable for the study of other systems where dynamics simulations are not easily carried out.
Collapse
Affiliation(s)
- Yasemin Basdogan
- Department of Chemical and Petroleum Engineering Swanson School of Engineering , University of Pittsburgh , Pittsburgh 15261 , Pennsylvania , United States
| | - Mitchell C Groenenboom
- Department of Chemical and Petroleum Engineering Swanson School of Engineering , University of Pittsburgh , Pittsburgh 15261 , Pennsylvania , United States
| | - Ethan Henderson
- Department of Chemical and Petroleum Engineering Swanson School of Engineering , University of Pittsburgh , Pittsburgh 15261 , Pennsylvania , United States
| | - Sandip De
- Laboratory of Computational Science and Modelling, Institute of Materials , École Polytechnique Fédérale de Lausanne , Lausanne 1015 , Switzerland
| | - Susan B Rempe
- Department of Nanobiology , Sandia National Laboratories , Albuquerque 87185 , New Mexico , United States
| | - John A Keith
- Department of Chemical and Petroleum Engineering Swanson School of Engineering , University of Pittsburgh , Pittsburgh 15261 , Pennsylvania , United States
| |
Collapse
|