1
|
Liang T, Liu W, Tan K, Wu A, Lu X. Advancing Ionic Liquid Research with pSCNN: A Novel Approach for Accurate Normal Melting Temperature Predictions. ACS OMEGA 2024; 9:31694-31702. [PMID: 39072063 PMCID: PMC11270577 DOI: 10.1021/acsomega.4c02393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 04/12/2024] [Accepted: 06/25/2024] [Indexed: 07/30/2024]
Abstract
Ionic liquids (ILs), known for their distinct and tunable properties, offer a broad spectrum of potential applications across various fields, including chemistry, materials science, and energy storage. However, practical applications of ILs are often limited by their unfavorable physicochemical properties. Experimental screening becomes impractical due to the vast number of potential IL combinations. Therefore, the development of a robust and efficient model for predicting the IL properties is imperative. As the defining feature, it is of practice significance to establish an accurate yet efficient model to predict the normal melting point of IL (T m), which may facilitate the discovery and design of novel ILs for specific applications. In this study, we presented a pseudo-Siamese convolution neural network (pSCNN) inspired by SCNN and focused on the T m. Utilizing a data set of 3098 ILs, we systematically assess various deep learning models (ANN, pSCNN, and Transformer-CNF), along with molecular descriptors (ECFP fingerprint and Mordred properties), for their performance in predicting the T m of ILs. Remarkably, among the investigated modeling schemes, the pSCNN, coupled with filtered Mordred descriptors, demonstrates superior performance, yielding mean absolute error (MAE) and root-mean-square error (RMSE) values of 24.36 and 31.56 °C, respectively. Feature analysis further highlights the effectiveness of the pSCNN model. Moreover, the pSCNN method, with a pair of inputs, can be extended beyond ionic liquid melting point prediction.
Collapse
Affiliation(s)
- Tao Liang
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| | - Wei Liu
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| | - Kai Tan
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| | - Anan Wu
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| | - Xin Lu
- State Key Laboratory of Physical
Chemistry of Solid Surface, Fujian Provincial Key Laboratory for Theoretical
and Computational Chemistry, Departmental of Chemistry, College of
Chemistry and Chemical Engineering, Xiamen
University, Xiamen 361005, P. R. China
| |
Collapse
|
2
|
Feng H, Qin L, Zhang B, Zhou J. Prediction and Interpretability of Melting Points of Ionic Liquids Using Graph Neural Networks. ACS OMEGA 2024; 9:16016-16025. [PMID: 38617653 PMCID: PMC11007696 DOI: 10.1021/acsomega.3c09543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/13/2024] [Accepted: 03/15/2024] [Indexed: 04/16/2024]
Abstract
Ionic liquids (ILs) have wide and promising applications in fields such as chemical engineering, energy, and the environment. However, the melting points (MPs) of ILs are one of the most crucial properties affecting their applications. The MPs of ILs are affected by various factors, and tuning these in a laboratory is time-consuming and costly. Therefore, an accurate and efficient method is required to predict the desired MPs in the design of novel targeted ILs. In this study, three descriptor-based machine learning (DBML) models and eight graph neural network (GNN) models were proposed to predict the MPs of ILs. Fingerprints and molecular graphs were used to represent molecules for the DBML and GNNs, respectively. The GNN models demonstrated performance superior to that of the DBML models. Among all of the examined models, the graph convolutional model exhibited the best performance with high accuracy (root-mean-squared error = 37.06, mean absolute error = 28.79, and correlation coefficient = 0.76). Benefiting from molecular graph representation, we built a GNN-based interpretable model to reveal the atomistic contribution to the MPs of ILs using a data-driven procedure. According to our interpretable model, amino groups, S+, N+, and P+ would increase the MPs of ILs, while the negatively charged halogen atoms, S-, and N- would decrease the MPs of ILs. The results of this study provide new insight into the rapid screening and synthesis of targeted ILs with appropriate MPs.
Collapse
Affiliation(s)
- Haijun Feng
- School
of Computer Sciences, Shenzhen Institute
of Information Technology, Shenzhen, Guangdong 518172, China
| | - Lanlan Qin
- School
of Chemistry and Chemical Engineering, South
China University of Technology, Guangzhou, Guangdong 510640, China
| | - Bingxuan Zhang
- School
of Computer Sciences, Shenzhen Institute
of Information Technology, Shenzhen, Guangdong 518172, China
| | - Jian Zhou
- School
of Chemistry and Chemical Engineering, South
China University of Technology, Guangzhou, Guangdong 510640, China
| |
Collapse
|
3
|
Toropov AA, Toropova AP, Roncaglioni A, Benfenati E, Leszczynska D, Leszczynski J. The System of Self-Consistent Models: The Case of Henry's Law Constants. Molecules 2023; 28:7231. [PMID: 37894710 PMCID: PMC10609047 DOI: 10.3390/molecules28207231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/07/2023] [Accepted: 10/21/2023] [Indexed: 10/29/2023] Open
Abstract
Data on Henry's law constants make it possible to systematize geochemical conditions affecting atmosphere status and consequently triggering climate changes. The constants of Henry's law are desired for assessing the processes related to atmospheric contaminations caused by pollutants. The most important are those that are capable of long-term movements over long distances. This ability is closely related to the values of Henry's law constants. Chemical changes in gaseous mixtures affect the fate of atmospheric pollutants and ecology, climate, and human health. Since the number of organic compounds present in the atmosphere is extremely large, it is desirable to develop models suitable for predictions for the large pool of organic molecules that may be present in the atmosphere. Here, we report the development of such a model for Henry's law constants predictions of 29,439 compounds using the CORAL software (2023). The statistical quality of the model is characterized by the value of the coefficient of determination for the training and validation sets of about 0.81 (on average).
Collapse
Affiliation(s)
- Andrey A. Toropov
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (A.A.T.); (A.R.); (E.B.)
| | - Alla P. Toropova
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (A.A.T.); (A.R.); (E.B.)
| | - Alessandra Roncaglioni
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (A.A.T.); (A.R.); (E.B.)
| | - Emilio Benfenati
- Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy; (A.A.T.); (A.R.); (E.B.)
| | - Danuta Leszczynska
- Interdisciplinary Nanotoxicity Center, Department of Civil and Environmental Engineering, Jackson State University, 1325 Lynch Street, Jackson, MS 39217-0510, USA;
| | - Jerzy Leszczynski
- Interdisciplinary Nanotoxicity Center, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, 1400 J. R. Lynch Street, Jackson, MS 39217-0510, USA;
| |
Collapse
|
4
|
|
5
|
Characterising a Protic Ionic Liquid Library with Applied Machine Learning Algorithms. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.120453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
6
|
Baskin I, Epshtein A, Ein-Eli Y. Benchmarking machine learning methods for modeling physical properties of ionic liquids. J Mol Liq 2022. [DOI: 10.1016/j.molliq.2022.118616] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
7
|
Machine-Learning Model Prediction of Ionic Liquids Melting Points. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052408] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Ionic liquids (ILs) have great potential for application in energy storage and conversion devices. They have been identified as promising electrolytes candidates in various battery systems. However, the practical application of many ionic liquids remains limited due to the unfavorable melting points (Tm) which constrain the operating temperatures of the batteries and exhibit unfavorable transport property. To fine tune the Tm of ILs, a systematic study and accurate prediction of Tm of ILs is highly desirable. However, the Tm of an IL can change considerably depending on the molecular structures of the anion and cation and their combination. Thus, a fine control in Tm of ILs can be challenging. In this study, we employed a deep-learning model to predict the Tm of various ILs that consist of different cation and anion classes. Based on this model, a prediction of the melting point of ILs can be made with a reasonably high accuracy, achieving an R2 score of 0.90 with RMSE of ~32 K, and the Tm of ILs are mostly dictated by some important molecular descriptors, which can be used as a set of useful design rules to fine tune the Tm of ILs.
Collapse
|
8
|
|
9
|
Makarov D, Fadeeva Y, Shmukler L, Tetko I. Beware of proper validation of models for ionic Liquids! J Mol Liq 2021. [DOI: 10.1016/j.molliq.2021.117722] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
10
|
Carrera GVSM, Inês J, Bernardes CES, Klimenko K, Shimizu K, Canongia Lopes JN. The Solubility of Gases in Ionic Liquids: A Chemoinformatic Predictive and Interpretable Approach. Chemphyschem 2021; 22:2190-2200. [PMID: 34464013 DOI: 10.1002/cphc.202100632] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Indexed: 11/07/2022]
Abstract
This work comprises the study of solubilities of gases in ionic liquids (ILs) using a chemoinformatic approach. It is based on the codification, of the atomic inter-component interactions, cation/gas and anion/gas, which are used to obtain a pattern of activation in a Kohonen Neural Network (MOLMAP descriptors). A robust predictive model has been obtained with the Random Forest algorithm and used the maximum proximity as a confidence measure of a given chemical system compared to the training set. The encoding method has been validated with molecular dynamics. This encoding approach is a valuable estimator of attractive/repulsive interactions of a generical chemical system IL+gas. This method has been used as a fast/visual form of identification of the reasons behind the differences observed between the solubility of CO2 and O2 in 1-butyl-3-methylimidazolium hexafluorophosphate (BMIM PF6 ) at identical temperature and pressure (TP) conditions, The effect of variable cation and anion effect has been evaluated.
Collapse
Affiliation(s)
- Gonçalo V S M Carrera
- Chemistry Department LAQV-REQUIMTE, NOVA School of Science and Technology, 2829-516, Caparica, Portugal
| | - João Inês
- Chemistry Department LAQV-REQUIMTE, NOVA School of Science and Technology, 2829-516, Caparica, Portugal
| | - Carlos E S Bernardes
- Centro de Química Estrutural, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal
| | - Kyrylo Klimenko
- Chemistry Department LAQV-REQUIMTE, NOVA School of Science and Technology, 2829-516, Caparica, Portugal
| | - Karina Shimizu
- Centro de Química Estrutural, Department of Chemical and Biological Engineering, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001, Lisboa, Portugal
| | - José N Canongia Lopes
- Centro de Química Estrutural, Department of Chemical and Biological Engineering, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais, 1049-001, Lisboa, Portugal
| |
Collapse
|
11
|
De Jesus K, Rodriguez R, Baek D, Fox R, Pashikanti S, Sharma K. Extraction of lanthanides and actinides present in spent nuclear fuel and in electronic waste. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2021.116006] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
12
|
Kerner J, Dogan A, von Recum H. Machine learning and big data provide crucial insight for future biomaterials discovery and research. Acta Biomater 2021; 130:54-65. [PMID: 34087445 DOI: 10.1016/j.actbio.2021.05.053] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 02/06/2023]
Abstract
Machine learning have been widely adopted in a variety of fields including engineering, science, and medicine revolutionizing how data is collected, used, and stored. Their implementation has led to a drastic increase in the number of computational models for the prediction of various numerical, categorical, or association events given input variables. We aim to examine recent advances in the use of machine learning when applied to the biomaterial field. Specifically, quantitative structure properties relationships offer the unique ability to correlate microscale molecular descriptors to larger macroscale material properties. These new models can be broken down further into four categories: regression, classification, association, and clustering. We examine recent approaches and new uses of machine learning in the three major categories of biomaterials: metals, polymers, and ceramics for rapid property prediction and trend identification. While current research is promising, limitations in the form of lack of standardized reporting and available databases complicates the implementation of described models. Herein, we hope to provide a snapshot of the current state of the field and a beginner's guide to navigating the intersection of biomaterials research and machine learning. STATEMENT OF SIGNIFICANCE: Machine learning and its methods have found a variety of uses beyond the field of computer science but have largely been neglected by those in realm of biomaterials. Through the use of more computational methods, biomaterials development can be expediated while reducing the need for standard trial and error methods. Within, we introduce four basic models that readers can potentially apply to their current research as well as current applications within the field. Furthermore, we hope that this article may act as a "call to action" for readers to realize and address the current lack of implementation within the biomaterials field.
Collapse
Affiliation(s)
- Jacob Kerner
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| | - Alan Dogan
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| | - Horst von Recum
- Case Western Reserve University; 10900 Euclid Ave., Cleveland Ohio 44106.
| |
Collapse
|
13
|
He H, Pan Y, Meng J, Li Y, Zhong J, Duan W, Jiang J. Predicting Thermal Decomposition Temperature of Binary Imidazolium Ionic Liquid Mixtures from Molecular Structures. ACS OMEGA 2021; 6:13116-13123. [PMID: 34056461 PMCID: PMC8158806 DOI: 10.1021/acsomega.1c00846] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/27/2021] [Indexed: 06/12/2023]
Abstract
Ionic liquids (ILs) have been regarded as "designer solvents" because of their satisfactory physicochemical properties. The 5% onset decomposition temperature (T d,5%onset) is one of the most conservative but reliable indicators for characterizing the possible fire hazard of engineered ILs. This study is devoted to develop a quantitative structure-property relationship model for predicting the T d,5%onset of binary imidazolium IL mixtures. Both in silico design and data analysis descriptors and norm index were employed to encode the structural characteristics of binary IL mixtures. The subset of optimal descriptors was screened by combining the genetic algorithm with the multiple linear regression method. The resulting optimal prediction model was a four-variable multiple linear equation, with the average absolute error (AAE) for the external test set being 12.673 K. The results of rigorous model validations also demonstrated satisfactory model robustness and predictivity. The present study would provide a new reliable approach for predicting the thermal stability of binary IL mixtures.
Collapse
Affiliation(s)
- Hongpeng He
- Jiangsu
Key Laboratory of Hazardous Chemicals Safety and Control, College
of Safety Science and Engineering, Nanjing
Tech University, Nanjing 211816, China
| | - Yong Pan
- Jiangsu
Key Laboratory of Hazardous Chemicals Safety and Control, College
of Safety Science and Engineering, Nanjing
Tech University, Nanjing 211816, China
| | - Jianwen Meng
- Jiangsu
Key Laboratory of Hazardous Chemicals Safety and Control, College
of Safety Science and Engineering, Nanjing
Tech University, Nanjing 211816, China
| | - Yongheng Li
- Jiangsu
Key Laboratory of Hazardous Chemicals Safety and Control, College
of Safety Science and Engineering, Nanjing
Tech University, Nanjing 211816, China
| | - Junhong Zhong
- Jiangsu
Key Laboratory of Hazardous Chemicals Safety and Control, College
of Safety Science and Engineering, Nanjing
Tech University, Nanjing 211816, China
| | - Weijia Duan
- Jiangsu
Key Laboratory of Hazardous Chemicals Safety and Control, College
of Safety Science and Engineering, Nanjing
Tech University, Nanjing 211816, China
| | - Juncheng Jiang
- Jiangsu
Key Laboratory of Hazardous Chemicals Safety and Control, College
of Safety Science and Engineering, Nanjing
Tech University, Nanjing 211816, China
- School
of Environment & Safety Engineering, Changzhou University, Changzhou 213164, China
| |
Collapse
|
14
|
Mital DK, Nancarrow P, Zeinab S, Jabbar NA, Ibrahim TH, Khamis MI, Taha A. Group Contribution Estimation of Ionic Liquid Melting Points: Critical Evaluation and Refinement of Existing Models. Molecules 2021; 26:2454. [PMID: 33922374 PMCID: PMC8122861 DOI: 10.3390/molecules26092454] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 11/17/2022] Open
Abstract
While several group contribution method (GCM) models have been developed in recent years for the prediction of ionic liquid (IL) properties, some challenges exist in their effective application. Firstly, the models have been developed and tested based on different datasets; therefore, direct comparison based on reported statistical measures is not reliable. Secondly, many of the existing models are limited in the range of ILs for which they can be used due to the lack of functional group parameters. In this paper, we examine two of the most diverse GCMs for the estimation of IL melting point; a key property in the selection and design of ILs for materials and energy applications. A comprehensive database consisting of over 1300 data points for 933 unique ILs, has been compiled and used to critically evaluate the two GCMs. One of the GCMs has been refined by introducing new functional groups and reparametrized to give improved performance for melting point estimation over a wider range of ILs. This work will aid in the targeted design of ILs for materials and energy applications.
Collapse
Affiliation(s)
- Dhruve Kumar Mital
- Department of Chemical Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (D.K.M.); (S.Z.); (N.A.J.); (T.H.I.); (A.T.)
| | - Paul Nancarrow
- Department of Chemical Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (D.K.M.); (S.Z.); (N.A.J.); (T.H.I.); (A.T.)
| | - Samira Zeinab
- Department of Chemical Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (D.K.M.); (S.Z.); (N.A.J.); (T.H.I.); (A.T.)
| | - Nabil Abdel Jabbar
- Department of Chemical Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (D.K.M.); (S.Z.); (N.A.J.); (T.H.I.); (A.T.)
| | - Taleb Hassan Ibrahim
- Department of Chemical Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (D.K.M.); (S.Z.); (N.A.J.); (T.H.I.); (A.T.)
| | - Mustafa I. Khamis
- Department of Biology, Chemistry and Environmental Sciences, American University of Sharjah, Sharjah 26666, United Arab Emirates;
| | - Alnoman Taha
- Department of Chemical Engineering, American University of Sharjah, Sharjah 26666, United Arab Emirates; (D.K.M.); (S.Z.); (N.A.J.); (T.H.I.); (A.T.)
- Department of Chemical Engineering, University of Birmingham, SW Campus, Birmingham B15 2TT, UK
| |
Collapse
|
15
|
Sifain AE, Rice BM, Yalkowsky SH, Barnes BC. Machine learning transition temperatures from 2D structure. J Mol Graph Model 2021; 105:107848. [PMID: 33667863 DOI: 10.1016/j.jmgm.2021.107848] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/11/2021] [Accepted: 01/19/2021] [Indexed: 10/22/2022]
Abstract
A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp2-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergström) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER.
Collapse
Affiliation(s)
- Andrew E Sifain
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Betsy M Rice
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA
| | - Samuel H Yalkowsky
- Department of Pharmaceutics, College of Pharmacy, University of Arizona, Tucson, AZ, 85721, USA
| | - Brian C Barnes
- CCDC Army Research Laboratory, Aberdeen Proving Ground, MD, 21005, USA.
| |
Collapse
|
16
|
Ding Y, Chen M, Guo C, Zhang P, Wang J. Molecular fingerprint-based machine learning assisted QSAR model development for prediction of ionic liquid properties. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2020.115212] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Quantitative structure-property relationship for melting and freezing points of deep eutectic solvents. J Mol Liq 2021. [DOI: 10.1016/j.molliq.2020.114744] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
18
|
Venkatraman V, Evjen S, Knuutila HK, Fiksdahl A, Alsberg BK. Predicting ionic liquid melting points using machine learning. J Mol Liq 2020. [DOI: 10.1016/j.molliq.2020.114686] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
19
|
Sivaraman G, Jackson NE, Sanchez-Lengeling B, Vázquez-Mayagoitia Á, Aspuru-Guzik A, Vishwanath V, de Pablo JJ. A machine learning workflow for molecular analysis: application to melting points. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2020. [DOI: 10.1088/2632-2153/ab8aa3] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Abstract
Computational tools encompassing integrated molecular prediction, analysis, and generation are key for molecular design in a variety of critical applications. In this work, we develop a workflow for molecular analysis (MOLAN) that integrates an ensemble of supervised and unsupervised machine learning techniques to analyze molecular data sets. The MOLAN workflow combines molecular featurization, clustering algorithms, uncertainty analysis, low-bias dataset construction, high-performance regression models, graph-based molecular embeddings and attribution, and a semi-supervised variational autoencoder based on the novel SELFIES representation to enable molecular design. We demonstrate the utility of the MOLAN workflow in the context of a challenging multi-molecule property prediction problem: the determination of melting points solely from single molecule structure. This application serves as a case study for how to employ the MOLAN workflow in the context of molecular property prediction.
Collapse
|
20
|
Development of quantitative structure-property relationship (QSPR) models for predicting the thermal hazard of ionic liquids: A review of methods and models. J Mol Liq 2020. [DOI: 10.1016/j.molliq.2020.112471] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
|
21
|
A review on created QSPR models for predicting ionic liquids properties and their reliability from chemometric point of view. J Mol Liq 2020. [DOI: 10.1016/j.molliq.2019.112013] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
22
|
alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints. METHODS IN PHARMACOLOGY AND TOXICOLOGY 2020. [DOI: 10.1007/978-1-0716-0150-1_32] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
23
|
Predicting Melting Points of Biofriendly Choline-Based Ionic Liquids with Molecular Dynamics. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9245367] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
In this work, we introduce a simulation-based method for predicting the melting point of ionic liquids without prior knowledge of their crystal structure. We run molecular dynamics simulations of biofriendly, choline cation-based ionic liquids and apply the method to predict their melting point. The root-mean-square error of the predicted values is below 24 K. We advocate that such precision is sufficient for designing ionic liquids with relatively low melting points. The workflow for simulations is available for everyone and can be adopted for any species from the wide chemical space of ionic liquids.
Collapse
|
24
|
Carrera GVSM, Nunes da Ponte M, Rebelo LPN. Chemoinformatic Approaches To Predict the Viscosities of Ionic Liquids and Ionic Liquid-Containing Systems. Chemphyschem 2019; 20:2767-2773. [PMID: 31424158 DOI: 10.1002/cphc.201900593] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 07/26/2019] [Indexed: 12/11/2022]
Abstract
Modelling, predicting, and understanding the factors influencing the viscosities of ionic liquids and related mixtures are sequentially checked in this work. The molecular maps of atom-level properties (MOLMAP codification system) is adapted for a straightforward inclusion of ionic liquids and mixtures containing ionic liquids. Random Forest models have been tested in this context and an optimal model was selected. The interpretability of the selected Random Forest model is highlighted with selected structural features that might contribute to identify low viscosities. The constructed model is able to recognize the influence of different structural variables, temperature, and pressure for a correct classification of the different systems. The codification and interpretation systems are highlighted in this work.
Collapse
Affiliation(s)
- Gonçalo V S M Carrera
- LAQV, Requimte, Departamento de Química Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Faculdade de Ciências e Tecnologia, 2829-516, Caparica, Portugal
| | - Manuel Nunes da Ponte
- LAQV, Requimte, Departamento de Química Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Faculdade de Ciências e Tecnologia, 2829-516, Caparica, Portugal
| | - Luís P N Rebelo
- LAQV, Requimte, Departamento de Química Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa Faculdade de Ciências e Tecnologia, 2829-516, Caparica, Portugal
| |
Collapse
|
25
|
Cerecedo-Cordoba JA, González Barbosa JJ, Frausto Solís J, Gallardo-Rivas NV. Melting Temperature Estimation of Imidazole Ionic Liquids with Clustering Methods. J Chem Inf Model 2019; 59:3144-3153. [PMID: 31199647 DOI: 10.1021/acs.jcim.9b00203] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Ionic liquids (ILs) are ionic compounds with low melting points that can be designed to be used in an extensive set of commercial and industrial applications. However, the design of ILs is limited by the quantity and quality of the available data in the literature; therefore, the estimation of physicochemical properties of ILs by computational methods is a promising way of solving this problem, since it provides approximations of the real values, resulting in savings in both time and money. We studied two data sets of 281 and 134 liquids based on the molecule imidazole that were analyzed with QSPR techniques. This paper presents a software architecture that uses clustering techniques to improve the robustness of estimation models of the melting point of ILs. These results indicate an error of 6.25% in the previously unmodeled data set and an error of 4.43% in the second data set. We have an improvement with the second data set of 1.81% over the last results previously found.
Collapse
Affiliation(s)
- Jorge Alberto Cerecedo-Cordoba
- Tecnológico Nacional de México/Instituto Tecnológico de Ciudad Madero , Avenida Primero de Mayo , 89440 , Cuidad Madero , Tamaulipas , México
| | - Juan Javier González Barbosa
- Tecnológico Nacional de México/Instituto Tecnológico de Ciudad Madero , Avenida Primero de Mayo , 89440 , Cuidad Madero , Tamaulipas , México
| | - Juan Frausto Solís
- Tecnológico Nacional de México/Instituto Tecnológico de Ciudad Madero , Avenida Primero de Mayo , 89440 , Cuidad Madero , Tamaulipas , México
| | - Nohra Violeta Gallardo-Rivas
- Tecnológico Nacional de México/Instituto Tecnológico de Ciudad Madero , Avenida Primero de Mayo , 89440 , Cuidad Madero , Tamaulipas , México
| |
Collapse
|
26
|
Yeadon DJ, Jacquemin J, Plechkova NV, Gomes MC, Seddon KR. Using Thermodynamics to Assess the Molecular Interactions of Tetrabutylphosphonium Carboxylate–Water Mixtures. Aust J Chem 2019. [DOI: 10.1071/ch18481] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Densities, ρ, viscosities, η, and enthalpies of mixing, , of binary [P4 4 4 4][CnCOO]–water mixtures (with n=1, 2 or 7) were determined at atmospheric pressure as a function of temperature. The excess, , apparent, , and partial, , molar volumes were deduced from experimental data, as well as fragilities, m*, and excess Gibbs free energies of activation of viscous flow, . exhibited predominantly negative deviation from ideality, with a minimum at approximately ~0.8 for all three systems, indicating strong hydrogen-bonding interactions. All three binary systems were found to be fragile, with [P4 4 4 4][C7COO] showing the smallest deviations in fragility with the addition of water. values of the systems were exothermic over the entire composition range, having the following trend: [P4 4 4 4][C2COO]>[P4 4 4 4][C7COO]>[P4 4 4 4][C1COO].
Collapse
|
27
|
Venkatraman V, Evjen S, Knuutila HK, Fiksdahl A, Alsberg BK. Predicting ionic liquid melting points using machine learning. J Mol Liq 2018. [DOI: 10.1016/j.molliq.2018.03.090] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
28
|
Baghban A, Sasanipour J, Sarafbidabad M, Piri A, Razavi R. On the prediction of critical micelle concentration for sugar-based non-ionic surfactants. Chem Phys Lipids 2018; 214:46-57. [DOI: 10.1016/j.chemphyslip.2018.05.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 05/08/2018] [Accepted: 05/26/2018] [Indexed: 12/30/2022]
|
29
|
Venkatraman V, Raj JJ, Evjen S, Lethesh KC, Fiksdahl A. In silico prediction and experimental verification of ionic liquid refractive indices. J Mol Liq 2018. [DOI: 10.1016/j.molliq.2018.05.067] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
30
|
Chen L, Bryantsev VS. A density functional theory based approach for predicting melting points of ionic liquids. Phys Chem Chem Phys 2018; 19:4114-4124. [PMID: 28111666 DOI: 10.1039/c6cp08403f] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Accurate prediction of melting points of ILs is important both from the fundamental point of view and from the practical perspective for screening ILs with low melting points and broadening their utilization in a wider temperature range. In this work, we present an ab initio approach to calculate melting points of ILs with known crystal structures and illustrate its application for a series of 11 ILs containing imidazolium/pyrrolidinium cations and halide/polyatomic fluoro-containing anions. The melting point is determined as a temperature at which the Gibbs free energy of fusion is zero. The Gibbs free energy of fusion can be expressed through the use of the Born-Fajans-Haber cycle via the lattice free energy of forming a solid IL from gaseous phase ions and the sum of the solvation free energies of ions comprising IL. Dispersion-corrected density functional theory (DFT) involving (semi)local (PBE-D3) and hybrid exchange-correlation (HSE06-D3) functionals is applied to estimate the lattice enthalpy, entropy, and free energy. The ions solvation free energies are calculated with the SMD-generic-IL solvation model at the M06-2X/6-31+G(d) level of theory under standard conditions. The melting points of ILs computed with the HSE06-D3 functional are in good agreement with the experimental data, with a mean absolute error of 30.5 K and a mean relative error of 8.5%. The model is capable of accurately reproducing the trends in melting points upon variation of alkyl substituents in organic cations and replacement one anion by another. The results verify that the lattice energies of ILs containing polyatomic fluoro-containing anions can be approximated reasonably well using the volume-based thermodynamic approach. However, there is no correlation of the computed lattice energies with molecular volume for ILs containing halide anions. Moreover, entropies of solid ILs follow two different linear relationships with molecular volume for halides and polyatomic fluoro-containing anions. Continuous progress in predicting crystal structures of organic salts with halide anions will be a key factor for successful prediction of melting points with no prior knowledge of the crystal structure.
Collapse
Affiliation(s)
- Lihua Chen
- Department of Materials Science and Engineering, University of Connecticut, Storrs, CT 06269, USA and Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA.
| | | |
Collapse
|
31
|
3D molecular fragment descriptors for structure–property modeling: predicting the free energies for the complexation between antipodal guests and β-cyclodextrins. J INCL PHENOM MACRO 2017. [DOI: 10.1007/s10847-017-0739-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
32
|
Affiliation(s)
- Kun Dong
- State Key Laboratory of Multiphase
Complex Systems, Beijing Key Laboratory of Ionic Liquids Clean Process,
Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | - Xiaomin Liu
- State Key Laboratory of Multiphase
Complex Systems, Beijing Key Laboratory of Ionic Liquids Clean Process,
Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | - Haifeng Dong
- State Key Laboratory of Multiphase
Complex Systems, Beijing Key Laboratory of Ionic Liquids Clean Process,
Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | - Xiangping Zhang
- State Key Laboratory of Multiphase
Complex Systems, Beijing Key Laboratory of Ionic Liquids Clean Process,
Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | - Suojiang Zhang
- State Key Laboratory of Multiphase
Complex Systems, Beijing Key Laboratory of Ionic Liquids Clean Process,
Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
33
|
Martin S, Pratt HD, Anderson TM. Screening for High Conductivity/Low Viscosity Ionic Liquids Using Product Descriptors. Mol Inform 2017; 36. [DOI: 10.1002/minf.201600125] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 02/14/2017] [Indexed: 11/07/2022]
Affiliation(s)
- Shawn Martin
- Sandia National Laboratories; Albuquerque, New Mexico 87185 USA
| | - Harry D. Pratt
- Sandia National Laboratories; Albuquerque, New Mexico 87185 USA
| | | |
Collapse
|
34
|
Mehraein I, Riahi S. The QSPR models to predict the solubility of CO 2 in ionic liquids based on least-squares support vector machines and genetic algorithm-multi linear regression. J Mol Liq 2017. [DOI: 10.1016/j.molliq.2016.10.133] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
35
|
Gaudin T, Rotureau P, Pezron I, Fayet G. New QSPR Models to Predict the Critical Micelle Concentration of Sugar-Based Surfactants. Ind Eng Chem Res 2016. [DOI: 10.1021/acs.iecr.6b02890] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Théophile Gaudin
- Sorbonne Universités, Université de Technologie de Compiègne, EA 4297 TIMR, rue du
Dr Schweitzer, 60200 Compiègne, France
- INERIS, Parc Technologique Alata, BP2, 60550 Verneuil-en-Halatte, France
| | - Patricia Rotureau
- INERIS, Parc Technologique Alata, BP2, 60550 Verneuil-en-Halatte, France
| | - Isabelle Pezron
- Sorbonne Universités, Université de Technologie de Compiègne, EA 4297 TIMR, rue du
Dr Schweitzer, 60200 Compiègne, France
| | - Guillaume Fayet
- INERIS, Parc Technologique Alata, BP2, 60550 Verneuil-en-Halatte, France
| |
Collapse
|
36
|
Yosipof A, Shimanovich K, Senderowitz H. Materials Informatics: Statistical Modeling in Material Science. Mol Inform 2016; 35:568-579. [DOI: 10.1002/minf.201600047] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 07/11/2016] [Indexed: 01/01/2023]
Affiliation(s)
- Abraham Yosipof
- Department of Business Administration; Peres Academic Center; Rehovot 76102 Israel
- College of Law & Business; Ramat-Gan 26 Ben Gurion Street Israel
| | - Klimentiy Shimanovich
- Department of Chemistry; Bar Ilan University; Ramat-Gan 5290002 Israel
- Department of Physical Electronics, School of Electrical Engineering, Faculty of Engineering; Tel Aviv University; Ramat Aviv 69978 Israel
| | | |
Collapse
|
37
|
Tetko IV, M. Lowe D, Williams AJ. The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS. J Cheminform 2016; 8:2. [PMID: 26807157 PMCID: PMC4724158 DOI: 10.1186/s13321-016-0113-y] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 01/08/2016] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826.
Collapse
Affiliation(s)
- Igor V. Tetko
- />Institute of Structural Biology, Helmholtz Zentrum München für Gesundheit und Umwelt (HMGU), Ingolstädter Landstraße 1, b. 60w, 85764 Neuherberg, Germany
- />BigChem GmbH, 85764 Neuherberg, Germany
| | - Daniel M. Lowe
- />NextMove Software Limited, Innovation Centre (Unit 23), Cambridge Science Park, Cambridge, CB4 0EY UK
| | | |
Collapse
|
38
|
Chen B, Zhang T, Bond T, Gan Y. Development of quantitative structure activity relationship (QSAR) model for disinfection byproduct (DBP) research: A review of methods and resources. JOURNAL OF HAZARDOUS MATERIALS 2015; 299:260-79. [PMID: 26142156 DOI: 10.1016/j.jhazmat.2015.06.054] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 06/17/2015] [Accepted: 06/21/2015] [Indexed: 05/19/2023]
Abstract
Quantitative structure-activity relationship (QSAR) models are tools for linking chemical activities with molecular structures and compositions. Due to the concern about the proliferating number of disinfection byproducts (DBPs) in water and the associated financial and technical burden, researchers have recently begun to develop QSAR models to investigate the toxicity, formation, property, and removal of DBPs. However, there are no standard procedures or best practices regarding how to develop QSAR models, which potentially limit their wide acceptance. In order to facilitate more frequent use of QSAR models in future DBP research, this article reviews the processes required for QSAR model development, summarizes recent trends in QSAR-DBP studies, and shares some important resources for QSAR development (e.g., free databases and QSAR programs). The paper follows the four steps of QSAR model development, i.e., data collection, descriptor filtration, algorithm selection, and model validation; and finishes by highlighting several research needs. Because QSAR models may have an important role in progressing our understanding of DBP issues, it is hoped that this paper will encourage their future use for this application.
Collapse
Affiliation(s)
- Baiyang Chen
- Harbin Institute of Technology Shenzhen Graduate School, Shenzhen Key Laboratory of Water Resource Utilization and Environmental Pollution Control, Shenzhen 518055, China.
| | - Tian Zhang
- Harbin Institute of Technology Shenzhen Graduate School, Shenzhen Key Laboratory of Water Resource Utilization and Environmental Pollution Control, Shenzhen 518055, China
| | - Tom Bond
- Department of Civil and Environmental Engineering, Imperial College, London SW7 2AZ, United Kingdom
| | - Yiqun Gan
- Harbin Institute of Technology Shenzhen Graduate School, Shenzhen Key Laboratory of Water Resource Utilization and Environmental Pollution Control, Shenzhen 518055, China
| |
Collapse
|
39
|
Nekoeinia M, Yousefinejad S, Abdollahi-Dezaki A. Prediction of ETN Polarity Scale of Ionic Liquids Using a QSPR Approach. Ind Eng Chem Res 2015. [DOI: 10.1021/acs.iecr.5b02982] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Mohsen Nekoeinia
- Department
of Chemistry, Payame Noor University, P.O. BOX 19395-3697, Tehran, Iran
| | | | | |
Collapse
|
40
|
Nieto-Draghi C, Fayet G, Creton B, Rozanska X, Rotureau P, de Hemptinne JC, Ungerer P, Rousseau B, Adamo C. A General Guidebook for the Theoretical Prediction of Physicochemical Properties of Chemicals for Regulatory Purposes. Chem Rev 2015; 115:13093-164. [PMID: 26624238 DOI: 10.1021/acs.chemrev.5b00215] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Carlos Nieto-Draghi
- IFP Energies nouvelles , 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Guillaume Fayet
- INERIS, Parc Technologique Alata, BP2 , 60550 Verneuil-en-Halatte, France
| | - Benoit Creton
- IFP Energies nouvelles , 1 et 4 avenue de Bois-Préau, 92852 Rueil-Malmaison, France
| | - Xavier Rozanska
- Materials Design S.A.R.L. , 18, rue de Saisset, 92120 Montrouge, France
| | - Patricia Rotureau
- INERIS, Parc Technologique Alata, BP2 , 60550 Verneuil-en-Halatte, France
| | | | - Philippe Ungerer
- Materials Design S.A.R.L. , 18, rue de Saisset, 92120 Montrouge, France
| | - Bernard Rousseau
- Laboratoire de Chimie-Physique, Université Paris Sud , UMR 8000 CNRS, Bât. 349, 91405 Orsay Cedex, France
| | - Carlo Adamo
- Institut de Recherche Chimie Paris, PSL Research University, CNRS, Chimie Paristech , 11 rue P. et M. Curie, F-75005 Paris, France.,Institut Universitaire de France , 103 Boulevard Saint Michel, F-75005 Paris, France
| |
Collapse
|
41
|
Lazzús JA, Pulgar-Villarroel G. Estimation of thermal conductivity of ionic liquids using quantitative structure–property relationship calculations. J Mol Liq 2015. [DOI: 10.1016/j.molliq.2015.08.037] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
42
|
Lindenberg EK, Patey GN. Melting point trends and solid phase behaviors of model salts with ion size asymmetry and distributed cation charge. J Chem Phys 2015; 143:024508. [DOI: 10.1063/1.4923344] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- E. K. Lindenberg
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada
| | - G. N. Patey
- Department of Chemistry, University of British Columbia, Vancouver, British Columbia V6T 1Z1, Canada
| |
Collapse
|
43
|
Alexander DLJ, Tropsha A, Winkler DA. Beware of R(2): Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models. J Chem Inf Model 2015; 55:1316-22. [PMID: 26099013 DOI: 10.1021/acs.jcim.5b00206] [Citation(s) in RCA: 333] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The statistical metrics used to characterize the external predictivity of a model, i.e., how well it predicts the properties of an independent test set, have proliferated over the past decade. This paper clarifies some apparent confusion over the use of the coefficient of determination, R(2), as a measure of model fit and predictive power in QSAR and QSPR modeling. R(2) (or r(2)) has been used in various contexts in the literature in conjunction with training and test data for both ordinary linear regression and regression through the origin as well as with linear and nonlinear regression models. We analyze the widely adopted model fit criteria suggested by Golbraikh and Tropsha ( J. Mol. Graphics Modell. 2002 , 20 , 269 - 276 ) in a strict statistical manner. Shortcomings in these criteria are identified, and a clearer and simpler alternative method to characterize model predictivity is provided. The intent is not to repeat the well-documented arguments for model validation using test data but rather to guide the application of R(2) as a model fit statistic. Examples are used to illustrate both correct and incorrect uses of R(2). Reporting the root-mean-square error or equivalent measures of dispersion, which are typically of more practical importance than R(2), is also encouraged, and important challenges in addressing the needs of different categories of users such as computational chemists, experimental scientists, and regulatory decision support specialists are outlined.
Collapse
Affiliation(s)
- D L J Alexander
- †CSIRO Digital Productivity Flagship, Private Bag 10, Clayton South, VIC 3169, Australia
| | - A Tropsha
- ‡UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - David A Winkler
- §CSIRO Manufacturing Flagship, Clayton, VIC 3168, Australia.,∥Monash Institute of Pharmaceutical Sciences, Parkville, VIC 3052, Australia.,⊥Latrobe Institute for Molecular Science, Bundoora, VIC 3046, Australia.,#School of Chemical and Physical Sciences, Flinders University, Bedford Park, SA 5042, Australia
| |
Collapse
|
44
|
Wicker JGP, Cooper RI. Will it crystallise? Predicting crystallinity of molecular materials. CrystEngComm 2015. [DOI: 10.1039/c4ce01912a] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Machine learning algorithms can be used to create models which separate molecular materials which will form good-quality crystals from those that will not, and predict how synthetic modifications will change the crystallinity.
Collapse
|
45
|
Tetko IV, Sushko Y, Novotarskyi S, Patiny L, Kondratov I, Petrenko AE, Charochkina L, Asiri AM. How accurately can we predict the melting points of drug-like compounds? J Chem Inf Model 2014; 54:3320-9. [PMID: 25489863 PMCID: PMC4702524 DOI: 10.1021/ci5005288] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
This article contributes a highly accurate model for predicting the melting points (MPs) of medicinal chemistry compounds. The model was developed using the largest published data set, comprising more than 47k compounds. The distributions of MPs in drug-like and drug lead sets showed that >90% of molecules melt within [50,250]°C. The final model calculated an RMSE of less than 33 °C for molecules from this temperature interval, which is the most important for medicinal chemistry users. This performance was achieved using a consensus model that performed calculations to a significantly higher accuracy than the individual models. We found that compounds with reactive and unstable groups were overrepresented among outlying compounds. These compounds could decompose during storage or measurement, thus introducing experimental errors. While filtering the data by removing outliers generally increased the accuracy of individual models, it did not significantly affect the results of the consensus models. Three analyzed distance to models did not allow us to flag molecules, which had MP values fell outside the applicability domain of the model. We believe that this negative result and the public availability of data from this article will encourage future studies to develop better approaches to define the applicability domain of models. The final model, MP data, and identified reactive groups are available online at http://ochem.eu/article/55638.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz-Zentrum München - German Research Centre for Environmental Health (GmbH), Institute of Structural Biology , Munich 85764, Germany
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Yan F, Lartey M, Jariwala K, Bowser S, Damodaran K, Albenze E, Luebke DR, Nulwala HB, Smit B, Haranczyk M. Toward a Materials Genome Approach for Ionic Liquids: Synthesis Guided by Ab Initio Property Maps. J Phys Chem B 2014; 118:13609-20. [DOI: 10.1021/jp506972w] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Affiliation(s)
- Fangyong Yan
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Michael Lartey
- National Energy Technology Laboratory, P.O. Box
10940, Pittsburgh, Pennsylvania 15236, United States
| | - Kuldeep Jariwala
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Sage Bowser
- Department
of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - Krishnan Damodaran
- Department
of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, Pennsylvania 15260, United States
| | - Erik Albenze
- National Energy Technology Laboratory, P.O. Box
10940, Pittsburgh, Pennsylvania 15236, United States
- URS Corporation, P.O. Box 618, South
Park, Pennsylvania 15129, United States
| | - David R. Luebke
- National Energy Technology Laboratory, P.O. Box
10940, Pittsburgh, Pennsylvania 15236, United States
| | - Hunaid B. Nulwala
- National Energy Technology Laboratory, P.O. Box
10940, Pittsburgh, Pennsylvania 15236, United States
- Department
of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Berend Smit
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Department
of Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
- Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| | - Maciej Haranczyk
- Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
47
|
Ruggiu F, Solov'ev V, Marcou G, Horvath D, Graton J, Le Questel JY, Varnek A. Individual Hydrogen-Bond Strength QSPR Modelling with ISIDA Local Descriptors: a Step Towards Polyfunctional Molecules. Mol Inform 2014; 33:477-87. [PMID: 27485986 DOI: 10.1002/minf.201400032] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Accepted: 05/15/2014] [Indexed: 11/09/2022]
Abstract
Here, we introduce new ISIDA fragment descriptors able to describe "local" properties related to selected atoms or molecular fragments. These descriptors have been applied for QSPR modelling of the H-bond basicity scale pKBHX , measured by the 1 : 1 complexation constant of a series of organic acceptors (H-bond bases) with 4-fluorophenol as the reference H-bond donor in CCl4 at 298 K. Unlike previous QSPR studies of H-bond complexation, the models based on these new descriptors are able to predict the H-bond basicity of different acceptor centres on the same polyfunctional molecule. QSPR models were obtained using support vector machine and ensemble multiple linear regression methods on a set of 537 organic compounds including 5 bifunctional molecules. They were validated with cross-validation procedures and with two external test sets. The best model displays good predictive performance on a large test set of 451 mono- and bifunctional molecules: a root-mean squared error RMSE=0.26 and a determination coefficient R(2) =0.91. It is implemented on our website (http://infochim.u-strasbg.fr/webserv/VSEngine.html) together with the estimation of its applicability domain and an automatic detection of potential H-bond acceptors.
Collapse
Affiliation(s)
- Fiorella Ruggiu
- Laboratoire de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France phone:+33368851560
| | - Vitaly Solov'ev
- Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninskiy prospect, 31a, 119991, Moscow, Russian Federation
| | - Gilles Marcou
- Laboratoire de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France phone:+33368851560
| | - Dragos Horvath
- Laboratoire de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France phone:+33368851560
| | - Jérôme Graton
- Université de Nantes, UMR CNRS 6230, Chimie Et Interdisciplinarité: Synthèse, Analyse, Modélisation (CEISAM), UFR Sciences & Techniques, 2, rue de la Houssinière, BP 92208, 44322 NANTES Cedex 3, France
| | - Jean-Yves Le Questel
- Université de Nantes, UMR CNRS 6230, Chimie Et Interdisciplinarité: Synthèse, Analyse, Modélisation (CEISAM), UFR Sciences & Techniques, 2, rue de la Houssinière, BP 92208, 44322 NANTES Cedex 3, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140 CNRS, Université de Strasbourg, 1, rue Blaise Pascal, 67000 Strasbourg, France phone:+33368851560.
| |
Collapse
|
48
|
Solov’ev V, Varnek A, Tsivadze A. QSPR ensemble modelling of the 1:1 and 1:2 complexation of Co2+, Ni2+, and Cu2+ with organic ligands: relationships between stability constants. J Comput Aided Mol Des 2014; 28:549-64. [DOI: 10.1007/s10822-014-9741-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 04/01/2014] [Indexed: 12/01/2022]
|
49
|
Geppert T, Beck B. Fuzzy Matched Pairs: A Means To Determine the Pharmacophore Impact on Molecular Interaction. J Chem Inf Model 2014; 54:1093-102. [DOI: 10.1021/ci400694q] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Affiliation(s)
- Tim Geppert
- Department of Lead Identification and Optimization Support, Boehringer-Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397 Biberach an der Riss, Germany
| | - Bernd Beck
- Department of Lead Identification and Optimization Support, Boehringer-Ingelheim Pharma GmbH & Co. KG, Birkendorferstrasse 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
50
|
Lindenberg EK, Patey GN. How distributed charge reduces the melting points of model ionic salts. J Chem Phys 2014; 140:104504. [DOI: 10.1063/1.4867275] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|