1
|
van Tilborg D, Alenicheva A, Grisoni F. Exposing the Limitations of Molecular Machine Learning with Activity Cliffs. J Chem Inf Model 2022; 62:5938-5951. [PMID: 36456532 PMCID: PMC9749029 DOI: 10.1021/acs.jcim.2c01073] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 12/03/2022]
Abstract
Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| | | | - Francesca Grisoni
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| |
Collapse
|
2
|
Diéguez-Santana K, Nachimba-Mayanchi MM, Puris A, Gutiérrez RT, González-Díaz H. Prediction of acute toxicity of pesticides for Americamysis bahia using linear and nonlinear QSTR modelling approaches. ENVIRONMENTAL RESEARCH 2022; 214:113984. [PMID: 35981614 DOI: 10.1016/j.envres.2022.113984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/19/2022] [Accepted: 07/22/2022] [Indexed: 06/15/2023]
Abstract
Globally, pesticides are toxic substances with wide applications. However, the widespread use of pesticides has received increasing attention from regulatory agencies due to their various acute and chronic effects on multiple organisms. In this study, Quantitative Structure-Toxicity Relationship (QSTR) models were established using Multiple Linear Regression (MLR) and five Machine Learning (ML) algorithms to predict pesticide toxicity in Americamysis bahia. The most influential descriptors included in the MLR model are RBF, JGI2, nCbH, nRCOOR, nRSR, nPO4 and 'Cl-090', with positive contributions to the dependent variable (negative decimal logarithm of median lethal concentration at 96-h). The Random Forest (RF) regression model was superior amongst the five ML models. We observed higher values of R2 (0.812) and lower values of RMSE (0.595) and MAE (0.462) in the cross-validation training set and external validation set. Similarly, this study had a high level of fitness and was internally robust and externally predictive compared to models presented in similar studies. The results suggest that the developed QSTR models are suitable for reliably predicting the aquatic toxicity of structurally diverse pesticides and can be used for screening, prioritising new pesticides, filling data gaps and overcoming the limitations of in vivo and in vitro tests.
Collapse
Affiliation(s)
- Karel Diéguez-Santana
- Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, 48940, Leioa, Spain; Universidad Regional Amazónica Ikiam, Tena, Ecuador.
| | | | - Amilkar Puris
- Facultad de Ciencias de la Ingeniería, Universidad Técnica Estatal de Quevedo, Ecuador
| | | | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of the Basque Country UPV/EHU, 48940, Leioa, Spain; Basque Center for Biophysics CSIC-UPVEH, University of Basque Country UPV/EHU, 48940, Leioa, Spain; IKERBASQUE, Basque Foundation for Science, 48011, Bilbao, Biscay, Spain
| |
Collapse
|
3
|
Kim JY, Kim KB, Lee BM. Validation of Quantitative Structure-Activity Relationship (QSAR) and Quantitative Structure-Property Relationship (QSPR) approaches as alternatives to skin sensitization risk assessment. JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH. PART A 2021; 84:945-959. [PMID: 34338166 DOI: 10.1080/15287394.2021.1956660] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The aim of this study was conducted to validate the physicochemical properties of a total of 362 chemicals [305 skin sensitizers (212 in the previous study + 93 additional new chemicals), 57 non-skin sensitizers (38 in the previous study + 19 additional new chemicals)] for skin sensitization risk assessment using quantitative structure-activity relationship (QSAR)/quantitative structure-property relationship (QSPR) approaches. The average melting point (MP), surface tension (ST), and density (DS) of the 305 skin sensitizers and 57 non-sensitizers were used to determine the cutoff values distinguishing positive and negative sensitization, and correlation coefficients were employed to derive effective 3-fold concentration (EC3 (%)) values. QSAR models were also utilized to assess skin sensitization. The sensitivity, specificity, and accuracy were 80, 15, and 70%, respectively, for the Toxtree QSAR model; 88, 46, and 81%, respectively, for Vega; and 56, 61, and 56%, respectively, for Danish EPA QSAR. Surprisingly, the sensitivity, specificity, and accuracy were 60, 80, and 64%, respectively, when MP, ST, and DS (MP+ST+DS) were used in this study. Further, MP+ST+DS exhibited a sensitivity of 77%, specificity 57%, and accuracy 73% when the derived EC3 values were classified into local lymph node assay (LLNA) skin sensitizer and non-sensitizer categories. Thus, MP, ST, and DS may prove useful in predicting EC3 values as not only an alternative approach to animal testing but also for skin sensitization risk assessment.
Collapse
Affiliation(s)
- Ji Yun Kim
- Division of Toxicology, College of Pharmacy, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| | - Kyu-Bong Kim
- College of Pharmacy, Dankook University Dandae-ro, Cheonan, Chungnam, South Korea
| | - Byung-Mu Lee
- Division of Toxicology, College of Pharmacy, Sungkyunkwan University, Suwon, Gyeonggi-do, South Korea
| |
Collapse
|
4
|
Ding X, Cui C, Wang D, Zhao J, Zheng M, Luo X, Jiang H, Chen K. Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods. Curr Pharm Des 2021; 26:4195-4205. [PMID: 32338210 DOI: 10.2174/1381612826666200427111309] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 04/08/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Enhancing a compound's biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process. METHODS Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold. RESULTS Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563). CONCLUSION An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.
Collapse
Affiliation(s)
- Xiaoyu Ding
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Chen Cui
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Dingyan Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Jihui Zhao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Xiaomin Luo
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| | - Kaixian Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China
| |
Collapse
|
5
|
Andries JPM, Goodarzi M, Heyden YV. Improvement of quantitative structure-retention relationship models for chromatographic retention prediction of peptides applying individual local partial least squares models. Talanta 2020; 219:121266. [PMID: 32887157 DOI: 10.1016/j.talanta.2020.121266] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/02/2020] [Accepted: 06/03/2020] [Indexed: 10/24/2022]
Abstract
In Reversed-Phase Liquid Chromatography, Quantitative Structure-Retention Relationship (QSRR) models for retention prediction of peptides can be built, starting from large sets of theoretical molecular descriptors. Good predictive QSRR models can be obtained after selecting the most informative descriptors. Reliable retention prediction may be an aid in the correct identification of proteins/peptides in proteomics and in chromatographic method development. Traditionally, global QSRR models are built, using a calibration set containing a representative range of analytes. In this study, a strategy is presented to build individual local Partial Least Squares (PLS) models for peptides, based on selected local calibration samples, most similar to the specific query peptide to be predicted. Similar local calibration peptides are selected from a possible calibration set. The calibration samples with the lowest Euclidian distances to the query peptide are considered as most similar. Two Euclidian distances are investigated as similarity parameter, (i) in the autoscaled descriptor space and, (ii) in the PLS factor space of the global calibration samples, both after variable selection by the Final Complexity Adapted Models (FCAM) method. The predictive abilities of individual local QSRR PLS models for peptides, developed with both Euclidian distances, are found significantly better than those of two global models, i.e. before and after FCAM variable selection. The predictive abilities of the local models, developed with distances calculated in the PLS factor space, were best.
Collapse
Affiliation(s)
- Jan P M Andries
- Research Group Analysis Techniques in the Life Sciences, Avans Hogeschool, University of Professional Education, P.O. Box 90116, 4800, RA Breda, the Netherlands.
| | - Mohammad Goodarzi
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX, 75390, United States
| | - Yvan Vander Heyden
- Department of Analytical Chemistry, Applied Chemometrics and Molecular Modelling (FABI), Vrije Universiteit Brussel (VUB), Laarbeeklaan 103, B-1090, Brussels, Belgium
| |
Collapse
|
6
|
Nearest Neighbor Gaussian Process for Quantitative Structure-Activity Relationships. J Chem Inf Model 2020; 60:4653-4663. [PMID: 33022174 DOI: 10.1021/acs.jcim.0c00678] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
While Gaussian process models are typically restricted to smaller data sets, we propose a variation which extends its applicability to the larger data sets common in the industrial drug discovery space, making it relatively novel in the quantitative structure-activity relationship (QSAR) field. By incorporating locality-sensitive hashing for fast nearest neighbor searches, the nearest neighbor Gaussian process model makes predictions with time complexity that is sub-linear with the sample size. The model can be efficiently built, permitting rapid updates to prevent degradation as new data is collected. Given its small number of hyperparameters, it is robust against overfitting and generalizes about as well as other common QSAR models. Like the usual Gaussian process model, it natively produces principled and well-calibrated uncertainty estimates on its predictions. We compare this new model with implementations of random forest, light gradient boosting, and k-nearest neighbors to highlight these promising advantages. The code for the nearest neighbor Gaussian process is available at https://github.com/Merck/nngp.
Collapse
|
7
|
Kim JY, Kim MK, Kim KB, Kim HS, Lee BM. Quantitative structure-activity and quantitative structure-property relationship approaches as alternative skin sensitization risk assessment methods. JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH. PART A 2019; 82:447-472. [PMID: 31104613 DOI: 10.1080/15287394.2019.1616437] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
This study aimed to predict skin sensitization potency of selected chemicals by quantitatively analyzing their physicochemical properties by employing quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) approaches as alternative risk assessment methods to animal testing. Correlations between effective concentration for a stimulation index of 3 (EC3) (%), the amount of a chemical required to elicit a threefold increase in lymph node cell proliferative activity (stimulation index, ≥3), were calculated using local lymph node assay (LLNA) and physicochemical properties of 212 skin sensitizers and 38 non-sensitizers were investigated. The correlation coefficients between melting point (MP) and EC3 and between surface tension (ST) and EC3 were 0.65 and 0.69, respectively. The correlation coefficient for MP + ST and EC3 was estimated to be 0.72. Thus, correlation coefficients between EC3 and MP, ST, and MP + ST reliably predicted the skin sensitization potential of the chemicals with sensitivities of 72% (126/175), 70% (122/174), and 73% (116/158); specificities of 77% (27/35), 69% (22/32), and 81% (26/32); and accuracies of 73% (153/210), 70% (144/206), and 75% (142/190), respectively. Our findings suggest that the EC3 value may be more accurately predicted using the ST values of chemicals as opposed to MP values. Thus, information on MP and ST parameters of chemicals might be useful for predicting the EC3 values as not only an alternative approach to animal testing, but as a risk assessment method for skin sensitization.
Collapse
Affiliation(s)
- Ji Yun Kim
- a Division of Toxicology, College of Pharmacy , Sungkyunkwan University , Suwon , Gyeonggi-do , South Korea
| | - Min Kook Kim
- a Division of Toxicology, College of Pharmacy , Sungkyunkwan University , Suwon , Gyeonggi-do , South Korea
| | - Kyu-Bong Kim
- b College of Pharmacy , Dankook University , Cheonan , Chungnam , South Korea
| | - Hyung Sik Kim
- a Division of Toxicology, College of Pharmacy , Sungkyunkwan University , Suwon , Gyeonggi-do , South Korea
| | - Byung-Mu Lee
- a Division of Toxicology, College of Pharmacy , Sungkyunkwan University , Suwon , Gyeonggi-do , South Korea
| |
Collapse
|
8
|
Vukovic K, Gadaleta D, Benfenati E. Methodology of aiQSAR: a group-specific approach to QSAR modelling. J Cheminform 2019; 11:27. [PMID: 30945010 PMCID: PMC6446381 DOI: 10.1186/s13321-019-0350-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 03/25/2019] [Indexed: 12/26/2022] Open
Abstract
Background Several QSAR methodology developments have shown promise in recent years. These include the consensus approach to generate the final prediction of a model, utilizing new, advanced machine learning algorithms and streamlining, standardization and automation of various QSAR steps. One approach that seems under-explored is at-the-runtime generation of local models specific to individual compounds. This approach was quite likely limited by the computational requirements, but with current increases in processing power and the widespread availability of cluster-computing infrastructure, this limitation is no longer that severe. Results We propose a new QSAR methodology: aiQSAR, whose aim is to generate endpoint predictions directly from the input dataset by building an array of local models generated at-the-runtime and specific for each compound in the dataset. The local group of each compound is selected on the basis of fingerprint similarities and the final prediction is calculated by integrating the results of a number of autonomous mathematical models. The method is applicable to regression, binary classification and multi-class classification and was tested on one dataset for each endpoint type: bioconcentration factor (BCF) for regression, Ames test for binary classification and Environmental Protection Agency (EPA) acute rat oral toxicity ranking for multi-class classification. As part of this method, the applicability domain of each prediction is assessed through the applicability domain measure, calculated on the basis of the fingerprint similarities in each local group of compounds. Conclusions We outline the methodology for a new QSAR-based predictive tool whose advantages are automation, group-specific approach to modelling and simplicity of execution. Our aim now will be to develop this method into a stand-alone software tool. We hope that eventual adoption of our tool would make QSAR modelling more accessible and transparent. Our methodology could be used as an initial modelling step, to predict new compounds by simply loading the training dataset as an input. Predictions could then be further evaluated and refined either by other tools or through optimization of aiQSAR parameters. Electronic supplementary material The online version of this article (10.1186/s13321-019-0350-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kristijan Vukovic
- Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy. .,Jozef Stefan International Postgraduate School, Jamova cesta 39, 1000, Ljubljana, Slovenia.
| | - Domenico Gadaleta
- Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche Mario Negri-IRCCS, Via Mario Negri 2, 20156, Milan, Italy
| |
Collapse
|
9
|
Raevsky OA, Grigorev VY, Polianczyk DE, Raevskaja OE, Dearden JC. Aqueous Drug Solubility: What Do We Measure, Calculate and QSPR Predict? Mini Rev Med Chem 2019; 19:362-372. [PMID: 30058484 DOI: 10.2174/1389557518666180727164417] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Revised: 07/06/2018] [Accepted: 07/20/2018] [Indexed: 01/07/2023]
Abstract
Detailed critical analysis of publications devoted to QSPR of aqueous solubility is presented in the review with discussion of four types of aqueous solubility (three different thermodynamic solubilities with unknown solute structure, intrinsic solubility, solubility in physiological media at pH=7.4 and kinetic solubility), variety of molecular descriptors (from topological to quantum chemical), traditional statistical and machine learning methods as well as original QSPR models.
Collapse
Affiliation(s)
- Oleg A Raevsky
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, Chernogolovka, Russian Federation
| | - Veniamin Y Grigorev
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, Chernogolovka, Russian Federation
| | - Daniel E Polianczyk
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, Chernogolovka, Russian Federation
| | - Olga E Raevskaja
- Department of Computer-Aided Molecular Design, Institute of Physiologically Active Compounds, Russian Academy of Science, Chernogolovka, Russian Federation
| | - John C Dearden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
10
|
Kaneko H. Discussion on Regression Methods Based on Ensemble Learning and Applicability Domains of Linear Submodels. J Chem Inf Model 2018; 58:480-489. [PMID: 29425038 DOI: 10.1021/acs.jcim.7b00649] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
To develop a new ensemble learning method and construct highly predictive regression models in chemoinformatics and chemometrics, applicability domains (ADs) are introduced into the ensemble learning process of prediction. When estimating values of an objective variable using subregression models, only the submodels with ADs that cover a query sample, i.e., the sample is inside the model's AD, are used. By constructing submodels and changing a list of selected explanatory variables, the union of the submodels' ADs, which defines the overall AD, becomes large, and the prediction performance is enhanced for diverse compounds. By analyzing a quantitative structure-activity relationship data set and a quantitative structure-property relationship data set, it is confirmed that the ADs can be enlarged and the estimation performance of regression models is improved compared with traditional methods.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry, School of Science and Technology, Meiji University , 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
11
|
Raevsky OA, Grigorev VY, Polianczyk DE, Raevskaja OE, Dearden JC. Six global and local QSPR models of aqueous solubility at pH = 7.4 based on structural similarity and physicochemical descriptors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2017; 28:661-676. [PMID: 28891683 DOI: 10.1080/1062936x.2017.1368704] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 08/14/2017] [Indexed: 06/07/2023]
Abstract
Aqueous solubility at pH = 7.4 is a very important property for medicinal chemists because this is the pH value of physiological media. The present work describes the application of three different methods (support vector machine (SVM), random forest (RF) and multiple linear regression (MLR)) and three local quantitative structure-property relationship (QSPR) models (regression corrected by nearest neighbours (RCNN), arithmetic mean property (AMP) and local regression property (LoReP)) to construct stable QSPRs with clear mechanistic interpretation. Our data set contained experimental values of aqueous solubility at pH = 7.4 of 387 chemicals (349 in the training set and 38 in the test set including 16 own measurements). The initial descriptor pool contained 210 physicochemical descriptors, calculated from the HYBOT, DRAGON, SYBYL and VolSurf+ programs. Six QSPRs with good statistics based on fundamentals of aqueous solubility and optimization of descriptor space were obtained. Those models have an RMSE close to experimental error (0.70), and are amenable to physical interpretation. The QSPR models developed in this study may be useful for medicinal chemists. Global MLR, RF and SVM models may be valuable for consideration of common factors that influence solubility. The RCNN, AMP and LoReP local models may be helpful for the optimization of aqueous solubility in small sets of related chemicals.
Collapse
Affiliation(s)
- O A Raevsky
- a Department of Computer-Aided Molecular Design , Russian Academy of Science , Chernogolovka , Russia
| | - V Y Grigorev
- a Department of Computer-Aided Molecular Design , Russian Academy of Science , Chernogolovka , Russia
| | - D E Polianczyk
- a Department of Computer-Aided Molecular Design , Russian Academy of Science , Chernogolovka , Russia
| | - O E Raevskaja
- a Department of Computer-Aided Molecular Design , Russian Academy of Science , Chernogolovka , Russia
| | - J C Dearden
- b School of Pharmacy and Biomolecular Sciences , Liverpool John Moores University , Liverpool , UK
| |
Collapse
|
12
|
García-Jacas CR, Marrero-Ponce Y, Barigye SJ, Hernández-Ortega T, Cabrera-Leyva L, Fernández-Castillo A. N-tuple topological/geometric cutoffs for 3D N-linear algebraic molecular codifications: variability, linear independence and QSAR analysis. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2016; 27:949-975. [PMID: 27707004 DOI: 10.1080/1062936x.2016.1231714] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2016] [Accepted: 08/30/2016] [Indexed: 06/06/2023]
Abstract
Novel N-tuple topological/geometric cutoffs to consider specific inter-atomic relations in the QuBiLS-MIDAS framework are introduced in this manuscript. These molecular cutoffs permit the taking into account of relations between more than two atoms by using (dis-)similarity multi-metrics and the concepts related with topological and Euclidean-geometric distances. To this end, the kth two-, three- and four-tuple topological and geometric neighbourhood quotient (NQ) total (or local-fragment) spatial-(dis)similarity matrices are defined, to represent 3D information corresponding to the relations between two, three and four atoms of the molecular structures that satisfy certain cutoff criteria. First, an analysis of a diverse chemical space for the most common values of topological/Euclidean-geometric distances, bond/dihedral angles, triangle/quadrilateral perimeters, triangle area and volume was performed in order to determine the intervals to take into account in the cutoff procedures. A variability analysis based on Shannon's entropy reveals that better distribution patterns are attained with the descriptors based on the cutoffs proposed (QuBiLS-MIDAS NQ-MDs) with regard to the results obtained when all inter-atomic relations are considered (QuBiLS-MIDAS KA-MDs - 'Keep All'). A principal component analysis shows that the novel molecular cutoffs codify chemical information captured by the respective QuBiLS-MIDAS KA-MDs, as well as information not captured by the latter. Lastly, a QSAR study to obtain deeper knowledge of the contribution of the proposed methods was carried out, using four molecular datasets (steroids (STER), angiotensin converting enzyme (ACE), thermolysin inhibitors (THER) and thrombin inhibitors (THR)) widely used as benchmarks in the evaluation of several methodologies. One to four variable QSAR models based on multiple linear regression were developed for each compound dataset following the original division into training and test sets. The results obtained reveal that the novel cutoff procedures yield superior performances relative to those of the QuBiLS-MIDAS KA-MDs in the prediction of the biological activities considered. From the results achieved, it can be suggested that the proposed N-tuple topological/geometric cutoffs constitute a relevant criteria for generating MDs codifying particular atomic relations, ultimately useful in enhancing the modelling capacity of the QuBiLS-MIDAS 3D-MDs.
Collapse
Affiliation(s)
- C R García-Jacas
- a Escuela de Sistemas y Computación , Pontificia Universidad Católica del Ecuador Sede Esmeraldas (PUCESE) , Esmeraldas , Ecuador
- b Grupo de Investigación de Bioinformática , Instituto de Química, Universidad Nacional Autónoma de México (UNAM) , Ciudad de México , D.F, México
- c Grupo de Investigacion de Bioinformatica , Universidad de las Ciencias Informaticas (UCI) , La Habana , Cuba
| | - Y Marrero-Ponce
- d Grupo de Medicina Molecular y Traslacional (MeM&T) , Universidad San Francisco de Quito (USFQ) , Quito , Ecuador
- e Instituto de Simulación Computacional (ISC-USFQ), Universidad San Francisco de Quito (USFQ) , Quito , Ecuador
| | - S J Barigye
- g Department of Chemistry , McGill University , Montréal , Québec , Canada
| | - T Hernández-Ortega
- c Grupo de Investigacion de Bioinformatica , Universidad de las Ciencias Informaticas (UCI) , La Habana , Cuba
| | - L Cabrera-Leyva
- f Grupo de Investigación de Inteligencia Artificial (AIRES) , Universidad de Camagüey , Camagüey , Cuba
| | - A Fernández-Castillo
- c Grupo de Investigacion de Bioinformatica , Universidad de las Ciencias Informaticas (UCI) , La Habana , Cuba
| |
Collapse
|
13
|
Carrió P, Sanz F, Pastor M. Toward a unifying strategy for the structure-based prediction of toxicological endpoints. Arch Toxicol 2015; 90:2445-60. [PMID: 26553148 DOI: 10.1007/s00204-015-1618-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 10/19/2015] [Indexed: 01/13/2023]
Abstract
Most computational methods used for the prediction of toxicity endpoints are based on the assumption that similar compounds have similar biological properties. This principle can be exploited using computational methods like read across or quantitative structure-activity relationships. However, there is no general agreement about which method is the most appropriate for quantifying compound similarity neither for exploiting the similarity principle in order to obtain reliable estimations of the compound properties. Moreover, optimal similarity metrics and modeling methods might depend on the characteristics of the endpoints and training series used in each case. This study describes a comparative analysis of the predictive performance of diverse similarity metrics and modeling methods in toxicological applications. A collection of two quantitative (n = 660, n = 1114) and three qualitative (n = 447, n = 905, n = 1220) datasets representing very different endpoints of interest in drug safety evaluation and rigorous methods were used to estimate the external predictive ability in each case. The results confirm that no single approach produces the best results in all instances, and the best predictions were obtained using different tools in different situations. The trends observed in this study were exploited to propose a unifying strategy allowing the use of the most suitable method for every compound. A comparison of the quality of the predictions obtained by the unifying strategy with those obtained by standard prediction methods confirmed the usefulness of the proposed approach.
Collapse
Affiliation(s)
- Pau Carrió
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Ferran Sanz
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences, Hospital del Mar Medical Research Institute (IMIM), Universitat Pompeu Fabra, Carrer Dr. Aiguader 88, 08003, Barcelona, Spain.
| |
Collapse
|
14
|
Shaik B, Gupta R, Louis B, Agrawal VK. Prediction of permeability of drug-like compounds across polydimethylsiloxane membranes by machine learning methods. JOURNAL OF PHARMACEUTICAL INVESTIGATION 2015. [DOI: 10.1007/s40005-015-0194-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Li L, Hu J, Ho YS. Global Performance and Trend of QSAR/QSPR Research: A Bibliometric Analysis. Mol Inform 2014; 33:655-68. [PMID: 27485301 DOI: 10.1002/minf.201300180] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 08/04/2014] [Indexed: 11/08/2022]
Abstract
A bibliometric analysis based on the Science Citation Index Expanded was conducted to provide insights into the publication performance and research trend of quantitative structure-activity relationship (QSAR) and quantitative structure-property relationship (QSPR) from 1993 to 2012. The results show that the number of articles per year quadrupled from 1993 to 2006 and plateaued since 2007. Journal of Chemical Information and Modeling was the most prolific journal. The internal methodological innovations in acquiring molecular descriptors and modeling stimulated the articles' increase in the research fields of drug design and synthesis, and chemoinformatics; while the external regulatory demands on model validation and reliability fueled the increase in environmental sciences. "Prediction endpoints", "statistical algorithms", and "molecular descriptors" were identified as three research hotspots. The articles from developed countries were larger in number and more influential in citation, whereas those from developing countries were higher in output growth rates.
Collapse
Affiliation(s)
- Li Li
- State Key Joint Laboratory for Environmental Simulation and Pollution Control, College of Environmental Sciences and Engineering, Peking University, Beijing 100871, People's Republic of China
| | - Jianxin Hu
- State Key Joint Laboratory for Environmental Simulation and Pollution Control, College of Environmental Sciences and Engineering, Peking University, Beijing 100871, People's Republic of China
| | - Yuh-Shan Ho
- Trend Research Centre, Asia University, Taichung 41354, Taiwan. .,Department of Environmental Engineering, Peking University, Beijing 100871, People's Republic of China tel: +886 4 2332 3456 x 1797; fax: +886 4 2330 5834..
| |
Collapse
|
16
|
Poulsen KL, Olivero-Verbel J, Beggs KM, Ganey PE, Roth RA. Trovafloxacin enhances lipopolysaccharide-stimulated production of tumor necrosis factor-α by macrophages: role of the DNA damage response. J Pharmacol Exp Ther 2014; 350:164-70. [PMID: 24817034 DOI: 10.1124/jpet.114.214189] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Trovafloxacin (TVX) is a drug that has caused idiosyncratic, drug-induced liver injury (IDILI) in humans. In a murine model of IDILI, otherwise nontoxic doses of TVX and the inflammagen lipopolysaccharide (LPS) interacted to produce pronounced hepatocellular injury. The liver injury depended on a TVX-induced, small but significant prolongation of tumor necrosis factor-α (TNF) appearance in the plasma. The enhancement of TNF expression by TVX was reproduced in vitro in RAW 264.7 murine macrophages (RAW cells) stimulated with LPS. The current study was designed to identify the molecular target of TVX responsible for this response in RAW cells. An in silico analysis suggested a favorable binding profile of TVX to eukaryotic topoisomerase II-α (TopIIα), and a cell-free assay revealed that TVX inhibited eukaryotic TopIIα activity. Topoisomerase inhibition is known to lead to DNA damage, and TVX increased the DNA damage marker phosphorylated histone 2A.X in RAW cells. Moreover, TVX induced activation of the DNA damage sensor kinases, ataxia telangiectasia mutated (ATM) and Rad3-related (ATR). The ATR inhibitor NU6027 [6-(cyclohexylmethoxy)-5-nitrosopyrimidine-2,4-diamine] prevented the TVX-mediated increases in LPS-induced TNF mRNA and protein release, whereas a selective ATM inhibitor [2-(4-morpholinyl)-6-(1-thianthrenyl)-4H-pyran-4-one (KU55933)] was without effect. TVX prolonged TNF mRNA stability, and this effect was largely attenuated by NU6027. These results suggest that TVX can inhibit eukaryotic topoisomerase, leading to activation of ATR and potentiation of TNF release by macrophages, at least in part through increased mRNA stability. This off-target effect might contribute to the ability of TVX to precipitate IDILI in humans.
Collapse
Affiliation(s)
- Kyle L Poulsen
- Department of Pharmacology & Toxicology, Center for Integrative Toxicology, Michigan State University, East Lansing, Michigan (K.L.P., K.M.B., P.E.G., and R.A.R.); and Environmental and Computational Chemistry Group, School of Pharmaceutical Sciences, University of Cartagena, Cartagena, Colombia (J.O.-V.)
| | - Jesus Olivero-Verbel
- Department of Pharmacology & Toxicology, Center for Integrative Toxicology, Michigan State University, East Lansing, Michigan (K.L.P., K.M.B., P.E.G., and R.A.R.); and Environmental and Computational Chemistry Group, School of Pharmaceutical Sciences, University of Cartagena, Cartagena, Colombia (J.O.-V.)
| | - Kevin M Beggs
- Department of Pharmacology & Toxicology, Center for Integrative Toxicology, Michigan State University, East Lansing, Michigan (K.L.P., K.M.B., P.E.G., and R.A.R.); and Environmental and Computational Chemistry Group, School of Pharmaceutical Sciences, University of Cartagena, Cartagena, Colombia (J.O.-V.)
| | - Patricia E Ganey
- Department of Pharmacology & Toxicology, Center for Integrative Toxicology, Michigan State University, East Lansing, Michigan (K.L.P., K.M.B., P.E.G., and R.A.R.); and Environmental and Computational Chemistry Group, School of Pharmaceutical Sciences, University of Cartagena, Cartagena, Colombia (J.O.-V.)
| | - Robert A Roth
- Department of Pharmacology & Toxicology, Center for Integrative Toxicology, Michigan State University, East Lansing, Michigan (K.L.P., K.M.B., P.E.G., and R.A.R.); and Environmental and Computational Chemistry Group, School of Pharmaceutical Sciences, University of Cartagena, Cartagena, Colombia (J.O.-V.)
| |
Collapse
|
17
|
Sheridan RP. Global Quantitative Structure–Activity Relationship Models vs Selected Local Models as Predictors of Off-Target Activities for Project Compounds. J Chem Inf Model 2014; 54:1083-92. [DOI: 10.1021/ci500084w] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Affiliation(s)
- Robert P. Sheridan
- Cheminformatics Department,
RY800-D133, Merck Research Laboratories, Rahway, New Jersey 07065, United States
| |
Collapse
|
18
|
Raevsky OA, Grigor'ev VY, Polianczyk DE, Raevskaja OE, Dearden JC. Calculation of aqueous solubility of crystalline un-ionized organic chemicals and drugs based on structural similarity and physicochemical descriptors. J Chem Inf Model 2014; 54:683-91. [PMID: 24456022 DOI: 10.1021/ci400692n] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Solubilities of crystalline organic compounds calculated according to AMP (arithmetic mean property) and LoReP (local one-parameter regression) models based on structural and physicochemical similarities are presented. We used data on water solubility of 2615 compounds in un-ionized form measured at 25±5 °C. The calculation results were compared with the equation based on the experimental data for lipophilicity and melting point. According to statistical criteria, the model based on structural and physicochemical similarities showed a better fit with the experimental data. An additional advantage of this model is that it uses only theoretical descriptors, and this provides means for calculating water solubility for both existing and not yet synthesized compounds.
Collapse
Affiliation(s)
- Oleg A Raevsky
- Institute of Physiologically Active Compounds, Russian Academy of Science , Chernogolovka, Russia
| | | | | | | | | |
Collapse
|
19
|
Maunz A, Gütlein M, Rautenberg M, Vorgrimmler D, Gebele D, Helma C. lazar: a modular predictive toxicology framework. Front Pharmacol 2013; 4:38. [PMID: 23761761 PMCID: PMC3669891 DOI: 10.3389/fphar.2013.00038] [Citation(s) in RCA: 93] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Accepted: 03/19/2013] [Indexed: 11/16/2022] Open
Abstract
lazar (lazy structure–activity relationships) is a modular framework for predictive toxicology. Similar to the read across procedure in toxicological risk assessment, lazar creates local QSAR (quantitative structure–activity relationship) models for each compound to be predicted. Model developers can choose between a large variety of algorithms for descriptor calculation and selection, chemical similarity indices, and model building. This paper presents a high level description of the lazar framework and discusses the performance of example classification and regression models.
Collapse
Affiliation(s)
- Andreas Maunz
- Institute for Physics, Albert-Ludwigs-Universität Freiburg Freiburg, Germany
| | | | | | | | | | | |
Collapse
|
20
|
Piir G, Sild S, Maran U. Comparative analysis of local and consensus quantitative structure-activity relationship approaches for the prediction of bioconcentration factor. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2013; 24:175-199. [PMID: 23410132 DOI: 10.1080/1062936x.2012.762426] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Quantitative structure-activity relationships (QSARs) are broadly classified as global or local, depending on their molecular constitution. Global models use large and diverse training sets covering a wide range of chemical space. Local models focus on smaller structurally or chemically similar subsets that are conventionally selected by human experts or alternatively using clustering analysis. The current study focuses on the comparative analysis of different clustering algorithms (expectation-maximization, K-means and hierarchical) for seven different descriptor sets as structural characteristics and two rule-based approaches to select subsets for designing local QSAR models. A total of 111 local QSAR models are developed for predicting bioconcentration factor. Predictions from local models were compared with corresponding predictions from the global model. The comparison of coefficients of determination (r(2)) and standard deviations for local models with similar subsets from the global model show improved prediction quality in 97% of cases. The descriptor content of derived QSARs is discussed and analyzed. Local QSAR models were further consolidated within the framework of consensus approach. All different consensus approaches increased performance over the global and local models. The consensus approach reduced the number of strongly deviating predictions by evening out prediction errors, which were produced by some local QSARs.
Collapse
Affiliation(s)
- G Piir
- Institute of Chemistry, University of Tartu, Tartu, Estonia
| | | | | |
Collapse
|
21
|
Luo X, Krumrine JR, Shenvi AB, Pierson ME, Bernstein PR. Calculation and application of activity discriminants in lead optimization. J Mol Graph Model 2010; 29:372-81. [PMID: 20800520 DOI: 10.1016/j.jmgm.2010.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Revised: 07/10/2010] [Accepted: 07/14/2010] [Indexed: 11/18/2022]
Abstract
We present a technique for computing activity discriminants of in vitro (pharmacological, DMPK, and safety) assays and the application to the prediction of in vitro activities of proposed synthetic targets during the lead optimization phase of drug discovery projects. This technique emulates how medicinal chemists perform SAR analysis and activity prediction. The activity discriminants that are functions of 6 commonly used medicinal chemistry descriptors can be interpreted easily by medicinal chemists. Further, visualization with Spotfire allows medicinal chemists to analyze how the query molecule is related to compounds tested previously, and to evaluate easily the relevance of the activity discriminants to the activities of the query molecule. Validation with all compounds synthesized and tested in AstraZeneca Wilmington since 2006 demonstrates that this approach is useful for prioritizing new synthetic targets for synthesis.
Collapse
Affiliation(s)
- Xincai Luo
- Department of Chemistry, AstraZeneca Pharmaceuticals, 1800 Concord Pike, Wilmington, DE 19850, USA.
| | | | | | | | | |
Collapse
|
22
|
Sheppard D, Henkelman G, von Lilienfeld OA. Alchemical derivatives of reaction energetics. J Chem Phys 2010; 133:084104. [DOI: 10.1063/1.3474502] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
|
23
|
Xi L, Li S, Liu H, Li J, Lei B, Yao X. Global and local prediction of protein folding rates based on sequence autocorrelation information. J Theor Biol 2010; 264:1159-68. [DOI: 10.1016/j.jtbi.2010.03.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2009] [Revised: 03/28/2010] [Accepted: 03/29/2010] [Indexed: 11/24/2022]
|
24
|
Li J, Li S, Lei B, Liu H, Yao X, Liu M, Gramatica P. A new strategy to improve the predictive ability of the local lazy regression and its application to the QSAR study of melanin-concentrating hormone receptor 1 antagonists. J Comput Chem 2010; 31:973-85. [PMID: 19670228 DOI: 10.1002/jcc.21383] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
In the quantitative structure-activity relationship (QSAR) study, local lazy regression (LLR) can predict the activity of a query molecule by using the information of its local neighborhood without need to produce QSAR models a priori. When a prediction is required for a query compound, a set of local models including different number of nearest neighbors are identified. The leave-one-out cross-validation (LOO-CV) procedure is usually used to assess the prediction ability of each model, and the model giving the lowest LOO-CV error or highest LOO-CV correlation coefficient is chosen as the best model. However, it has been proved that the good statistical value from LOO cross-validation appears to be the necessary, but not the sufficient condition for the model to have a high predictive power. In this work, a new strategy is proposed to improve the predictive ability of LLR models and to access the accuracy of a query prediction. The bandwidth of k neighbor value for LLR is optimized by considering the predictive ability of local models using an external validation set. This approach was applied to the QSAR study of a series of thienopyrimidinone antagonists of melanin-concentrating hormone receptor 1. The obtained results from the new strategy shows evident improvement compared with the commonly used LOO-CV LLR methods and the traditional global linear model.
Collapse
Affiliation(s)
- Jiazhong Li
- State Key Laboratory of Applied Organic Chemistry, Department of Chemistry, Lanzhou University, Lanzhou 730000, China
| | | | | | | | | | | | | |
Collapse
|
25
|
Tromelin A, Andriot I, Kopjar M, Guichard E. Thermodynamic and structure-property study of liquid-vapor equilibrium for aroma compounds. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2010; 58:4372-4387. [PMID: 20222661 DOI: 10.1021/jf904146c] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Thermodynamic parameters (T, DeltaH degrees , DeltaS degrees , K) were collected from the literature and/or calculated for five esters, four ketones, two aldehydes, and three alcohols, pure compounds and compounds in aqueous solution. Examination of correlations between these parameters and the range values of DeltaH degrees and DeltaS degrees puts forward the key roles of enthalpy for vaporization of pure compounds and of entropy in liquid-vapor equilibrium of compounds in aqueous solution. A structure-property relationship (SPR) study was performed using molecular descriptors on aroma compounds to better understand their vaporization behavior. In addition to the role of polarity for vapor-liquid equilibrium of compounds in aqueous solution, the structure-property study points out the role of chain length and branching, illustrated by the correlation between the connectivity index CHI-V-1 and the difference between T and log K for vaporization of pure compounds and compounds in aqueous solution. Moreover, examination of the esters' enthalpy values allowed a probable conformation adopted by ethyl octanoate in aqueous solution to be proposed.
Collapse
Affiliation(s)
- Anne Tromelin
- Centre des Sciences du Gout et de l'Alimentation, UMR1324 INRA, UMR6265 CNRS Universite de Bourgogne, Agrosup Dijon, Dijon.
| | | | | | | |
Collapse
|
26
|
Helgee EA, Carlsson L, Boyer S, Norinder U. Evaluation of Quantitative Structure−Activity Relationship Modeling Strategies: Local and Global Models. J Chem Inf Model 2010; 50:677-89. [DOI: 10.1021/ci900471e] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ernst Ahlberg Helgee
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca Research & Development 15185 Södertälje, Sweden
| | - Lars Carlsson
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca Research & Development 15185 Södertälje, Sweden
| | - Scott Boyer
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca Research & Development 15185 Södertälje, Sweden
| | - Ulf Norinder
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca Research & Development 15185 Södertälje, Sweden
| |
Collapse
|
27
|
Ewing T, Feher M. Forecasting CYP2D6 and CYP3A4 Risk with a Global/Local Fusion Model of CYP450 Inhibition. Mol Inform 2010; 29:127-41. [DOI: 10.1002/minf.200900040] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2009] [Accepted: 09/23/2009] [Indexed: 11/12/2022]
|
28
|
Gedeck P, Kramer C, Ertl P. Computational analysis of structure-activity relationships. PROGRESS IN MEDICINAL CHEMISTRY 2010; 49:113-60. [PMID: 20855040 DOI: 10.1016/s0079-6468(10)49004-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Peter Gedeck
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Forum 1, Novartis Campus, CH-4056 Basel, Switzerland
| | | | | |
Collapse
|
29
|
Mensch J, Oyarzabal J, Mackie C, Augustijns P. In vivo, in vitro and in silico methods for small molecule transfer across the BBB. J Pharm Sci 2009; 98:4429-68. [DOI: 10.1002/jps.21745] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
30
|
Carlsson L, Helgee EA, Boyer S. Interpretation of Nonlinear QSAR Models Applied to Ames Mutagenicity Data. J Chem Inf Model 2009; 49:2551-8. [DOI: 10.1021/ci9002206] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Affiliation(s)
- Lars Carlsson
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden
| | | | - Scott Boyer
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden
| |
Collapse
|
31
|
Lei B, Xi L, Li J, Liu H, Yao X. Global, local and novel consensus quantitative structure-activity relationship studies of 4-(Phenylaminomethylene) isoquinoline-1, 3 (2H, 4H)-diones as potent inhibitors of the cyclin-dependent kinase 4. Anal Chim Acta 2009; 644:17-24. [DOI: 10.1016/j.aca.2009.04.019] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Revised: 02/23/2009] [Accepted: 04/15/2009] [Indexed: 10/20/2022]
|
32
|
Current mathematical methods used in QSAR/QSPR studies. Int J Mol Sci 2009; 10:1978-1998. [PMID: 19564933 PMCID: PMC2695261 DOI: 10.3390/ijms10051978] [Citation(s) in RCA: 124] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2009] [Accepted: 04/28/2009] [Indexed: 02/07/2023] Open
Abstract
This paper gives an overview of the mathematical methods currently used in quantitative structure-activity/property relationship (QASR/QSPR) studies. Recently, the mathematical methods applied to the regression of QASR/QSPR models are developing very fast, and new methods, such as Gene Expression Programming (GEP), Project Pursuit Regression (PPR) and Local Lazy Regression (LLR) have appeared on the QASR/QSPR stage. At the same time, the earlier methods, including Multiple Linear Regression (MLR), Partial Least Squares (PLS), Neural Networks (NN), Support Vector Machine (SVM) and so on, are being upgraded to improve their performance in QASR/QSPR studies. These new and upgraded methods and algorithms are described in detail, and their advantages and disadvantages are evaluated and discussed, to show their application potential in QASR/QSPR studies in the future.
Collapse
|
33
|
Prediction of hERG Potassium Channel Blockade Using kNN-QSAR and Local Lazy Regression Methods. ACTA ACUST UNITED AC 2008. [DOI: 10.1002/qsar.200810072] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
34
|
Du H, Zhang X, Wang J, Yao X, Hu Z. Novel approaches to predict the retention of histidine-containing peptides in immobilized metal-affinity chromatography. Proteomics 2008; 8:2185-95. [PMID: 18446801 DOI: 10.1002/pmic.200700788] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The new method lazy learning method-local lazy regression (LLR) was first used to model the quantitative structure-retention relationship (QSRR) for predicting and explaining the retention behaviors of peptides in the nickel column in immobilized metal-affinity chromatography (IMAC). The best multilinear regression (BMLR) method implemented in the CODESSA was used to select the most appropriate molecular descriptors from a large set and build a linear regression model. Based on the selected five descriptors, another two approaches, projection pursuit regression (PPR) and LLR were used to build more accurate QSRR models. The coefficients of determination (R(2)) of the best model developed based on LLR were 0.9446 and 0.9252 for the training set and the test set, respectively. By comparison, it was proved that the novel local learning method LLR was a very promising tool for QSRR modeling with excellent predictive capability for the prediction of imidazole concentration (IMC) values of histidine-containing peptides in IMAC. It could be used in other chromatography research fields and that should facilitate the design and purification of peptides and proteins.
Collapse
Affiliation(s)
- Hongying Du
- Department of Chemistry, Lanzhou University, Lanzhou, China
| | | | | | | | | |
Collapse
|
35
|
Du H, Watzl J, Wang J, Zhang X, Yao X, Hu Z. Prediction of retention indices of drugs based on immobilized artificial membrane chromatography using Projection Pursuit Regression and Local Lazy Regression. J Sep Sci 2008; 31:2325-33. [DOI: 10.1002/jssc.200700665] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
36
|
Utilizing high throughput screening data for predictive toxicology models: protocols and application to MLSCN assays. J Comput Aided Mol Des 2008; 22:367-84. [DOI: 10.1007/s10822-008-9192-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2007] [Accepted: 01/30/2008] [Indexed: 01/27/2023]
|
37
|
Maunz A, Helma C. Prediction of chemical toxicity with local support vector regression and activity-specific kernels. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2008; 19:413-431. [PMID: 18853295 DOI: 10.1080/10629360802358430] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
We propose a new kernel, based on 2-D structural chemical similarity, that integrates activity-specific information from the training data, and a new approach to applicability domain estimation that takes feature significances and activity distributions into consideration. The new kernel provides superior results than the well-established Tanimoto kernel, and activity-sensitive feature selection enhances prediction quality. Validation of local support vector regression models based on this kernel has been preformed with three publicly available datasets from the DSSTox project. One of them (Fathead Minnow Acute Toxicity) has been already modelled by other groups, and serves as a benchmark dataset, the other two (Maximum Recommended Therapeutic Dose, IRIS Lifetime Cancer Risk) have been modelled for the first time according to the knowledge of the authors. For all three models predictive accuracies increase with the prediction confidences that indicate the applicability domain. Depending on the confidence cutoff for acceptable predictions we were able to achieve > 90% predictions within 1 log unit of the experimental data for all datasets.
Collapse
Affiliation(s)
- A Maunz
- Freiburg Center for Data Analysis and Modelling, Freiburg, Germany.
| | | |
Collapse
|