1
|
Solov'ev V, Baulin D, Tsivadze A. Design of phosphoryl containing podands with Li +/Na + selectivity using machine learning. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:521-539. [PMID: 34105425 DOI: 10.1080/1062936x.2021.1929462] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 05/09/2021] [Indexed: 06/12/2023]
Abstract
In this work we demonstrated, that machine learning opens a way for real design of ligands with required metal ion selectivity. We performed the ensemble QSPR modelling of the Li+/Na+ complexation selectivity and the stability constants for the Li+L and Na+L complexes of phosphoryl podands in nonaqueous solvent THF/СНCl3 (4:1 v/v). The models were built and cross-validated using MLR with the ISIDA QSPR program and SVM with the libSVM package. The program SVMsmf was implemented to fulfil an ensemble modelling using libSVM and the Substructural Molecular Fragments (SMF) descriptors. SMF were used as descriptors for the ensemble modelling, properties predictions by consensus models and design of combinatorial library of new ligands. SMF such as the P=O group, the ether and P=O groups bound through the aromatic ring contribute significantly to the Li+/Na+ selectivity. The developed models were applied for the prediction of the studied properties for a focused virtual library of 3057 phosphoryl podands generated using SMF contributions promising for selective binding of lithium. Consensus models selected hits for a synthesis by combinatorial library screening. Among the constructed selective ligands - hits, three new podands were synthesized, for which the experimentally estimated selectivity is in satisfactory agreement with that predicted.
Collapse
Affiliation(s)
- V Solov'ev
- Laboratory of Novel Physicochemical Problems, A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, Russian Federation
| | - D Baulin
- Laboratory of Novel Physicochemical Problems, A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, Russian Federation
| | - A Tsivadze
- Laboratory of Novel Physicochemical Problems, A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Moscow, Russian Federation
| |
Collapse
|
2
|
Chemical Graph Theory for Property Modeling in QSAR and QSPR—Charming QSAR & QSPR. MATHEMATICS 2020. [DOI: 10.3390/math9010060] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Quantitative structure-activity relationship (QSAR) and Quantitative structure-property relationship (QSPR) are mathematical models for the prediction of the chemical, physical or biological properties of chemical compounds. Usually, they are based on structural (grounded on fragment contribution) or calculated (centered on QSAR three-dimensional (QSAR-3D) or chemical descriptors) parameters. Hereby, we describe a Graph Theory approach for generating and mining molecular fragments to be used in QSAR or QSPR modeling based exclusively on fragment contributions. Merging of Molecular Graph Theory, Simplified Molecular Input Line Entry Specification (SMILES) notation, and the connection table data allows a precise way to differentiate and count the molecular fragments. Machine learning strategies generated models with outstanding root mean square error (RMSE) and R2 values. We also present the software Charming QSAR & QSPR, written in Python, for the property prediction of chemical compounds while using this approach.
Collapse
|
3
|
Costa PC, Barsottini MR, Vieira ML, Pires BA, Evangelista JS, Zeri AC, Nascimento AF, Silva JS, Carazzolle MF, Pereira GA, Sforça ML, Miranda PC, Rocco SA. N-Phenylbenzamide derivatives as alternative oxidase inhibitors: Synthesis, molecular properties, 1H-STD NMR, and QSAR. J Mol Struct 2020. [DOI: 10.1016/j.molstruc.2020.127903] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
4
|
Solov'ev VP, Ustynyuk YA, Zhokhova NI, Karpov KV. Predictive Models for HOMO and LUMO Energies of N-Donor Heterocycles as Ligands for Lanthanides Separation. Mol Inform 2018; 37:e1800025. [PMID: 29971949 DOI: 10.1002/minf.201800025] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Accepted: 06/20/2018] [Indexed: 11/11/2022]
Abstract
Quantum chemical calculations combined with QSPR methodology reveal challenging perspectives for the solution of a number of fundamental and applied problems. In this work, we performed the PM7 and DFT calculations and QSPR modeling of HOMO and LUMO energies for polydentate N-heterocyclic ligands promising for the extraction separation of lanthanides because these values are related to the ligands selectivity in the respect to the target cations. Data for QSPR modeling comprised the PM7 calculated HOMO and LUMO energies of N-donor heterocycles, including several types of both known and virtual undescribed polydentate ligands. Ensemble modeling included various molecular fragments as descriptors and different variable selection techniques to build consensus models (CMs) on a training set of 388 ligands using external cross-validation. CMs were then verified to make predictions for two external test sets: 45 ligands (T1) that were similar to the ligands of the training set, and 1546 structures (T2), which were substantially different from the ligands of the training set. The consensus models predict well in 5-fold cross-validation (RMSEHOMO =0.097 eV, RMSELUMO =0.064 eV), and on the external test sets (T1: RMSEHOMO =0.26 eV, RMSELUMO =0.24 eV; T2: RMSEHOMO =0.26 eV, RMSELUMO =0.17 eV). An analysis of the results reveals that substituents in heteroaromatic rings of the ligands and at the amide nitrogens can deeply influence their metal binding properties.
Collapse
Affiliation(s)
- Vitaly P Solov'ev
- A.N. Frumkin Institute of Physical Chemistry and Electrochemistry, Russian Academy of Sciences, Leninskiy prosp., 31, 119071, Moscow, Russia
| | - Yuri A Ustynyuk
- Chemistry Department, M.V. Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Nelly I Zhokhova
- Faculty of Physics, M.V. Lomonosov Moscow State University, 119991, Moscow, Russia
| | - Kirill V Karpov
- Faculty of Physics, M.V. Lomonosov Moscow State University, 119991, Moscow, Russia
| |
Collapse
|
5
|
Shahid K, Wang Q, Jia Q, Li L, Cui X, Xia S, Ma P. Proposal and evaluation of a new norm index-based QSAR model to predict pEC 50 and pCC 50 activities of HEPT derivatives. Chin J Chem Eng 2016. [DOI: 10.1016/j.cjche.2016.04.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
6
|
Jingjie S, Liping C, Wanghua C. Prediction on the Auto-ignition Temperature Using Substructural Molecular Fragments. ACTA ACUST UNITED AC 2014. [DOI: 10.1016/j.proeng.2014.10.510] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
7
|
Srivastava HK, Sastry GN. Molecular dynamics investigation on a series of HIV protease inhibitors: assessing the performance of MM-PBSA and MM-GBSA approaches. J Chem Inf Model 2012; 52:3088-98. [PMID: 23121465 DOI: 10.1021/ci300385h] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
The binding free energies (ΔG(Bind)) obtained from molecular mechanics with Poisson-Boltzmann surface area (MM-PBSA) or molecular mechanics with Generalized Born surface area (MM-GBSA) calculations using molecular dynamics (MD) trajectories are the most popular procedures to measure the strength of interactions between a ligand and its receptor. Several attempts have been made to correlate the ΔG(Bind) and experimental IC(50) values in order to observe the relationship between binding strength of a ligand (with its receptor) and its inhibitory activity. The duration of MD simulations seems very important for getting acceptable correlation. Here, we are presenting a systematic study to estimate the reasonable MD simulation time for acceptable correlation between ΔG(Bind) and experimental IC(50) values. A comparison between MM-PBSA and MM-GBSA approaches is also presented at various time scales. MD simulations (10 ns) for 14 HIV protease inhibitors have been carried out by using the Amber program. MM-PBSA/GBSA based ΔG(Bind) have been calculated and correlated with experimental IC(50) values at different time scales (0-1 to 0-10 ns). This study clearly demonstrates that the MM-PBSA based ΔG(Bind) (ΔG(Bind)-PB) values provide very good correlation with experimental IC(50) values (quantitative and qualitative) when MD simulation is carried out for a longer time; however, MM-GBSA based ΔG(Bind) (ΔG(Bind)-GB) values show acceptable correlation for shorter time of simulation also. The accuracy of ΔG(Bind)-PB increases and ΔG(Bind)-GB remains almost constant with the increasing time of simulation.
Collapse
Affiliation(s)
- Hemant Kumar Srivastava
- Centre for Molecular Modelling, CSIR-Indian Institute of Chemical Technology, Tarnaka, Hyderabad 500 607, India
| | | |
Collapse
|
8
|
Bonachera F, Marcou G, Kireeva N, Varnek A, Horvath D. Using self-organizing maps to accelerate similarity search. Bioorg Med Chem 2012; 20:5396-409. [DOI: 10.1016/j.bmc.2012.04.024] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Revised: 04/03/2012] [Accepted: 04/10/2012] [Indexed: 10/28/2022]
|
9
|
Akyüz L, Sarıpınar E. Conformation depends on 4D-QSAR analysis using EC-GA method: pharmacophore identification and bioactivity prediction of TIBOs as non-nucleoside reverse transcriptase inhibitors. J Enzyme Inhib Med Chem 2012; 28:776-91. [PMID: 22591319 DOI: 10.3109/14756366.2012.684051] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The electron conformational and genetic algorithm methods (EC-GA) were integrated for the identification of the pharmacophore group and predicting the anti HIV-1 activity of tetrahydroimidazo[4,5,1-jk][1,4]benzodiazepinone (TIBO) derivatives. To reveal the pharmacophore group, each conformation of all compounds was arranged by electron conformational matrices of congruity. Multiple comparisons of these matrices, within given tolerances for high active and low active TIBO derivatives, allow the identification of the pharmacophore group that refers to the electron conformational submatrix of activity. The effects of conformations, internal and external validation were investigated by four different models based on an ensemble of conformers and a single conformer, both with and without a test set. Model 1 using an ensemble of conformers for the training (39 compounds) and test sets (13 compounds), obtained by the optimum seven parameters, gave satisfactory results (R²(training) = 0.878, R²(test)= 0.910, q² = 0.840, q²(ext1) = 0.926 and q²(ext2) = 0.900).
Collapse
Affiliation(s)
- Lalehan Akyüz
- Faculty of Science, Department of Chemistry, Erciyes University, Kayseri, Turkey
| | | |
Collapse
|
10
|
Liu Q, Zhou H, Liu L, Chen X, Zhu R, Cao Z. Multi-target QSAR modelling in the analysis and design of HIV-HCV co-inhibitors: an in-silico study. BMC Bioinformatics 2011; 12:294. [PMID: 21774796 PMCID: PMC3167801 DOI: 10.1186/1471-2105-12-294] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Accepted: 07/20/2011] [Indexed: 12/13/2022] Open
Abstract
Background HIV and HCV infections have become the leading global public-health threats. Even more remarkable, HIV-HCV co-infection is rapidly emerging as a major cause of morbidity and mortality throughout the world, due to the common rapid mutation characteristics of the two viruses as well as their similar complex influence to immunology system. Although considerable progresses have been made on the study of the infection of HIV and HCV respectively, few researches have been conducted on the investigation of the molecular mechanism of their co-infection and designing of the multi-target co-inhibitors for the two viruses simultaneously. Results In our study, a multi-target Quantitative Structure-Activity Relationship (QSAR) study of the inhibitors for HIV-HCV co-infection were addressed with an in-silico machine learning technique, i.e. multi-task learning, to help to guide the co-inhibitor design. Firstly, an integrated dataset with 3 HIV inhibitor subsets targeted on protease, integrase and reverse transcriptase respectively, together with another 6 subsets of 2 HCV inhibitors targeted on NS3 serine protease and NS5B polymerase respectively were compiled. Secondly, an efficient multi-target QSAR modelling of HIV-HCV co-inhibitors was performed by applying an accelerated gradient method based multi-task learning on the whole 9 datasets. Furthermore, by solving the L-1-infinity regularized optimization, the Drug-like index features for compound description were ranked according to their joint importance in multi-target QSAR modelling of HIV and HCV. Finally, a drug structure-activity simulation for investigating the relationships between compound structures and binding affinities was presented based on our multiple target analysis, which is then providing several novel clues for the design of multi-target HIV-HCV co-inhibitors with increasing likelihood of successful therapies on HIV, HCV and HIV-HCV co-infection. Conclusions The framework presented in our study provided an efficient way to identify and design inhibitors that simultaneously and selectively bind to multiple targets from multiple viruses with high affinity, and will definitely shed new lights on the future work of inhibitor synthesis for multi-target HIV, HCV, and HIV-HCV co-infection treatments.
Collapse
Affiliation(s)
- Qi Liu
- College of Life Science and Biotechnology, Tongji University, 200092, China
| | | | | | | | | | | |
Collapse
|
11
|
Abstract
This chapter reviews the application of fragment descriptors at different stages of virtual screening: filtering, similarity search, and direct activity assessment using QSAR/QSPR models. Several case studies are considered. It is demonstrated that the power of fragment descriptors stems from their universality, very high computational efficiency, simplicity of interpretation, and versatility.
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratory of Chemoinformatics, UMR7177 CNRS, University of Strasbourg, Strasbourg, France
| |
Collapse
|
12
|
MIA–QSAR coupled to principal component analysis-adaptive neuro-fuzzy inference systems (PCA–ANFIS) for the modeling of the anti-HIV reverse transcriptase activities of TIBO derivatives. Eur J Med Chem 2010; 45:1352-8. [DOI: 10.1016/j.ejmech.2009.12.028] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Revised: 12/09/2009] [Accepted: 12/14/2009] [Indexed: 11/18/2022]
|
13
|
Varnek A, Fourches D, Solov'ev V, Klimchuk O, Ouadi A, Billard I. Successful “In Silico” Design of New Efficient Uranyl Binders. SOLVENT EXTRACTION AND ION EXCHANGE 2007. [DOI: 10.1080/07366290701415820] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
14
|
Horvath D, Bonachera F, Solov'ev V, Gaudin C, Varnek A. Stochastic versus Stepwise Strategies for Quantitative Structure−Activity Relationship GenerationHow Much Effort May the Mining for Successful QSAR Models Take? J Chem Inf Model 2007; 47:927-39. [PMID: 17480052 DOI: 10.1021/ci600476r] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Descriptor selection in QSAR typically relies on a set of upfront working hypotheses in order to boil down the initial descriptor set to a tractable size. Stepwise regression, computationally cheap and therefore widely used in spite of its potential caveats, is most aggressive in reducing the effectively explored problem space by adopting a greedy variable pick strategy. This work explores an antipodal approach, incarnated by an original Genetic Algorithm (GA)-based Stochastic QSAR Sampler (SQS) that favors unbiased model search over computational cost. Independent of a priori descriptor filtering and, most important, not limited to linear models only, it was benchmarked against the ISIDA Stepwise Regression (SR) tool. SQS was run under various premises, varying the training/validation set splitting scheme, the nonlinearity policy, and the used descriptors. With the considered three anti-HIV compound sets, repeated SQS runs generate sometimes poorly overlapping but nevertheless equally well validating model sets. Enabling SQS to apply nonlinear descriptor transformations increases the problem space: nevertheless, nonlinear models tend to be more robust validators. Model validation benchmarking showed SQS to match the performance of SR or outperform it in cases when the upfront simplifications of SR "backfire", even though the robust SR got trapped in local minima only once in six cases. Consensus models from large SQS model sets validate well--but not outstandingly better than SR consensus equations. SQS is thus a robust QSAR building tool according to standard validation tests against external sets of compounds (of same families as used for training), but many of its benefits/drawbacks may yet not be revealed by such tests. SQS results are a challenge to the traditional way to interpret and exploit QSAR: how to deal with thousands of well validating models, nonetheless providing potentially diverging applicability ranges and predicted values for external compounds. SR does not impose such burden on the user, but is "betting" on a single equation or a narrow consensus model to behave properly in virtual screening a sound strategy? By posing these questions, this article will hopefully act as an incentive for the long-haul studies needed to get them answered.
Collapse
Affiliation(s)
- Dragos Horvath
- UGSF-UMR 8576 CNRS/USTL, Université de Lille 1, Bât C9., 59650 Villeneuve d'Ascq, France.
| | | | | | | | | |
Collapse
|
15
|
Varnek A, Kireeva N, Tetko IV, Baskin II, Solov'ev VP. Exhaustive QSPR Studies of a Large Diverse Set of Ionic Liquids: How Accurately Can We Predict Melting Points? J Chem Inf Model 2007; 47:1111-22. [PMID: 17381081 DOI: 10.1021/ci600493x] [Citation(s) in RCA: 116] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Several popular machine learning methods--Associative Neural Networks (ANN), Support Vector Machines (SVM), k Nearest Neighbors (kNN), modified version of the partial least-squares analysis (PLSM), backpropagation neural network (BPNN), and Multiple Linear Regression Analysis (MLR)--implemented in ISIDA, NASAWIN, and VCCLAB software have been used to perform QSPR modeling of melting point of structurally diverse data set of 717 bromides of nitrogen-containing organic cations (FULL) including 126 pyridinium bromides (PYR), 384 imidazolium and benzoimidazolium bromides (IMZ), and 207 quaternary ammonium bromides (QUAT). Several types of descriptors were tested: E-state indices, counts of atoms determined for E-state atom types, molecular descriptors generated by the DRAGON program, and different types of substructural molecular fragments. Predictive ability of the models was analyzed using a 5-fold external cross-validation procedure in which every compound in the parent set was included in one of five test sets. Among the 16 types of developed structure--melting point models, nonlinear SVM, ASNN, and BPNN techniques demonstrate slightly better performance over other methods. For the full set, the accuracy of predictions does not significantly change as a function of the type of descriptors. For other sets, the performance of descriptors varies as a function of method and data set used. The root-mean squared error (RMSE) of prediction calculated on independent test sets is in the range of 37.5-46.4 degrees C (FULL), 26.2-34.8 degrees C (PYR), 38.8-45.9 degrees C (IMZ), and 34.2-49.3 degrees C (QUAT). The moderate accuracy of predictions can be related to the quality of the experimental data used for obtaining the models as well as to difficulties to take into account the structural features of ionic liquids in the solid state (polymorphic effects, eutectics, glass formation).
Collapse
Affiliation(s)
- Alexandre Varnek
- Laboratoire d'Infochimie, UMR 7551 CNRS, Université Louis Pasteur, 4, rue B. Pascal, Strasbourg 67000, France.
| | | | | | | | | |
Collapse
|
16
|
Goulon A, Picot T, Duprat A, Dreyfus G. Predicting activities without computing descriptors: graph machines for QSAR. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2007; 18:141-53. [PMID: 17365965 DOI: 10.1080/10629360601054313] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
We describe graph machines, an alternative approach to traditional machine-learning-based QSAR, which circumvents the problem of designing, computing and selecting molecular descriptors. In that approach, which is similar in spirit to recursive networks, molecules are considered as structured data, represented as graphs. For each example of the data set, a mathematical function (graph machine) is built, whose structure reflects the structure of the molecule under consideration; it is the combination of identical parameterised functions, called "node functions" (e.g. a feedforward neural network). The parameters of the node functions, shared both within and across the graph machines, are adjusted during training with the "shared weights" technique. Model selection is then performed by traditional cross-validation. Therefore, the designer's main task consists in finding the optimal complexity for the node function. The efficiency of this new approach has been demonstrated in many QSAR or QSPR tasks, as well as in modelling the activities of complex chemicals (e.g. the toxicity of a family of phenols or the anti-HIV activities of HEPT derivatives). It generally outperforms traditional techniques without requiring the selection and computation of descriptors.
Collapse
Affiliation(s)
- A Goulon
- Laboratoire d'Electronique, Ecole Supérieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI-ParisTech), 10 rue Vauquelin, 75005 Paris, France
| | | | | | | |
Collapse
|
17
|
Tetko IV, Solov'ev VP, Antonov AV, Yao X, Doucet JP, Fan B, Hoonakker F, Fourches D, Jost P, Lachiche N, Varnek A. Benchmarking of linear and nonlinear approaches for quantitative structure-property relationship studies of metal complexation with ionophores. J Chem Inf Model 2006; 46:808-19. [PMID: 16563012 DOI: 10.1021/ci0504216] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A benchmark of several popular methods, Associative Neural Networks (ANN), Support Vector Machines (SVM), k Nearest Neighbors (kNN), Maximal Margin Linear Programming (MMLP), Radial Basis Function Neural Network (RBFNN), and Multiple Linear Regression (MLR), is reported for quantitative-structure property relationships (QSPR) of stability constants logK1 for the 1:1 (M:L) and logbeta2 for 1:2 complexes of metal cations Ag+ and Eu3+ with diverse sets of organic molecules in water at 298 K and ionic strength 0.1 M. The methods were tested on three types of descriptors: molecular descriptors including E-state values, counts of atoms determined for E-state atom types, and substructural molecular fragments (SMF). Comparison of the models was performed using a 5-fold external cross-validation procedure. Robust statistical tests (bootstrap and Kolmogorov-Smirnov statistics) were employed to evaluate the significance of calculated models. The Wilcoxon signed-rank test was used to compare the performance of methods. Individual structure-complexation property models obtained with nonlinear methods demonstrated a significantly better performance than the models built using multilinear regression analysis (MLRA). However, the averaging of several MLRA models based on SMF descriptors provided as good of a prediction as the most efficient nonlinear techniques. Support Vector Machines and Associative Neural Networks contributed in the largest number of significant models. Models based on fragments (SMF descriptors and E-state counts) had higher prediction ability than those based on E-state indices. The use of SMF descriptors and E-state counts provided similar results, whereas E-state indices lead to less significant models. The current study illustrates the difficulties of quantitative comparison of different methods: conclusions based only on one data set without appropriate statistical tests could be wrong.
Collapse
Affiliation(s)
- Igor V Tetko
- Institute of Bioorganic & Petrochemistry, Kiev, Ukraine
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Katritzky AR, Dobchev DA, Fara DC, Hür E, Tämm K, Kurunczi L, Karelson M, Varnek A, Solov'ev VP. Skin Permeation Rate as a Function of Chemical Structure. J Med Chem 2006; 49:3305-14. [PMID: 16722649 DOI: 10.1021/jm051031d] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Multilinear and nonlinear QSAR models were built for the skin permeation rate (Log K(p)) of a set of 143 diverse compounds. Satisfactory models were obtained by three approaches applied: (i) CODESSA PRO, (ii) Neural Network modeling using large pools of theoretical molecular descriptors, and (iii) ISIDA modeling based on fragment descriptors. The predictive abilities of the models were assessed by internal and external validations. The descriptors involved in the equations are discussed from the physicochemical point of view to illuminate the factors that influence skin permeation.
Collapse
Affiliation(s)
- Alan R Katritzky
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Solov’ev VP, Kireeva NV, Tsivadze AY, Varnek AA. Structure-property modelling of complex formation of strontium with organic ligands in water. J STRUCT CHEM+ 2006. [DOI: 10.1007/s10947-006-0300-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
20
|
Graph Machines and Their Applications to Computer-Aided Drug Design: A New Approach to Learning from Structured Data. ACTA ACUST UNITED AC 2006. [DOI: 10.1007/11839132_1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
21
|
Varnek A, Fourches D, Solov'ev VP, Baulin VE, Turanov AN, Karandashev VK, Fara D, Katritzky AR. "In silico" design of new uranyl extractants based on phosphoryl-containing podands: QSPR studies, generation and screening of virtual combinatorial library, and experimental tests. ACTA ACUST UNITED AC 2005; 44:1365-82. [PMID: 15272845 DOI: 10.1021/ci049976b] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
This paper is devoted to computer-aided design of new extractants of the uranyl cation involving three main steps: (i) a QSPR study, (ii) generation and screening of a virtual combinatorial library, and (iii) synthesis of several predicted compounds and their experimental extraction studies. First, we performed a QSPR modeling of the distribution coefficient (logD) of uranyl extracted by phosphoryl-containing podands from water to 1,2-dichloroethane. Two different approaches were used: one based on classical structural and physicochemical descriptors (implemented in the CODESSA PRO program) and another one based on fragment descriptors (implemented in the TRAIL program). Three statistically significant models obtained with TRAIL involve as descriptors either sequences of atoms and bonds or atoms with their close environment (augmented atoms). The best models of CODESSA PRO include its own molecular descriptors as well as fragment descriptors obtained with TRAIL. At the second step, a virtual combinatorial library of 2024 podands has been generated with the CombiLib program, followed by the assessment of logD values using developed QSPR models. At the third step, eight of these hypothetical compounds were synthesized and tested experimentally. Comparison with experiment shows that developed QSPR models successfully predict logD values for 7 of 8 compounds from that "blind test" set.
Collapse
Affiliation(s)
- A Varnek
- Laboratoire d'Infochimie, UMR 7551 CNRS, Université Louis Pasteur, 4, rue B. Pascal, Strasbourg 67000, France.
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Katritzky AR, Kuanar M, Fara DC, Karelson M, Acree WE, Solov'ev VP, Varnek A. QSAR modeling of blood:air and tissue:air partition coefficients using theoretical descriptors. Bioorg Med Chem 2005; 13:6450-63. [PMID: 16202613 DOI: 10.1016/j.bmc.2005.06.066] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2005] [Revised: 06/29/2005] [Accepted: 06/30/2005] [Indexed: 11/21/2022]
Abstract
Human blood:air, human and rat tissue (fat, brain, liver, muscle, and kidney):air partition coefficients of a diverse set of organic compounds were correlated and predicted using structural descriptors by employing CODESSA-PRO and ISIDA programs. Four and five descriptor regression models developed using CODESSA-PRO were validated on three different test sets. Overall, these models have reasonable values of correlation coefficients (R(2)) and leave-one-out correlation coefficients (R(cv)(2)): R(2) = 0.881-0.983; R(cv)(2) = 0.826-0.962. Calculations with ISIDA resulted in models based on atom/bond sequences involving two to three atoms with statistical parameters that were similar to those of models obtained with CODESSA-PRO (R(2) = 0.911-0.974; R(cv)(2) = 0.831-0.936). A mixed pool of molecular and fragment descriptors did not lead to significant improvement of the models.
Collapse
Affiliation(s)
- Alan R Katritzky
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, 32611, USA.
| | | | | | | | | | | | | |
Collapse
|
23
|
Varnek A, Fourches D, Hoonakker F, Solov'ev VP. Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures. J Comput Aided Mol Des 2005; 19:693-703. [PMID: 16292611 DOI: 10.1007/s10822-005-9008-0] [Citation(s) in RCA: 136] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2005] [Accepted: 07/28/2005] [Indexed: 10/25/2022]
Abstract
Substructural fragments are proposed as a simple and safe way to encode molecular structures in a matrix containing the occurrence of fragments of a given type. The knowledge retrieved from QSPR modelling can also be stored in that matrix in addition to the information about fragments. Complex supramolecular systems (using special bond types) and chemical reactions (represented as Condensed Graphs of Reactions, CGR) can be treated similarly. The efficiency of fragments as descriptors has been demonstrated in QSPR studies of aqueous solubility for a diverse set of organic compounds as well as in the analysis of thermodynamic parameters for hydrogen-bonding in some supramolecular complexes. It has also been shown that CGR may be an interesting opportunity to perform similarity searches for chemical reactions. The relationship between the density of information in descriptors/knowledge matrices and the robustness of QSPR models is discussed.
Collapse
Affiliation(s)
- A Varnek
- Laboratoire d'Infochimie, UMR 7551 CNRS, Université Louis Pasteur, 4, rue B., 67000, Pascal, Strasbourg, France.
| | | | | | | |
Collapse
|
24
|
Katritzky AR, Fara DC, Yang H, Karelson M, Suzuki T, Solov'ev VP, Varnek A. Quantitative Structure−Property Relationship Modeling of β-Cyclodextrin Complexation Free Energies. ACTA ACUST UNITED AC 2004; 44:529-41. [PMID: 15032533 DOI: 10.1021/ci034190j] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
CODESSA-PRO was used to model binding energies for 1:1 complexation systems between 218 organic guest molecules and beta-cyclodextrin, using a seven-parameter equation with R2 = 0.796 and Rcv2 = 0.779. Fragment-based TRAIL calculations gave a better fit with R2 = 0.943 and Rcv2 = 0.848 for 195 data points in the database. The advantages and disadvantages of each approach are discussed, and it is concluded that a combination of the two approaches has much promise from a practical viewpoint.
Collapse
Affiliation(s)
- Alan R Katritzky
- Center for Heterocyclic Compounds, Department of Chemistry, University of Florida, Gainesville, Florida 32611, USA.
| | | | | | | | | | | | | |
Collapse
|