1
|
Identification of a Family of Glycoside Derivatives Biologically Active against Acinetobacter baumannii and Other MDR Bacteria Using a QSPR Model. Pharmaceuticals (Basel) 2023. [DOI: 10.3390/ph16020250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023] Open
Abstract
As the rate of discovery of new antibacterial compounds for multidrug-resistant bacteria is declining, there is an urge for the search for molecules that could revert this tendency. Acinetobacter baumannii has emerged as a highly virulent Gram-negative bacterium that has acquired multiple resistance mechanisms against antibiotics and is considered of critical priority. In this work, we developed a quantitative structure-property relationship (QSPR) model with 592 compounds for the identification of structural parameters related to their property as antibacterial agents against A. baumannii. QSPR mathematical validation (R2 = 70.27, RN = −0.008, a(R2) = 0.014, and δK = 0.021) and its prediction ability (Q2LMO = 67.89, Q2EXT = 67.75, a(Q2) = −0.068, δQ = 0.0, rm2¯ = 0.229, and Δrm2 = 0.522) were obtained with different statistical parameters; additional validation was done using three sets of external molecules (R2 = 72.89, 71.64 and 71.56). We used the QSPR model to perform a virtual screening on the BIOFACQUIM natural product database. From this screening, our model showed that molecules 32 to 35 and 54 to 68, isolated from different extracts of plants of the Ipomoea sp., are potential antibacterials against A. baumannii. Furthermore, biological assays showed that molecules 56 and 60 to 64 have a wide antibacterial activity against clinically isolated strains of A. baumannii, as well as other multidrug-resistant bacteria, including Staphylococcus aureus, Escherichia coli, Klebsiella pneumonia, and Pseudomonas aeruginosa. Finally, we propose 60 as a potential lead compound due to its broad-spectrum activity and its structural simplicity. Therefore, our QSPR model can be used as a tool for the investigation and search for new antibacterial compounds against A. baumannii.
Collapse
|
2
|
Spiers RC, Kalivas JH. Reliable Model Selection without Reference Values by Utilizing Model Diversity with Prediction Similarity. J Chem Inf Model 2021; 61:2220-2230. [PMID: 33900749 DOI: 10.1021/acs.jcim.0c01493] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Predictive modeling (calibration or training) with various data formats, such as near-infrared (NIR) spectra and quantitative structure-activity relationship (QSAR) data, provides essential information if a proper model is selected. Similarly, with a general model selection approach, spectral model maintenance (updating) from original modeling conditions to new conditions can be performed for dynamic modeling. Fundamental modeling (partial least-squares (PLS) and others) and maintenance processes (domain adaptation or transfer learning and others) require selection of tuning parameter(s) values to isolate models that can accurately predict new samples or molecules, e.g., number of PLS latent variables to predict analyte concentration. Regardless of the modeling task, model selection is complex and without a reliable protocol. Tuning parameter selection typically depends on only one model quality measure assessing model bias using prediction accuracy. Developed in this paper is a generic model selection process using concepts from consensus modeling and QSAR activity landscapes. It is a consensus filtering approach that prioritizes model diversity (MD) while conserving prediction similarity (PS) fused with a common bias-variance trade-off measure. A significant feature of MDPS is that a cross-validation scheme is not needed because models are selected relative to predicting new samples or molecules, i.e., model selection uses unlabeled samples (without reference values) for active predictions. The versatility and reliability of MDPS model selection is shown using four NIR data sets and a QSAR data set. The study also substantiates the Rashomon effect where there is not one best model tuning parameter value that provides accurate predictions.
Collapse
Affiliation(s)
- Robert C Spiers
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| | - John H Kalivas
- Department of Chemistry, Idaho State University, Pocatello, Idaho 83209, United States
| |
Collapse
|
3
|
Ranjan P, Athar M, Jha PC, Krishna KV. Probing the opportunities for designing anthelmintic leads by sub-structural topology-based QSAR modelling. Mol Divers 2018; 22:669-683. [PMID: 29611020 DOI: 10.1007/s11030-018-9825-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 03/16/2018] [Indexed: 12/30/2022]
Abstract
A quantitative structure-activity (QSAR) model has been developed for enriched tubulin inhibitors, which were retrieved from sequence similarity searches and applicability domain analysis. Using partial least square (PLS) method and leave-one-out (LOO) validation approach, the model was generated with the correlation statistics of [Formula: see text] and [Formula: see text] of 0.68 and 0.69, respectively. The present study indicates that topological descriptors, viz. BIC, CH_3_C, IC, JX and Kappa_2 correlate well with biological activity. ADME and toxicity (or ADME/T) assessment showed that out of 260 molecules, 255 molecules successfully passed the ADME/T assessment test, wherein the drug-likeness attributes were exhibited. These results showed that topological indices and the colchicine binding domain directly influence the aetiology of helminthic infections. Further, we anticipate that our model can be applied for guiding and designing potential anthelmintic inhibitors.
Collapse
Affiliation(s)
- Prabodh Ranjan
- CCG@CUG, School of Chemical Sciences, Central University of Gujarat, Sector-30, Gandhinagar, Gujarat, 382030, India
| | - Mohd Athar
- CCG@CUG, School of Chemical Sciences, Central University of Gujarat, Sector-30, Gandhinagar, Gujarat, 382030, India
| | - Prakash Chandra Jha
- CCG@CUG, Centre for Applied Chemistry, Central University of Gujarat, Sector-30, Gandhinagar, Gujarat, 382030, India.
| | - Kari Vijaya Krishna
- Department of Chemistry, School of Advanced Sciences, VIT University, Vellore, Tamil Nadu, 632014, India
| |
Collapse
|
4
|
Kalivas JH, Héberger K, Andries E. Sum of ranking differences (SRD) to ensemble multivariate calibration model merits for tuning parameter selection and comparing calibration methods. Anal Chim Acta 2015; 869:21-33. [DOI: 10.1016/j.aca.2014.12.056] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Revised: 10/16/2014] [Accepted: 12/09/2014] [Indexed: 10/24/2022]
|
5
|
Lin N, Zou Y, Zhang H. Kinetic migration studies of bisphenol-A-related compounds from can coatings into food simulant and oily foods. Eur Food Res Technol 2013. [DOI: 10.1007/s00217-013-2003-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
6
|
Szaleniec M. Prediction of enzyme activity with neural network models based on electronic and geometrical features of substrates. Pharmacol Rep 2012; 64:761-81. [DOI: 10.1016/s1734-1140(12)70873-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Revised: 04/16/2012] [Indexed: 11/26/2022]
|
7
|
Ghosh J, Lewitus DY, Chandra P, Joy A, Bushman J, Knight D, Kohn J. Computational modeling of in vitro biological responses on polymethacrylate surfaces. POLYMER 2011; 52:2650-2660. [PMID: 21779132 PMCID: PMC3138629 DOI: 10.1016/j.polymer.2011.04.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
The objective of this research was to examine the capabilities of QSPR (Quantitative Structure Property Relationship) modeling to predict specific biological responses (fibrinogen adsorption, cell attachment and cell proliferation index) on thin films of different polymethacrylates. Using 33 commercially available monomers it is theoretically possible to construct a library of over 40,000 distinct polymer compositions. A subset of these polymers were synthesized and solvent cast surfaces were prepared in 96 well plates for the measurement of fibrinogen adsorption. NIH 3T3 cell attachment and proliferation index were measured on spin coated thin films of these polymers. Based on the experimental results of these polymers, separate models were built for homo-, co-, and terpolymers in the library with good correlation between experiment and predicted values. The ability to predict biological responses by simple QSPR models for large numbers of polymers has important implications in designing biomaterials for specific biological or medical applications.
Collapse
Affiliation(s)
- Jayeeta Ghosh
- New Jersey Center for Biomaterials, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854-8087, United State
| | - Dan Y Lewitus
- New Jersey Center for Biomaterials, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854-8087, United State
| | - Prafulla Chandra
- New Jersey Center for Biomaterials, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854-8087, United State
| | - Abraham Joy
- New Jersey Center for Biomaterials, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854-8087, United State
| | - Jared Bushman
- New Jersey Center for Biomaterials, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854-8087, United State
| | - Doyle Knight
- Department of Mechanical and Aerospace Engineering, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854-8058, United States
| | - Joachim Kohn
- New Jersey Center for Biomaterials, Rutgers, The State University of New Jersey, New Brunswick, NJ 08854-8087, United State
| |
Collapse
|
8
|
Gunturi SB, Theerthala SS, Patel NK, Bahl J, Narayanan R. Prediction of skin sensitization potential using D-optimal design and GA-kNN classification methods. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2010; 21:305-335. [PMID: 20544553 DOI: 10.1080/10629361003773955] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Modelling of skin sensitization data of 255 diverse compounds and 450 calculated descriptors was performed to develop global predictive classification models that are applicable to whole chemical space. With this aim, we employed two automated procedures, (a) D-optimal design to select optimal members of the training and test sets and (b) k-Nearest Neighbour classification (kNN) method along with Genetic Algorithms (GA-kNN Classification) to select significant and independent descriptors in order to build the models. This methodology helped us to derive multiple models, M1-M5, that are stable and robust. The best among them, model M1 (CCR(train) = 84.3%, CCR(test) = 87.2% and CCR(ext) = 80.4%), is based on six neighbours and nine descriptors and further suggests that: (a) it is stable and robust and performs better than the reported models in literature, and (b) the combination of D-optimal design and GA-kNN classification approach is a very promising approach. Consensus prediction based on the models M1-M5 improved the CCR of training, test and external validation datasets by 3.8%, 4.45% and 3.85%, respectively, over M1. From the analysis of the physical meaning of the selected descriptors, it is inferred that the skin sensitization potential of small organic compounds can be accurately predicted using calculated descriptors that code for the following fundamental properties: (i) lipophilicity, (ii) atomic polarizability, (iii) shape, (iii) electrostatic interactions, and (iv) chemical reactivity.
Collapse
Affiliation(s)
- S B Gunturi
- Innovation Labs Hyderabad, Tata Consultancy Services Limited, #1, Software Units Layout, Madhapur, Hyderabad - 500 081, India
| | | | | | | | | |
Collapse
|
9
|
Fernandez M, Caballero J, Fernandez L, Sarai A. Genetic algorithm optimization in drug design QSAR: Bayesian-regularized genetic neural networks (BRGNN) and genetic algorithm-optimized support vectors machines (GA-SVM). Mol Divers 2010; 15:269-89. [PMID: 20306130 DOI: 10.1007/s11030-010-9234-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Accepted: 01/25/2010] [Indexed: 10/19/2022]
Abstract
Many articles in "in silico" drug design implemented genetic algorithm (GA) for feature selection, model optimization, conformational search, or docking studies. Some of these articles described GA applications to quantitative structure-activity relationships (QSAR) modeling in combination with regression and/or classification techniques. We reviewed the implementation of GA in drug design QSAR and specifically its performance in the optimization of robust mathematical models such as Bayesian-regularized artificial neural networks (BRANNs) and support vector machines (SVMs) on different drug design problems. Modeled data sets encompassed ADMET and solubility properties, cancer target inhibitors, acetylcholinesterase inhibitors, HIV-1 protease inhibitors, ion-channel and calcium entry blockers, and antiprotozoan compounds as well as protein classes, functional, and conformational stability data. The GA-optimized predictors were often more accurate and robust than previous published models on the same data sets and explained more than 65% of data variances in validation experiments. In addition, feature selection over large pools of molecular descriptors provided insights into the structural and atomic properties ruling ligand-target interactions.
Collapse
Affiliation(s)
- Michael Fernandez
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology (KIT), 680-4 Kawazu, Iizuka, 820-8502, Japan.
| | | | | | | |
Collapse
|
10
|
Han L, Wang Y, Bryant SH. Developing and validating predictive decision tree models from mining chemical structural fingerprints and high-throughput screening data in PubChem. BMC Bioinformatics 2008; 9:401. [PMID: 18817552 PMCID: PMC2572623 DOI: 10.1186/1471-2105-9-401] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2008] [Accepted: 09/25/2008] [Indexed: 01/17/2023] Open
Abstract
Background Recent advances in high-throughput screening (HTS) techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HTS assays, it is a very challenging task to mine the HTS data for potential interest in drug development research. Computational approaches for the analysis of HTS results face great challenges due to the large quantity of information and significant amounts of erroneous data produced. Results In this study, Decision Trees (DT) based models were developed to discriminate compound bioactivities by using their chemical structure fingerprints provided in the PubChem system . The DT models were examined for filtering biological activity data contained in four assays deposited in the PubChem Bioassay Database including assays tested for 5HT1a agonists, antagonists, and HIV-1 RT-RNase H inhibitors. The 10-fold Cross Validation (CV) sensitivity, specificity and Matthews Correlation Coefficient (MCC) for the models are 57.2~80.5%, 97.3~99.0%, 0.4~0.5 respectively. A further evaluation was also performed for DT models built for two independent bioassays, where inhibitors for the same HIV RNase target were screened using different compound libraries, this experiment yields enrichment factor of 4.4 and 9.7. Conclusion Our results suggest that the designed DT models can be used as a virtual screening technique as well as a complement to traditional approaches for hits selection.
Collapse
Affiliation(s)
- Lianyi Han
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
11
|
Considerations and recent advances in QSAR models for cytochrome P450-mediated drug metabolism prediction. J Comput Aided Mol Des 2008; 22:843-55. [DOI: 10.1007/s10822-008-9225-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2007] [Accepted: 06/08/2008] [Indexed: 02/07/2023]
|
12
|
Quantitative Series Enrichment Analysis (QSEA): a novel procedure for 3D-QSAR analysis. J Comput Aided Mol Des 2008; 22:541-51. [DOI: 10.1007/s10822-008-9195-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 02/07/2008] [Indexed: 10/22/2022]
|
13
|
Dureja H, Madan AK. Superaugmented eccentric connectivity indices: new-generation highly discriminating topological descriptors for QSAR/QSPR modeling. Med Chem Res 2007. [DOI: 10.1007/s00044-007-9032-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
14
|
Sadat Hayatshahi SH, Abdolmaleki P, Ghiasi M, Safarian S. QSARs and activity predicting models for competitive inhibitors of adenosine deaminase. FEBS Lett 2007; 581:506-14. [PMID: 17250831 DOI: 10.1016/j.febslet.2006.12.050] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2006] [Revised: 12/16/2006] [Accepted: 12/25/2006] [Indexed: 10/23/2022]
Abstract
Combinations of multiple linear regressions, genetic algorithms and artificial neural networks were utilized to develop models for seeking quantitative structure-activity relationships that correlate structural descriptors and inhibition activity of adenosine deaminase competitive inhibitors. Many quantitative descriptors were generated to express the physicochemical properties of 70 compounds with optimized structures in aqueous solution. Multiple linear regressions were used to linearly select different subsets of descriptors and develop linear models for prediction of log(k(i)). The best subset then fed artificial neural networks to develop nonlinear predictors. A committee of six hybrid models - that included genetic algorithm routines together with neural networks - was also utilized to nonlinearly select most efficient subsets of descriptors in a cross-validation procedure for nonlinear log(k(i)) prediction. The best prediction model was found to be an 8-3-1 artificial neural network which was fed by the most frequently selected descriptors among these subsets. This prediction model resulted in train set root mean sum square error (RMSE) of 0.84 log(k(i)) and prediction set RMSE of 0.85 log(k(i)) (both equivalent of 0.10 in normal range of log(k(i))) and correlation coefficient (r(2)) of 0.91.
Collapse
|
15
|
Clare BW, Supuran CT. A perspective on quantitative structure–activity relationships and carbonic anhydrase inhibitors. Expert Opin Drug Metab Toxicol 2006; 2:113-37. [PMID: 16863473 DOI: 10.1517/17425255.2.1.113] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Carbonic anhydrases (CAs, EC 4.2.1.1) are wide-spread enzymes, present in mammals in at least 15 different isoforms. The 12 catalytically active isoforms play important physiological and pathophysiological functions and are strongly inhibited by aromatic/heterocyclic sulfonamides, sulfamides and sulfamates, among others. The catalytic and inhibition mechanisms of these enzymes are understood in great detail, and this greatly helped the design of potent inhibitors, some of which possess important clinical applications as antiglaucoma drugs, or in the management of some neuromuscular disorders. A recent discovery is connected with the involvement of CAs and their sulfonamide inhibitors in cancer: many potent CA inhibitors were shown to inhibit the growth of several tumour cell lines in vitro and in vivo, thus constituting interesting leads for developing novel antitumour therapies. The field of quantitative structure-activity relationship (QSAR), formalised by Hansch and others in the early 1960s, is the discovery of empirical relationships between the chemical structure of drugs and their biological activity. The emphasis is on empirical. Extending a QSAR to drugs other than those used to formulate it is always a new hypothesis, and although these extensions are often successful, it should be no cause for surprise if they break down in particular cases. With CA, as with other targets, the descriptor variables that have been used include topological indices, physical properties such as solvent partition coefficients and Hammett constants from reaction rate studies, and quantum theoretical parameters, such as orbital energies, atomic charges, polarisabilities and recently the orientation of nodes in pi-orbitals. This review deals only with the physical and quantum theoretical descriptors.
Collapse
Affiliation(s)
- Brian W Clare
- University of Western Australia, School of Biomedical and Chemical Science, Crawley, WA 6009, Australia.
| | | |
Collapse
|
16
|
Adams N, Clauss J, Meunier M, Schubert U. Predicting thermochemical parameters of oxygen-containing heterocycles using simple QSPR models. MOLECULAR SIMULATION 2006. [DOI: 10.1080/08927020500474300] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
17
|
Sadat Hayatshahi SH, Abdolmaleki P, Safarian S, Khajeh K. Non-linear quantitative structure–activity relationship for adenine derivatives as competitive inhibitors of adenosine deaminase. Biochem Biophys Res Commun 2005; 338:1137-42. [PMID: 16256072 DOI: 10.1016/j.bbrc.2005.10.049] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2005] [Accepted: 10/11/2005] [Indexed: 11/22/2022]
Abstract
Logistic regression and artificial neural networks have been developed as two non-linear models to establish quantitative structure-activity relationships between structural descriptors and biochemical activity of adenosine based competitive inhibitors, toward adenosine deaminase. The training set included 24 compounds with known k(i) values. The models were trained to solve two-class problems. Unlike the previous work in which multiple linear regression was used, the highest of positive charge on the molecules was recognized to be in close relation with their inhibition activity, while the electric charge on atom N1 of adenosine was found to be a poor descriptor. Consequently, the previously developed equation was improved and the newly formed one could predict the class of 91.66% of compounds correctly. Also optimized 2-3-1 and 3-4-1 neural networks could increase this rate to 95.83%.
Collapse
|
18
|
Milac AL, Avram S, Petrescu AJ. Evaluation of a neural networks QSAR method based on ligand representation using substituent descriptors. Application to HIV-1 protease inhibitors. J Mol Graph Model 2005; 25:37-45. [PMID: 16325439 DOI: 10.1016/j.jmgm.2005.09.014] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2005] [Revised: 06/17/2005] [Accepted: 09/29/2005] [Indexed: 11/18/2022]
Abstract
We present here a neural networks method designed to predict biological activity based on a local representation of the ligand. The compounds of the series are represented by a vector mapping for each of four substituent properties: volume, log P, dipole moment and a simple 'steric' parameter relating to its shape. This ligand representation was tested using neural networks on a set of 42 cyclic-urea derivatives, inhibiting HIV-1 protease. The leave-one-out cross-validation using all descriptors in the input gave a correlation factor between prediction and experiment of 0.76 for the overall set and 0.88 when three outliers were left out. To rank the significance of the four descriptors, we further tested all combinations of two and three parameters for each substituent, using two disjunctive testing sets of five inhibitors. In these sets, vectors with extreme descriptor values were used either in the training or the testing set (sets A and B, respectively). The method is a very good interpolator (set A, 95+/-2% accuracy) but a less effective extrapolator (set B, 85+/-2% accuracy). Generally, the combinations including the 'steric' parameter predict better than average, while those containing the volume are less effective. The best prediction, 98.8+/-1.2%, was obtained when log P, the dipole and the steric parameter were used on set A. At the opposite end, the lowest ranked descriptor set was obtained when replacing log P with the volume, giving 92.3+/-6.7% accuracy over the set A.
Collapse
Affiliation(s)
- Adina-Luminiţa Milac
- Institute of Biochemistry, Splaiul Independenţei 296, Sector 6, Bucharest, Romania
| | | | | |
Collapse
|
19
|
Clare BW, Supuran CT. A physically interpretable quantum-theoretic QSAR for some carbonic anhydrase inhibitors with diverse aromatic rings, obtained by a new QSAR procedure. Bioorg Med Chem 2005; 13:2197-211. [PMID: 15727872 DOI: 10.1016/j.bmc.2004.12.055] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2004] [Revised: 12/27/2004] [Accepted: 12/27/2004] [Indexed: 11/17/2022]
Abstract
A QSAR based almost entirely on quantum theoretically calculated descriptors has been developed for a large and heterogeneous group of aromatic and heteroaromatic carbonic anhydrase inhibitors, using orbital energies, nodal angles, atomic charges, and some other intuitively appealing descriptors. Most calculations have been done at the B3LYP/6-31G* level of theory. For the first time we have treated five-membered rings by the same means that we have used for benzene rings in the past. Our flip regression technique has been expanded to encompass automatic variable selection. The statistical quality of the results, while not equal to those we have had with benzene derivatives, is very good considering the noncongeneric nature of the compounds. The most significant correlation was with charge on the atoms of the sulfonamide group, followed by the nodal orientation and the solvation energy calculated by COSMO and the charge polarization of the molecule calculated as the mean absolute Mulliken charge over all atoms.
Collapse
Affiliation(s)
- Brian W Clare
- School of Biomedical and Chemical Science, The University of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia.
| | | |
Collapse
|
20
|
He L, Jurs PC, Kreatsoulas C, Custer LL, Durham SK, Pearl GM. Probabilistic Neural Network Multiple Classifier System for Predicting the Genotoxicity of Quinolone and Quinoline Derivatives. Chem Res Toxicol 2005; 18:428-40. [PMID: 15777083 DOI: 10.1021/tx049742m] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Quinolone and quinoline are known to be liver carcinogens in rodents, and a number of their derivatives have been shown to exhibit mutagenicity in the Ames test, using Salmonella typhimurium strain TA 100 in the presence of S9. Both the carcinogenicity and the mutagenicity of quinolone and quinoline derivatives, as determined by SAS, can be attributed to their genotoxicity potential. This potential, which is measured by genotoxicity tests, is a good indication of carcinogenicity and mutagenicity because compounds that are positive in these tests have the potential to be human carcinogens and/or mutagens. In this study, a collection of quinolone and quinoline derivatives' carcinogenicity is determined by qualitatively predicting their genotoxicity potential with predictive PNN (probabilistic neural network) classification models. In addition, a multiple classifier system is also developed to improve the predictability of genotoxicity. Superior results are seen with the multiple classifier system over the individual PNN classification models. With the multiple classifier system, 89.4% of the quinolone derivatives were predicted correctly, and higher predictability is seen with the quinoline derivatives at 92.2% correct. The multiple classifier system not only is able to accurately predict the genotoxicity but also provides an insight about the main determinants of genotoxicity of the quinolone and quinoline derivatives. Thus, the PNN multiple classifier system generated in this study is a beneficial contributor toward predictive toxicology in the design of less carcinogenic bioactive compounds.
Collapse
Affiliation(s)
- Linnan He
- Department of Chemistry, Penn State University, University Park, Pennsylvania 16802, USA
| | | | | | | | | | | |
Collapse
|
21
|
|
22
|
Todeschini R, Consonni V, Mauri A, Pavan M. Detecting “bad” regression models: multicriteria fitness functions in regression analysis. Anal Chim Acta 2004. [DOI: 10.1016/j.aca.2003.12.010] [Citation(s) in RCA: 118] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
23
|
Kalivas JH, Forrester JB, Seipel HA. QSAR modeling based on the bias/variance compromise: a harmonious. J Comput Aided Mol Des 2004; 18:537-47. [PMID: 15729853 DOI: 10.1007/s10822-004-4063-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Modeling quantitative structure-activity relationships (QSAR) is considered with an emphasis on prediction. An abundance of methods are available to develop such models. Using a harmonious approach that balances the bias and variance of predictions, the best calibration models are identified relative to the bias and variance criteria used. Criteria utilized to determine the adequacy of models are the root mean square error of calibration (RMSEC) and validation (RMSEV), respective R2 values, and the norm of the regression vector. QSAR data from the literature are used to demonstrate concepts. For these data sets and criteria used, it is suggested that models obtained by ridge regression (RR) are more harmonious and parsimonious than models obtained by partial least squares (PLS) and principal component regression (PCR) when the data is mean-centered. The most harmonious RR models have the best bias/variance tradeoff, reflected by the smallest RMSEC, RMSEV, and regression vector norms and the largest calibration and validation R2 values. The most parsimonious RR models have the smallest effective rank.
Collapse
Affiliation(s)
- John H Kalivas
- Department of Chemistry, Idaho State University, Pocatello, ID 83209, USA.
| | | | | |
Collapse
|
24
|
Weaver DC. Applying data mining techniques to library design, lead generation and lead optimization. Curr Opin Chem Biol 2004; 8:264-70. [PMID: 15183324 DOI: 10.1016/j.cbpa.2004.04.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Many data mining techniques have been applied to activity and ADMET datasets and the resulting models are being used to understand quantitative structure-activity relationships and design new libraries. This review summarizes data mining concepts and discuss their application to library design, lead generation (particularly for sequential screening) and lead optimization (specifically for generating and interpreting QSAR models). Also, this review discusses recent comparative studies between data mining techniques and draws some conclusions about the patterns emerging in the drug discovery data mining field.
Collapse
Affiliation(s)
- Daniel C Weaver
- Array Biopharma, Inc., 3200 Walnut Street, Boulder, Colorado 80303, USA.
| |
Collapse
|
25
|
He L, Jurs PC, Custer LL, Durham SK, Pearl GM. Predicting the Genotoxicity of Polycyclic Aromatic Compounds from Molecular Structure with Different Classifiers. Chem Res Toxicol 2003; 16:1567-80. [PMID: 14680371 DOI: 10.1021/tx030032a] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Classification models were developed to provide accurate prediction of genotoxicity of 277 polycyclic aromatic compounds (PACs) directly from their molecular structures. Numerical descriptors encoding the topological, geometric, electronic, and polar surface area properties of the compounds were calculated to represent the structural information. Each compound's genotoxicity was represented with IMAX (maximal SOS induction factor) values measured by the SOS Chromotest in the presence and absence of S9 rat liver homogenate. The compounds' class identity was determined by a cutoff IMAX value of 1.25-compounds with IMAX > 1.25 in either test were classified as genotoxic, and the ones with IMAX < or = 1.25 were nongenotoxic. Several binary classification models were generated to predict genotoxicity: k-nearest neighbor (k-NN), linear discriminant analysis, and probabilistic neural network. The study showed k-NN to provide the highest predictive ability among the three classifiers with a training set classification rate of 93.5%. A consensus model was also developed that incorporated the three classifiers and correctly predicted 81.2% of the 277 compounds. It also provided a higher prediction rate on the genotoxic class than any other single model.
Collapse
Affiliation(s)
- Linnan He
- Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory, University Park, Pennsylvania 16802, USA
| | | | | | | | | |
Collapse
|
26
|
Tropsha A, Gramatica P, Gombar V. The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models. ACTA ACUST UNITED AC 2003. [DOI: 10.1002/qsar.200390007] [Citation(s) in RCA: 1437] [Impact Index Per Article: 68.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|