1
|
Lambert FN, Vivian DN, Raimondo S, Tebes-Stevens CT, Barron MG. Relationships Between Aquatic Toxicity, Chemical Hydrophobicity, and Mode of Action: Log Kow Revisited. ARCHIVES OF ENVIRONMENTAL CONTAMINATION AND TOXICOLOGY 2022; 83:326-338. [PMID: 35864329 DOI: 10.1007/s00244-022-00944-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/24/2022] [Indexed: 06/15/2023]
Abstract
Relationships between toxicity and chemical hydrophobicity have been known for nearly 100 years in mammals and fish, typically using the log of the octanol:water partition coefficient (Kow). The current study reassessed the influence of mode of action (MOA) on acute aquatic toxicity-log Kow relationships using a comprehensive database of 617 organic chemicals with curated and standardized acute toxicity data that did not exceed solubility limits, their consensus log Kow values, and weight of evidence-based MOA classifications (including 6 broad and 26 specific MOAs). A total of 166 significant (p < 0.05) log Kow-toxicity models were developed across six taxa groups that included QSARs for 5 of the broad and 13 of the specific MOAs. In this study, we demonstrate that QSARs based on MOAs can significantly increase LC50 prediction accuracy for specific acting chemicals. Prediction accuracy increases when QSARs are built based on highly specific MOAs, rather than broad MOA classifications. Additionally, we demonstrate that building QSAR models with chemicals in specific MOA groupings, rather than broader MOA groups leads to significantly better estimates. We also evaluated the differences between models developed from mass-based (µg/L) and mole-based (µmol/L) toxicity data and demonstrate that both are suitable for QSAR development with no clear trend in greater model accuracy. Overall, the results reveal that, despite high variance in all taxa and MOA groups, specific MOA-based models can improve the accuracy of aquatic toxicity predictions over more general groupings.Please check and confirm that the authors and their respective affiliations have been correctly identified and amend if necessary.The affiliations are correct.
Collapse
Affiliation(s)
- Faith N Lambert
- Office of Research and Development, U.S. EPA, U.S. EPA, 1 Sabine Island Drive, Gulf Breeze, FL, USA
- Syngenta, Research Triangle Park, NC, 27709, USA
| | - Deborah N Vivian
- Office of Research and Development, U.S. EPA, U.S. EPA, 1 Sabine Island Drive, Gulf Breeze, FL, USA
| | - Sandy Raimondo
- Office of Research and Development, U.S. EPA, U.S. EPA, 1 Sabine Island Drive, Gulf Breeze, FL, USA
| | | | - Mace G Barron
- Office of Research and Development, U.S. EPA, U.S. EPA, 1 Sabine Island Drive, Gulf Breeze, FL, USA.
| |
Collapse
|
2
|
Király P, Kiss R, Kovács D, Ballaj A, Tóth G. The Relevance of Goodness-of-fit, Robustness and Prediction Validation Categories of OECD-QSAR Principles with Respect to Sample Size and Model Type. Mol Inform 2022; 41:e2200072. [PMID: 35773201 PMCID: PMC9787734 DOI: 10.1002/minf.202200072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 06/30/2022] [Indexed: 12/30/2022]
Abstract
We investigated the relevance of the validation principles on the Quantitative Structure Activity Relationship models issued by Organization for Economic and Co-operation and Development. We checked the goodness-of-fit, robustness and predictivity categories in linear and nonlinear models using benchmark datasets. Most of our conclusions are drawn using the sample size dependence of the different validation parameters. We found that the goodness-of-fit parameters misleadingly overestimate the models on small samples. In the case of neural network and support vector models, the feasibility of the goodness-of-fit parameters often might be questioned. We propose to use the simplest y-scrambling method to estimate chance correlation. We found that the leave-one-out and leave-many-out cross-validation parameters can be rescaled to each other in all models and the computationally feasible method should be chosen depending on the model type. We assessed the interdependence of the validation parameters by calculating their rank correlations. Goodness of fit and robustness correlate quite well over a sample size for linear models and one of the approaches might be redundant. In the rank correlation between internal and external validation parameters, we found that the assignment of good and bad modellable data to the training or the test causes negative correlations.
Collapse
Affiliation(s)
- Péter Király
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| | - Ramóna Kiss
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| | - Dániel Kovács
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| | - Amine Ballaj
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| | - Gergely Tóth
- Institute of ChemistryLoránd Eötvös UniversityPázmány S.1/A1117BudapestHungary
| |
Collapse
|
3
|
Kovács D, Király P, Tóth G. Sample-size dependence of validation parameters in linear regression models and in QSAR. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:247-268. [PMID: 33749419 DOI: 10.1080/1062936x.2021.1890208] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Accepted: 02/10/2021] [Indexed: 06/12/2023]
Abstract
The dependence of statistical validation parameters was investigated on the size of the sample taken in fit of multivariate linear curves. We observed that R2 and related internal parameters were misleading as they overestimated the goodness-of-fit of models at small sample size. Cross-validation metrics showed correct trends. It was possible to scale the leave-one-out and the leave-many-out results close to identical by correcting the degrees of freedom of the models. y and x-randomized validation parameters were calculated and the methods provided close to identical results. We suggest to use the simplest methods in both cases. The external parameters followed correct trends with respect to the sample size, but their sensitivity differed. We plotted the Roy-Ojha metrics in 2D and we coloured them with respect to other external parameters to provide an easy classification of models. The rank correlations were calculated between the performance parameters. Up to a sample size, goodness-of-fit and robustness were distinguishable, but above a certain sample size, the parameters were redundant. The external-internal pairs were weakly correlated. Our data show that all the three aspects of validation are necessary at small sample sizes, but the internal check of robustness is not informative above a given sample size.
Collapse
Affiliation(s)
- D Kovács
- Institute of Chemistry, Loránd Eötvös University, Budapest, Hungary
| | - P Király
- Institute of Chemistry, Loránd Eötvös University, Budapest, Hungary
| | - G Tóth
- Institute of Chemistry, Loránd Eötvös University, Budapest, Hungary
| |
Collapse
|
4
|
Sarfraz M, Rauf A, Keller P, Qureshi AM. N, N′-dialkyl-2-thiobarbituric acid based sulfonamides as potential SARS-CoV-2 main protease inhibitors. CAN J CHEM 2021. [DOI: 10.1139/cjc-2020-0332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
An efficient methodology was developed to generate novel N,N′-dialkyl-2-thiobarbituric acid based sulfonamides S1–S4 in good to excellent yields (84%–95%). The synthesized compounds S1–S4 were docked to screen their in silico activities against two enzymes i.e., SARS-CoV-2 main protease enzyme with unliganded active site (2019-nCoV, coronavirus disease 2019, COVID-19) PDB ID: 6Y84 and SARS-CoV-2 Mpro PDB ID: 6LU7. Furthermore, some in silico physicochemical and physicokinetic properties were evaluated using the OSIRIS Property Explorer, Molinspiration property calculator, ADMET property calculator, and GUSAR to assess these compounds as potential candidates as lead compounds for the quest of SARS-CoV-2 main protease inhibitors. Molecular docking analyses of the synthesized compounds predicted that compound S3 is more potent as SARS-CoV-2 main protease inhibitor with binding energy –11.65 kcal/mol in comparison with reference inhibitor N3 (–10.95 kcal/mol), whereas compounds S1, S2, and S4 recorded comparable binding energies –9.89, –10.84, and –10.94 kcal/mol with reference inhibitor N3, which were much better than remdesivir (–9.85 kcal/mol). In case of SARS-CoV-2 Mpro, all compounds S1–S4 with docking energy values of –7.28, –8.38, –8.31, and –7.34 kcal/mol, respectively, were found to be potent in comparison with reference inhibitor N3 (–6.31 kcal/mol) and remdesivir (–6.33 kcal/mol). Ligand efficiency values against the target SARS-CoV-2 proteins, as well as α-glucosidase and DNA-(apurinic or apyrimidinic site) lyase inhibition results of these newly synthesized compounds, were also found to be promising.
Collapse
Affiliation(s)
- Muhammad Sarfraz
- Department of Chemistry, The Islamia University of Bahawalpur, 63100, Pakistan
| | - Abdul Rauf
- Department of Chemistry, The Islamia University of Bahawalpur, 63100, Pakistan
| | - Paul Keller
- School of Chemistry and Molecular Bioscience, Molecular Horizons, Illawarra health and Medical Research Institute, University of Wollongong, 2522, Australia
| | | |
Collapse
|
5
|
Yu X. Prediction of chemical toxicity to Tetrahymena pyriformis with four-descriptor models. ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY 2020; 190:110146. [PMID: 31923753 DOI: 10.1016/j.ecoenv.2019.110146] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2019] [Revised: 12/27/2019] [Accepted: 12/28/2019] [Indexed: 06/10/2023]
Abstract
A quantitative structure-toxicity relationship (QSTR) model based on four descriptors was successfully developed for 1163 chemical toxicants against Tetrahymena pyriformis by applying general regression neural network (GRNN). The training set consisting of 600 organic compounds was used to train GRNN models that were evaluated with the test set of 563 compounds. For the optimal GRNN model, the training set possesses the coefficient of determination R2 of 0.86 and root mean square (rms) error of 0.41, and the test set has R2 of 0.80 and rms of 0.41. Investigated results indicate that the optimal GRNN model is accurate, although the GRNN model has only four descriptor and more samples in the test set.
Collapse
Affiliation(s)
- Xinliang Yu
- Hunan Provincial Key Laboratory of Environmental Catalysis & Waste Regeneration, College of Chemistry and Chemical Engineering, Hunan Institute of Engineering, Xiangtan, Hunan, 411104, China.
| |
Collapse
|
6
|
Grenet I, Yin Y, Comet JP. G-Networks to Predict the Outcome of Sensing of Toxicity. SENSORS 2018; 18:s18103483. [PMID: 30332807 PMCID: PMC6210391 DOI: 10.3390/s18103483] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Revised: 10/05/2018] [Accepted: 10/12/2018] [Indexed: 01/09/2023]
Abstract
G-Networks and their simplified version known as the Random Neural Network have often been used to classify data. In this paper, we present a use of the Random Neural Network to the early detection of potential of toxicity chemical compounds through the prediction of their bioactivity from the compounds' physico-chemical structure, and propose that it be automated using machine learning (ML) techniques. Specifically the Random Neural Network is shown to be an effective analytical tool to this effect, and the approach is illustrated and compared with several ML techniques.
Collapse
Affiliation(s)
- Ingrid Grenet
- University Côte d'Azur, I3S laboratory, UMR CNRS 7271, CS 40121, 06903 Sophia Antipolis CEDEX, France.
| | - Yonghua Yin
- Intelligent Systems and Networks Group, Department of Electrical and Electronic Engineering, Imperial College, London SW7 2AZ, UK.
| | - Jean-Paul Comet
- University Côte d'Azur, I3S laboratory, UMR CNRS 7271, CS 40121, 06903 Sophia Antipolis CEDEX, France.
| |
Collapse
|
7
|
A Round Trip from Medicinal Chemistry to Predictive Toxicology. Methods Mol Biol 2017. [PMID: 27311477 DOI: 10.1007/978-1-4939-3609-0_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Predictive toxicology is a new emerging multifaceted research field aimed at protecting human health and environment from risks posed by chemicals. Such issue is of extreme public relevance and requires a multidisciplinary approach where the experience in medicinal chemistry is of utmost importance. Herein, we will survey some basic recommendations to gather good data and then will review three recent case studies to show how strategies of ligand- and structure-based molecular design, widely applied in medicinal chemistry, can be adapted to meet the more restrictive scientific and regulatory goals of predictive toxicology. In particular, we will report: Docking-based classification models to predict the estrogenic potentials of chemicals. Predicting the bioconcentration factor using biokinetics descriptors. Modeling oral sub-chronic toxicity using a customized k-nearest neighbors (k-NN) approach.
Collapse
|
8
|
Mathieu D. Physics-Based Modeling of Chemical Hazards in a Regulatory Framework: Comparison with Quantitative Structure–Property Relationship (QSPR) Methods for Impact Sensitivities. Ind Eng Chem Res 2016. [DOI: 10.1021/acs.iecr.6b01536] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
9
|
Wilson RE, Groskreutz SR, Weber SG. Improving the Sensitivity, Resolution, and Peak Capacity of Gradient Elution in Capillary Liquid Chromatography with Large-Volume Injections by Using Temperature-Assisted On-Column Solute Focusing. Anal Chem 2016; 88:5112-21. [PMID: 27033165 PMCID: PMC4940048 DOI: 10.1021/acs.analchem.5b04793] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Capillary HPLC (cLC) with gradient elution is the separation method of choice for the fields of proteomics and metabolomics. This is due to the complementary nature of cLC flow rates and electrospray or nanospray ionization mass spectrometry (ESI-MS). The small column diameters result in good mass sensitivity. Good concentration sensitivity is also possible by injection of relatively large volumes of solution and relying on solvent-based solute focusing. However, if the injection volume is too large or solutes are poorly retained during injection, volume overload occurs which leads to altered peak shapes, decreased sensitivity, and lower peak capacity. Solutes that elute early even with the use of a solvent gradient are especially vulnerable to this problem. In this paper, we describe a simple, automated instrumental method, temperature-assisted on-column solute focusing (TASF), that is capable of focusing large volume injections of small molecules and peptides under gradient conditions. By injecting a large sample volume while cooling a short segment of the column inlet at subambient temperatures, solutes are concentrated into narrow bands at the head of the column. Rapidly raising the temperature of this segment of the column leads to separations with less peak broadening in comparison to solvent focusing alone. For large volume injections of both mixtures of small molecules and a bovine serum albumin tryptic digest, TASF improved the peak shape and resolution in chromatograms. TASF showed the most dramatic improvements with shallow gradients, which is particularly useful for biological applications. Results demonstrate the ability of TASF with gradient elution to improve the sensitivity, resolution, and peak capacity of volume overloaded samples beyond gradient compression alone. Additionally, we have developed and validated a double extrapolation method for predicting retention factors at extremes of temperature and mobile phase composition. Using this method, the effects of TASF can be predicted, allowing determination of the usefulness of this technique for a particular application.
Collapse
Affiliation(s)
- Rachael E. Wilson
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Stephen R. Groskreutz
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - Stephen G. Weber
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
10
|
Toropova AP, Schultz TW, Toropov AA. Building up a QSAR model for toxicity toward Tetrahymena pyriformis by the Monte Carlo method: A case of benzene derivatives. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2016; 42:135-145. [PMID: 26851376 DOI: 10.1016/j.etap.2016.01.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Revised: 01/12/2016] [Accepted: 01/14/2016] [Indexed: 06/05/2023]
Abstract
Data on toxicity toward Tetrahymena pyriformis is indicator of applicability of a substance in ecologic and pharmaceutical aspects. Quantitative structure-activity relationships (QSARs) between the molecular structure of benzene derivatives and toxicity toward T. pyriformis (expressed as the negative logarithms of the population growth inhibition dose, mmol/L) are established. The available data were randomly distributed three times into the visible training and calibration sets, and invisible validation sets. The statistical characteristics for the validation set are the following: r(2)=0.8179 and s=0.338 (first distribution); r(2)=0.8682 and s=0.341 (second distribution); r(2)=0.8435 and s=0.323 (third distribution). These models are built up using only information on the molecular structure: no data on physicochemical parameters, 3D features of the molecular structure and quantum mechanics descriptors are involved in the modeling process.
Collapse
Affiliation(s)
- Alla P Toropova
- IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, Milano, Italy.
| | - Terry W Schultz
- College of Veterinary Medicine, The University of Tennessee, 2407 River Drive, Knoxville, TN 37996-4543, United States
| | - Andrey A Toropov
- IRCCS-Istituto di Ricerche Farmacologiche Mario Negri, Via La Masa 19, Milano, Italy
| |
Collapse
|
11
|
Raies AB, Bajic VB. In silico toxicology: computational methods for the prediction of chemical toxicity. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2016; 6:147-172. [PMID: 27066112 PMCID: PMC4785608 DOI: 10.1002/wcms.1240] [Citation(s) in RCA: 329] [Impact Index Per Article: 41.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2015] [Revised: 10/27/2015] [Accepted: 11/10/2015] [Indexed: 01/08/2023]
Abstract
Determining the toxicity of chemicals is necessary to identify their harmful effects on humans, animals, plants, or the environment. It is also one of the main steps in drug design. Animal models have been used for a long time for toxicity testing. However, in vivo animal tests are constrained by time, ethical considerations, and financial burden. Therefore, computational methods for estimating the toxicity of chemicals are considered useful. In silico toxicology is one type of toxicity assessment that uses computational methods to analyze, simulate, visualize, or predict the toxicity of chemicals. In silico toxicology aims to complement existing toxicity tests to predict toxicity, prioritize chemicals, guide toxicity tests, and minimize late-stage failures in drugs design. There are various methods for generating models to predict toxicity endpoints. We provide a comprehensive overview, explain, and compare the strengths and weaknesses of the existing modeling methods and algorithms for toxicity prediction with a particular (but not exclusive) emphasis on computational tools that can implement these methods and refer to expert systems that deploy the prediction models. Finally, we briefly review a number of new research directions in in silico toxicology and provide recommendations for designing in silico models. WIREs Comput Mol Sci 2016, 6:147-172. doi: 10.1002/wcms.1240 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Arwa B Raies
- King Abdullah University of Science and Technology (KAUST) Computational Bioscience Research Centre (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE) Thuwal Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST) Computational Bioscience Research Centre (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE) Thuwal Saudi Arabia
| |
Collapse
|
12
|
OMATA S, KANEKO H, FUNATSU K. Prediction of Membrane Resistance inNewly Constructed Membrane Bioreactor. JOURNAL OF COMPUTER CHEMISTRY-JAPAN 2016. [DOI: 10.2477/jccj.2016-0008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Shingo OMATA
- The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Hiromasa KANEKO
- The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656, Japan
| | - Kimito FUNATSU
- The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-8656, Japan
| |
Collapse
|
13
|
How to rank and discriminate artificial neural networks? Case study: prediction of anticancer activity of 17-picolyl and 17-picolinylidene androstane derivatives. JOURNAL OF THE IRANIAN CHEMICAL SOCIETY 2015. [DOI: 10.1007/s13738-015-0759-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
14
|
Yan J, Zhu WW, Kong B, Lu HB, Yun YH, Huang JH, Liang YZ. A Combinational Strategy of Model Disturbance and Outlier Comparison to Define Applicability Domain in Quantitative Structural Activity Relationship. Mol Inform 2014; 33:503-13. [PMID: 27486037 DOI: 10.1002/minf.201300161] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 04/16/2014] [Indexed: 01/21/2023]
Abstract
In order to define an applicability domain for quantitative structure-activity relationship modeling, a combinational strategy of model disturbance and outlier comparison is developed. An indicator named model disturbance index was defined to estimate the prediction error. Moreover, the information of the outliers in the training set was used to filter the unreliable samples in the test set based on "structural similarity". Chromatography retention indices data were used to investigate this approach. The relationship between model disturbance index and prediction error can be found. Also, the comparison between the outlier set and the test set could provide additional information about which unknown samples should be paid more attentions. A novel technique based on model population analysis was used to evaluate the validity of applicability domain. Finally, three commonly used methods, i.e. Leverage, descriptor range-based and model perturbation method, were compared with the proposed approach.
Collapse
Affiliation(s)
- Jun Yan
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Wei-Wei Zhu
- Department of Chemical and Bioscience, HeChi University, YiZhou 546300, P. R. China
| | - Bo Kong
- Technology Center of China Tobacco Hunan Industrial Co., LTD, Changsha 410014, P. R. China
| | - Hong-Bing Lu
- Technology Center of China Tobacco Hunan Industrial Co., LTD, Changsha 410014, P. R. China
| | - Yong-Huan Yun
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Jian-Hua Huang
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831
| | - Yi-Zeng Liang
- Research Center of Modernization of Traditional Chinese Medicine, Central South University, Changsha 410083, P. R. China tel: +86 731 8830831; fax: +86 731 8830831.
| |
Collapse
|
15
|
Ovchinnikova SI, Bykov AA, Tsivadze AY, Dyachkov EP, Kireeva NV. Supervised extensions of chemography approaches: case studies of chemical liabilities assessment. J Cheminform 2014; 6:20. [PMID: 24868246 PMCID: PMC4018504 DOI: 10.1186/1758-2946-6-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 04/28/2014] [Indexed: 12/04/2022] Open
Abstract
Chemical liabilities, such as adverse effects and toxicity, play a significant role in modern drug discovery process. In silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Herein, we propose an approach combining several classification and chemography methods to be able to predict chemical liabilities and to interpret obtained results in the context of impact of structural changes of compounds on their pharmacological profile. To our knowledge for the first time, the supervised extension of Generative Topographic Mapping is proposed as an effective new chemography method. New approach for mapping new data using supervised Isomap without re-building models from the scratch has been proposed. Two approaches for estimation of model's applicability domain are used in our study to our knowledge for the first time in chemoinformatics. The structural alerts responsible for the negative characteristics of pharmacological profile of chemical compounds has been found as a result of model interpretation.
Collapse
Affiliation(s)
- Svetlana I Ovchinnikova
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Arseniy A Bykov
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Aslan Yu Tsivadze
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
| | - Evgeny P Dyachkov
- Kurnakov Institute of General and Inorganic Chemistry RAS, Leninsky pr-t 31, 119071 Moscow, Russia
| | - Natalia V Kireeva
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| |
Collapse
|
16
|
Singh KP, Gupta S. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches. Toxicol Appl Pharmacol 2014; 275:198-212. [PMID: 24463095 DOI: 10.1016/j.taap.2014.01.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Revised: 01/04/2014] [Accepted: 01/13/2014] [Indexed: 02/03/2023]
Abstract
Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure-toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R(2)) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R(2) and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals.
Collapse
Affiliation(s)
- Kunwar P Singh
- Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India.
| | - Shikha Gupta
- Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India
| |
Collapse
|
17
|
Sahlin U, Jeliazkova N, Öberg T. Applicability Domain Dependent Predictive Uncertainty in QSAR Regressions. Mol Inform 2013; 33:26-35. [DOI: 10.1002/minf.201200131] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2012] [Accepted: 08/10/2013] [Indexed: 11/09/2022]
|
18
|
Fourches D, Tropsha A. Using Graph Indices for the Analysis and Comparison of Chemical Datasets. Mol Inform 2013; 32:827-42. [PMID: 27480235 DOI: 10.1002/minf.201300076] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2013] [Accepted: 08/05/2013] [Indexed: 12/13/2022]
Abstract
In cheminformatics, compounds are represented as points in multidimensional space of chemical descriptors. When all pairs of points found within certain distance threshold in the original high dimensional chemistry space are connected by distance-labeled edges, the resulting data structure can be defined as Dataset Graph (DG). We show that, similarly to the conventional description of organic molecules, many graph indices can be computed for DGs as well. We demonstrate that chemical datasets can be effectively characterized and compared by computing simple graph indices such as the average vertex degree or Randic connectivity index. This approach is used to characterize and quantify the similarity between different datasets or subsets of the same dataset (e.g., training, test, and external validation sets used in QSAR modeling). The freely available ADDAGRA program has been implemented to build and visualize DGs. The approach proposed and discussed in this report could be further explored and utilized for different cheminformatics applications such as dataset diversification by acquiring external compounds, dataset processing prior to QSAR modeling, or (dis)similarity modeling of multiple datasets studied in chemical genomics applications.
Collapse
Affiliation(s)
- Denis Fourches
- Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill NC 27599, USA
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill NC 27599, USA.
| |
Collapse
|
19
|
Putz MV, Dudaş NA. Determining chemical reactivity driving biological activity from SMILES transformations: the bonding mechanism of anti-HIV pyrimidines. Molecules 2013; 18:9061-116. [PMID: 23903183 PMCID: PMC6270382 DOI: 10.3390/molecules18089061] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2013] [Revised: 07/22/2013] [Accepted: 07/24/2013] [Indexed: 02/08/2023] Open
Abstract
Assessing the molecular mechanism of a chemical-biological interaction and bonding stands as the ultimate goal of any modern quantitative structure-activity relationship (QSAR) study. To this end the present work employs the main chemical reactivity structural descriptors (electronegativity, chemical hardness, chemical power, electrophilicity) to unfold the variational QSAR though their min-max correspondence principles as applied to the Simplified Molecular Input Line Entry System (SMILES) transformation of selected uracil derivatives with anti-HIV potential with the aim of establishing the main stages whereby the given compounds may inhibit HIV infection. The bonding can be completely described by explicitly considering by means of basic indices and chemical reactivity principles two forms of SMILES structures of the pyrimidines, the Longest SMILES Molecular Chain (LoSMoC) and the Branching SMILES (BraS), respectively, as the effective forms involved in the anti-HIV activity mechanism and according to the present work, also necessary intermediates in molecular pathways targeting/docking biological sites of interest.
Collapse
Affiliation(s)
- Mihai V Putz
- Laboratory of Computational and Structural Physical Chemistry for Nanosciences and QSAR, Biology-Chemistry Department, West University of Timişoara, Pestalozzi Str. No. 16, Timişoara 300115, Romania.
| | | |
Collapse
|
20
|
Ruusmann V, Maran U. From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions. J Comput Aided Mol Des 2013; 27:583-603. [PMID: 23884706 DOI: 10.1007/s10822-013-9664-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2013] [Accepted: 07/02/2013] [Indexed: 01/23/2023]
Abstract
The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel "timeline" approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.
Collapse
Affiliation(s)
- Villu Ruusmann
- Institute of Chemistry, University of Tartu, Ravila 14A, Tartu, Estonia
| | | |
Collapse
|
21
|
Péry ARR, Schüürmann G, Ciffroy P, Faust M, Backhaus T, Aicher L, Mombelli E, Tebby C, Cronin MTD, Tissot S, Andres S, Brignon JM, Frewer L, Georgiou S, Mattas K, Vergnaud JC, Peijnenburg W, Capri E, Marchis A, Wilks MF. Perspectives for integrating human and environmental risk assessment and synergies with socio-economic analysis. THE SCIENCE OF THE TOTAL ENVIRONMENT 2013; 456-457:307-316. [PMID: 23624004 DOI: 10.1016/j.scitotenv.2013.03.099] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 03/29/2013] [Accepted: 03/29/2013] [Indexed: 06/02/2023]
Abstract
For more than a decade, the integration of human and environmental risk assessment (RA) has become an attractive vision. At the same time, existing European regulations of chemical substances such as REACH (EC Regulation No. 1907/2006), the Plant Protection Products Regulation (EC regulation 1107/2009) and Biocide Regulation (EC Regulation 528/2012) continue to ask for sector-specific RAs, each of which have their individual information requirements regarding exposure and hazard data, and also use different methodologies for the ultimate risk quantification. In response to this difference between the vision for integration and the current scientific and regulatory practice, the present paper outlines five medium-term opportunities for integrating human and environmental RA, followed by detailed discussions of the associated major components and their state of the art. Current hazard assessment approaches are analyzed in terms of data availability and quality, and covering non-test tools, the integrated testing strategy (ITS) approach, the adverse outcome pathway (AOP) concept, methods for assessing uncertainty, and the issue of explicitly treating mixture toxicity. With respect to exposure, opportunities for integrating exposure assessment are discussed, taking into account the uncertainty, standardization and validation of exposure modeling as well as the availability of exposure data. A further focus is on ways to complement RA by a socio-economic assessment (SEA) in order to better inform about risk management options. In this way, the present analysis, developed as part of the EU FP7 project HEROIC, may contribute to paving the way for integrating, where useful and possible, human and environmental RA in a manner suitable for its coupling with SEA.
Collapse
Affiliation(s)
- A R R Péry
- INERIS, Parc Alata, BP2, 60550 Verneuil-en-Halatte, France.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Brandmaier S, Novotarskyi S, Sushko I, Tetko IV. From descriptors to predicted properties: experimental design by using applicability domain estimation. Altern Lab Anim 2013; 41:33-47. [PMID: 23614543 DOI: 10.1177/026119291304100106] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The importance of reliable methods for representative sub-sampling in terms of experimental design and risk assessment within the European Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) system is crucial. We developed experimental design approaches, by utilising predicted properties and the 'distance to model' parameter, to estimate the benefits of certain compounds to the quality of a resulting model. A statistical evaluation of four regression data sets and one classification data set showed that the adaptive concept of iteratively refining the representation of the chemical space contributes to a more efficient and more reliable selection in comparison to traditional approaches. The evaluation of compounds with regard to the uncertainty and the correlation of prediction is beneficial, and in particular, for regression data sets of sufficient size, whereas the use of predicted properties to define the chemical space is beneficial for classification models.
Collapse
Affiliation(s)
- Stefan Brandmaier
- Helmholtz-Zentrum München - German Research Centre for Environmental Health (GmbH), Institute of Structural Biology, Munich, Germany.
| | | | | | | |
Collapse
|
23
|
Wood DJ, Carlsson L, Eklund M, Norinder U, Stålring J. QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality. J Comput Aided Mol Des 2013; 27:203-19. [PMID: 23504478 PMCID: PMC3639359 DOI: 10.1007/s10822-013-9639-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2012] [Accepted: 03/05/2013] [Indexed: 11/29/2022]
Abstract
We propose that quantitative structure–activity relationship (QSAR) predictions should be explicitly represented as predictive (probability) distributions. If both predictions and experimental measurements are treated as probability distributions, the quality of a set of predictive distributions output by a model can be assessed with Kullback–Leibler (KL) divergence: a widely used information theoretic measure of the distance between two probability distributions. We have assessed a range of different machine learning algorithms and error estimation methods for producing predictive distributions with an analysis against three of AstraZeneca’s global DMPK datasets. Using the KL-divergence framework, we have identified a few combinations of algorithms that produce accurate and valid compound-specific predictive distributions. These methods use reliability indices to assign predictive distributions to the predictions output by QSAR models so that reliable predictions have tight distributions and vice versa. Finally we show how valid predictive distributions can be used to estimate the probability that a test compound has properties that hit single- or multi- objective target profiles.
Collapse
|
24
|
Abstract
Understanding structure-activity relationships (SARs) for a given set of molecules allows one to rationally explore chemical space and develop a chemical series optimizing multiple physicochemical and biological properties simultaneously, for instance, improving potency, reducing toxicity, and ensuring sufficient bioavailability. In silico methods allow rapid and efficient characterization of SARs and facilitate building a variety of models to capture and encode one or more SARs, which can then be used to predict activities for new molecules. By coupling these methods with in silico modifications of structures, one can easily prioritize large screening decks or even generate new compounds de novo and ascertain whether they belong to the SAR being studied. Computational methods can provide a guide for the experienced user by integrating and summarizing large amounts of preexisting data to suggest useful structural modifications. This chapter highlights the different types of SAR modeling methods and how they support the task of exploring chemical space to elucidate and optimize SARs in a drug discovery setting. In addition to considering modeling algorithms, I briefly discuss how to use databases as a source of SAR data to inform and enhance the exploration of SAR trends. I also review common modeling techniques that are used to encode SARs, recent work in the area of structure-activity landscapes, the role of SAR databases, and alternative approaches to exploring SAR data that do not involve explicit model development.
Collapse
Affiliation(s)
- Rajarshi Guha
- NIH Center for Advancing Translational Science, Rockville, MD, USA
| |
Collapse
|
25
|
Tebby C, Mombelli E. A Kernel-Based Method for Assessing Uncertainty on Individual QSAR Predictions. Mol Inform 2012; 31:741-51. [DOI: 10.1002/minf.201200053] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2012] [Accepted: 08/08/2012] [Indexed: 11/08/2022]
|
26
|
Prediction of acute mammalian toxicity using QSAR methods: a case study of sulfur mustard and its breakdown products. Molecules 2012; 17:8982-9001. [PMID: 22842643 PMCID: PMC6269063 DOI: 10.3390/molecules17088982] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Revised: 07/19/2012] [Accepted: 07/23/2012] [Indexed: 11/17/2022] Open
Abstract
Predicting toxicity quantitatively, using Quantitative Structure Activity Relationships (QSAR), has matured over recent years to the point that the predictions can be used to help identify missing comparison values in a substance's database. In this manuscript we investigate using the lethal dose that kills fifty percent of a test population (LD₅₀) for determining relative toxicity of a number of substances. In general, the smaller the LD₅₀ value, the more toxic the chemical, and the larger the LD₅₀ value, the lower the toxicity. When systemic toxicity and other specific toxicity data are unavailable for the chemical(s) of interest, during emergency responses, LD₅₀ values may be employed to determine the relative toxicity of a series of chemicals. In the present study, a group of chemical warfare agents and their breakdown products have been evaluated using four available rat oral QSAR LD₅₀ models. The QSAR analysis shows that the breakdown products of Sulfur Mustard (HD) are predicted to be less toxic than the parent compound as well as other known breakdown products that have known toxicities. The QSAR estimated break down products LD₅₀ values ranged from 299 mg/kg to 5,764 mg/kg. This evaluation allows for the ranking and toxicity estimation of compounds for which little toxicity information existed; thus leading to better risk decision making in the field.
Collapse
|
27
|
Su BH, Tu YS, Esposito EX, Tseng YJ. Predictive Toxicology Modeling: Protocols for Exploring hERG Classification and Tetrahymena pyriformis End Point Predictions. J Chem Inf Model 2012; 52:1660-73. [DOI: 10.1021/ci300060b] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Bo-Han Su
- Department
of Computer Science and Information Engineering, National Taiwan University, No.1 Sec.4, Roosevelt Road,
Taipei, Taiwan 106
| | - Yi-shu Tu
- Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No.1 Sec.4,
Roosevelt Road, Taipei, Taiwan 106
| | | | - Yufeng J. Tseng
- Department
of Computer Science and Information Engineering, National Taiwan University, No.1 Sec.4, Roosevelt Road,
Taipei, Taiwan 106
- Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No.1 Sec.4,
Roosevelt Road, Taipei, Taiwan 106
| |
Collapse
|
28
|
Kar S, Roy K. First report on development of quantitative interspecies structure-carcinogenicity relationship models and exploring discriminatory features for rodent carcinogenicity of diverse organic chemicals using OECD guidelines. CHEMOSPHERE 2012; 87:339-355. [PMID: 22225702 DOI: 10.1016/j.chemosphere.2011.12.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/25/2011] [Revised: 12/08/2011] [Accepted: 12/08/2011] [Indexed: 05/31/2023]
Abstract
Different regulatory agencies in food and drug administration and environmental protection worldwide are employing quantitative structure-activity relationship (QSAR) models to fill the data gaps related with properties of chemicals affecting the environment and human health. Carcinogenicity is a toxicity endpoint of major concern in recent times. Interspecies toxicity correlations may provide a tool for estimating sensitivity towards toxic chemical exposure with known levels of uncertainty for a diversity of wildlife species. In this background, we have developed quantitative interspecies structure-carcinogenicity correlation models for rat and mouse [rodent species according to the Organization for Economic Cooperation and Development (OECD) guidelines] based on the carcinogenic potential of 166 organic chemicals with wide diversity of molecular structures, spanning a large number of chemical classes and biological mechanisms. All the developed models have been assessed according to the OECD principles for the validation of QSAR models. Consensus predictions for carcinogenicity of the individual compounds are presented here for any one species when the data for the other species are available. Informative illustrations of the contributing structural fragments of chemicals which are responsible for specific carcinogenicity endpoints are identified by the developed models. The models have also been used to predict mouse carcinogenicities of 247 organic chemicals (for which rat carcinogenicities are present) and rat carcinogenicities of 150 chemicals (for which mouse carcinogenicities are present). Discriminatory features for rat and mouse carcinogenicity values have also been explored.
Collapse
Affiliation(s)
- Supratik Kar
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India
| | | |
Collapse
|
29
|
Recent trends in statistical QSAR modeling of environmental chemical toxicity. EXPERIENTIA SUPPLEMENTUM (2012) 2012; 101:381-411. [PMID: 22945576 DOI: 10.1007/978-3-7643-8340-4_13] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Quantitative cheminformatics approaches such as QSAR modeling find growing applications in chemical risk assessment. Traditional methods rely on the use of calculated chemical descriptors of molecules and relatively small training sets. However, in recent years, there is a trend toward the increased use of in vitro biological testing approaches to reduce both the length of experimental studies and the animal use for chemical risk assessment. Furthermore, there is also much greater emphasis on model validation using external datasets to enable the reliable use of computational models as part of regulatory decision making. In this chapter, recent trends emphasizing the need for both careful curation of experimental data prior to model development and rigorous model validation are investigated. Furthermore, recent approaches to chemical toxicity prediction that employ both chemical descriptors and in vitro screening data for developing novel hybrid chemical/biological models are being reviewed. Examples of respective application studies that employ novel workflows for model developments are described and recent important efforts by several academic, nonprofit, and industrial groups to start placing both data and, especially, models in the public domain are discussed.
Collapse
|
30
|
Schwöbel JAH, Madden JC, Cronin MTD. Application of a computational model for Michael addition reactivity in the prediction of toxicity to Tetrahymena pyriformis. CHEMOSPHERE 2011; 85:1066-1074. [PMID: 21890172 DOI: 10.1016/j.chemosphere.2011.07.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Revised: 07/13/2011] [Accepted: 07/18/2011] [Indexed: 05/31/2023]
Abstract
A computational model to predict acute aquatic toxicity to the ciliate Tetrahymena pyriformis has been developed. A general prediction of toxicity can be based on three consecutive steps: 1. Identification of a potential reactive mechanism via structural alerts; 2. Confirmation and quantification of (bio)chemical reactivity; 3. Establishing a relationship between calculated reactivity and toxicity. The method described herein uses a combination of a reactive toxicity (RT) model, including computed kinetic rate constants for adduct formation (log k) via a Michael acceptor mechanism of action, and baseline toxicity (BT), modelled by hydrophobicity (octanol-water partition coefficient). The maximum of the RT and BT values defines acute toxicity for a particular compound. The reactive toxicity model is based on site-specific steric and quantum chemical ground state electronic properties. The performance of the model was examined in terms of predicting the toxicity of 106 potential Michael acceptor compounds covering several classes of compounds (aldehydes, ketones, esters, heterocycles). The advantages of the computational method are described. The method allows for a closer and more transparent mechanistic insight into the molecular initiating events of toxicological endpoints.
Collapse
Affiliation(s)
- Johannes A H Schwöbel
- School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool L3 3AF, England, UK
| | | | | |
Collapse
|
31
|
Hewitt M, Cronin MTD, Rowe PH, Schultz TW. Repeatability analysis of the Tetrahymena pyriformis population growth impairment assay. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2011; 22:621-637. [PMID: 21830879 DOI: 10.1080/1062936x.2011.604100] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Assessments necessary to ensure the safety of both humans and the environment are challenged by the sheer number of chemicals in use today. Chemical legislation, such as REACH, aims to use alternative methods to reduce the reliance on in vivo animal testing. Consequently, databases such as the TETRATOX database, containing data from the Tetrahymena pyriformis population growth impairment assay, have been used extensively to develop computational models which aid in priority setting and initial hazard assessments. To use any toxicological data, an assessment of quality is required. One important aspect of quality is the repeatability of the assay. This study considered TETRATOX assay data for 85 structurally and mechanistically diverse compounds. The repeatability of replicate determinations was assessed and factors relating to repeatability are discussed. Despite the majority of compounds demonstrating excellent repeatability, it was found that the mechanism of action is likely to be a modulating factor, with compounds acting via electrophilic mechanisms being more likely to exhibit reduced repeatability than those acting via narcotic mechanisms. It is evident from this study that the TETRATOX assay is a robust and highly repeatable assay, suitable for use in toxicological modelling studies and priority setting.
Collapse
Affiliation(s)
- M Hewitt
- School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool, UK
| | | | | | | |
Collapse
|
32
|
Sahlin U, Filipsson M, Öberg T. A Risk Assessment Perspective of Current Practice in Characterizing Uncertainties in QSAR Regression Predictions. Mol Inform 2011; 30:551-64. [DOI: 10.1002/minf.201000177] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Accepted: 03/25/2011] [Indexed: 11/08/2022]
|
33
|
Ellison CM, Sherhod R, Cronin MTD, Enoch SJ, Madden JC, Judson PN. Assessment of Methods To Define the Applicability Domain of Structural Alert Models. J Chem Inf Model 2011; 51:975-85. [DOI: 10.1021/ci1000967] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- C. M. Ellison
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - R. Sherhod
- Department of Information Studies, University of Sheffield, Regent Court, Sheffield S1 4DP, England
| | - M. T. D. Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - S. J. Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - J. C. Madden
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Byrom Street, Liverpool L3 3AF, England
| | - P. N. Judson
- Lhasa Limited, 22-23 Blenheim Terrace, Woodhouse Lane, Leeds LS2 9HD, England
| |
Collapse
|
34
|
Huang J, Fan X. Why QSAR fails: an empirical evaluation using conventional computational approach. Mol Pharm 2011; 8:600-8. [PMID: 21370915 DOI: 10.1021/mp100423u] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Although a number of pitfalls of QSAR have been corrected in the past decade, the reliability of QSAR models is still insufficient. The reason why QSAR fails is still under hot debate; our study attempts to address this topic from a practical and empirical perspective, evaluating two relatively large toxicological data sets using a typical combination of support vector machine (SVM) and genetic algorithm (GA). Our results suggest that the vast number of equivalent models to be chosen and the insufficient validation strategy are primarily responsible for the failure of many QSAR models. First, a method often produces much more equivalent models than we might expect, and the corresponding descriptor sets show little overlap, indicating the unreliability of the conventional approaches. Moreover, although external validation has been considered necessary, validation on an arbitrarily selected independent set is still insufficient to guarantee the true predictability of a QSAR model. Therefore, more effective training and validation strategies are demanded to enhance the reliability of QSAR models. The present study also demonstrates that combinatorial or ensemble models can greatly reduce the variance of equivalent models, and that models built with the most frequently selected descriptors used by the equivalent models seem to yield more promising performances.
Collapse
Affiliation(s)
- Jianping Huang
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | | |
Collapse
|
35
|
Hewitt M, Ellison CM. Developing the Applicability Domain of In Silico Models: Relevance, Importance and Methods. IN SILICO TOXICOLOGY 2010. [DOI: 10.1039/9781849732093-00301] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The past two decades has seen the rapid growth in the development and utilisation of computational technologies to predict the toxicity of chemicals. Most notably, widespread pressure to both reduce and replace current animal testing regimes has led to in silico modelling becoming a widely utilised tool in toxicological screening. Unfortunately, given that computational models are open to misuse, there has been, and still is, significant reluctance to accept them for regulatory use. In an effort to combat this, the validation of both model and predictions is now at the forefront of research, with the concept of applicability domain being central to the validation process.
In this chapter the applicability domain concept is defined and numerous methods for its characterisation are detailed and explored with the aid of a case study example. These approaches are shown to span from relatively simple descriptor-based methods to more complex approaches based upon structural similarity or mechanism of action. Given the wealth of differing approaches available and the different information each method yields about the model, a stepwise scheme which considers numerous methods is recommended. With appreciation of model architecture and subsequent utilisation, this chapter shows that a robust and multifaceted applicability domain can be generated. Once defined, the applicability domain serves as a critical screening stage ensuring that a model is fit-for-purpose and predictions are made with maximal confidence.
Collapse
Affiliation(s)
- M. Hewitt
- School of Pharmacy and Chemistry, Liverpool John Moores University Byrom Street, Liverpool L3 3AF UK
| | - C. M. Ellison
- School of Pharmacy and Chemistry, Liverpool John Moores University Byrom Street, Liverpool L3 3AF UK
| |
Collapse
|
36
|
Tropsha A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol Inform 2010; 29:476-88. [DOI: 10.1002/minf.201000061] [Citation(s) in RCA: 1086] [Impact Index Per Article: 77.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Accepted: 06/08/2010] [Indexed: 11/11/2022]
|
37
|
Boethling RS, Costanza J. Domain of EPI suite biotransformation models. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2010; 21:415-443. [PMID: 20818580 DOI: 10.1080/1062936x.2010.501816] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Knowledge of the interpolative region or applicability domain (AD) of structure-activity relationships is believed to improve predictive accuracy. The present work was undertaken to characterize the AD of EPI Suite biotransformation models and evaluate the performance of selected AD assessment methods. AD methods were applied to the training sets of four models representing different end-points, and the predictive accuracy was then evaluated using six independent validation sets. Two of the models estimated a continuous variable (log half-life) from fragment descriptors. For biotransformation in fish (BCFBAF) and hydrocarbon biodegradation (BioHCwin), the approach using ranges, with preprocessing by analysis of principal components, worked reasonably well in identifying subsets of validation chemicals that have higher root mean squared error than for all validation chemicals. AD methods were also applied to two classification models, Biowin3 (which predicts the time required to achieve complete aerobic biodegradation) and Biowin5 (the probability of ready biodegradation in the OECD 301C test). Structure-based AD methods (fingerprints, atom environments) showed some success, but descriptor-based AD methods were not useful in identifying misclassified chemicals. For Biowin3 the largest percentage of misclassified chemicals was obtained for chemicals for which prediction was based on molecular weight alone, which suggests the need to revise the fragment library of the model.
Collapse
Affiliation(s)
- R S Boethling
- US Environmental Protection Agency, Office of Pollution Prevention and Toxics 7406M, Washington, DC 20460, USA.
| | | |
Collapse
|
38
|
Kaneko H, Arakawa M, Funatsu K. Applicability domains and accuracy of prediction of soft sensor models. AIChE J 2010. [DOI: 10.1002/aic.12351] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
39
|
Furuhama A, Toida T, Nishikawa N, Aoki Y, Yoshioka Y, Shiraishi H. Development of an ecotoxicity QSAR model for the KAshinhou Tool for Ecotoxicity (KATE) system, March 2009 version. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2010; 21:403-13. [PMID: 20818579 PMCID: PMC2946238 DOI: 10.1080/1062936x.2010.501815] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Accepted: 04/28/2010] [Indexed: 05/14/2023]
Abstract
The KAshinhou Tool for Ecotoxicity (KATE) system, including ecotoxicity quantitative structure-activity relationship (QSAR) models, was developed by the Japanese National Institute for Environmental Studies (NIES) using the database of aquatic toxicity results gathered by the Japanese Ministry of the Environment and the US EPA fathead minnow database. In this system chemicals can be entered according to their one-dimensional structures and classified by substructure. The QSAR equations for predicting the toxicity of a chemical compound assume a linear correlation between its log P value and its aquatic toxicity. KATE uses a structural domain called C-judgement, defined by the substructures of specified functional groups in the QSAR models. Internal validation by the leave-one-out method confirms that the QSAR equations, with r(2 )> 0.7, RMSE 5, give acceptable q(2) values. Such external validation indicates that a group of chemicals with an in-domain of KATE C-judgements exhibits a lower root mean square error (RMSE). These findings demonstrate that the KATE system has the potential to enable chemicals to be categorised as potential hazards.
Collapse
Affiliation(s)
- A Furuhama
- Research Center for Environmental Risk, National Institute for Environmental Studies (NIES), 16-2 Onogawa, Tsukuba 305-8506, Japan.
| | | | | | | | | | | |
Collapse
|
40
|
Fechner N, Jahn A, Hinselmann G, Zell A. Estimation of the applicability domain of kernel-based machine learning models for virtual screening. J Cheminform 2010; 2:2. [PMID: 20222949 PMCID: PMC2851576 DOI: 10.1186/1758-2946-2-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2009] [Accepted: 03/11/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The virtual screening of large compound databases is an important application of structural-activity relationship models. Due to the high structural diversity of these data sets, it is impossible for machine learning based QSAR models, which rely on a specific training set, to give reliable results for all compounds. Thus, it is important to consider the subset of the chemical space in which the model is applicable. The approaches to this problem that have been published so far mostly use vectorial descriptor representations to define this domain of applicability of the model. Unfortunately, these cannot be extended easily to structured kernel-based machine learning models. For this reason, we propose three approaches to estimate the domain of applicability of a kernel-based QSAR model. RESULTS We evaluated three kernel-based applicability domain estimations using three different structured kernels on three virtual screening tasks. Each experiment consisted of the training of a kernel-based QSAR model using support vector regression and the ranking of a disjoint screening data set according to the predicted activity. For each prediction, the applicability of the model for the respective compound is quantitatively described using a score obtained by an applicability domain formulation. The suitability of the applicability domain estimation is evaluated by comparing the model performance on the subsets of the screening data sets obtained by different thresholds for the applicability scores. This comparison indicates that it is possible to separate the part of the chemspace, in which the model gives reliable predictions, from the part consisting of structures too dissimilar to the training set to apply the model successfully. A closer inspection reveals that the virtual screening performance of the model is considerably improved if half of the molecules, those with the lowest applicability scores, are omitted from the screening. CONCLUSION The proposed applicability domain formulations for kernel-based QSAR models can successfully identify compounds for which no reliable predictions can be expected from the model. The resulting reduction of the search space and the elimination of some of the active compounds should not be considered as a drawback, because the results indicate that, in most cases, these omitted ligands would not be found by the model anyway.
Collapse
Affiliation(s)
- Nikolas Fechner
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Andreas Jahn
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Georg Hinselmann
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| | - Andreas Zell
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Sand 1, 72076 Tübingen, Germany
| |
Collapse
|
41
|
Helgee EA, Carlsson L, Boyer S, Norinder U. Evaluation of Quantitative Structure−Activity Relationship Modeling Strategies: Local and Global Models. J Chem Inf Model 2010; 50:677-89. [DOI: 10.1021/ci900471e] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ernst Ahlberg Helgee
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca Research & Development 15185 Södertälje, Sweden
| | - Lars Carlsson
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca Research & Development 15185 Södertälje, Sweden
| | - Scott Boyer
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca Research & Development 15185 Södertälje, Sweden
| | - Ulf Norinder
- Safety Assessment, AstraZeneca Research & Development, 43183 Mölndal, Sweden, and Medicinal Chemistry, AstraZeneca Research & Development 15185 Södertälje, Sweden
| |
Collapse
|
42
|
Rasulev B, Kušić H, Leszczynska D, Leszczynski J, Koprivanac N. QSAR modeling of acute toxicity on mammals caused by aromatic compounds: the case study using oral LD50 for rats. ACTA ACUST UNITED AC 2010; 12:1037-44. [DOI: 10.1039/b919489d] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
43
|
Bajot F. The Use of Qsar and Computational Methods in Drug Design. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2010. [DOI: 10.1007/978-1-4020-9783-6_9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
44
|
Ellison CM, Enoch SJ, Cronin MT, Madden JC, Judson P. Definition of the Applicability Domains of Knowledge-based Predictive Toxicology Expert Systems by Using a Structural Fragment-based Approach. Altern Lab Anim 2009; 37:533-45. [DOI: 10.1177/026119290903700510] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The applicability domain of a (quantitative) structure–activity relationship ([Q]SAR) must be defined, if a model is to be used successfully for toxicity prediction, particularly for regulatory purposes. Previous efforts to set guidelines on the definition of applicability domains have often been biased toward quantitative, rather than qualitative, models. As a result, novel techniques are still required to define the applicability domains of structural alert models and knowledge-based systems. By using Derek for Windows as an example, this study defined the domain for the skin sensitisation structural alert rule-base. This was achieved by fragmenting the molecules within a training set of compounds, then searching the fragments for those created from a test compound. This novel method was able to highlight test chemicals which differed from those in the training set. The information was then used to designate chemicals as being either within or outside the domain of applicability for the structural alert on which that training set was based.
Collapse
Affiliation(s)
- Claire M. Ellison
- School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool, UK
| | - Steven J. Enoch
- School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool, UK
| | - Mark T.D. Cronin
- School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool, UK
| | - Judith C. Madden
- School of Pharmacy and Chemistry, Liverpool John Moores University, Liverpool, UK
| | | |
Collapse
|
45
|
Hansch C, Verma RP. Overcoming tumor drug resistance with C2-modified 10-deacetyl-7-propionyl cephalomannines: a QSAR study. Mol Pharm 2009; 6:849-60. [PMID: 19334723 DOI: 10.1021/mp800138w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The microtubule-stabilizing taxanes such as paclitaxel and docetaxel are the two most important anticancer drugs currently used in clinics for the treatment of various types of cancers. However, the major common drawbacks of these two drugs are drug resistance, neurotoxicity, substrate for drug transporter P-gp, cross-resistance with other chemotherapeutic agents, low oral bioavailability, and no penetration in the blood-brain barrier (BBB). These limitations have led to the search for new taxane derivatives with improved biological activity. In the present paper, we discuss the quantitative structure-activity relationship (QSAR) studies on a series of C2-modified 10-deacetyl-7-propionyl cephalomannines (IV) with respect to their binding affinities toward beta-tubulin and cytotoxic activities against both drug-sensitive and drug-resistant tumor cells, in which resistance is mediated through either P-gp overexpression or beta-tubulin mutation mechanisms, by the formulation of five QSARs. Hydrophobicity and molar refractivity of the substituents (pi(X) and MR(X)) are found to be the most important determinants for the activity. Parabolic correlations in terms of MR(X) (eqs 2 and 4 ) are encouraging examples in which the optimum values of MR(X) are well-defined. We believe that these two QSAR models may prove to be adequate predictive models that can help to provide guidance in design and synthesis, and subsequently yield very specific cephalomannine derivatives (IV) that may have high biological activities. On the basis of these two QSAR models, 10 cephalomannine analogues (IV-21 to IV-30) are suggested as potential synthetic targets. Internal (cross-validation (q(2)), quality factor (Q), Fischer statistics (F), and Y-randomization) and external validation tests have validated all the QSAR models.
Collapse
Affiliation(s)
- Corwin Hansch
- Department of Chemistry, Pomona College, Claremont, CA 91711, USA
| | | |
Collapse
|
46
|
Verma RP, Hansch C. Taxane analogues against lung cancer: a quantitative structure-activity relationship study. Chem Biol Drug Des 2009; 73:627-36. [PMID: 19635054 DOI: 10.1111/j.1747-0285.2009.00816.x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Lung cancer is the second most common cancer in both men (after prostate cancer) and women (after breast cancer). The microtubule-stabilizing taxane such as docetaxel is the only agent currently approved for both first- and second-line treatment of advanced non-small cell lung cancer. Although docetaxel has made significant progress in the treatment of lung cancers either using alone or in combination with various novel targeted agents, its use often results in various undesired side-effects. These limitations have led to the search for new taxane derivatives with fewer side-effects, superior pharmacological properties, and improved anticancer activity to maximize the induced benefits for lung cancer patients. Herein, four series of taxane derivatives were used to correlate their inhibitory activities against lung cancer cells with hydrophobic and steric descriptors to gain a better understanding of their chemical-biological interactions. A parabolic correlation with MR(Y) is the most encouraging example, in which the optimum value of this parameter is well defined. On the basis of this quantitative structure-activity relationship model, six compounds (3-23 to 3-28) are suggested as potential synthetic targets. Internal (cross-validation (q(2)), quality factor (Q), Fischer statistics (F ) and Y-randomization) and external validation tests have validated all the quantitative structure-activity relationship models.
Collapse
|
47
|
Clark RD. DPRESS: Localizing estimates of predictive uncertainty. J Cheminform 2009; 1:11. [PMID: 20298517 PMCID: PMC3225832 DOI: 10.1186/1758-2946-1-11] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2009] [Accepted: 07/14/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The need to have a quantitative estimate of the uncertainty of prediction for QSAR models is steadily increasing, in part because such predictions are being widely distributed as tabulated values disconnected from the models used to generate them. Classical statistical theory assumes that the error in the population being modeled is independent and identically distributed (IID), but this is often not actually the case. Such inhomogeneous error (heteroskedasticity) can be addressed by providing an individualized estimate of predictive uncertainty for each particular new object u: the standard error of prediction su can be estimated as the non-cross-validated error st* for the closest object t* in the training set adjusted for its separation d from u in the descriptor space relative to the size of the training set.The predictive uncertainty factor gammat* is obtained by distributing the internal predictive error sum of squares across objects in the training set based on the distances between them, hence the acronym: Distributed PRedictive Error Sum of Squares (DPRESS). Note that st* and gammat*are characteristic of each training set compound contributing to the model of interest. RESULTS The method was applied to partial least-squares models built using 2D (molecular hologram) or 3D (molecular field) descriptors applied to mid-sized training sets (N = 75) drawn from a large (N = 304), well-characterized pool of cyclooxygenase inhibitors. The observed variation in predictive error for the external 229 compound test sets was compared with the uncertainty estimates from DPRESS. Good qualitative and quantitative agreement was seen between the distributions of predictive error observed and those predicted using DPRESS. Inclusion of the distance-dependent term was essential to getting good agreement between the estimated uncertainties and the observed distributions of predictive error. The uncertainty estimates derived by DPRESS were conservative even when the training set was biased, but not excessively so. CONCLUSION DPRESS is a straightforward and powerful way to reliably estimate individual predictive uncertainties for compounds outside the training set based on their distance to the training set and the internal predictive uncertainty associated with its nearest neighbor in that set. It represents a sample-based, a posteriori approach to defining applicability domains in terms of localized uncertainty.
Collapse
Affiliation(s)
- Robert D Clark
- Biochemical Infometrics, 827 Renee Lane, Creve Coeur MO 63141, USA.
| |
Collapse
|
48
|
Affiliation(s)
- Stefan Balaz
- Department of Pharmaceutical Sciences, College of Pharmacy, North Dakota State University, Fargo, North Dakota 58105, USA.
| |
Collapse
|
49
|
Affiliation(s)
- Rajeshwar P. Verma
- Department of Chemistry, Pomona College, 645 North College Avenue, Claremont, California 91711
| | - Corwin Hansch
- Department of Chemistry, Pomona College, 645 North College Avenue, Claremont, California 91711
| |
Collapse
|
50
|
Koleva YK, Madden JC, Cronin MTD. Formation of Categories from Structure−Activity Relationships To Allow Read-Across for Risk Assessment: Toxicity of α,β-Unsaturated Carbonyl Compounds. Chem Res Toxicol 2008; 21:2300-12. [DOI: 10.1021/tx8002438] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Yana K. Koleva
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, England
| | - Judith C. Madden
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, England
| | - Mark T. D. Cronin
- School of Pharmacy and Chemistry, Liverpool John Moores University, Byrom Street, Liverpool, L3 3AF, England
| |
Collapse
|