1
|
Gajewicz-Skretna A, Furuhama A, Yamamoto H, Suzuki N. Generating accurate in silico predictions of acute aquatic toxicity for a range of organic chemicals: Towards similarity-based machine learning methods. CHEMOSPHERE 2021; 280:130681. [PMID: 34162070 DOI: 10.1016/j.chemosphere.2021.130681] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 06/13/2023]
Abstract
There has been an increase in the use of non-animal approaches, such as in silico and/or in vitro methods, for assessing the risks of hazardous chemicals. A number of machine learning algorithms link molecular descriptors that interpret chemical structural properties with their biological activity. These computer-aided methods encounter several challenges, the most significant being the heterogeneity of datasets; more efficient and inclusive computational methods that are able to process large and heterogeneous chemical datasets are needed. In this context, this study verifies the utility of similarity-based machine learning methods in predicting the acute aquatic toxicity of diverse organic chemicals on Daphnia magna and Oryzias latipes. Two similarity-based methods were tested that employ a limited training dataset, most similar to a given fitting point, instead of using the entire dataset that encompasses a wide range of chemicals. The kernel-weighted local polynomial approach had a number of advantages over the distance-weighted k-nearest neighbor (k-NN) algorithm. The results highlight the importance of lipophilicity, electrophilic reactivity, molecular polarizability, and size in determining acute toxicity. The rigorous model validation ensures that this approach is an important tool for estimating toxicity in new or untested chemicals.
Collapse
Affiliation(s)
- Agnieszka Gajewicz-Skretna
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland.
| | - Ayako Furuhama
- Center for Health and Environmental Risk Research, National Institute for Environmental Studies (NIES), 16-2 Onogawa, Tsukuba, 305-8506, Japan; Division of Genetics and Mutagenesis, National Institute of Health Sciences (NIHS), 3-25-26 Tonomachi, Kawasaki-ku, Kawasaki City, Kanagawa, 210-9501, Japan
| | - Hiroshi Yamamoto
- Center for Health and Environmental Risk Research, National Institute for Environmental Studies (NIES), 16-2 Onogawa, Tsukuba, 305-8506, Japan
| | - Noriyuki Suzuki
- Center for Health and Environmental Risk Research, National Institute for Environmental Studies (NIES), 16-2 Onogawa, Tsukuba, 305-8506, Japan
| |
Collapse
|
2
|
Gajewicz-Skretna A, Kar S, Piotrowska M, Leszczynski J. The kernel-weighted local polynomial regression (KwLPR) approach: an efficient, novel tool for development of QSAR/QSAAR toxicity extrapolation models. J Cheminform 2021; 13:9. [PMID: 33579384 PMCID: PMC7881668 DOI: 10.1186/s13321-021-00484-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Accepted: 01/11/2021] [Indexed: 11/10/2022] Open
Abstract
The ability of accurate predictions of biological response (biological activity/property/toxicity) of a given chemical makes the quantitative structure‐activity/property/toxicity relationship (QSAR/QSPR/QSTR) models unique among the in silico tools. In addition, experimental data of selected species can also be used as an independent variable along with other structural as well as physicochemical variables to predict the response for different species formulating quantitative activity–activity relationship (QAAR)/quantitative structure–activity–activity relationship (QSAAR) approach. Irrespective of the models' type, the developed model's quality, and reliability need to be checked through multiple classical stringent validation metrics. Among the validation metrics, error-based metrics are more significant as the basic idea of a good predictive model is to improve the predictions' quality by lowering the predicted residuals for new query compounds. Following the concept, we have checked the predictive quality of the QSAR and QSAAR models employing kernel-weighted local polynomial regression (KwLPR) approach over the traditional linear and non-linear regression-based approaches tools such as multiple linear regression (MLR) and k nearest neighbors (kNN). Five datasets which were previously modeled using linear and non-linear regression method were considered to implement the KwPLR approach, followed by comparison of their validation metrics outcomes. For all five cases, the KwLPR based models reported better results over the traditional approaches. The present study's focus is not to develop a better or improved QSAR/QSAAR model over the previous ones, but to demonstrate the advantage, prediction power, and reliability of the KwLPR algorithm and establishing it as a novel, powerful cheminformatic tool. To facilitate the use of the KwLPR algorithm for QSAR/QSPR/QSTR/QSAAR modeling, the authors provide an in-house developed KwLPR.RMD script under the open-source R programming language. ![]()
Collapse
Affiliation(s)
- Agnieszka Gajewicz-Skretna
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland.
| | - Supratik Kar
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS, 39217, USA
| | - Magdalena Piotrowska
- Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, Wita Stwosza 63, 80-308, Gdansk, Poland
| | - Jerzy Leszczynski
- Interdisciplinary Center for Nanotoxicity, Department of Chemistry, Physics and Atmospheric Sciences, Jackson State University, 1400 J. R. Lynch Street, P. O. Box 17910, Jackson, MS, 39217, USA
| |
Collapse
|
3
|
Abstract
The emphasis of this review is particularly on multivariate statistical methods currently used in quantitative structure–activity relationship (QSAR) studies.
Collapse
Affiliation(s)
- Somayeh Pirhadi
- Drug Design in Silico Lab
- Chemistry Faculty
- K. N. Toosi University of Technology
- Tehran
- Iran
| | | | - Jahan B. Ghasemi
- Drug Design in Silico Lab
- Chemistry Faculty
- K. N. Toosi University of Technology
- Tehran
- Iran
| |
Collapse
|
4
|
Dodić J, Grahovac J, Kalajdžija N, Kovačević S, Jevrić L, Podunavac Kuzmanović S. Chemometric approach to prediction of antibacterial agent production by Streptomyces hygroscopicus. Appl Biochem Biotechnol 2014; 174:534-41. [PMID: 25082769 DOI: 10.1007/s12010-014-1115-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2014] [Accepted: 07/23/2014] [Indexed: 11/29/2022]
Abstract
The nutritional requirements for antimicrobial agent production using Streptomyces hygroscopicus were analyzed in shake flask experiments. Antimicrobial activity was tested against Staphylococcus aureus and Bacillus cereus. The mathematical models have been generated with relative high complexity in order to give an adequate fit to the data. All the results suggest a high dependence of produced antimicrobial agent quantities on the amount of carbon, nitrogen, and phosphorus in cultivation medium. The statistical results of the generated models reflect the high predictive ability. The derived models were validated using leave-one-out cross-validation technique, and from statistical point of view, they have significantly high values of the cross-validation parameters.
Collapse
Affiliation(s)
- Jelena Dodić
- Department of Biotechnology and Pharmaceutical Engineering, Faculty of Technology, University of Novi Sad, Bulevar cara Lazara 1, 21000, Novi Sad, Serbia
| | | | | | | | | | | |
Collapse
|
5
|
Lewis RA, Wood D. Modern 2D QSAR for drug discovery. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2014. [DOI: 10.1002/wcms.1187] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Richard A. Lewis
- Novartis Institutes for BioMedical Research; Novartis Pharma AG; Basel Switzerland
| | - David Wood
- Novartis Institutes for BioMedical Research; Novartis Horsham Research Centre; Horsham UK
| |
Collapse
|
6
|
Chen B, Harrison RF, Papadatos G, Willett P, Wood DJ, Lewell XQ, Greenidge P, Stiefl N. Evaluation of machine-learning methods for ligand-based virtual screening. J Comput Aided Mol Des 2007; 21:53-62. [PMID: 17205373 DOI: 10.1007/s10822-006-9096-5] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2006] [Accepted: 12/04/2006] [Indexed: 01/28/2023]
Abstract
Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed.
Collapse
Affiliation(s)
- Beining Chen
- Department of Chemistry, University of Sheffield, Western Bank, Sheffield, UK
| | | | | | | | | | | | | | | |
Collapse
|
7
|
McNeany TJ, Hirst JD. Inhibition of the Tyrosine Kinase, Syk, Analyzed by Stepwise Nonparametric Regression. J Chem Inf Model 2005; 45:768-76. [PMID: 15921466 DOI: 10.1021/ci049631t] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
A set of 538 inhibitors of the tyrosine kinase, Syk, including purines, pyrimidines, indoles, imidazoles, pyrazoles, and quinazolines, has been analyzed using a stepwise nonparametric regression (SNPR) algorithm, which has been developed for QSAR studies of pharmacological data. The algorithm couples stepwise descriptor selection with flexible, nonparametric, kernel regression, to generate structure-activity relationships. A further 371 molecules have been used as a test set to evaluate the models generated. Descriptors were selected using an internal monitoring set, and models were assessed using 10% of the principal (538-compound) data set, selected randomly, as an external validation set. The best model had a Q(2) of 0.46 for the external validation set. Test set predictions were significantly less accurate, partly due to the higher mean activity of the test molecules. However at a more coarse-grain level the SNPR models classified active molecules accurately, giving good enrichments. The data sets are difficult to model accurately and SNPR performs better than multilinear regression and a neural network analysis. In the additive implementation of SNPR multidimensional models are considered as a sum of single dimensional regressions. This makes the resultant models easily interpretable. For example, in the most predictive SNPR models, there is a clear nonlinear relationship between hydrophobicity (AlogP98) and inhibitory activity.
Collapse
Affiliation(s)
- T John McNeany
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK
| | | |
Collapse
|
8
|
Gilardoni F, Curcin V, Karunanayake K, Norgaard J, Guo Y. Integrated Informatics in Life and Materials Sciences: An Oxymoron? ACTA ACUST UNITED AC 2005. [DOI: 10.1002/qsar.200420056] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
9
|
Hirst JD, McNeany TJ, Howe T, Whitehead L. Application of non-parametric regression to quantitative structure-activity relationships. Bioorg Med Chem 2002; 10:1037-41. [PMID: 11836112 DOI: 10.1016/s0968-0896(01)00359-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Several non-parametric regressors have been applied to modelling quantitative structure-activity relationship (QSAR) data. Performances were benchmarked against multilinear regression and the nonlinear method of smoothing splines. Variable selection was explored through systematic combinations of different variables and combinations of principal components. For the training set examined--539 inhibitors of the tyrosine kinase, Syk--the best two-descriptor model had a 5-fold cross-validated q2 of 0.43. This was generated by a multi-variate Nadaraya-Watson kernel estimator. A subsequent, independent, test set of 371 similar chemical entities showed the model had some predictive power. Other approaches did not perform as well. A modest increase in predictive ability can be achieved with three descriptors, but the resulting model is less easy to visualise. We conclude that non-parametric regression offers a potentially powerful approach to identifying predictive, low-dimensional QSARs.
Collapse
Affiliation(s)
- Jonathan D Hirst
- School of Chemistry, University of Nottingham, University Park, NG7 2RD, Nottingham, UK.
| | | | | | | |
Collapse
|
10
|
Agatonovic-Kustrin S, Alany R. Application of diffuse reflectance infrared Fourier transform spectroscopy combined with artificial neural networks in analysing enantiomeric purity of terbutaline sulphate bulk drug. Anal Chim Acta 2001. [DOI: 10.1016/s0003-2670(01)01234-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Agatonovic-Kustrin S, Beresford R, Yusof AP. ANN modeling of the penetration across a polydimethylsiloxane membrane from theoretically derived molecular descriptors. J Pharm Biomed Anal 2001; 26:241-54. [PMID: 11470201 DOI: 10.1016/s0731-7085(01)00421-6] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
A quantitative structure-permeability relationship was developed using Artificial Neural Network (ANN) modeling to study penetration across a polydimethylsiloxane membrane. A set of 254 compounds and their experimentally derived maximum steady state flux values used in this study was gathered from the literature. A total of 42 molecular descriptors were calculated for each compound. A genetic algorithm was used to select important molecular descriptors and supervised ANN was used to correlate selected descriptors with the experimentally derived maximum steady-state flux through the polydimethylsiloxane membrane (log J). Calculated molecular descriptors were used as the ANN's inputs and log J as the output. Developed model indicates that molecular shape and size, inter-molecular interactions, hydrogen-bonding capacity of drugs, and conformational stability could be used to predict drug absorption through skin. A 12-descriptor nonlinear computational neural network model has been developed for the estimation of log J values for a data set of 254 drugs. Described model does not require experimental parameters and could potentially provide useful prediction of membrane penetration of new drugs and reduce the need for actual compound synthesis and flux measurements.
Collapse
Affiliation(s)
- S Agatonovic-Kustrin
- School of Pharmaceutical Sciences, Universiti Sains Malaysia, 11800 Penang, Malaysia.
| | | | | |
Collapse
|
12
|
Harper G, Bradshaw J, Gittins JC, Green DV, Leach AR. Prediction of biological activity for high-throughput screening using binary kernel discrimination. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2001; 41:1295-300. [PMID: 11604029 DOI: 10.1021/ci000397q] [Citation(s) in RCA: 65] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
High-throughput screening has made a significant impact on drug discovery, but there is an acknowledged need for quantitative methods to analyze screening results and predict the activity of further compounds. In this paper we introduce one such method, binary kernel discrimination, and investigate its performance on two datasets; the first is a set of 1650 monoamine oxidase inhibitors, and the second a set of 101 437 compounds from an in-house enzyme assay. We compare the performance of binary kernel discrimination with a simple procedure which we call "merged similarity search", and also with a feedforward neural network. Binary kernel discrimination is shown to perform robustly with varying quantities of training data and also in the presence of noisy data. We conclude by highlighting the importance of the judicious use of general pattern recognition techniques for compound selection.
Collapse
Affiliation(s)
- G Harper
- GlaxoSmithKline Research and Development, Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, U.K.
| | | | | | | | | |
Collapse
|
13
|
van Rhee AM, Stocker J, Printzenhoff D, Creech C, Wagoner PK, Spear KL. Retrospective Analysis of an Experimental High-Throughput Screening Data Set by Recursive Partitioning. ACTA ACUST UNITED AC 2001; 3:267-77. [PMID: 11350250 DOI: 10.1021/cc0000747] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
With the emergence of combinatorial chemistry, whether based on parallel, mixture, solution, or solid phase chemistry, it is now possible to generate large numbers of diverse or focused compound libraries. In this paper we aim to demonstrate that it is possible to design targeted libraries by applying nonparametric statistical methods, recursive partitioning in particular, to large data sets containing thousands of compounds and their associated biological data. Moreover, when applied to an experimental high-throughput screening (HTS) data set, our data strongly suggest that this method can improve the hit rate of our primary screens (about 4- to 5-fold) while increasing screening efficiency: less than one-fifth of the complete selection needs to be screened in order to identify about 75% of all actives present.
Collapse
Affiliation(s)
- A M van Rhee
- ICAgen, Inc., P.O. Box 14487, Research Triangle Park, North Carolina 27709, USA
| | | | | | | | | | | |
Collapse
|