1
|
Prediction of the aquatic toxicity of aromatic compounds to tetrahymena pyriformis through support vector regression. Oncotarget 2018; 8:49359-49369. [PMID: 28467816 PMCID: PMC5564774 DOI: 10.18632/oncotarget.17210] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 03/30/2017] [Indexed: 01/24/2023] Open
Abstract
Toxicity evaluation is an extremely important process during drug development. It is usually initiated by experiments on animals, which is time-consuming and costly. To speed up such a process, a quantitative structure-activity relationship (QSAR) study was performed to develop a computational model for correlating the structures of 581 aromatic compounds with their aquatic toxicity to tetrahymena pyriformis. A set of 68 molecular descriptors derived solely from the structures of the aromatic compounds were calculated based on Gaussian 03, HyperChem 7.5, and TSAR V3.3. A comprehensive feature selection method, minimum Redundancy Maximum Relevance (mRMR)-genetic algorithm (GA)-support vector regression (SVR) method, was applied to select the best descriptor subset in QSAR analysis. The SVR method was employed to model the toxicity potency from a training set of 500 compounds. Five-fold cross-validation method was used to optimize the parameters of SVR model. The new SVR model was tested on an independent dataset of 81 compounds. Both high internal consistent and external predictive rates were obtained, indicating the SVR model is very promising to become an effective tool for fast detecting the toxicity.
Collapse
|
2
|
Gupta S, Basant N, Singh KP. Predicting aquatic toxicities of benzene derivatives in multiple test species using local, global and interspecies QSTR modeling approaches. RSC Adv 2015. [DOI: 10.1039/c5ra12825k] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A flow diagram showing QSTR modeling strategy for aquatic toxicity prediction of benzene derivatives in multiple test species.
Collapse
Affiliation(s)
- Shikha Gupta
- Environmental Chemistry Division
- CSIR-Indian Institute of Toxicology Research
- Lucknow-226001
- India
| | | | - Kunwar P. Singh
- Environmental Chemistry Division
- CSIR-Indian Institute of Toxicology Research
- Lucknow-226001
- India
| |
Collapse
|
3
|
Singh KP, Gupta S. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches. Toxicol Appl Pharmacol 2014; 275:198-212. [PMID: 24463095 DOI: 10.1016/j.taap.2014.01.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Revised: 01/04/2014] [Accepted: 01/13/2014] [Indexed: 02/03/2023]
Abstract
Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure-toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R(2)) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R(2) and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals.
Collapse
Affiliation(s)
- Kunwar P Singh
- Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India.
| | - Shikha Gupta
- Academy of Scientific and Innovative Research, Anusandhan Bhawan, Rafi Marg, New Delhi 110 001, India; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001, India
| |
Collapse
|
4
|
Devillers J, Doucet JP, Doucet-Panaye A, Decourtye A, Aupinel P. Linear and non-linear QSAR modelling of juvenile hormone esterase inhibitors. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2012; 23:357-369. [PMID: 22443267 DOI: 10.1080/1062936x.2012.664562] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
A tight control of juvenile hormone (JH) titre is crucial during the life cycle of a holometabolous insect. JH metabolism is made through the action of enzymes, particularly the juvenile hormone esterase (JHE). Trifluoromethylketones (TFKs) are able to inhibit this enzyme to disrupt the endocrine function of the targeted insect. In this context, a set of 96 TFKs, tested on Trichoplusia ni for their JHE inhibition, was split into a training set (n = 77) and a test set (n = 19) to derive a QSAR model. TFKs were initially described by 42 CODESSA (Comprehensive Descriptors for Structural and Statistical Analysis) descriptors, but a feature selection process allowed us to consider only five descriptors encoding the structural characteristics of the TFKs and their reactivity. A classical and spline regression analysis, a three-layer perceptron, a radial basis function network and a support vector regression were experienced as statistical tools. The best results were obtained with the support vector regression (r(2) and r(test)(2) = 0.91). The model provides information on the structural features and properties responsible for the high JHE inhibition activity of TFKs.
Collapse
|
5
|
Burden F, Winkler D. An Optimal Self-Pruning Neural Network and Nonlinear Descriptor Selection in QSAR. ACTA ACUST UNITED AC 2009. [DOI: 10.1002/qsar.200810202] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
6
|
|
7
|
|
8
|
Burden FR, Polley MJ, Winkler DA. Toward Novel Universal Descriptors: Charge Fingerprints. J Chem Inf Model 2009; 49:710-5. [DOI: 10.1021/ci800290h] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Frank R. Burden
- CSIRO Molecular & Health Technologies, Private Bag 10, Clayton South MDC, Clayton, Victoria 3169, Australia, School of Chemistry, Monash University, Clayton, Victoria 3168, Australia, and SciMetrics Limited, 548 Canning Street, Carlton North, Victoria 3054, Australia
| | - Mitchell J. Polley
- CSIRO Molecular & Health Technologies, Private Bag 10, Clayton South MDC, Clayton, Victoria 3169, Australia, School of Chemistry, Monash University, Clayton, Victoria 3168, Australia, and SciMetrics Limited, 548 Canning Street, Carlton North, Victoria 3054, Australia
| | - David A. Winkler
- CSIRO Molecular & Health Technologies, Private Bag 10, Clayton South MDC, Clayton, Victoria 3169, Australia, School of Chemistry, Monash University, Clayton, Victoria 3168, Australia, and SciMetrics Limited, 548 Canning Street, Carlton North, Victoria 3054, Australia
| |
Collapse
|
9
|
Cruz-Monteagudo M, González-Díaz H, Borges F, Dominguez ER, Cordeiro MNDS. 3D-MEDNEs: an alternative "in silico" technique for chemical research in toxicology. 2. quantitative proteome-toxicity relationships (QPTR) based on mass spectrum spiral entropy. Chem Res Toxicol 2008; 21:619-32. [PMID: 18257557 DOI: 10.1021/tx700296t] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Low range mass spectra (MS) characterization of serum proteome offers the best chance of discovering proteome-(early drug-induced cardiac toxicity) relationships, called here Pro-EDICToRs. However, due to the thousands of proteins involved, finding the single disease-related protein could be a hard task. The search for a model based on general MS patterns becomes a more realistic choice. In our previous work ( González-Díaz, H. , et al. Chem. Res. Toxicol. 2003, 16, 1318- 1327 ), we introduced the molecular structure information indices called 3D-Markovian electronic delocalization entropies (3D-MEDNEs). In this previous work, quantitative structure-toxicity relationship (QSTR) techniques allowed us to link 3D-MEDNEs with blood toxicological properties of drugs. In this second part, we extend 3D-MEDNEs to numerically encode biologically relevant information present in MS of the serum proteome for the first time. Using the same idea behind QSTR techniques, we can seek now by analogy a quantitative proteome-toxicity relationship (QPTR). The new QPTR models link MS 3D-MEDNEs with drug-induced toxicological properties from blood proteome information. We first generalized Randic's spiral graph and lattice networks of protein sequences to represent the MS of 62 serum proteome samples with more than 370 100 intensity ( I i ) signals with m/ z bandwidth above 700-12000 each. Next, we calculated the 3D-MEDNEs for each MS using the software MARCH-INSIDE. After that, we developed several QPTR models using different machine learning and MS representation algorithms to classify samples as control or positive Pro-EDICToRs samples. The best QPTR proposed showed accuracy values ranging from 83.8% to 87.1% and leave-one-out (LOO) predictive ability of 77.4-85.5%. This work demonstrated that the idea behind classic drug QSTR models may be extended to construct QPTRs with proteome MS data.
Collapse
Affiliation(s)
- Maykel Cruz-Monteagudo
- Physico-Chemical Molecular Research Unit, Department of Organic Chemistry, Faculty of Pharmacy, University of Porto, 4150-047 Porto, Portugal
| | | | | | | | | |
Collapse
|
10
|
Abstract
Artificial neural networks are increasingly used in environmental toxicology to find complex relationships between the ecotoxicity of xenobiotics and their structure and/or physicochemical properties. The raison d'etre of these nonlinear tools is their ability to derive powerful QSARs for molecules presenting different mechanisms of action. In this chapter, the main QSAR models derived for aquatic and terrestrial species are reviewed. Their characteristics and modeling performances are deeply analyzed.
Collapse
|
11
|
Kahn I, Sild S, Maran U. Modeling the Toxicity of Chemicals to Tetrahymena pyriformis Using Heuristic Multilinear Regression and Heuristic Back-Propagation Neural Networks. J Chem Inf Model 2007; 47:2271-9. [DOI: 10.1021/ci700231c] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Affiliation(s)
- Iiris Kahn
- Institute of Chemistry, University of Tartu, 2 Jakobi Str., Tartu 51014, Estonia
| | - Sulev Sild
- Institute of Chemistry, University of Tartu, 2 Jakobi Str., Tartu 51014, Estonia
| | - Uko Maran
- Institute of Chemistry, University of Tartu, 2 Jakobi Str., Tartu 51014, Estonia
| |
Collapse
|
12
|
Yuan H, Wang Y, Cheng Y. Local and Global Quantitative Structure−Activity Relationship Modeling and Prediction for the Baseline Toxicity. J Chem Inf Model 2006; 47:159-69. [PMID: 17238261 DOI: 10.1021/ci600299j] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The predictive accuracy of the model is of the most concern for computational chemists in quantitative structure-activity relationship (QSAR) investigations. It is hypothesized that the model based on analogical chemicals will exhibit better predictive performance than that derived from diverse compounds. This paper develops a novel scheme called "clustering first, and then modeling" to build local QSAR models for the subsets resulted from clustering of the training set according to structural similarity. For validation and prediction, the validation set and test set were first classified into the corresponding subsets just as those of the training set, and then the prediction was performed by the relevant local model for each subset. This approach was validated on two independent data sets by local modeling and prediction of the baseline toxicity for the fathead minnow. In this process, hierarchical clustering was employed for cluster analysis, k-nearest neighbor for classification, and partial least squares for the model generation. The statistical results indicated that the predictive performances of the local models based on the subsets were much superior to those of the global model based on the whole training set, which was consistent with the hypothesis. This approach proposed here is promising for extension to QSAR modeling for various physicochemical properties, biological activities, and toxicities.
Collapse
Affiliation(s)
- Hua Yuan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310027, China
| | | | | |
Collapse
|
13
|
Hawkins DM, Basak SC, Mills D. QSARs for chemical mutagens from structure: ridge regression fitting and diagnostics. ENVIRONMENTAL TOXICOLOGY AND PHARMACOLOGY 2004; 16:37-44. [PMID: 21782692 DOI: 10.1016/j.etap.2003.09.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2003] [Accepted: 09/08/2003] [Indexed: 05/31/2023]
Abstract
QSAR models have been developed for a diverse set of mutagens using computed molecular descriptors. Such models can be used in predicting mutagenicity from structure. All common methods-regression, neural nets, k-nearest neighbors-are 'linear smoothers'-weighted averages of the activities in the calibration data with weights dependent on the descriptors. While they have been studied extensively, a vital but overlooked area is 'case diagnostics', pointers to compounds that are poorly fitted, or are unusually influential in fitting the model. This is particularly true where the measured activity is binary-present or absent. We illustrate the use of numeric and graphic diagnostics, particularly that of the FF plot, with a data set with 508 compounds and 307 structural descriptors used to predict mutagenicity.
Collapse
Affiliation(s)
- Douglas M Hawkins
- School of Statistics, University of Minnesota, 313 Ford Hall, 224 Church Street S.E., Minneapolis, MN 55455, USA
| | | | | |
Collapse
|
14
|
Mattioni BE, Kauffman GW, Jurs PC, Custer LL, Durham SK, Pearl GM. Predicting the genotoxicity of secondary and aromatic amines using data subsetting to generate a model ensemble. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2003; 43:949-63. [PMID: 12767154 DOI: 10.1021/ci034013i] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Binary quantitative structure-activity relationship (QSAR) models are developed to classify a data set of 334 aromatic and secondary amine compounds as genotoxic or nongenotoxic based on information calculated solely from chemical structure. Genotoxic endpoints for each compound were determined using the SOS Chromotest in both the presence and absence of an S9 rat liver homogenate. Compounds were considered genotoxic if assay results indicated a positive genotoxicity hit for either the S9 inactivated or S9 activated assay. Each compound in the data set was encoded through the calculation of numerical descriptors that describe various aspects of chemical structure (e.g. topological, geometric, electronic, polar surface area). Furthermore, five additional descriptors that focused on the secondary and aromatic nitrogen atoms in each molecule were calculated specifically for this study. Descriptor subsets were examined using a genetic algorithm search engine interfaced with a k-Nearest Neighbor fitness evaluator to find the most information-rich subsets, which ultimately served as the final predictive models. Models were chosen for their ability to minimize the total number of misclassifications, with special attention given to those models that possessed fewer occurrences of positive toxicity hits being misclassified as nontoxic (false negatives). In addition, a subsetting procedure was used to form an ensemble of models using different combinations of compounds in the training and prediction sets. This was done to ensure that consistent results could be obtained regardless of training set composition. The procedure also allowed for each compound to be externally validated three times by different training set data with the resultant predictions being used in a "majority rules" voting scheme to produce a consensus prediction for each member of the data set. The individual models produced an average training set classification rate of 71.6% and an average prediction set classification rate of 67.7%. However, the model ensemble was able to correctly classify the genotoxicity of 72.2% of all prediction set compounds.
Collapse
Affiliation(s)
- Brian E Mattioni
- Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory, University Park, Pennsylvania 16802, USA
| | | | | | | | | | | |
Collapse
|
15
|
Kaiser K. Neural networks for effect prediction in environmental and health issues using large datasets. ACTA ACUST UNITED AC 2003. [DOI: 10.1002/qsar.200390010] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
16
|
|