1
|
Karadžić Banjac MŽ, Kovačević SZ, Jevrić LR, Podunavac-Kuzmanović SO, Mandić AI. On the characterization of novel biologically active steroids: Selection of lipophilicity models of newly synthesized steroidal derivatives by classical and non-parametric ranking approaches. Comput Biol Chem 2019; 80:23-30. [DOI: 10.1016/j.compbiolchem.2019.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 03/09/2019] [Indexed: 10/27/2022]
|
2
|
Idakwo G, Luttrell J, Chen M, Hong H, Zhou Z, Gong P, Zhang C. A review on machine learning methods for in silico toxicity prediction. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART C, ENVIRONMENTAL CARCINOGENESIS & ECOTOXICOLOGY REVIEWS 2019; 36:169-191. [PMID: 30628866 DOI: 10.1080/10590501.2018.1537118] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
In silico toxicity prediction plays an important role in the regulatory decision making and selection of leads in drug design as in vitro/vivo methods are often limited by ethics, time, budget, and other resources. Many computational methods have been employed in predicting the toxicity profile of chemicals. This review provides a detailed end-to-end overview of the application of machine learning algorithms to Structure-Activity Relationship (SAR)-based predictive toxicology. From raw data to model validation, the importance of data quality is stressed as it greatly affects the predictive power of derived models. Commonly overlooked challenges such as data imbalance, activity cliff, model evaluation, and definition of applicability domain are highlighted, and plausible solutions for alleviating these challenges are discussed.
Collapse
Affiliation(s)
- Gabriel Idakwo
- a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA
| | - Joseph Luttrell
- a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA
| | - Minjun Chen
- b Division of Bioinformatics and Biostatistics, National Center for Toxicological Science , US Food and Drug Administration , Jefferson , Arkansas , USA
| | - Huixiao Hong
- b Division of Bioinformatics and Biostatistics, National Center for Toxicological Science , US Food and Drug Administration , Jefferson , Arkansas , USA
| | - Zhaoxian Zhou
- a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA
| | - Ping Gong
- c Environmental Laboratory , US Army Engineer Research and Development Center , Vicksburg , Mississippi , USA
| | - Chaoyang Zhang
- a School of Computing Sciences and Computer Engineering , University of Southern Mississippi , Hattiesburg , Mississippi , USA
| |
Collapse
|
3
|
Jevrić LR, Karadžić MŽ, Podunavac-Kuzmanović SO, Tepić Horecki AN, Kovačević SZ, Vidović SS, Šumić ZM, Ilin ŽM. New guidelines for prediction of antioxidant activity of Lactuca sativaL. varieties based on phytochemicals content and multivariate chemometrics. J FOOD PROCESS PRES 2017. [DOI: 10.1111/jfpp.13355] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Lidija R. Jevrić
- Faculty of Technology Novi Sad; University of Novi Sad; Bulevar cara Lazara 1, Novi Sad, 21000 Serbia
| | - Milica Ž. Karadžić
- Faculty of Technology Novi Sad; University of Novi Sad; Bulevar cara Lazara 1, Novi Sad, 21000 Serbia
| | | | | | - Strahinja Z. Kovačević
- Faculty of Technology Novi Sad; University of Novi Sad; Bulevar cara Lazara 1, Novi Sad, 21000 Serbia
| | - Senka S. Vidović
- Faculty of Technology Novi Sad; University of Novi Sad; Bulevar cara Lazara 1, Novi Sad, 21000 Serbia
| | - Zdravko M. Šumić
- Faculty of Technology Novi Sad; University of Novi Sad; Bulevar cara Lazara 1, Novi Sad, 21000 Serbia
| | - Žarko M. Ilin
- Faculty of Agriculture; University of Novi Sad; Trg Dositeja Obradovića 8, Novi Sad, 21000 Serbia
| |
Collapse
|
4
|
How to compare separation selectivity of high-performance liquid chromatographic columns properly? J Chromatogr A 2017; 1488:45-56. [DOI: 10.1016/j.chroma.2017.01.066] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Revised: 01/23/2017] [Accepted: 01/24/2017] [Indexed: 11/24/2022]
|
5
|
Rácz A, Bajusz D, Héberger K. Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2015; 26:683-700. [PMID: 26434574 DOI: 10.1080/1062936x.2015.1084647] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2015] [Accepted: 08/16/2015] [Indexed: 06/05/2023]
Abstract
Recent implementations of QSAR modelling software provide the user with numerous models and a wealth of information. In this work, we provide some guidance on how one should interpret the results of QSAR modelling, compare and assess the resulting models, and select the best and most consistent ones. Two QSAR datasets are applied as case studies for the comparison of model performance parameters and model selection methods. We demonstrate the capabilities of sum of ranking differences (SRD) in model selection and ranking, and identify the best performance indicators and models. While the exchange of the original training and (external) test sets does not affect the ranking of performance parameters, it provides improved models in certain cases (despite the lower number of molecules in the training set). Performance parameters for external validation are substantially separated from the other merits in SRD analyses, highlighting their value in data fusion.
Collapse
Affiliation(s)
- A Rácz
- a Plasma Chemistry Research Group , Hungarian Academy of Sciences , Budapest , Hungary
- b Department of Applied Chemistry , Corvinus University of Budapest , Budapest , Hungary
| | - D Bajusz
- c Medicinal Chemistry Research Group , Hungarian Academy of Sciences , Budapest , Hungary
| | - K Héberger
- a Plasma Chemistry Research Group , Hungarian Academy of Sciences , Budapest , Hungary
| |
Collapse
|
6
|
Generalized Pairwise Correlation and method comparison: Impact assessment for JAR attributes on overall liking. Food Qual Prefer 2015. [DOI: 10.1016/j.foodqual.2015.02.017] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
|
8
|
Rajappan R, Shingade PD, Natarajan R, Jayaraman VK. Quantitative Structure−Property Relationship (QSPR) Prediction of Liquid Viscosities of Pure Organic Compounds Employing Random Forest Regression. Ind Eng Chem Res 2009. [DOI: 10.1021/ie8018406] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Affiliation(s)
- Remya Rajappan
- Centre for Mathematical Sciences Pala Campus, Arunapuram, Kerala, India 686 574, Chemical Engineering and Process Development Division, National Chemical Laboratory, Pune, India 411 008, and Department of Chemical Engineering, Lakehead University, 955 Oliver Road Thunder Bay, ON, Canada P7B 5E1
| | - Prashant D. Shingade
- Centre for Mathematical Sciences Pala Campus, Arunapuram, Kerala, India 686 574, Chemical Engineering and Process Development Division, National Chemical Laboratory, Pune, India 411 008, and Department of Chemical Engineering, Lakehead University, 955 Oliver Road Thunder Bay, ON, Canada P7B 5E1
| | - Ramanathan Natarajan
- Centre for Mathematical Sciences Pala Campus, Arunapuram, Kerala, India 686 574, Chemical Engineering and Process Development Division, National Chemical Laboratory, Pune, India 411 008, and Department of Chemical Engineering, Lakehead University, 955 Oliver Road Thunder Bay, ON, Canada P7B 5E1
| | - Valadi K. Jayaraman
- Centre for Mathematical Sciences Pala Campus, Arunapuram, Kerala, India 686 574, Chemical Engineering and Process Development Division, National Chemical Laboratory, Pune, India 411 008, and Department of Chemical Engineering, Lakehead University, 955 Oliver Road Thunder Bay, ON, Canada P7B 5E1
| |
Collapse
|
9
|
Zagyi M, Cserháti T. Quantitative Structure‐Retention Relationship Study on the Binding of Organic Solvents to the Corn Protein, Zein. J LIQ CHROMATOGR R T 2007. [DOI: 10.1080/10826070601084795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Affiliation(s)
- M. Zagyi
- a Institute of Materials and Environmental Chemistry, Chemical Research Centre, Hungarian Academy of Sciences , Budapest, Hungary
| | - T. Cserháti
- a Institute of Materials and Environmental Chemistry, Chemical Research Centre, Hungarian Academy of Sciences , Budapest, Hungary
| |
Collapse
|
10
|
Van Gyseghem E, Dejaegher B, Put R, Forlay-Frick P, Elkihel A, Daszykowski M, Héberger K, Massart DL, Heyden YV. Evaluation of chemometric techniques to select orthogonal chromatographic systems. J Pharm Biomed Anal 2006; 41:141-51. [PMID: 16352413 DOI: 10.1016/j.jpba.2005.11.007] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2005] [Accepted: 11/06/2005] [Indexed: 11/30/2022]
Abstract
Several chemometric techniques were compared for their performance to determine the orthogonality and similarity between chromatographic systems. Pearson's correlation coefficient (r) based color maps earlier were used to indicate selectivity differences between systems. These maps, in which the systems were ranked according to decreasing or increasing dissimilarities observed in the weighted-average-linkage dendrogram, were now applied as reference method. A number of chemometric techniques were evaluated as potential alternative (visualization) methods for the same purpose. They include hierarchical clustering techniques (single, complete, unweighted-average-linkage, centroid and Ward's method), the Kennard and Stone algorithm, auto-associative multivariate regression trees (AAMRT), and the generalized pairwise correlation method (GPCM) with McNemar's statistical test. After all, the reference method remained our preferred technique to select orthogonal and identify similar systems.
Collapse
Affiliation(s)
- E Van Gyseghem
- Department of Analytical Chemistry and Pharmaceutical Technology, A VICIM Partner, Vrije Universiteit Brussel-VUB, Laarbeeklaan 103, B-1090 Brussels, Belgium
| | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Basak SC, Natarajan R, Mills D, Hawkins DM, Kraker JJ. Quantitative Structure−Activity Relationship Modeling of Juvenile Hormone Mimetic Compounds for Culex Pipiens Larvae, with a Discussion of Descriptor-Thinning Methods. J Chem Inf Model 2005; 46:65-77. [PMID: 16426041 DOI: 10.1021/ci050215y] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Quantitative structure-activity relationship (QSAR) modelers often encounter the problem of multicollinearity owing to the availability of large numbers of computable molecular descriptors. Sparsity of the variables while using descriptors such as atom pairs increases the complexity. Three different predictor-thinning methods, namely, a modified Gram-Schmidt algorithm, a marginal soft thresholding algorithm, and LASSO (least absolute shrinkage and selection operator), were utilized to reduce the number of descriptors prior to developing linear models. Juvenile hormone (JH) activity of 304 compounds on Culex pipiens larvae was taken as the model data set, and predictor trimming of a large number of diverse descriptors comprising 268 global molecular descriptors (topostructural, topochemical, and geometrical), 13 quantum chemical descriptors, and 915 atom pairs (substructural counts) was applied prior to linear regression by the ridge regression method. The data set (N = 304) was split into five calibration data sets of random samples of sizes 60/110/160/210/260, and the remaining 244/194/144/94/44 compounds were used for validations. LASSO was not found to be a very effective method in handling a large set of descriptors because the number of predictors retained could not exceed the number of observations. The results indicated that the modified Gram-Schmidt algorithm could be used to trim the number of predictors in the global molecular descriptor set where collinearity of the descriptors was the major concern. On the contrary, the soft thresholding approach was found to be an effective tool in subset selection from a diverse set of descriptors having both sparsity and multicollinearity, as in the case of the combined set of atom pairs and global molecular descriptors. The final model developed after variable selection was dominated more by atom pairs, which indicated the important structural moieties that affect JH activity of the compounds. The success of the method reiterates the fact that QSAR or quantitative structure-property relationship (QSPR) models can be developed for a diverse set of compounds using properly parametrized and diverse sets of descriptors, of course, with the selection of the appropriate statistical tools.
Collapse
Affiliation(s)
- Subhash C Basak
- Natural Resources Research Institute, Center for Water and Environment, University of Minnesota-Duluth, 55811, USA.
| | | | | | | | | |
Collapse
|
12
|
Forlay-Frick P, Van Gyseghem E, Héberger K, Vander Heyden Y. Selection of orthogonal chromatographic systems based on parametric and non-parametric statistical tests. Anal Chim Acta 2005. [DOI: 10.1016/j.aca.2005.02.058] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
13
|
Farkas O, Héberger K. Comparison of Ridge Regression, Partial Least-Squares, Pairwise Correlation, Forward- and Best Subset Selection Methods for Prediction of Retention Indices for Aliphatic Alcohols. J Chem Inf Model 2005; 45:339-46. [PMID: 15807497 DOI: 10.1021/ci049827t] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A quantitative structure-retention relationship (QSRR) study based on multiple linear regression (MLR) was performed for the description and prediction of Kováts retention indices (RI) of alcohol compounds. Alcohols were of saturated, linear or branched types and contained a hydroxyl group on the primary, secondary or tertiary carbon atoms. Constitutive and weighted holistic invariant molecular (WHIM) descriptors were used to represent the structure of alcohols in the MLR models. Before the model building, five variable selection methods were applied to select the most relevant variables from a large set of descriptors, respectively. The selected molecular properties were included into the MLR models. The efficiency of the variable selection methods was also compared. The selection methods were as follows: ridge regression (RR), partial least-squares method (PLS), pair-correlation method (PCM), forward selection (FS) and best subset selection (BSS). The stability and the validity of the MLR models were tested by a cross-validation technique using a leave-n-out technique. Neither RR nor PLS selected variables were able to describe the Kováts retention index properly, and PCM gave reliable results in the description but not for prediction. We built models with good predicting ability using FS and BSS as a selection method. The most relevant variables in the description and prediction of RIs were the mean electrotopological state index, the molecular mass, and WHIM indices characterizing size and shape.
Collapse
Affiliation(s)
- Orsolya Farkas
- Institute of Chemistry, Chemical Research Center, Hungarian Academy of Sciences, H-1525 Budapest, P.O. Box 17, Hungary.
| | | |
Collapse
|