1
|
Al-Fakih AM, Qasim MK, Algamal ZY, Alharthi AM, Zainal-Abidin MH. QSAR classification model for diverse series of antifungal agents based on binary coyote optimization algorithm. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023; 34:285-298. [PMID: 37157994 DOI: 10.1080/1062936x.2023.2208374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
One of the recently developed metaheuristic algorithms, the coyote optimization algorithm (COA), has shown to perform better in a number of difficult optimization tasks. The binary form, BCOA, is used in this study as a solution to the descriptor selection issue in classifying diverse antifungal series. Z-shape transfer functions (ZTF) are evaluated to verify their efficiency in improving BCOA performance in QSAR classification based on classification accuracy (CA), the geometric mean of sensitivity and specificity (G-mean), and the area under the curve (AUC). The Kruskal-Wallis test is also applied to show the statistical differences between the functions. The efficacy of the best suggested transfer function, ZTF4, is further assessed by comparing it to the most recent binary algorithms. The results prove that ZTF, especially ZTF4, significantly improves the performance of the original BCOA. The ZTF4 function yields the best CA and G-mean of 99.03% and 0.992%, respectively. It shows the fastest convergence behaviour compared to other binary algorithms. It takes the fewest iterations to reach high classification performance and selects the fewest descriptors. In conclusion, the obtained results indicate the ability of the ZTF4-based BCOA to find the smallest subset of descriptors while maintaining the best classification accuracy performance.
Collapse
Affiliation(s)
- A M Al-Fakih
- Department of Chemistry, Faculty of Science, Universiti Teknologi Malaysia, Johor, Malaysia
- Department of Chemistry, Faculty of Science, Sana'a University, Sana'a, Yemen
| | - M K Qasim
- Department of General Science, University of Mosul, Mosul, Iraq
| | - Z Y Algamal
- Department of Statistics and Informatics, University of Mosul, Mosul, Iraq
- College of Engineering, University of Warith Al-Anbiyaa, Karbala, Iraq
| | - A M Alharthi
- Department of Mathematics, Turabah University College, Taif University, Taif, Saudi Arabia
| | - M H Zainal-Abidin
- Department of Chemistry, Faculty of Science, Universiti Teknologi Malaysia, Johor, Malaysia
| |
Collapse
|
2
|
Survival Prediction Model for Patients with Esophageal Squamous Cell Carcinoma Based on the Parameter-Optimized Deep Belief Network Using the Improved Archimedes Optimization Algorithm. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:1924906. [PMID: 35844460 PMCID: PMC9286952 DOI: 10.1155/2022/1924906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/24/2022] [Indexed: 11/27/2022]
Abstract
Esophageal squamous cell carcinoma (ESCC) is one of the highest incidence and mortality cancers in the world. An effective survival prediction model can improve the quality of patients' survival. Therefore, a parameter-optimized deep belief network based on the improved Archimedes optimization algorithm is proposed in this paper for the survival prediction of patients with ESCC. Firstly, a combination of features significantly associated with the survival of patients is found by the minimum redundancy and maximum relevancy (MRMR) algorithm. Secondly, a DBN network is introduced to make predictions for survival of patients. Aiming at the problem that the deep belief network model is affected by parameters in the construction process, this paper uses the Archimedes optimization algorithm to optimize the learning rate α and batch size β of DBN. In order to overcome the problem that AOA is prone to fall into local optimum and low search accuracy, an improved Archimedes optimization algorithm (IAOA) is proposed. On this basis, a survival prediction model for patients with ESCC is constructed. Finally, accuracy comparison tests are carried out on IAOA-DBN, AOA-DBN, SSA-DBN, PSO-DBN, BES-DBN, IAOA-SVM, and IAOA-BPNN models. The results show that the IAOA-DBN model can effectively predict the five-year survival rate of patients and provide a reference for the clinical judgment of patients with ESCC.
Collapse
|
3
|
Al-Fakih AM, Algamal ZY, Qasim MK. An improved opposition-based crow search algorithm for biodegradable material classification. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:403-415. [PMID: 35469528 DOI: 10.1080/1062936x.2022.2064546] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 04/05/2022] [Indexed: 06/14/2023]
Abstract
The development of a reliable quantitative structure-activity relationship (QSAR) classification model with a small number of molecular descriptors is a crucial step in chemometrics. In this study, an improvement of crow search algorithm (CSA) is proposed by adapting the opposite-based learning (OBL) approach, which is named as OBL-CSA, to improve the exploration and exploitation capability of the CSA in quantitative structure-biodegradation relationship (QSBR) modelling of classifying the biodegradable materials. The results reveal that the performance of OBL-CSA not only manifest in improving the classification performance, but also in reduced computational time required to complete the process when compared to the standard CSA and other four optimization algorithms tested, which are the particle swarm algorithm (PSO), black hole algorithm (BHA), grey wolf algorithm (GWA), and whale optimization algorithm (WOA). In conclusion, the OBL-CSA could be a valuable resource in the classification of biodegradable materials.
Collapse
Affiliation(s)
- A M Al-Fakih
- Department of Chemistry, Faculty of Science, Universiti Teknologi Malaysia, Johor, Malaysia and Department of Chemistry, Faculty of Science, Sana'a University, Sana'a, Yemen
| | - Z Y Algamal
- Department of Statistics and Informatics, University of Mosul, Mosul, Iraq
| | - M K Qasim
- Department of General Science, University of Mosul, Mosul, Iraq
| |
Collapse
|
4
|
A multi-leader Harris hawk optimization based on differential evolution for feature selection and prediction influenza viruses H1N1. Artif Intell Rev 2021. [DOI: 10.1007/s10462-021-10075-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
5
|
Kaneko H. Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables. Heliyon 2021; 7:e07356. [PMID: 34195450 PMCID: PMC8237311 DOI: 10.1016/j.heliyon.2021.e07356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 05/02/2021] [Accepted: 06/16/2021] [Indexed: 11/24/2022] Open
Abstract
The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X and number of selected variables that are unrelated to an objective variable, such as activities and properties (y), were investigated to evaluate the variable or feature selection methods. Variable selection methods include least absolute shrinkage and selection operator, genetic algorithm-based partial least squares, genetic algorithm-based support vector regression, and Boruta. Several regression analysis methods were used to test the prediction accuracy of the model constructed using the selected X. The characteristics of each variable selection method were analyzed using eight datasets. The results showed that even when variables unrelated to y were selected by variable selection and the number of unrelated variables was the same as the number of the original variables, a regression model with good accuracy, which ignores the influence of such noise variables, can be constructed by applying various regression analysis methods. Additionally, the variables related to y must not to be deleted. These findings provide a basis for improving the variable selection methods.
Collapse
Affiliation(s)
- Hiromasa Kaneko
- Department of Applied Chemistry, School of Science and Technology, Meiji University, 1-1-1 Higashi-Mita, Tama-ku, Kawasaki, Kanagawa 214-8571, Japan
| |
Collapse
|
6
|
Algamal ZY, Qasim MK, Lee MH, Ali HTM. QSAR model for predicting neuraminidase inhibitors of influenza A viruses (H1N1) based on adaptive grasshopper optimization algorithm. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2020; 31:803-814. [PMID: 32938208 DOI: 10.1080/1062936x.2020.1818616] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 08/31/2020] [Indexed: 06/11/2023]
Abstract
High-dimensionality is one of the major problems which affect the quality of the quantitative structure-activity relationship (QSAR) modelling. Obtaining a reliable QSAR model with few descriptors is an essential procedure in chemometrics. The binary grasshopper optimization algorithm (BGOA) is a new meta-heuristic optimization algorithm, which has been used successfully to perform feature selection. In this paper, four new transfer functions were adapted to improve the exploration and exploitation capability of the BGOA in QSAR modelling of influenza A viruses (H1N1). The QSAR model with these new quadratic transfer functions was internally and externally validated based on MSEtrain, Y-randomization test, MSEtest, and the applicability domain (AD). The validation results indicate that the model is robust and not due to chance correlation. In addition, the results indicate that the descriptor selection and prediction performance of the QSAR model for training dataset outperform the other S-shaped and V-shaped transfer functions. QSAR model using quadratic transfer function shows the lowest MSEtrain. For the test dataset, proposed QSAR model shows lower value of MSEtest compared with the other methods, indicating its higher predictive ability. In conclusion, the results reveal that the proposed QSAR model is an efficient approach for modelling high-dimensional QSAR models and it is useful for the estimation of IC50 values of neuraminidase inhibitors that have not been experimentally tested.
Collapse
Affiliation(s)
- Z Y Algamal
- Department of Statistics and Informatics, University of Mosul , Mosul, Iraq
| | - M K Qasim
- Department of General Science, University of Mosul , Mosul, Iraq
| | - M H Lee
- Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia , Johor, Malaysia
| | - H T M Ali
- College of Computers and Information Technology, Nawroz University , Dahuk, Iraq
| |
Collapse
|
7
|
Xia Y, Zhang H. 13C NMR chemical shift prediction of diverse chemical compounds. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:477-490. [PMID: 31155931 DOI: 10.1080/1062936x.2019.1619621] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 05/13/2019] [Indexed: 06/09/2023]
Abstract
Selection of key descriptors is very important in QSPR analysis. Presence of noise in the subset of descriptors reduces the quality of predictions. A complete set is considered as perfect when it does not include irrelevant or redundant elements. This paper reports complete sets of descriptors used to develop QSPR models for 1786 13C NMR chemical shifts (δC parameters) of carbon atoms in 125 diverse chemical compounds. PBE1PBE/6-311G(2d,2p) and B3LYP/6-31G(d) basis sets were used for quantum chemistry calculations after the molecular structures were optimized with semi-empirical AM1 and B3LYP/6-31G(d). The two complete sets consisting of magnetic shielding elements (σXX, σYY, σZZ) and the chemical shift principal values (σ11, σ22, σ33) were used as the inputs for support vector machine (SVM) models of δC parameters. The four SVM models obtained have the mean root mean square (rms) errors of about 4.5-4.6 ppm. The results suggest that SVM models are accurate and acceptable compared with previous models, although our models are based on a relatively large set of compounds. Our approach is valuable in the selection of important descriptors for QSPR studies of δC parameters.
Collapse
Affiliation(s)
- Y Xia
- a China Key Laboratory of Advanced Packaging Materials and Technology of Hunan Province, School of Packaging and Materials Engineering , Hunan University of Technology , Zhuzhou , China
| | - H Zhang
- b Chinese Mechanical Engineering Society , Beijing , China
| |
Collapse
|
8
|
Golbraikh A. Value of p-Value. Mol Inform 2019; 38:e1800152. [PMID: 31188542 DOI: 10.1002/minf.201800152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 05/07/2019] [Indexed: 11/09/2022]
Abstract
The goal of this manuscript is to discuss important aspects of external validation of classification and category Quantitative Structure - Activity/Property/Toxicity Relationship QS/A/P/T/R models that to the best of author's knowledge are not addressed in publications. Statistical significance (in terms of p-value) and accuracy of prediction (in terms of Correct Classification Rate (CCR)) of external validation set compounds are among most important characteristics of the models. We assert that in most cases the models built for classification or category response variable should be statistically significant and predictive for each class or category. We show that three thresholds of the number of compounds in each class or category of the external validation sets should be satisfied. 1) The p-value criterion can never be satisfied, if the number of compounds is below the first threshold. 2) If the number of compounds is between the first and the second thresholds, p-value criterion should be used. 3) If it is higher than the third threshold, classification or category accuracy criterion should be used. 4) If the number of compounds is between second and third thresholds, either one or the other criterion should be used depending on the value of p-value. 5) When the number of compounds in the class approaches infinity, the maximum relative error of prediction approaches the relative expected error. The results are of interest in other areas of multidimensional data analysis.
Collapse
Affiliation(s)
- Alexander Golbraikh
- Laboratory for Molecular Modeling, University of North Carolina at Chapel Hill, CB #7360, Chapel Hill, NC 27599
| |
Collapse
|
9
|
Al-Dabbagh ZT, Algamal ZY. A robust quantitative structure-activity relationship modelling of influenza neuraminidase a/PR/8/34 (H1N1) inhibitors based on the rank-bridge estimator. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:417-428. [PMID: 31122071 DOI: 10.1080/1062936x.2019.1613261] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Accepted: 04/26/2019] [Indexed: 06/09/2023]
Abstract
Linear regression model is frequently encountered in quantitative structure-activity relationship (QSAR) modelling. The traditional estimation of regression model parameters is based on the normal assumption of the response variable (biological activity) and therefore, it is sensitive to outliers or heavy-tailed distributions. Robust penalized regression methods have been given considerable attention because they combine the robust estimation method with penalty terms to perform QSAR parameter estimation and variable selection (descriptor selection) simultaneously. In this paper, based on bridge penalty, a robust QSAR model of the influenza neuraminidase a/PR/8/34 (H1N1) inhibitors is proposed as a resistant method to the existence of outliers or heavy-tailed errors. The basic idea is to combine the rank regression and the bridge penalty together to produce the rank-bridge method. The rank-bridge model is internally and externally validated based on Qint2 , QLGO2 , QBoot2 , MSEtrain , Y-randomization test, Qext2 , MSEtest and the applicability domain (AD). The validation results indicate that the rank-bridge model is robust and not due to chance correlation. In addition, the results indicate that the descriptor selection and prediction performance of the rank-bridge model for training dataset outperforms the other two used modelling methods. Rank-bridge model shows the highest Qint2 , QLGO2 and QBoot2 , and the lowest MSEtrain . For the test dataset, rank-bridge model shows higher external validation value ( Qext2 = 0.824), and lower value of MSEtest compared with the other methods, indicating its higher predictive ability.
Collapse
Affiliation(s)
- Z T Al-Dabbagh
- a Department of Operations Research and Artificial Intelligence , University of Mosul , Mosul , Iraq
| | - Z Y Algamal
- b Department of Statistics and Informatics , University of Mosul , Mosul , Iraq
| |
Collapse
|
10
|
Al-Fakih AM, Algamal ZY, Lee MH, Aziz M, Ali HTM. QSAR classification model for diverse series of antifungal agents based on improved binary differential search algorithm. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2019; 30:131-143. [PMID: 30734580 DOI: 10.1080/1062936x.2019.1568298] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Accepted: 01/08/2019] [Indexed: 06/09/2023]
Abstract
An improved binary differential search (improved BDS) algorithm is proposed for QSAR classification of diverse series of antimicrobial compounds against Candida albicans inhibitors. The transfer functions is the most important component of the BDS algorithm, and converts continuous values of the donor into discrete values. In this paper, the eight types of transfer functions are investigated to verify their efficiency in improving BDS algorithm performance in QSAR classification. The performance was evaluated using three metrics: classification accuracy (CA), geometric mean of sensitivity and specificity (G-mean), and area under the curve. The Kruskal-Wallis test was also applied to show the statistical differences between the functions. Two functions, S1 and V4, show the best classification achievement, with a slightly better performance of V4 than S1. The V4 function takes the lowest iterations and selects the fewest descriptors. In addition, the V4 function yields the best CA and G-mean of 98.07% and 0.977%, respectively. The results prove that the V4 transfer function significantly improves the performance of the original BDS.
Collapse
Affiliation(s)
- A M Al-Fakih
- a Department of Chemistry , Universiti Teknologi Malaysia , Johor , Malaysia
- b Department of Chemistry , Sana'a University , Sana'a , Yemen
| | - Z Y Algamal
- c Department of Statistics and Informatics , University of Mosul , Mosul , Iraq
| | - M H Lee
- d Department of Mathematical Sciences , Universiti Teknologi Malaysia , Johor , Malaysia
| | - M Aziz
- a Department of Chemistry , Universiti Teknologi Malaysia , Johor , Malaysia
- e Advanced Membrane Technology Centre, Universiti Teknologi Malaysia , Johor , Malaysia
| | - H T M Ali
- f College of Computers and Information Technology, Nawroz University , Kurdistan region , Iraq
| |
Collapse
|
11
|
Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Hansen Solubility Parameters Based on 1D and 2D Molecular Descriptors Computed from SMILES String. J CHEM-NY 2019. [DOI: 10.1155/2019/9858371] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A new method of Hansen solubility parameters (HSPs) prediction was developed by combining the multivariate adaptive regression splines (MARSplines) methodology with a simple multivariable regression involving 1D and 2D PaDEL molecular descriptors. In order to adopt the MARSplines approach to QSPR/QSAR problems, several optimization procedures were proposed and tested. The effectiveness of the obtained models was checked via standard QSPR/QSAR internal validation procedures provided by the QSARINS software and by predicting the solubility classification of polymers and drug-like solid solutes in collections of solvents. By utilizing information derived only from SMILES strings, the obtained models allow for computing all of the three Hansen solubility parameters including dispersion, polarization, and hydrogen bonding. Although several descriptors are required for proper parameters estimation, the proposed procedure is simple and straightforward and does not require a molecular geometry optimization. The obtained HSP values are highly correlated with experimental data, and their application for solving solubility problems leads to essentially the same quality as for the original parameters. Based on provided models, it is possible to characterize any solvent and liquid solute for which HSP data are unavailable.
Collapse
|
12
|
Application of Support Vector Machines in Viral Biology. GLOBAL VIROLOGY III: VIROLOGY IN THE 21ST CENTURY 2019. [PMCID: PMC7114997 DOI: 10.1007/978-3-030-29022-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Novel experimental and sequencing techniques have led to an exponential explosion and spiraling of data in viral genomics. To analyse such data, rapidly gain information, and transform this information to knowledge, interdisciplinary approaches involving several different types of expertise are necessary. Machine learning has been in the forefront of providing models with increasing accuracy due to development of newer paradigms with strong fundamental bases. Support Vector Machines (SVM) is one such robust tool, based rigorously on statistical learning theory. SVM provides very high quality and robust solutions to classification and regression problems. Several studies in virology employ high performance tools including SVM for identification of potentially important gene and protein functions. This is mainly due to the highly beneficial aspects of SVM. In this chapter we briefly provide lucid and easy to understand details of SVM algorithms along with applications in virology.
Collapse
|