1
|
Çelik Ş, Tutar H, Gönülal E, Er H. Prediction of fresh herbage yield using data mining techniques with limited plant quality parameters. Sci Rep 2024; 14:21396. [PMID: 39271726 PMCID: PMC11399138 DOI: 10.1038/s41598-024-72746-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 09/10/2024] [Indexed: 09/15/2024] Open
Abstract
The purpose of this study was to ascertain the fresh herbage yield, fertilizer dosage, and plant characteristics of the Sorghum-Sudangrass hybrid grown in arid and semi-arid regions, as well as their interrelationships. For this reason, data from the Sorghum-Sudangrass hybrid were used to assess the predictive performance of several data mining techniques, including CHAID, CART, MARS, and Bagging MARS. Plant traits were measured in Konya and Sanliurfa during 2021 and 2022. The descriptive statistical values were calculated as follows: plant height 306.27 cm, stem diameter 9.47 mm, fresh herbage yield 10852.51 kg/da, crude protein ratio 9.66%, acid detergent fiber 33.39%, neutral detergent fiber 51.85%, acid detergent lignin 9.76%, dry matter digestibility 62.88%, dry matter intake 2.34%, and relative feed value 114.68 (average values). The predictive capacities of the fitted models were assessed using model fit statistics such as the coefficient of determination (R²), adjusted R², root mean square error (RMSE), mean absolute percentage error (MAPE), standard deviation ratio (SD ratio), and Akaike Information Criterion (AIC). With the lowest values for RMSE, MAPE, SD ratio, and AIC (246, 1.926, 0.085, and 845, respectively), and the highest R² value (0.993) and adjusted R² value (0.989), the MARS algorithm was determined to be the best model for characterizing fresh herbage yield. As a solid alternative to other data mining techniques, the MARS algorithm was shown to be the most appropriate model for forecasting fresh herbage production.
Collapse
Affiliation(s)
- Şenol Çelik
- Biometry and Genetic Unit, Department of Animal Science, Faculty of Agriculture, Bingol University, 12000, Bingöl, Turkey.
| | - Halit Tutar
- Department of Field Crops, Faculty of Agriculture, Bingol University, 12000, Bingöl, Turkey
| | - Erdal Gönülal
- Bahri Dagdas International Agriculture Research Institute, 42000, Konya, Turkey
| | - Hasan Er
- Department of Biosystems Engineering, Faculty of Agriculture, Bingol University, 12000, Bingöl, Turkey
| |
Collapse
|
2
|
Kaveh S, Mani-Varnosfaderani A, Neiband MS. Deriving general structure-activity/selectivity relationship patterns for different subfamilies of cyclin-dependent kinase inhibitors using machine learning methods. Sci Rep 2024; 14:15315. [PMID: 38961127 PMCID: PMC11222421 DOI: 10.1038/s41598-024-66173-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Accepted: 06/27/2024] [Indexed: 07/05/2024] Open
Abstract
Cyclin-dependent kinases (CDKs) play essential roles in regulating the cell cycle and are among the most critical targets for cancer therapy and drug discovery. The primary objective of this research is to derive general structure-activity relationship (SAR) patterns for modeling the selectivity and activity levels of CDK inhibitors using machine learning methods. To accomplish this, 8592 small molecules with different binding affinities to CDK1, CDK2, CDK4, CDK5, and CDK9 were collected from Binding DB, and a diverse set of descriptors was calculated for each molecule. The supervised Kohonen networks (SKN) and counter propagation artificial neural networks (CPANN) models were trained to predict the activity levels and therapeutic targets of the molecules. The validity of models was confirmed through tenfold cross-validation and external test sets. Using selected sets of molecular descriptors (e.g. hydrophilicity and total polar surface area) we derived activity and selectivity maps to elucidate local regions in chemical space for active and selective CDK inhibitors. The SKN models exhibited prediction accuracies ranging from 0.75 to 0.94 for the external test sets. The developed multivariate classifiers were used for ligand-based virtual screening of 2 million random molecules of the PubChem database, yielding areas under the receiver operating characteristic curves ranging from 0.72 to 1.00 for the SKN model. Considering the persistent challenge of achieving CDK selectivity, this research significantly contributes to addressing the issue and underscores the paramount importance of developing drugs with minimized side effects.
Collapse
Affiliation(s)
- Sara Kaveh
- Chemometrics and Cheminformatics Laboratory, Department of Analytical Chemistry, Tarbiat Modares University, Tehran, Iran
| | - Ahmad Mani-Varnosfaderani
- Chemometrics and Cheminformatics Laboratory, Department of Analytical Chemistry, Tarbiat Modares University, Tehran, Iran.
| | - Marzieh Sadat Neiband
- Department of Chemistry, Payame Noor University (PNU), P.O. Box 19395-4697, Tehran, Iran
| |
Collapse
|
3
|
Çelik Ş, Yılmaz O. Investigation of the Relationships between Coat Colour, Sex, and Morphological Characteristics in Donkeys Using Data Mining Algorithms. Animals (Basel) 2023; 13:2366. [PMID: 37508143 PMCID: PMC10376350 DOI: 10.3390/ani13142366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/02/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
This study was carried out in order to determine the morphological characteristics, body coat colour distribution, and body dimensions of donkeys raised in Turkey, as well as to determine the relationships between these factors. For this reason, the predictive performance of various machine learning algorithms (i.e., CHAID, Random Forest, ALM, MARS, and Bagging MARS) were compared, utilising the biometric data of donkeys. In particular, mean measurements were taken from a total of 371 donkeys (252 male and 119 female) with descriptive statistical values as follows: height at withers, 100.7 cm; rump height, 103.1 cm; body length, 103.8 cm; chest circumference, 112.8 cm; chest depth, 45.7 cm; chest width, 29.1 cm; front shin circumference, 13.5 cm; head length, 55 cm; and ear length, 22 cm. The body colour distribution of the donkeys considered in this study was calculated as 39.35% grey, 19.95% white, 21.83% black, and 18.87% brown. Model fit statistics, including the coefficient of determination (R2), mean square error, root-mean-square error (RMSE), mean absolute percentage error (MAPE), and standard deviation ratio (SD ratio), were calculated to measure the predictive ability of the fitted models. The MARS algorithm was found to be the best model for defining the body length of donkeys, with the highest R2 value (0.916) and the lowest RMSE, MAPE, and SD ratio values (2.173, 1.615, and 0.291, respectively). The experimental results indicate that the most suitable model is the MARS algorithm, which provides a good alternative to other data mining algorithms for predicting the body length of donkeys.
Collapse
Affiliation(s)
- Şenol Çelik
- Biometry and Genetic Unit, Department of Animal Science, Faculty of Agriculture, Bingol University, Bingol 12000, Turkey
| | - Orhan Yılmaz
- Plant and Animal Production Department, Posof Vocational School, Ardahan University, Ardahan 75000, Turkey
| |
Collapse
|
4
|
Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Antitumor Activity of Anthrapyrazole Derivatives. Int J Mol Sci 2022; 23:ijms23095132. [PMID: 35563523 PMCID: PMC9104800 DOI: 10.3390/ijms23095132] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 04/26/2022] [Accepted: 04/29/2022] [Indexed: 02/01/2023] Open
Abstract
An approach using multivariate adaptive regression splines (MARSplines) was applied for quantitative structure–activity relationship studies of the antitumor activity of anthrapyrazoles. At the first stage, the structures of anthrapyrazole derivatives were subjected to geometrical optimization by the AM1 method using the Polak–Ribiere algorithm. In the next step, a data set of 73 compounds was coded over 2500 calculated molecular descriptors. It was shown that fourteen independent variables appearing in the statistically significant MARS model (i.e., descriptors belonging to 3D-MoRSE, 2D autocorrelations, GETAWAY, burden eigenvalues and RDF descriptors), significantly affect the antitumor activity of anthrapyrazole compounds. The study confirmed the benefit of using a modern machine learning algorithm, since the high predictive power of the obtained model had proven to be useful for the prediction of antitumor activity against murine leukemia L1210. It could certainly be considered as a tool for predicting activity against other cancer cell lines.
Collapse
|
5
|
Quantitative structure-critical micelle concentration modeling of anionic gemini surfactants, comparison of MLR, PLS, WNN, and ANFIS models with eigenvalue and correlation ranking methods. JOURNAL OF THE IRANIAN CHEMICAL SOCIETY 2021. [DOI: 10.1007/s13738-021-02225-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
6
|
Towards a Comprehensive Assessment of Statistical versus Soft Computing Models in Hydrology: Application to Monthly Pan Evaporation Prediction. WATER 2021. [DOI: 10.3390/w13172451] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper evaluates six soft computational models along with three statistical data-driven models for the prediction of pan evaporation (EP). Accordingly, improved kriging—as a novel statistical model—is proposed for accurate predictions of EP for two meteorological stations in Turkey. In the standard kriging model, the input data nonlinearity effects are increased by using a nonlinear map and transferring input data from a polynomial to an exponential basic function. The accuracy, precision, and over/under prediction tendencies of the response surface method, kriging, improved kriging, multilayer perceptron neural network using the Levenberg–Marquardt (MLP-LM) as well as a conjugate gradient (MLP-CG), radial basis function neural network (RBFNN), multivariate adaptive regression spline (MARS), M5Tree and support vector regression (SVR) were compared. Overall, all the applied models were highly capable of predicting monthly EP in both stations with a mean absolute error (MAE) < 0.77 mm and a Willmott index (d) > 0.95. Considering periodicity as an input parameter, the MLP-LM provided better results than the other methods among the soft computing models (MAE = 0.492 mm and d = 0.981). However, the improved kriging method surpassed all the other models based on the statistical measures (MAE = 0.471 mm and d = 0.983). Finally, the outcomes of the Mann–Whitney test indicated that the applied soft computational models do not have significant superiority over the statistical ones (p-value > 0.65 at α = 0.01 and α = 0.05).
Collapse
|
7
|
Ghaleb A, Aouidate A, Ayouchia HBE, Aarjane M, Anane H, Stiriba SE. In silico molecular investigations of pyridine N-Oxide compounds as potential inhibitors of SARS-CoV-2: 3D QSAR, molecular docking modeling, and ADMET screening. J Biomol Struct Dyn 2020; 40:143-153. [PMID: 32799761 DOI: 10.1080/07391102.2020.1808530] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The new coronavirus SARS-CoV-2 virus is causing a severe pneumonia in human, provoking the serious outbreak epidemic CoV-2. Since its appearance in Wuhan, China on December 2019, CoV-2 becomes the biggest challenge the world is facing today, including the discovery of antiviral drug for SARS-CoV-2. In this study, the potential inhibitory of a class of human SARS inhibitors, namely pyridine N-oxide derivatives, against CoV-2 was addressed by quantitative structure-activity relationship 3 D-QSAR. The reliable CoMSIA developed model of 110 pyridine N-oxide based-antiviral compounds, showed Q2= 0.54 and rext2=0.71. The molecular surflex-docking was applied to identify the crystal structure of CoV-2 main protease 3CLpro (PDB: 6LU7) and two potentially and largely used antiviral molecules, namely chloroquine, hydroxychloroquine. The obtained free energy affinity and ADMET properties indicate that among the series of model antiviral compounds examined, the new antiviral compound A5 could be an excellent antiviral drug inhibitor against COVID-19. The inhibition activity of pyridine N-oxyde compounds against CoV-2 was compared with the activity of two common antiviral drug, namely chloroquine (CQ) and hydroxychloroquine (HCQ). DFT method was also used to define the sites of reactivity of pyridine N-oxyde derivatives as well as CQ and HCQ.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Adib Ghaleb
- Laboratoire de Chimie Analytique et Moléculaire/LCAM, Faculté Polydisciplinaire de Safi, Université Cadi Ayyad, Safi, Morocco
| | - Adnane Aouidate
- MCNSL, School of Sciences, Moulay Ismail University, Meknes, Morocco
| | - Hicham Ben El Ayouchia
- Laboratoire de Chimie Analytique et Moléculaire/LCAM, Faculté Polydisciplinaire de Safi, Université Cadi Ayyad, Safi, Morocco
| | - Mohammed Aarjane
- LCBAE, Equipe Chimie Moléculaire et Molécules Bioactives, Université Moulay Ismail, Faculté des Sciences, Meknès, Morocco
| | - Hafid Anane
- Laboratoire de Chimie Analytique et Moléculaire/LCAM, Faculté Polydisciplinaire de Safi, Université Cadi Ayyad, Safi, Morocco
| | - Salah-Eddine Stiriba
- Laboratoire de Chimie Analytique et Moléculaire/LCAM, Faculté Polydisciplinaire de Safi, Université Cadi Ayyad, Safi, Morocco.,Instituto de Ciencia Molecular/ICMol, Universidad de Valencia, Valencia, Spain
| |
Collapse
|
8
|
Tabaraki R, Khodabakhshi M. Performance comparison of wavelet neural network and adaptive neuro-fuzzy inference system with small data sets. J Mol Graph Model 2020; 100:107698. [PMID: 32739637 DOI: 10.1016/j.jmgm.2020.107698] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 07/10/2020] [Accepted: 07/13/2020] [Indexed: 11/15/2022]
Abstract
In this work, performance of wavelet neural network (WNN) and adaptive neuro-fuzzy inference system (ANFIS) models were compared with small data sets by different criteria such as second order corrected Akaike information criterion (AICc), Bayesian information criterion (BIC), root mean squared error (RMSE), mean absolute relative error (MARE), coefficient of determination (R2), external Q2 function ( [Formula: see text] ) and concordance correlation coefficient (CCC). Another criterion was the over-fitting. Ten data sets were selected from literature and their data were divided into training, test, and validation sets. Network parameters were optimized for WNN and ANFIS models and the best architectures with the lowest errors were selected for each data set. A precise survey of the number of permitted adjustable parameters (NPAP) and the total number of adjustable parameters (TNAP) in WNN and ANFIS models was shown that 60% of the ANFIS models and 30% of the WNN models had over-fitting. As a rule of thumb, to avoid over-fitting it is suggested that the ratio of the number of observations in training set to the number of input neurons must be greater than 10 and 20 for WNN and ANFIS, respectively. The smaller ratio required in WNN indicates its flexibility vs. ANFIS that relates to differences in structure and connections in the both networks.
Collapse
Affiliation(s)
- Reza Tabaraki
- Department of Chemistry, Faculty of Basic Science, Ilam University, Ilam, Iran.
| | - Mina Khodabakhshi
- Department of Chemistry, Faculty of Basic Science, Ilam University, Ilam, Iran
| |
Collapse
|
9
|
Przybyłek M, Recki Ł, Mroczyńska K, Jeliński T, Cysewski P. Experimental and theoretical solubility advantage screening of bi-component solid curcumin formulations. J Drug Deliv Sci Technol 2019. [DOI: 10.1016/j.jddst.2019.01.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
10
|
Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Hansen Solubility Parameters Based on 1D and 2D Molecular Descriptors Computed from SMILES String. J CHEM-NY 2019. [DOI: 10.1155/2019/9858371] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A new method of Hansen solubility parameters (HSPs) prediction was developed by combining the multivariate adaptive regression splines (MARSplines) methodology with a simple multivariable regression involving 1D and 2D PaDEL molecular descriptors. In order to adopt the MARSplines approach to QSPR/QSAR problems, several optimization procedures were proposed and tested. The effectiveness of the obtained models was checked via standard QSPR/QSAR internal validation procedures provided by the QSARINS software and by predicting the solubility classification of polymers and drug-like solid solutes in collections of solvents. By utilizing information derived only from SMILES strings, the obtained models allow for computing all of the three Hansen solubility parameters including dispersion, polarization, and hydrogen bonding. Although several descriptors are required for proper parameters estimation, the proposed procedure is simple and straightforward and does not require a molecular geometry optimization. The obtained HSP values are highly correlated with experimental data, and their application for solving solubility problems leads to essentially the same quality as for the original parameters. Based on provided models, it is possible to characterize any solvent and liquid solute for which HSP data are unavailable.
Collapse
|
11
|
A Model for Shovel Capital Cost Estimation, Using a Hybrid Model of Multivariate Regression and Neural Networks. Symmetry (Basel) 2017. [DOI: 10.3390/sym9120298] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
12
|
Zarei K, Atabati M, Ahmadi M. Shuffling cross-validation-bee algorithm as a new descriptor selection method for retention studies of pesticides in biopartitioning micellar chromatography. JOURNAL OF ENVIRONMENTAL SCIENCE AND HEALTH. PART. B, PESTICIDES, FOOD CONTAMINANTS, AND AGRICULTURAL WASTES 2017; 52:346-352. [PMID: 28277080 DOI: 10.1080/03601234.2017.1283139] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Bee algorithm (BA) is an optimization algorithm inspired by the natural foraging behaviour of honey bees to find the optimal solution which can be proposed to feature selection. In this paper, shuffling cross-validation-BA (CV-BA) was applied to select the best descriptors that could describe the retention factor (log k) in the biopartitioning micellar chromatography (BMC) of 79 heterogeneous pesticides. Six descriptors were obtained using BA and then the selected descriptors were applied for model development using multiple linear regression (MLR). The descriptor selection was also performed using stepwise, genetic algorithm and simulated annealing methods and MLR was applied to model development and then the results were compared with those obtained from shuffling CV-BA. The results showed that shuffling CV-BA can be applied as a powerful descriptor selection method. Support vector machine (SVM) was also applied for model development using six selected descriptors by BA. The obtained statistical results using SVM were better than those obtained using MLR, as the root mean square error (RMSE) and correlation coefficient (R) for whole data set (training and test), using shuffling CV-BA-MLR, were obtained as 0.1863 and 0.9426, respectively, while these amounts for the shuffling CV-BA-SVM method were obtained as 0.0704 and 0.9922, respectively.
Collapse
Affiliation(s)
- Kobra Zarei
- a School of Chemistry , Damghan University , Damghan , Iran
| | | | - Monire Ahmadi
- a School of Chemistry , Damghan University , Damghan , Iran
| |
Collapse
|
13
|
Zarei K, Atabati M, Kor K. Bee algorithm and adaptive neuro-fuzzy inference system as tools for QSAR study toxicity of substituted benzenes to Tetrahymena pyriformis. BULLETIN OF ENVIRONMENTAL CONTAMINATION AND TOXICOLOGY 2014; 92:642-649. [PMID: 24638918 DOI: 10.1007/s00128-014-1253-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 03/08/2014] [Indexed: 06/03/2023]
Abstract
A quantitative structure-activity relationship (QSAR) was developed to predict the toxicity of substituted benzenes to Tetrahymena pyriformis. A set of 1,497 zero- to three-dimensional descriptors were used for each molecule in the data set. A major problem of QSAR is the high dimensionality of the descriptor space; therefore, descriptor selection is one of the most important steps. In this paper, bee algorithm was used to select the best descriptors. Three descriptors were selected and used as inputs for adaptive neuro-fuzzy inference system (ANFIS). Then the model was corrected for unstable compounds (the compounds that can be ionized in the aqueous solutions or can easily metabolize under some conditions). Finally squared correlation coefficients were obtained as 0.8769, 0.8649 and 0.8301 for training, test and validation sets, respectively. The results showed bee-ANFIS can be used as a powerful model for prediction of toxicity of substituted benzenes to T. pyriformis.
Collapse
Affiliation(s)
- Kobra Zarei
- School of Chemistry, Damghan University, Damghan, Iran,
| | | | | |
Collapse
|
14
|
Structure-activity relationship for Fe(III)-salen-like complexes as potent anticancer agents. ScientificWorldJournal 2014; 2014:745649. [PMID: 24955417 PMCID: PMC3997896 DOI: 10.1155/2014/745649] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2014] [Accepted: 02/19/2014] [Indexed: 11/17/2022] Open
Abstract
Quantitative structure activity relationship (QSAR) for the anticancer activity of Fe(III)-salen and salen-like complexes was studied. The methods of density function theory (B3LYP/LANL2DZ) were used to optimize the structures. A pool of descriptors was calculated: 1497 theoretical descriptors and quantum-chemical parameters, shielding NMR, and electronic descriptors. The study of structure and activity relationship was performed with multiple linear regression (MLR) and artificial neural network (ANN). In nonlinear method, the adaptive neuro-fuzzy inference system (ANFIS) was applied in order to choose the most effective descriptors. The ANN-ANFIS model with high statistical significance (R2train = 0.99, RMSE = 0.138, and Q2LOO = 0.82) has better capability to predict the anticancer activity of the new compounds series of this family. Based on this study, anticancer activity of this compound is mainly dependent on the geometrical parameters, position, and the nature of the substituent of salen ligand.
Collapse
|
15
|
Shao YE. Body fat percentage prediction using intelligent hybrid approaches. ScientificWorldJournal 2014; 2014:383910. [PMID: 24723804 PMCID: PMC3958757 DOI: 10.1155/2014/383910] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2013] [Accepted: 01/15/2014] [Indexed: 11/25/2022] Open
Abstract
Excess of body fat often leads to obesity. Obesity is typically associated with serious medical diseases, such as cancer, heart disease, and diabetes. Accordingly, knowing the body fat is an extremely important issue since it affects everyone's health. Although there are several ways to measure the body fat percentage (BFP), the accurate methods are often associated with hassle and/or high costs. Traditional single-stage approaches may use certain body measurements or explanatory variables to predict the BFP. Diverging from existing approaches, this study proposes new intelligent hybrid approaches to obtain fewer explanatory variables, and the proposed forecasting models are able to effectively predict the BFP. The proposed hybrid models consist of multiple regression (MR), artificial neural network (ANN), multivariate adaptive regression splines (MARS), and support vector regression (SVR) techniques. The first stage of the modeling includes the use of MR and MARS to obtain fewer but more important sets of explanatory variables. In the second stage, the remaining important variables are served as inputs for the other forecasting methods. A real dataset was used to demonstrate the development of the proposed hybrid models. The prediction results revealed that the proposed hybrid schemes outperformed the typical, single-stage forecasting models.
Collapse
Affiliation(s)
- Yuehjen E. Shao
- Department of Statistics and Information Science, Fu Jen Catholic University, 510, Chung-Cheng Road, Xinzhuang District, New Taipei City 24205, Taiwan
| |
Collapse
|
16
|
Shao YE, Hou CD, Chiu CC. Hybrid intelligent modeling schemes for heart disease classification. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2013.09.020] [Citation(s) in RCA: 71] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
17
|
Asadollahi-Baboli M. Aquatic toxicity assessment of esters towards the Daphnia magna through PCA-ANFIS. BULLETIN OF ENVIRONMENTAL CONTAMINATION AND TOXICOLOGY 2013; 91:450-454. [PMID: 23884170 DOI: 10.1007/s00128-013-1066-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2013] [Accepted: 07/13/2013] [Indexed: 06/02/2023]
Abstract
The widespread production of esters combined with their ability to migrate in different compartments, makes their environmental toxicity important. In this background, the multivariate image analysis-quantitative structure-toxicity relationship (MIA-QSTR) method coupled to principal component analysis-adaptive neuro-fuzzy inference systems (PCA-ANFIS) was applied to assess the toxicity of esters to Daphnia magna. In MIA-QSTR, pixels of chemical structures (2D images) stand for descriptors, and structural changes account for the variance in toxicities. The ANFIS procedure was capable of correlating the inputs (PCA scores) with the toxicities accurately. The PCA-ANFIS also was statistically validated for its predictive power using cross-validation, applicability domain and Y-scrambling evaluation procedures. The satisfactory results (R p (2) = 0.926, Q LOO (2) = 0.887, R L25%O (2) = 0.843, RMSELOO = 0.320 and RMSEL25%O = 0.379) suggests that the QSTR model could be proposed as an alternative method for aquatic toxicity assessment of esters allowing possible application in the European Union regulation REACH.
Collapse
Affiliation(s)
- M Asadollahi-Baboli
- Department of Science, Babol University of Technology, P.O. Box 47148-71167, Babol, Mazandaran, Iran,
| |
Collapse
|
18
|
|
19
|
ASADOLLAHI-BABOLI M, MANI-VARNOSFADERANI A. APPLICATION OF COMPUTATIONAL METHODS TO PREDICT ABSORPTION MAXIMA OF ORGANIC DYES USED IN SOLAR CELLS. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2013. [DOI: 10.1142/s0219633612501143] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A quantitative structure-property relationship (QSPR) study for the prediction of the absorption maxima (λmax) of organic dyes used in solar cells was carried out using different computational methods. Three-dimensional (3D) descriptors were calculated using Codessa and Dragon softwares to represent the dye molecules. Then, different chemometric tools such as multivariate adaptive regression splines (MARS) and adaptive neuro-fuzzy inference system (ANFIS) combined with Monte Carlo (MC) sampling technique were utilized for selecting the most important descriptors and predicting the absorption maxima of the dyes. Various evaluation techniques such as leave-one-out, leave-multiple-out cross-validation procedures, randomization tests, and validation through the external test set were performed to validate the performance of the model. The results revealed that the calculated absorption maxima values are in good agreement with the experimental ones. This theoretical method provides an accurate and alternative method to obtain λmax of dyes before they are actually synthesized.
Collapse
Affiliation(s)
- M. ASADOLLAHI-BABOLI
- Department of Science, Babol University of Technology, P. O. Box 47148-71167, Babol, Mazandaran, Iran
| | - A. MANI-VARNOSFADERANI
- Department of Chemistry and Biochemistry, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
20
|
|
21
|
Shuffling multivariate adaptive regression splines as a predictive method for modeling of novel pyridylmethylthio derivatives as VEGFR2 inhibitors. Med Chem Res 2012. [DOI: 10.1007/s00044-012-0266-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
22
|
Molecular docking, molecular dynamics simulation, and QSAR model on potent thiazolidine-4-carboxylic acid inhibitors of influenza neuraminidase. Med Chem Res 2012. [DOI: 10.1007/s00044-012-0175-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
23
|
Jalali-Heravi M, Mani-Varnosfaderani A, Taherinia D, Mahmoodi MM. The use of Bayesian nonlinear regression techniques for the modelling of the retention behaviour of volatile components of Artemisia species. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2012; 23:461-483. [PMID: 22452344 DOI: 10.1080/1062936x.2012.665083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The main aim of this work was to assess the ability of Bayesian multivariate adaptive regression splines (BMARS) and Bayesian radial basis function (BRBF) techniques for modelling the gas chromatographic retention indices of volatile components of Artemisia species. A diverse set of molecular descriptors was calculated and used as descriptor pool for modelling the retention indices. The ability of BMARS and BRBF techniques was explored for the selection of the most relevant descriptors and proper basis functions for modelling. The results revealed that BRBF technique is more reproducible than BMARS for modelling the retention indices and can be used as a method for variable selection and modelling in quantitative structure-property relationship (QSPR) studies. It is also concluded that the Markov chain Monte Carlo (MCMC) search engine, implemented in BRBF algorithm, is a suitable method for selecting the most important features from a vast number of them. The values of correlation between the calculated retention indices and the experimental ones for the training and prediction sets (0.935 and 0.902, respectively) revealed the prediction power of the BRBF model in estimating the retention index of volatile components of Artemisia species.
Collapse
Affiliation(s)
- M Jalali-Heravi
- Department of Chemistry, Sharif University of Technology, Tehran, Iran
| | | | | | | |
Collapse
|
24
|
Asadollahi-Baboli M. Quantitative structure-activity relationship analysis of human neutrophil elastase inhibitors using shuffling classification and regression trees and adaptive neuro-fuzzy inference systems. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2012; 23:505-520. [PMID: 22452268 DOI: 10.1080/1062936x.2012.665811] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The purpose of this study was to develop quantitative structure-activity relationship models for N-benzoylindazole derivatives as inhibitors of human neutrophil elastase. These models were developed with the aid of classification and regression trees (CART) and an adaptive neuro-fuzzy inference system (ANFIS) combined with a shuffling cross-validation technique using interpretable descriptors. More than one hundred meaningful descriptors, representing various structural characteristics for all 51 N-benzoylindazole derivatives in the data set, were calculated and used as the original variables for shuffling CART modelling. Five descriptors of average Wiener index, Kier benzene-likeliness index, subpolarity parameter, average shape profile index of order 2 and folding degree index selected by the shuffling CART technique have been used as inputs of the ANFIS for prediction of inhibition behaviour of N-benzoylindazole derivatives. The results of the developed shuffling CART-ANFIS model compared to other techniques, such as genetic algorithm (GA)-partial least square (PLS)-ANFIS and stepwise multiple linear regression (MLR)-ANFIS, are promising and descriptive. The satisfactory results r2p = 0.845, Q2(LOO) = 0.861, r2(L25%O) = 0.829, RMSE(LOO) = 0.305 and RMSE(L25%O) = 0.336) demonstrate that shuffling CART-ANFIS models present the relationship between human neutrophil elastase inhibitor activity and molecular descriptors, and they yield predictions in excellent agreement with the experimental values.
Collapse
|
25
|
Zarei K, Salehabadi Z. The shuffling multivariate adaptive regression splines and adaptive neuro-fuzzy inference system as tools for QSPR study bioconcentration factors of polychlorinated biphenyls (PCBs). Struct Chem 2012. [DOI: 10.1007/s11224-012-9987-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
26
|
Muñoz C, Adasme F, Alzate-Morales JH, Vergara-Jaque A, Kniess T, Caballero J. Study of differences in the VEGFR2 inhibitory activities between semaxanib and SU5205 using 3D-QSAR, docking, and molecular dynamics simulations. J Mol Graph Model 2011; 32:39-48. [PMID: 22070999 DOI: 10.1016/j.jmgm.2011.10.005] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2011] [Revised: 09/30/2011] [Accepted: 10/15/2011] [Indexed: 11/28/2022]
Abstract
Semaxanib (SU5416) and 3-[4'-fluorobenzylidene]indolin-2-one (SU5205) are structurally similar drugs that are able to inhibit vascular endothelial growth factor receptor-2 (VEGFR2), but the former is 87 times more effective than the latter. Previously, SU5205 was used as a radiolabelled inhibitor (as surrogate for SU5416) and a radiotracer for positron emission tomography (PET) imaging, but the compound exhibited poor stability and only a moderate IC(50) toward VEGFR2. In the current work, the relationship between the structure and activity of these drugs as VEGFR2 inhibitors was studied using 3D-QSAR, docking and molecular dynamics (MD) simulations. First, comparative molecular field analysis (CoMFA) was performed using 48 2-indolinone derivatives and their VEGFR2 inhibitory activities. The best CoMFA model was carried out over a training set including 40 compounds, and it included steric and electrostatic fields. In addition, this model gave satisfactory cross-validation results and adequately predicted 8 compounds contained in the test set. The plots of the CoMFA fields could explain the structural differences between semaxanib and SU5205. Docking and molecular dynamics simulations showed that both molecules have the same orientation and dynamics inside the VEGFR2 active site. However, the hydrophobic pocket of VEGFR2 was more exposed to the solvent media when it was complexed with SU5205. An energetic analysis, including Embrace and MM-GBSA calculations, revealed that the potency of ligand binding is governed by van der Waals contacts.
Collapse
Affiliation(s)
- Camila Muñoz
- Centro de Bioinformática y Simulación Molecular, Universidad de Talca, 2 Norte 685, Casilla 721, Talca, Chile
| | | | | | | | | | | |
Collapse
|
27
|
AFIUNI-ZADEH S, AZIMI G. A QSAR Study for Modeling of 8-Azaadenine Analogues Proposed as A1 Adenosine Receptor Antagonists Using Genetic Algorithm Coupling Adaptive Neuro-Fuzzy Inference System (ANFIS). ANAL SCI 2010; 26:897-902. [DOI: 10.2116/analsci.26.897] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|