1
|
Mafarja M, Thaher T, Al-Betar MA, Too J, Awadallah MA, Abu Doush I, Turabieh H. Classification framework for faulty-software using enhanced exploratory whale optimizer-based feature selection scheme and random forest ensemble learning. APPL INTELL 2023; 53:1-43. [PMID: 36785593 PMCID: PMC9909674 DOI: 10.1007/s10489-022-04427-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/23/2022] [Indexed: 02/11/2023]
Abstract
Software Fault Prediction (SFP) is an important process to detect the faulty components of the software to detect faulty classes or faulty modules early in the software development life cycle. In this paper, a machine learning framework is proposed for SFP. Initially, pre-processing and re-sampling techniques are applied to make the SFP datasets ready to be used by ML techniques. Thereafter seven classifiers are compared, namely K-Nearest Neighbors (KNN), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Linear Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), and Random Forest (RF). The RF classifier outperforms all other classifiers in terms of eliminating irrelevant/redundant features. The performance of RF is improved further using a dimensionality reduction method called binary whale optimization algorithm (BWOA) to eliminate the irrelevant/redundant features. Finally, the performance of BWOA is enhanced by hybridizing the exploration strategies of the grey wolf optimizer (GWO) and harris hawks optimization (HHO) algorithms. The proposed method is called SBEWOA. The SFP datasets utilized are selected from the PROMISE repository using sixteen datasets for software projects with different sizes and complexity. The comparative evaluation against nine well-established feature selection methods proves that the proposed SBEWOA is able to significantly produce competitively superior results for several instances of the evaluated dataset. The algorithms' performance is compared in terms of accuracy, the number of features, and fitness function. This is also proved by the 2-tailed P-values of the Wilcoxon signed ranks statistical test used. In conclusion, the proposed method is an efficient alternative ML method for SFP that can be used for similar problems in the software engineering domain.
Collapse
Affiliation(s)
- Majdi Mafarja
- Department of Computer Science, Birzeit University, Birzeit, Palestine
| | - Thaer Thaher
- Department of Computer Systems Engineering, Arab American University, Jenin, Palestine
- Information Technology Engineering, Al-Quds University, Abu Dies, Jerusalem, Palestine
| | - Mohammed Azmi Al-Betar
- Artificial Intelligence Research Center (AIRC), College of Engineering and Information Technology, Ajman University, Ajman, United Arab EmiratesDeepSinghML2017, Irbid, Jordan
| | - Jingwei Too
- Faculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, 76100 Durian Tunggal Melaka, Malaysia
| | - Mohammed A. Awadallah
- Department of Computer Science, Al-Aqsa University, P.O. Box 4051, Gaza, Palestine
- Artificial Intelligence Research Center (AIRC), Ajman University, Ajman, United Arab Emirates
| | - Iyad Abu Doush
- Department of Computing, College of Engineering and Applied Sciences, American University of Kuwait, Salmiya, Kuwait
- Computer Science Department, Yarmouk University, Irbid, Jordan
| | - Hamza Turabieh
- Department of Health Management and Informatics, University of Missouri, Columbia, 5 Hospital Drive, Columbia, MO 65212 USA
| |
Collapse
|
2
|
Luo C, Xu Y, Shao Y, Wang Z, Hu J, Yuan J, Liu Y, Duan M, Huang L, Zhou F. EvaGoNet: an integrated network of variational autoencoder and Wasserstein generative adversarial network with gradient penalty for binary classification tasks. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
3
|
Feature Selection Using Diversity-Based Multi-objective Binary Differential Evolution. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.12.117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
4
|
A Robust Feature Construction for Fish Classification Using Grey Wolf Optimizer. CYBERNETICS AND INFORMATION TECHNOLOGIES 2022. [DOI: 10.2478/cait-2022-0045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Abstract
The low quality of the collected fish image data directly from its habitat affects its feature qualities. Previous studies tended to be more concerned with finding the best method rather than the feature quality. This article proposes a new fish classification workflow using a combination of Contrast-Adaptive Color Correction (NCACC) image enhancement and optimization-based feature construction called Grey Wolf Optimizer (GWO). This approach improves the image feature extraction results to obtain new and more meaningful features. This article compares the GWO-based and other optimization method-based fish classification on the newly generated features. The comparison results show that GWO-based classification had 0.22% lower accuracy than GA-based but 1.13 % higher than PSO. Based on ANOVA tests, the accuracy of GA and GWO were statistically indifferent, and GWO and PSO were statistically different. On the other hand, GWO-based performed 0.61 times faster than GA-based classification and 1.36 minutes faster than the other.
Collapse
|
5
|
Koosha M, Khodabandelou G, Ebadzadeh MM. A hierarchical estimation of multi-modal distribution programming for regression problems. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
6
|
Feature Encoding and Selection for Iris Recognition Based on Variable Length Black Hole Optimization. COMPUTERS 2022. [DOI: 10.3390/computers11090140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Iris recognition as a biometric identification method is one of the most reliable biometric human identification methods. It exploits the distinctive pattern of the iris area. Typically, several steps are performed for iris recognition, namely, pre-processing, segmentation, normalization, extraction, coding and classification. In this article, we present a novel algorithm for iris recognition that includes in addition to iris features extraction and coding the step of feature selection. Furthermore, it enables selecting a variable length of features for iris recognition by adapting our recent algorithm variable length black hole optimization (VLBHO). It is the first variable length feature selection for iris recognition. Our proposed algorithm enables segments-based decomposition of features according to their relevance which makes the optimization more efficient in terms of both memory and computation and more promising in terms of convergence. For classification, the article uses the famous support vector machine (SVM) and the Logistic model. The proposed algorithm has been evaluated based on two iris datasets, namely, IITD and CASIA. The finding is that optimizing feature encoding and selection based on VLBHO is superior to the benchmarks with an improvement percentage of 0.21%.
Collapse
|
7
|
Rao C, Liu Y, Goh M. Credit risk assessment mechanism of personal auto loan based on PSO-XGBoost Model. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00854-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
AbstractAs online P2P loans in automotive financing grows, there is a need to manage and control the credit risk of the personal auto loans. In this paper, the personal auto loans data sets on the Kaggle platform are used on a machine learning based credit risk assessment mechanism for personal auto loans. An integrated Smote-Tomek Link algorithm is proposed to convert the data set into a balanced data set. Then, an improved Filter-Wrapper feature selection method is presented to select credit risk assessment indexes for the loans. Combining Particle Swarm Optimization (PSO) with the eXtreme Gradient Boosting (XGBoost) model, a PSO-XGBoost model is formed to assess the credit risk of the loans. The PSO-XGBoost model is compared against the XGBoost, Random Forest, and Logistic Regression models on the standard performance evaluation indexes of accuracy, precision, ROC curve, and AUC value. The PSO-XGBoost model is found to be superior on classification performance and classification effect.
Collapse
|
8
|
Ghanei Ghooshkhaneh N, Golzarian MR, Mollazade K. VIS-NIR spectroscopy for detection of citrus core rot caused by Alternaria alternata. Food Control 2022. [DOI: 10.1016/j.foodcont.2022.109320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
9
|
|
10
|
Xue Y, Cai X, Neri F. A multi-objective evolutionary algorithm with interval based initialization and self-adaptive crossover operator for large-scale feature selection in classification. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109420] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
11
|
Genetic Programming-Based Feature Construction for System Setting Recognition and Component-Level Prognostics. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Extracting representative feature sets from raw signals is crucial in Prognostics and Health Management (PHM) for components’ behavior understanding. The literature proposes various methods, including signal processing in the time, frequency, and time–frequency domains, feature selection, and unsupervised feature learning. An emerging task in data science is Feature Construction (FC), which has the advantages of both feature selection and feature learning. In particular, the constructed features address a specific objective function without requiring a label during the construction process. Genetic Programming (GP) is a powerful tool to perform FC in the PHM context, as it allows to obtain distinct feature sets depending on the analysis goal, i.e., diagnostics and prognostics. This paper adopts GP to extract system-level features for machinery setting recognition and component-level features for prognostics. Three distinct fitness functions are considered for the GP training, which requires a set of statistical time-domain features as input. The methodology is applied to vibration signals extracted from a test rig during run-to-failure tests under different settings. The performances of constructed features are evaluated through the classification accuracy and the Remaining Useful Life (RUL) prediction error. Results demonstrate that GP-based features classify known and novel machinery operating conditions better than feature selection and learning methods.
Collapse
|
12
|
Self-paced non-convex regularized analysis-synthesis dictionary learning for unsupervised feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
13
|
Sang B, Chen H, Yang L, Zhou D, Li T, Xu W. Incremental attribute reduction approaches for ordered data with time-evolving objects. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106583] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
14
|
Fu W, Xue B, Gao X, Zhang M. Output-based transfer learning in genetic programming for document classification. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106597] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
15
|
Ma J, Gao X. Designing genetic programming classifiers with feature selection and feature construction. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|