1
|
Khushal R, Fatima U. Fuzzy machine learning logic utilization on hormonal imbalance dataset. Comput Biol Med 2024; 174:108429. [PMID: 38631116 DOI: 10.1016/j.compbiomed.2024.108429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 04/04/2024] [Accepted: 04/04/2024] [Indexed: 04/19/2024]
Abstract
In this research work, a novel fuzzy data transformation technique has been proposed and applied to the hormonal imbalance dataset. Hormonal imbalance is ubiquitously found principally in females of reproductive age which ultimately leads to numerous related medical conditions. Polycystic Ovary Syndrome (PCOS) is one of them. Treatment along with adopting a healthy lifestyle is advised to mitigate its consequences on the quality of life. The biological dataset of hormonal imbalance "PCOS" provides limited results that is whether the syndrome is present or not. Also, there are input variables that contain binary responses only, to deal with this conundrum, a novel fuzzy data transformation technique has been developed and applied to them thus leading to their fuzzy transformation which provides a broader spectrum to diagnose PCOS. Due to this, the output variable has also been transformed. Hence, a novel fuzzy transformation technique has been employed due to the limitation of the dataset leading to the transition of binary classification output into three classes. An adaptive fuzzy machine learning logic model is developed in which the inference of the transformed biological dataset is performed by the machine learning techniques that provide the fuzzy output. Machine learning techniques have also been applied to the untransformed biological dataset. Both implementations have been compared by computation of the relevant metrics. Machine learning employment on untransformed biological dataset provides limited results whether the syndrome is present or absent however machine learning on fuzzy transformed biological dataset provides a broader spectrum of diagnosis consisting of a third class depicting that PCOS might be present which would ultimately alert a patient to take preventive measures to minimize the chances of syndrome development in future.
Collapse
Affiliation(s)
- Rabia Khushal
- Department of Mathematics, NED University of Engineering & Technology, Pakistan.
| | - Ubaida Fatima
- Department of Mathematics, NED University of Engineering & Technology, Pakistan.
| |
Collapse
|
2
|
Rebalance Weights AdaBoost-SVM Model for Imbalanced Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023. [DOI: 10.1155/2023/4860536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Classification of imbalanced data is a challenging task that has captured considerable interest in numerous scientific fields by virtue of the great practical value of minority accuracy. Some methods for improving generalization performance have been developed to address this classification situation. Here, we propose a cost-sensitive ensemble learning method using a support vector machine as a base learner of AdaBoost for classifying imbalanced data. Considering that the existing methods are not well studied in terms of how to precisely control the classification accuracy of the minority class, we developed a novel way to rebalance the weights of AdaBoost, and the weights influence the base learner training. This weighting strategy increases the sample weight of the misclassified minority while decreasing the sample weight of the misclassified majority until their distributions are even in each round. Furthermore, we included P-mean as one of the assessment markers and discussed why it is necessary. Experiments were conducted to compare the proposed and comparison 10 models on 18 datasets in terms of six different metrics. Through comprehensive experimental findings, the statistical study is performed to verify the efficacy and usability of the proposed model.
Collapse
|
3
|
Fu S, Tian Y, Tang J, Liu X. Cost-sensitive learning with modified Stein loss function. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
4
|
Non-parallel bounded support matrix machine and its application in roller bearing fault diagnosis. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.12.090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
5
|
Active Learning by Extreme Learning Machine with Considering Exploration and Exploitation Simultaneously. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11089-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
6
|
Least squares structural twin bounded support vector machine on class scatter. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04237-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
7
|
Nonparallel Support Vector Machine with L2-norm Loss and its DCD-type Solver. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11067-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
8
|
Evaluation modeling of highway collapse hazard based on rough set and support vector machine. Sci Rep 2022; 12:18723. [PMCID: PMC9636135 DOI: 10.1038/s41598-022-23567-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 11/02/2022] [Indexed: 11/06/2022] Open
Abstract
The prediction of possibility and risk classification of collapse is an important issue in the process of highway construction in mountain area. Based on the principle of rough set and support vector machine, a landslide hazard prediction model was established. First of all, according to field investigation, an evaluation index system and a sample set of evaluation index data were established, the rough set decision table was constructed by preprocessing the original data based on the function classification of standard evaluation index, and then, the influence indexes of the collapse activity were reduced by rough set theory, and the main 9 indexes affecting the collapse activity as the key discriminant factors of support vector machine model, namely slope shape of slope, aspect of slope, slope of slope, height of slope, exposed structural face, stratum lithology, relationship between weakness face and free face, vegetation cover rate and weathering degree of rock were extracted. Then, taking the data of 13 post earthquake collapses in Yingxiu-Wolong highway of Hanchuan County measured by the authors in the field as training samples, the optimal model parameters were analyzed and calculated. When the penalty parameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$C$$\end{document}C is 8 and the kernel parameter \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\sigma$$\end{document}σ is 0.5, the correct rate of cross-validation is 100%, and the model is optimal. At last, 4 other landslide data were tested, the discriminant results of the test sample data were compared with the results obtained by uncertainty measure and distance discriminant analysis. The results show that the discriminant results of the test sample data by RS-SVM were consistent with the results obtained by uncertainty measure and distance discriminant analysis, the accurate rate is 100%. The collapse hazard analysis model based on rough set and support vector machine can reduce the computation while ensuring the accuracy of evaluation, and better solve the small sample and nonlinear problems, can provide certain a good idea for collapse hazard evaluation in the future.
Collapse
|
9
|
An Intuitionistic Fuzzy Random Vector Functional Link Classifier. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11043-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
Kim M, Hwang KB. An empirical evaluation of sampling methods for the classification of imbalanced data. PLoS One 2022; 17:e0271260. [PMID: 35901023 PMCID: PMC9333262 DOI: 10.1371/journal.pone.0271260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Accepted: 06/28/2022] [Indexed: 11/18/2022] Open
Abstract
In numerous classification problems, class distribution is not balanced. For example, positive examples are rare in the fields of disease diagnosis and credit card fraud detection. General machine learning methods are known to be suboptimal for such imbalanced classification. One popular solution is to balance training data by oversampling the underrepresented (or undersampling the overrepresented) classes before applying machine learning algorithms. However, despite its popularity, the effectiveness of sampling has not been rigorously and comprehensively evaluated. This study assessed combinations of seven sampling methods and eight machine learning classifiers (56 varieties in total) using 31 datasets with varying degrees of imbalance. We used the areas under the precision-recall curve (AUPRC) and receiver operating characteristics curve (AUROC) as the performance measures. The AUPRC is known to be more informative for imbalanced classification than the AUROC. We observed that sampling significantly changed the performance of the classifier (paired t-tests P < 0.05) only for few cases (12.2% in AUPRC and 10.0% in AUROC). Surprisingly, sampling was more likely to reduce rather than improve the classification performance. Moreover, the adverse effects of sampling were more pronounced in AUPRC than in AUROC. Among the sampling methods, undersampling performed worse than others. Also, sampling was more effective for improving linear classifiers. Most importantly, we did not need sampling to obtain the optimal classifier for most of the 31 datasets. In addition, we found two interesting examples in which sampling significantly reduced AUPRC while significantly improving AUROC (paired t-tests P < 0.05). In conclusion, the applicability of sampling is limited because it could be ineffective or even harmful. Furthermore, the choice of the performance measure is crucial for decision making. Our results provide valuable insights into the effect and characteristics of sampling for imbalanced classification.
Collapse
Affiliation(s)
- Misuk Kim
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, Korea
| | - Kyu-Baek Hwang
- Department of Computer Science and Engineering, Graduate School, Soongsil University, Seoul, Korea
- * E-mail:
| |
Collapse
|
11
|
Double-kernelized weighted broad learning system for imbalanced data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07534-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
12
|
Cost-Sensitive Metaheuristic Optimization-Based Neural Network with Ensemble Learning for Financial Distress Prediction. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12146918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Financial distress prediction is crucial in the financial domain because of its implications for banks, businesses, and corporations. Serious financial losses may occur because of poor financial distress prediction. As a result, significant efforts have been made to develop prediction models that can assist decision-makers to anticipate events before they occur and avoid bankruptcy, thereby helping to improve the quality of such tasks. Because of the usual highly imbalanced distribution of data, financial distress prediction is a challenging task. Hence, a wide range of methods and algorithms have been developed over recent decades to address the classification of imbalanced datasets. Metaheuristic optimization-based artificial neural networks have shown exciting results in a variety of applications, as well as classification problems. However, less consideration has been paid to using a cost sensitivity fitness function in metaheuristic optimization-based artificial neural networks to solve the financial distress prediction problem. In this work, we propose ENS_PSONNcost and ENS_CSONNcost: metaheuristic optimization-based artificial neural networks that utilize a particle swarm optimizer and a competitive swarm optimizer and five cost sensitivity fitness functions as the base learners in a majority voting ensemble learning paradigm. Three extremely imbalanced datasets from Spanish, Taiwanese, and Polish companies were considered to avoid dataset bias. The results showed significant improvements in the g-mean (the geometric mean of sensitivity and specificity) metric and the F1 score (the harmonic mean of precision and sensitivity) while maintaining adequately high accuracy.
Collapse
|
13
|
Song X, Chen Y, Liang P, Wan X, Cui Y. A novel adaptive boundary weighted and synthetic minority oversampling algorithm for imbalanced datasets. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-220937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In recent years, imbalanced data learning has attracted a lot of attention from academia and industry as a new challenge. In order to solve the problems such as imbalances between and within classes, this paper proposes an adaptive boundary weighted synthetic minority oversampling algorithm (ABWSMO) for unbalanced datasets. ABWSMO calculates the sample space clustering density based on the distribution of the underlying data and the K-Means clustering algorithm, incorporates local weighting strategies and global weighting strategies to improve the SMOTE algorithm to generate data mechanisms that enhance the learning of important samples at the boundary of unbalanced data sets and avoid the traditional oversampling algorithm generate unnecessary noise. The effectiveness of this sampling algorithm in improving data imbalance is verified by experimentally comparing five traditional oversampling algorithms on 16 unbalanced ratio datasets and 3 classifiers in the UCI database.
Collapse
Affiliation(s)
- Xudong Song
- Big Data & Intelligent System Research Group, Dalian Jiaotong University, Dalian, China
| | - Yilin Chen
- Big Data & Intelligent System Research Group, Dalian Jiaotong University, Dalian, China
| | - Pan Liang
- Big Data & Intelligent System Research Group, Dalian Jiaotong University, Dalian, China
| | - Xiaohui Wan
- Big Data & Intelligent System Research Group, Dalian Jiaotong University, Dalian, China
| | - Yunxian Cui
- Big Data & Intelligent System Research Group, Dalian Jiaotong University, Dalian, China
| |
Collapse
|
14
|
Ramp loss KNN-weighted multi-class twin support vector machine. Soft comput 2022. [DOI: 10.1007/s00500-022-07040-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
15
|
Huang K, Wang X. CCR-GSVM: A boundary data generation algorithm for support vector machine in imbalanced majority noise problem. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03408-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
16
|
Hazarika BB, Gupta D. 1-Norm random vector functional link networks for classification problems. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00668-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractThis paper presents a novel random vector functional link (RVFL) formulation called the 1-norm RVFL (1N RVFL) networks, for solving the binary classification problems. The solution to the optimization problem of 1N RVFL is obtained by solving its exterior dual penalty problem using a Newton technique. The 1-norm makes the model robust and delivers sparse outputs, which is the fundamental advantage of this model. The sparse output indicates that most of the elements in the output matrix are zero; hence, the decision function can be achieved by incorporating lesser hidden nodes compared to the conventional RVFL model. 1N RVFL produces a classifier that is based on a smaller number of input features. To put it another way, this method will suppress the neurons in the hidden layer. Statistical analyses have been carried out on several real-world benchmark datasets. The proposed 1N RVFL with two activation functions viz., ReLU and sine are used in this work. The classification accuracies of 1N RVFL are compared with the extreme learning machine (ELM), kernel ridge regression (KRR), RVFL, kernel RVFL (K-RVFL) and generalized Lagrangian twin RVFL (GLTRVFL) networks. The experimental results with comparable or better accuracy indicate the effectiveness and usability of 1N RVFL for solving binary classification problems.
Collapse
|
17
|
|
18
|
Gupta D, Natarajan N. Prediction of uniaxial compressive strength of rock samples using density weighted least squares twin support vector regression. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06204-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
19
|
Hazarika BB, Gupta D, Borah P. An intuitionistic fuzzy kernel ridge regression classifier for binary classification. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107816] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
20
|
Kumar B, Gupta D. Universum based Lagrangian twin bounded support vector machine to classify EEG signals. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 208:106244. [PMID: 34216880 DOI: 10.1016/j.cmpb.2021.106244] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 06/15/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE The detection of brain-related problems and neurological disorders like epilepsy, sleep disorder, and so on is done by using electroencephalogram (EEG) signals which contain noisy signals and outliers. Universum data contains a set of a sample that does not belong to any of the concerned classes and serves as the advanced knowledge about the data distribution. Earlier information has been utilized viably in improving classification performance. Recently a novel universum support vector machine (USVM) was proposed for EEG signal classification and further, a universum twin support vector machine (UTWSVM) was proposed based on USVM to improve the performance. Inspired by USVM and UTWSVM, this paper suggests a novel method called universum based Lagrangian twin bounded support vector machine (ULTBSVM), where universum data is utilized to incorporate the prior information about the data distribution to classify healthy and seizure EEG signals. METHODS In the proposed ULTBSVM the square of the 2-norm of the slack variables is used to formulate the objective function strongly convex; hence it always gives unique solutions. Unlike twin support vector machine (TWSVM) and universum twin support vector machine (UTWSVM), the proposed ULTBSVM is having regularization terms that follow the structural risk minimization (SRM) principle and enhance the stability in the dual formulations, make the model well-posed and prevents the overfitting problem. Here, interracial EEG data have been considered as universum data to classify healthy and seizure signals. Several feature extraction techniques have been implemented to get important noiseless features. RESULTS Several EEG datasets, as well as publicly available UCI datasets, are utilized to assess the performance of the proposed method. An analytical comparison has been performed of the proposed method with USVM and UTWSVM to detect seizure and healthy signals and for real-world data, the ULTBSVM is compared with the universum based models as well as TWSVM and the proposed method gives better results in most of the cases as compared to the other methods. CONCLUSION The results clearly show that ULTBSVM is a potential method for the classification of EEG signals as well as real-world datasets having interracial data as universum data. Here we have used universum points for the binary class classification problem, but one can extend and use it for multi-class classification problems as well.
Collapse
Affiliation(s)
- Bikram Kumar
- Department of Computer Science and Engineering, National Institute of Technology, Arunachal Pradesh 791112, India
| | - Deepak Gupta
- Department of Computer Science and Engineering, National Institute of Technology, Arunachal Pradesh 791112, India.
| |
Collapse
|
21
|
Data-driven mechanism based on fuzzy Lagrangian twin parametric-margin support vector machine for biomedical data analysis. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05866-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
22
|
Sensor data-driven structural damage detection based on deep convolutional neural networks and continuous wavelet transform. APPL INTELL 2021. [DOI: 10.1007/s10489-020-02092-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|