1
|
Khushal R, Fatima U. Fuzzy machine learning logic utilization on hormonal imbalance dataset. Comput Biol Med 2024; 174:108429. [PMID: 38631116 DOI: 10.1016/j.compbiomed.2024.108429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 04/04/2024] [Accepted: 04/04/2024] [Indexed: 04/19/2024]
Abstract
In this research work, a novel fuzzy data transformation technique has been proposed and applied to the hormonal imbalance dataset. Hormonal imbalance is ubiquitously found principally in females of reproductive age which ultimately leads to numerous related medical conditions. Polycystic Ovary Syndrome (PCOS) is one of them. Treatment along with adopting a healthy lifestyle is advised to mitigate its consequences on the quality of life. The biological dataset of hormonal imbalance "PCOS" provides limited results that is whether the syndrome is present or not. Also, there are input variables that contain binary responses only, to deal with this conundrum, a novel fuzzy data transformation technique has been developed and applied to them thus leading to their fuzzy transformation which provides a broader spectrum to diagnose PCOS. Due to this, the output variable has also been transformed. Hence, a novel fuzzy transformation technique has been employed due to the limitation of the dataset leading to the transition of binary classification output into three classes. An adaptive fuzzy machine learning logic model is developed in which the inference of the transformed biological dataset is performed by the machine learning techniques that provide the fuzzy output. Machine learning techniques have also been applied to the untransformed biological dataset. Both implementations have been compared by computation of the relevant metrics. Machine learning employment on untransformed biological dataset provides limited results whether the syndrome is present or absent however machine learning on fuzzy transformed biological dataset provides a broader spectrum of diagnosis consisting of a third class depicting that PCOS might be present which would ultimately alert a patient to take preventive measures to minimize the chances of syndrome development in future.
Collapse
Affiliation(s)
- Rabia Khushal
- Department of Mathematics, NED University of Engineering & Technology, Pakistan.
| | - Ubaida Fatima
- Department of Mathematics, NED University of Engineering & Technology, Pakistan.
| |
Collapse
|
2
|
Rebalance Weights AdaBoost-SVM Model for Imbalanced Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023. [DOI: 10.1155/2023/4860536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Classification of imbalanced data is a challenging task that has captured considerable interest in numerous scientific fields by virtue of the great practical value of minority accuracy. Some methods for improving generalization performance have been developed to address this classification situation. Here, we propose a cost-sensitive ensemble learning method using a support vector machine as a base learner of AdaBoost for classifying imbalanced data. Considering that the existing methods are not well studied in terms of how to precisely control the classification accuracy of the minority class, we developed a novel way to rebalance the weights of AdaBoost, and the weights influence the base learner training. This weighting strategy increases the sample weight of the misclassified minority while decreasing the sample weight of the misclassified majority until their distributions are even in each round. Furthermore, we included P-mean as one of the assessment markers and discussed why it is necessary. Experiments were conducted to compare the proposed and comparison 10 models on 18 datasets in terms of six different metrics. Through comprehensive experimental findings, the statistical study is performed to verify the efficacy and usability of the proposed model.
Collapse
|
3
|
Online Learning Approach Based on Recursive Formulation for Twin Support Vector Machine and Sparse Pinball Twin Support Vector Machine. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11084-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
4
|
Distance-based arranging oversampling technique for imbalanced data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07828-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
5
|
Frequency component Kernel for SVM. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07632-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
6
|
Zhou K, Zhang Q, Li J. TSVMPath: Fast Regularization Parameter Tuning Algorithm for Twin Support Vector Machine. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10870-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
7
|
Pan H, Sheng L, Xu H, Tong J, Zheng J, Liu Q. Pinball transfer support matrix machine for roller bearing fault diagnosis under limited annotation data. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109209] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
8
|
Xie F, Xu Y, Ma M, Pang X. A safe acceleration method for multi-task twin support vector machine. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-021-01481-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
9
|
Inequality distance hyperplane multiclass support vector machines. INT J INTELL SYST 2021. [DOI: 10.1002/int.22764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
10
|
|
11
|
Laxmi S, Gupta SK, Kumar S. Intuitionistic fuzzy proximal support vector machine for multicategory classification problems. Soft comput 2021. [DOI: 10.1007/s00500-021-06193-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
12
|
Zhang B, Shang P. Cumulative Permuted Fractional Entropy and its Applications. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:4946-4955. [PMID: 33021947 DOI: 10.1109/tnnls.2020.3026424] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Fractional calculus and entropy are two essential mathematical tools, and their conceptions support a productive interplay in the study of system dynamics and machine learning. In this article, we modify the fractional entropy and propose the cumulative permuted fractional entropy (CPFE). A theoretical analysis is provided to prove that CPFE not only meets the basic properties of the Shannon entropy but also has unique characteristics of its own. We apply it to typical discrete distributions, simulated data, and real-world data to prove its efficiency in the application. This article demonstrates that CPFE can measure the complexity and uncertainty of complex systems so that it can perform reliable and accurate classification. Finally, we introduce CPFE to support vector machines (SVMs) and get CPFE-SVM. The CPFE can be used to process data to make the irregular data linearly separable. Compared with the other five state-of-the-art algorithms, CPFE-SVM has significantly higher accuracy and less computational burden. Therefore, the CPFE-SVM is especially suitable for the classification of irregular large-scale data sets. Also, it is insensitive to noise. Implications of the results and future research directions are also presented.
Collapse
|
13
|
Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11188546] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.
Collapse
|
14
|
Kumar B, Gupta D. Universum based Lagrangian twin bounded support vector machine to classify EEG signals. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 208:106244. [PMID: 34216880 DOI: 10.1016/j.cmpb.2021.106244] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 06/15/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE The detection of brain-related problems and neurological disorders like epilepsy, sleep disorder, and so on is done by using electroencephalogram (EEG) signals which contain noisy signals and outliers. Universum data contains a set of a sample that does not belong to any of the concerned classes and serves as the advanced knowledge about the data distribution. Earlier information has been utilized viably in improving classification performance. Recently a novel universum support vector machine (USVM) was proposed for EEG signal classification and further, a universum twin support vector machine (UTWSVM) was proposed based on USVM to improve the performance. Inspired by USVM and UTWSVM, this paper suggests a novel method called universum based Lagrangian twin bounded support vector machine (ULTBSVM), where universum data is utilized to incorporate the prior information about the data distribution to classify healthy and seizure EEG signals. METHODS In the proposed ULTBSVM the square of the 2-norm of the slack variables is used to formulate the objective function strongly convex; hence it always gives unique solutions. Unlike twin support vector machine (TWSVM) and universum twin support vector machine (UTWSVM), the proposed ULTBSVM is having regularization terms that follow the structural risk minimization (SRM) principle and enhance the stability in the dual formulations, make the model well-posed and prevents the overfitting problem. Here, interracial EEG data have been considered as universum data to classify healthy and seizure signals. Several feature extraction techniques have been implemented to get important noiseless features. RESULTS Several EEG datasets, as well as publicly available UCI datasets, are utilized to assess the performance of the proposed method. An analytical comparison has been performed of the proposed method with USVM and UTWSVM to detect seizure and healthy signals and for real-world data, the ULTBSVM is compared with the universum based models as well as TWSVM and the proposed method gives better results in most of the cases as compared to the other methods. CONCLUSION The results clearly show that ULTBSVM is a potential method for the classification of EEG signals as well as real-world datasets having interracial data as universum data. Here we have used universum points for the binary class classification problem, but one can extend and use it for multi-class classification problems as well.
Collapse
Affiliation(s)
- Bikram Kumar
- Department of Computer Science and Engineering, National Institute of Technology, Arunachal Pradesh 791112, India
| | - Deepak Gupta
- Department of Computer Science and Engineering, National Institute of Technology, Arunachal Pradesh 791112, India.
| |
Collapse
|
15
|
Abstract
AbstractThe twin support vector machine improves the classification performance of the support vector machine by solving two small quadratic programming problems. However, this method has the following defects: (1) For the twin support vector machine and some of its variants, the constructed models use a hinge loss function, which is sensitive to noise and unstable in resampling. (2) The models need to be converted from the original space to the dual space, and their time complexity is high. To further enhance the performance of the twin support vector machine, the pinball loss function is introduced into the twin bounded support vector machine, and the problem of the pinball loss function not being differentiable at zero is solved by constructing a smooth approximation function. Based on this, a smooth twin bounded support vector machine model with pinball loss is obtained. The model is solved iteratively in the original space using the Newton-Armijo method. A smooth twin bounded support vector machine algorithm with pinball loss is proposed, and theoretically the convergence of the iterative algorithm is proven. In the experiments, the proposed algorithm is validated on the UCI datasets and the artificial datasets. Furthermore, the performance of the presented algorithm is compared with those of other representative algorithms, thereby demonstrating the effectiveness of the proposed algorithm.
Collapse
|
16
|
Xu W, Huang D, Zhou S. Universal consistency of twin support vector machines. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01281-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractA classification problem aims at constructing a best classifier with the smallest risk. When the sample size approaches infinity, the learning algorithms for a classification problem are characterized by an asymptotical property, i.e., universal consistency. It plays a crucial role in measuring the construction of classification rules. A universal consistent algorithm ensures that the larger the sample size of the algorithm is, the more accurately the distribution of the samples could be reconstructed. Support vector machines (SVMs) are regarded as one of the most important models in binary classification problems. How to effectively extend SVMs to twin support vector machines (TWSVMs) so as to improve performance of classification has gained increasing interest in many research areas recently. Many variants for TWSVMs have been proposed and used in practice. Thus in this paper, we focus on the universal consistency of TWSVMs in a binary classification setting. We first give a general framework for TWSVM classifiers that unifies most of the variants of TWSVMs for binary classification problems. Based on it, we then investigate the universal consistency of TWSVMs. To do this, we give some useful definitions of risk, Bayes risk and universal consistency for TWSVMs. Theoretical results indicate that universal consistency is valid for various TWSVM classifiers under some certain conditions, including covering number, localized covering number and stability. For applications of our general framework, several variants of TWSVMs are considered.
Collapse
|
17
|
Xie F, Pang X, Xu Y. Pinball loss-based multi-task twin support vector machine and its safe acceleration method. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06173-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
18
|
Wang Z, Tsai CF, Lin WC. Data cleaning issues in class imbalanced datasets: instance selection and missing values imputation for one-class classifiers. DATA TECHNOLOGIES AND APPLICATIONS 2021. [DOI: 10.1108/dta-01-2021-0027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeClass imbalance learning, which exists in many domain problem datasets, is an important research topic in data mining and machine learning. One-class classification techniques, which aim to identify anomalies as the minority class from the normal data as the majority class, are one representative solution for class imbalanced datasets. Since one-class classifiers are trained using only normal data to create a decision boundary for later anomaly detection, the quality of the training set, i.e. the majority class, is one key factor that affects the performance of one-class classifiers.Design/methodology/approachIn this paper, we focus on two data cleaning or preprocessing methods to address class imbalanced datasets. The first method examines whether performing instance selection to remove some noisy data from the majority class can improve the performance of one-class classifiers. The second method combines instance selection and missing value imputation, where the latter is used to handle incomplete datasets that contain missing values.FindingsThe experimental results are based on 44 class imbalanced datasets; three instance selection algorithms, including IB3, DROP3 and the GA, the CART decision tree for missing value imputation, and three one-class classifiers, which include OCSVM, IFOREST and LOF, show that if the instance selection algorithm is carefully chosen, performing this step could improve the quality of the training data, which makes one-class classifiers outperform the baselines without instance selection. Moreover, when class imbalanced datasets contain some missing values, combining missing value imputation and instance selection, regardless of which step is first performed, can maintain similar data quality as datasets without missing values.Originality/valueThe novelty of this paper is to investigate the effect of performing instance selection on the performance of one-class classifiers, which has never been done before. Moreover, this study is the first attempt to consider the scenario of missing values that exist in the training set for training one-class classifiers. In this case, performing missing value imputation and instance selection with different orders are compared.
Collapse
|
19
|
Borah P, Gupta D. Robust twin bounded support vector machines for outliers and imbalanced data. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01847-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
20
|
State of Charge Estimation for Lithium-Ion Power Battery Based on H-Infinity Filter Algorithm. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10186371] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
To accurately estimate the state of charge (SOC) of lithium-ion power batteries in the event of errors in the battery model or unknown external noise, an SOC estimation method based on the H-infinity filter (HIF) algorithm is proposed in this paper. Firstly, a fractional-order battery model based on a dual polarization equivalent circuit model is established. Then, the parameters of the fractional-order battery model are identified by the hybrid particle swarm optimization (HPSO) algorithm, based on a genetic crossover factor. Finally, the accuracy of the SOC estimation results of the lithium-ion batteries, using the HIF algorithm and extended Kalman filter (EKF) algorithm, are verified and compared under three conditions: uncertain measurement accuracy, uncertain SOC initial value, and uncertain application conditions. The simulation results show that the SOC estimation method based on HIF can ensure that the SOC estimation error value fluctuates within ±0.02 in any case, and is slightly affected by environmental and other factors. It provides a way to improve the accuracy of SOC estimation in a battery management system.
Collapse
|
21
|
Borah P, Gupta D. Unconstrained convex minimization based implicit Lagrangian twin extreme learning machine for classification (ULTELMC). APPL INTELL 2020. [DOI: 10.1007/s10489-019-01596-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
22
|
Cho P, Lee M, Chang W. Instance-based entropy fuzzy support vector machine for imbalanced data. Pattern Anal Appl 2019. [DOI: 10.1007/s10044-019-00851-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
23
|
Borah P, Gupta D. Functional iterative approaches for solving support vector classification problems based on generalized Huber loss. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04436-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
24
|
Unconstrained convex minimization based implicit Lagrangian twin random vector Functional-link networks for binary classification (ULTRVFLC). Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105534] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
25
|
Entropy-Based Fuzzy Least Squares Twin Support Vector Machine for Pattern Classification. Neural Process Lett 2019. [DOI: 10.1007/s11063-019-10078-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
26
|
|