1
|
Rezvani S, Wu J. Handling Multi-Class Problem by Intuitionistic Fuzzy Twin Support Vector Machines Based on Relative Density Information. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2023; 45:14653-14664. [PMID: 37651498 DOI: 10.1109/tpami.2023.3310908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
The intuitionistic fuzzy twin support vector machine (IFTSVM) merges the idea of the intuitionistic fuzzy set (IFS) with the twin support vector machine (TSVM), which can reduce the negative impact of noise and outliers. However, this technique is not suitable for multi-class and high-dimensional feature space problems. Furthermore, the computational complexity of IFTSVM is high because it uses the membership and non-membership functions to build a score function. We propose a new version of IFTSVM by using relative density information. This idea approximates the probability density distribution in multi-dimensional continuous space by computing the K-nearest-neighbor distance of each training sample. Then, we evaluate all the training points by a one-versus-one-versus-rest strategy to construct the k-class classification hyperplanes. A coordinate descent system is utilized to reduce the computational complexity of the training. The bootstrap technique with a 95 % confidence interval and Friedman test are conducted to quantify the significance of the performance improvements observed in numerical evaluations. Experiments on 24 benchmark datasets demonstrate the proposed method produces promising results as compared with other support vector machine models reported in the literature.
Collapse
|
2
|
Fu S, Su D, Li S, Sun S, Tian Y. Linear-exponential loss incorporated deep learning for imbalanced classification. ISA TRANSACTIONS 2023; 140:279-292. [PMID: 37385859 DOI: 10.1016/j.isatra.2023.06.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/01/2023]
Abstract
The class imbalance issue is a pretty common and enduring topic all the time. When encountering unbalanced data distribution, conventional methods are prone to classify minority samples as majority ones, which may cause severe consequences in reality. It is crucial yet challenging to cope with such problems. In this paper, inspired by our previous work, we borrow the linear-exponential (LINEX) loss function in statistics into deep learning for the first time and extend it into a multi-class form, denoted as DLINEX. Compared with existing loss functions in class imbalance learning (e.g., the weighted cross entropy-loss and the focal loss), DLINEX has an asymmetric geometry interpretation, which can adaptively focus more on the minority and hard-to-classify samples by solely adjusting one parameter. Besides, it simultaneously achieves between and within class diversities via caring about the inherent properties of each instance. As a result, DLINEX achieves 42.08% G-means on the CIFAR-10 dataset at the imbalance ratio of 200, 79.06% G-means on the HAM10000 dataset, 82.74% F1 on the DRIVE dataset, 83.93% F1 on the CHASEDB1 dataset and 79.55% F1 on the STARE dataset The quantitative and qualitative experiments convincingly demonstrate that DLINEX can work favorably in imbalanced classifications, either at the image-level or the pixel-level.
Collapse
Affiliation(s)
- Saiji Fu
- School of Economics and Management, Beijing University of Posts and Telecommunications, No. 10 Xitucheng Road, Haidian District, Beijing, 100876, China.
| | - Duo Su
- School of Computer Science and Technology, University of Chinese Academy of Sciences, No. 19 (A) Yuquan Road, Shijingshan District, Beijing, 100049, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China.
| | - Shilin Li
- School of Mathematics, Renmin University of China, No. 59 Zhongguancun Street, Haidian District, Beijing, 100872, China.
| | - Shiding Sun
- School of Mathematical Sciences, University of Chinese Academy of Sciences, No. 19 (A) Yuquan Road, Shijingshan District, Beijing, 100049, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China.
| | - Yingjie Tian
- School of Economics and Management, University of Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, No. 80 of Zhongguancun East Road, Haidian District, Beijing, 100190, China; MOE Social Science Laboratory of Digital Economic Forecasts and Policy Simulation at UCAS, No. 3 of Zhongguancun South Street 1, Haidian District, Beijing, 100190, China.
| |
Collapse
|
3
|
Dixit A, Jain S. Intuitionistic fuzzy time series forecasting method for non-stationary time series data with suitable number of clusters and different window size for fuzzy rule generation. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2022.12.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
5
|
Fu C, Zhou S, Zhang D, Chen L. Relative Density-Based Intuitionistic Fuzzy SVM for Class Imbalance Learning. ENTROPY (BASEL, SWITZERLAND) 2022; 25:34. [PMID: 36673175 PMCID: PMC9857943 DOI: 10.3390/e25010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 12/20/2022] [Accepted: 12/21/2022] [Indexed: 06/17/2023]
Abstract
The support vector machine (SVM) has been combined with the intuitionistic fuzzy set to suppress the negative impact of noises and outliers in classification. However, it has some inherent defects, resulting in the inaccurate prior distribution estimation for datasets, especially the imbalanced datasets with non-normally distributed data, further reducing the performance of the classification model for imbalance learning. To solve these problems, we propose a novel relative density-based intuitionistic fuzzy support vector machine (RIFSVM) algorithm for imbalanced learning in the presence of noise and outliers. In our proposed algorithm, the relative density, which is estimated by adopting the k-nearest-neighbor distances, is used to calculate the intuitionistic fuzzy numbers. The fuzzy values of the majority class instances are designed by multiplying the score function of the intuitionistic fuzzy number by the imbalance ratio, and the fuzzy values of minority class instances are assigned the intuitionistic fuzzy membership degree. With the help of the strong capture ability of the relative density to prior information and the strong recognition ability of the intuitionistic fuzzy score function to noises and outliers, the proposed RIFSVM not only reduces the influence of class imbalance but also suppresses the impact of noises and outliers, and further improves the classification performance. Experiments on the synthetic and public imbalanced datasets show that our approach has better performance in terms of G-Means, F-Measures, and AUC than the other class imbalance classification algorithms.
Collapse
Affiliation(s)
- Cui Fu
- School of Mathematics and Statistics, Xi’dian University, Xi’an 710071, China
| | - Shuisheng Zhou
- School of Mathematics and Statistics, Xi’dian University, Xi’an 710071, China
| | - Dan Zhang
- School of Mathematics and Statistics, Xi’dian University, Xi’an 710071, China
| | - Li Chen
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| |
Collapse
|
11
|
Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11188546] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.
Collapse
|