1
|
ESMOTE: an overproduce-and-choose synthetic examples generation strategy based on evolutionary computation. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08004-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
|
2
|
|
3
|
García-Pedrajas N, Cerruela-García G. MABUSE: A margin optimization based feature subset selection algorithm using boosting principles. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
4
|
Gong C, Su ZG, Wang PH, Wang Q, You Y. Evidential instance selection for K-nearest neighbor classification of big data. Int J Approx Reason 2021. [DOI: 10.1016/j.ijar.2021.08.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
5
|
Comparison of Instance Selection and Construction Methods with Various Classifiers. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10113933] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Instance selection and construction methods were originally designed to improve the performance of the k-nearest neighbors classifier by increasing its speed and improving the classification accuracy. These goals were achieved by eliminating redundant and noisy samples, thus reducing the size of the training set. In this paper, the performance of instance selection methods is investigated in terms of classification accuracy and reduction of training set size. The classification accuracy of the following classifiers is evaluated: decision trees, random forest, Naive Bayes, linear model, support vector machine and k-nearest neighbors. The obtained results indicate that for the most of the classifiers compressing the training set affects prediction performance and only a small group of instance selection methods can be recommended as a general purpose preprocessing step. These are learning vector quantization based algorithms, along with the Drop2 and Drop3. Other methods are less efficient or provide low compression ratio.
Collapse
|
6
|
Zhu Z, Wang Z, Li D, Du W. NearCount: Selecting critical instances based on the cited counts of nearest neighbors. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2019.105196] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
7
|
A parameter-free hybrid instance selection algorithm based on local sets with natural neighbors. APPL INTELL 2020. [DOI: 10.1007/s10489-019-01598-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
8
|
Feng W, Dauphin G, Huang W, Quan Y, Liao W. New margin-based subsampling iterative technique in modified random forests for classification. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.07.016] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
9
|
Kasemtaweechok C, Suwannik W. Adaptive geometric median prototype selection method for k-nearest neighbors classification. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-184190] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
10
|
Yang L, Zhu Q, Huang J, Wu Q, Cheng D, Hong X. Constraint nearest neighbor for instance reduction. Soft comput 2019. [DOI: 10.1007/s00500-019-03865-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
11
|
Rodriguez-Mier P, Mucientes M, Bugarín A. Feature Selection and Evolutionary Rule Learning for Big Data in Smart Building Energy Management. Cognit Comput 2019. [DOI: 10.1007/s12559-019-09630-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
12
|
Kordos M, Łapa K. Multi-Objective Evolutionary Instance Selection for Regression Tasks. ENTROPY 2018; 20:e20100746. [PMID: 33265835 PMCID: PMC7512309 DOI: 10.3390/e20100746] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2018] [Revised: 09/20/2018] [Accepted: 09/25/2018] [Indexed: 11/16/2022]
Abstract
The purpose of instance selection is to reduce the data size while preserving as much useful information stored in the data as possible and detecting and removing the erroneous and redundant information. In this work, we analyze instance selection in regression tasks and apply the NSGA-II multi-objective evolutionary algorithm to direct the search for the optimal subset of the training dataset and the k-NN algorithm for evaluating the solutions during the selection process. A key advantage of the method is obtaining a pool of solutions situated on the Pareto front, where each of them is the best for certain RMSE-compression balance. We discuss different parameters of the process and their influence on the results and put special efforts to reducing the computational complexity of our approach. The experimental evaluation proves that the proposed method achieves good performance in terms of minimization of prediction error and minimization of dataset size.
Collapse
Affiliation(s)
- Mirosław Kordos
- Department of Computer Science and Automatics, University of Bielsko-Biała, ul. Willowa 2, 43-309 Bielsko-Biała, Poland
- Correspondence:
| | - Krystian Łapa
- Institute of Computational Intelligence, Częstochowa University of Technology, 42-201 Częstochowa, Poland
| |
Collapse
|
13
|
|
14
|
|
15
|
Class Imbalance Ensemble Learning Based on the Margin Theory. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8050815] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
16
|
Instance Selection for Classifier Performance Estimation in Meta Learning. ENTROPY 2017. [DOI: 10.3390/e19110583] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
17
|
Prototype Generation Using Self-Organizing Maps for Informativeness-Based Classifier. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2017; 2017:4263064. [PMID: 28811818 PMCID: PMC5547710 DOI: 10.1155/2017/4263064] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Revised: 06/13/2017] [Accepted: 06/15/2017] [Indexed: 11/17/2022]
Abstract
The k nearest neighbor is one of the most important and simple procedures for data classification task. The kNN, as it is called, requires only two parameters: the number of k and a similarity measure. However, the algorithm has some weaknesses that make it impossible to be used in real problems. Since the algorithm has no model, an exhaustive comparison of the object in classification analysis and all training dataset is necessary. Another weakness is the optimal choice of k parameter when the object analyzed is in an overlap region. To mitigate theses negative aspects, in this work, a hybrid algorithm is proposed which uses the Self-Organizing Maps (SOM) artificial neural network and a classifier that uses similarity measure based on information. Since SOM has the properties of vector quantization, it is used as a Prototype Generation approach to select a reduced training dataset for the classification approach based on the nearest neighbor rule with informativeness measure, named iNN. The SOMiNN combination was exhaustively experimented and the results show that the proposed approach presents important accuracy in databases where the border region does not have the object classes well defined.
Collapse
|
18
|
Arnaiz-González Á, Díez-Pastor JF, Rodríguez JJ, García-Osorio C. Instance selection of linear complexity for big data. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.05.056] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
19
|
Rodríguez-Fdez I, Mucientes M, Bugarín A. FRULER: Fuzzy Rule Learning through Evolution for Regression. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.03.012] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
20
|
Arnaiz-González Á, Díez-Pastor JF, Rodríguez JJ, García-Osorio C. Instance selection for regression: Adapting DROP. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.04.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
21
|
Stimuli-Magnitude-Adaptive Sample Selection for Data-Driven Haptic Modeling. ENTROPY 2016. [DOI: 10.3390/e18060222] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
22
|
Xia W, Mita Y, Shibata T. A Nearest Neighbor Classifier Employing Critical Boundary Vectors for Efficient On-Chip Template Reduction. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:1094-1107. [PMID: 26080388 DOI: 10.1109/tnnls.2015.2437901] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Aiming at efficient data condensation and improving accuracy, this paper presents a hardware-friendly template reduction (TR) method for the nearest neighbor (NN) classifiers by introducing the concept of critical boundary vectors. A hardware system is also implemented to demonstrate the feasibility of using an field-programmable gate array (FPGA) to accelerate the proposed method. Initially, k -means centers are used as substitutes for the entire template set. Then, to enhance the classification performance, critical boundary vectors are selected by a novel learning algorithm, which is completed within a single iteration. Moreover, to remove noisy boundary vectors that can mislead the classification in a generalized manner, a global categorization scheme has been explored and applied to the algorithm. The global characterization automatically categorizes each classification problem and rapidly selects the boundary vectors according to the nature of the problem. Finally, only critical boundary vectors and k -means centers are used as the new template set for classification. Experimental results for 24 data sets show that the proposed algorithm can effectively reduce the number of template vectors for classification with a high learning speed. At the same time, it improves the accuracy by an average of 2.17% compared with the traditional NN classifiers and also shows greater accuracy than seven other TR methods. We have shown the feasibility of using a proof-of-concept FPGA system of 256 64-D vectors to accelerate the proposed method on hardware. At a 50-MHz clock frequency, the proposed system achieves a 3.86 times higher learning speed than on a 3.4-GHz PC, while consuming only 1% of the power of that used by the PC.
Collapse
|
23
|
Tutorial on practical tips of the most influential data preprocessing algorithms in data mining. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2015.12.006] [Citation(s) in RCA: 146] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
24
|
Li X, Ouyang J, Zhou X. A kernel-based centroid classifier using hypothesis margin. J EXP THEOR ARTIF IN 2015. [DOI: 10.1080/0952813x.2015.1042924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
25
|
|
26
|
|
27
|
Leyva E, Caises Y, González A, Pérez R. On the use of meta-learning for instance selection: An architecture and an experimental study. Inf Sci (N Y) 2014. [DOI: 10.1016/j.ins.2014.01.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
28
|
Hamidzadeh J, Monsefi R, Sadoghi Yazdi H. Large symmetric margin instance selection algorithm. INT J MACH LEARN CYB 2014. [DOI: 10.1007/s13042-014-0239-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
Nikolaidis K, Mu T, Goulermas J. Prototype reduction based on Direct Weighted Pruning. Pattern Recognit Lett 2014. [DOI: 10.1016/j.patrec.2013.08.022] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
30
|
Leyva E, González A, Pérez R. Knowledge-based instance selection: A compromise between efficiency and versatility. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2013.04.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
31
|
García S, Derrac J, Cano JR, Herrera F. Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2012; 34:417-35. [PMID: 21768651 DOI: 10.1109/tpami.2011.142] [Citation(s) in RCA: 161] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
The nearest neighbor classifier is one of the most used and well-known techniques for performing recognition tasks. It has also demonstrated itself to be one of the most useful algorithms in data mining in spite of its simplicity. However, the nearest neighbor classifier suffers from several drawbacks such as high storage requirements, low efficiency in classification response, and low noise tolerance. These weaknesses have been the subject of study for many researchers and many solutions have been proposed. Among them, one of the most promising solutions consists of reducing the data used for establishing a classification rule (training data) by means of selecting relevant prototypes. Many prototype selection methods exist in the literature and the research in this area is still advancing. Different properties could be observed in the definition of them, but no formal categorization has been established yet. This paper provides a survey of the prototype selection methods proposed in the literature from a theoretical and empirical point of view. Considering a theoretical point of view, we propose a taxonomy based on the main characteristics presented in prototype selection and we analyze their advantages and drawbacks. Empirically, we conduct an experimental study involving different sizes of data sets for measuring their performance in terms of accuracy, reduction capabilities, and runtime. The results obtained by all the methods studied have been verified by nonparametric statistical tests. Several remarks, guidelines, and recommendations are made for the use of prototype selection for nearest neighbor classification.
Collapse
|
32
|
Triguero I, Derrac J, Garcia S, Herrera F. A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification. ACTA ACUST UNITED AC 2012. [DOI: 10.1109/tsmcc.2010.2103939] [Citation(s) in RCA: 186] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
33
|
|
34
|
Li Y, Maguire L. Selecting critical patterns based on local geometrical and statistical information. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2011; 33:1189-1201. [PMID: 21493967 DOI: 10.1109/tpami.2010.188] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Pattern selection methods have been traditionally developed with a dependency on a specific classifier. In contrast, this paper presents a method that selects critical patterns deemed to carry essential information applicable to train those types of classifiers which require spatial information of the training data set. Critical patterns include those edge patterns that define the boundary and those border patterns that separate classes. The proposed method selects patterns from a new perspective, primarily based on their location in input space. It determines class edge patterns with the assistance of the approximated tangent hyperplane of a class surface. It also identifies border patterns between classes using local probability. The proposed method is evaluated on benchmark problems using popular classifiers, including multilayer perceptrons, radial basis functions, support vector machines, and nearest neighbors. The proposed approach is also compared with four state-of-the-art approaches and it is shown to provide similar but more consistent accuracy from a reduced data set. Experimental results demonstrate that it selects patterns sufficient to represent class boundary and to preserve the decision surface.
Collapse
Affiliation(s)
- Yuhua Li
- School of Computing and Intelligent Systems, University of Ulster, Londonderry BT487JL, UK.
| | | |
Collapse
|