1
|
Du M, Zhao J, Sun J, Dong Y. M3W: Multistep Three-Way Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:5627-5640. [PMID: 36173778 DOI: 10.1109/tnnls.2022.3208418] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Three-way clustering has been an active research topic in the field of cluster analysis in recent years. Some efforts are focused on the technique due to its feasibility and rationality. We observe, however, that the existing three-way clustering algorithms struggle to obtain more information and limit the fault tolerance excessively. Moreover, although the one-step three-way allocation based on a pair of fixed, global thresholds is the most straightforward way to generate the three-way cluster representations, the clusters derived from a pair of global thresholds cannot exactly reveal the inherent clustering structure of the dataset, and the threshold values are often difficult to determine beforehand. Inspired by sequential three-way decisions, we propose an algorithm, called multistep three-way clustering (M3W), to address these issues. Specifically, we first use a progressive erosion strategy to construct a multilevel structure of data, so that lower levels (or external layers) can gather more available information from higher levels (or internal layers). Then, we further propose a multistep three-way allocation strategy, which sufficiently considers the neighborhood information of every eroded instance. We use the allocation strategy in combination with the multilevel structure to ensure that more information is gradually obtained to increase the probability of being assigned correctly, capturing adaptively the inherent clustering structure of the dataset. The proposed algorithm is compared with eight competitors using 18 benchmark datasets. Experimental results show that M3W achieves superior performance, verifying its advantages and effectiveness.
Collapse
|
2
|
Wang Z, Wang H, Du H, Chen S, Shi X. A novel density peaks clustering algorithm for automatic selection of clustering centers based on K-nearest neighbors. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:11875-11894. [PMID: 37501424 DOI: 10.3934/mbe.2023528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
The density peak clustering algorithm (DPC) requires manual determination of cluster centers, and poor performance on complex datasets with varying densities or non-convexity. Hence, a novel density peak clustering algorithm is proposed for the automatic selection of clustering centers based on K-nearest neighbors (AKDPC). First, the AKDPC classifies samples according to their mutual K-nearest neighbor values into core and non-core points. Second, the AKDPC uses the average distance of K nearest neighbors of a sample as its density. The smaller the average distance is, the higher the density. Subsequently, it selects the highest density sample among all unclassified core points as a center of the new cluster, and the core points that satisfy the merging condition are added to the cluster until no core points satisfy the condition. Afterwards, the above steps are repeated to complete the clustering of all core points. Lastly, the AKDPC labels the unclassified non-core points similar to the nearest points that have been classified. In addition, to prove the validity of AKDPC, experiments on manual and real datasets are conducted. By comparing the AKDPC with classical clustering algorithms and excellent DPC-variants, this paper demonstrates that AKDPC presents higher accuracy.
Collapse
Affiliation(s)
- Zhihe Wang
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Huan Wang
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Hui Du
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Shiyin Chen
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| | - Xinxin Shi
- The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
| |
Collapse
|
3
|
Chen Y, Zhu P, Li Q, Yao Y. Granularity-driven trisecting-and-learning models for interval-valued rule induction. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04468-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
|
4
|
Shah A, Ali B, Habib M, Frnda J, Ullah I, Shahid Anwar M. An Ensemble Face Recognition Mechanism based on Three-way Decisions. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2023. [DOI: 10.1016/j.jksuci.2023.03.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
5
|
Sun C, Du M, Sun J, Li K, Dong Y. A three-way clustering method based on improved density peaks algorithm and boundary detection graph. Int J Approx Reason 2023. [DOI: 10.1016/j.ijar.2022.12.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
6
|
Stepaniuk J, Skowron A. Three-way approximation of decision granules based on the rough set approach. Int J Approx Reason 2023. [DOI: 10.1016/j.ijar.2023.01.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
7
|
Xu J, Xin P, Zhang Y. Three-way neighborhood based stream computing for incomplete hybrid information system. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
8
|
A review of sequential three-way decision and multi-granularity learning. Int J Approx Reason 2022. [DOI: 10.1016/j.ijar.2022.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
9
|
|
10
|
Fan J, Wang P, Jiang C, Yang X, Song J. Ensemble learning using three-way density-sensitive spectral clustering. Int J Approx Reason 2022. [DOI: 10.1016/j.ijar.2022.07.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
11
|
|
12
|
An Improved Three-Way K-Means Algorithm by Optimizing Cluster Centers. Symmetry (Basel) 2022. [DOI: 10.3390/sym14091821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Most of data set can be represented in an asymmetric matrix. How to mine the uncertain information from the matrix is the primary task of data processing. As a typical unsupervised learning method, three-way k-means clustering algorithm uses core region and fringe region to represent clusters, which can effectively deal with the problem of inaccurate decision-making caused by inaccurate information or insufficient data. However, same with k-means algorithm, three-way k-means also has the problems that the clustering results are dependent on the random selection of clustering centers and easy to fall into the problem of local optimization. In order to solve this problem, this paper presents an improved three-way k-means algorithm by integrating ant colony algorithm and three-way k-means. Through using the random probability selection strategy and the positive and negative feedback mechanism of pheromone in ant colony algorithm, the sensitivity of the three k-means clustering algorithms to the initial clustering center is optimized through continuous updating iterations, so as to avoid the clustering results easily falling into local optimization. Dynamically adjust the weights of the core domain and the boundary domain to avoid the influence of artificially set parameters on the clustering results. The experiments on UCI data sets show that the proposed algorithm can improve the performances of three-way k-means clustering results and is effective in revealing cluster structures.
Collapse
|
13
|
Ma J, Hao Z, Hu M. TMsDP: two-stage density peak clustering based on multi-strategy optimization. DATA TECHNOLOGIES AND APPLICATIONS 2022. [DOI: 10.1108/dta-08-2021-0222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThe density peak clustering algorithm (DP) is proposed to identify cluster centers by two parameters, i.e. ρ value (local density) and δ value (the distance between a point and another point with a higher ρ value). According to the center-identifying principle of the DP, the potential cluster centers should have a higher ρ value and a higher δ value than other points. However, this principle may limit the DP from identifying some categories with multi-centers or the centers in lower-density regions. In addition, the improper assignment strategy of the DP could cause a wrong assignment result for the non-center points. This paper aims to address the aforementioned issues and improve the clustering performance of the DP.Design/methodology/approachFirst, to identify as many potential cluster centers as possible, the authors construct a point-domain by introducing the pinhole imaging strategy to extend the searching range of the potential cluster centers. Second, they design different novel calculation methods for calculating the domain distance, point-domain density and domain similarity. Third, they adopt domain similarity to achieve the domain merging process and optimize the final clustering results.FindingsThe experimental results on analyzing 12 synthetic data sets and 12 real-world data sets show that two-stage density peak clustering based on multi-strategy optimization (TMsDP) outperforms the DP and other state-of-the-art algorithms.Originality/valueThe authors propose a novel DP-based clustering method, i.e. TMsDP, and transform the relationship between points into that between domains to ultimately further optimize the clustering performance of the DP.
Collapse
|
14
|
Adaptive Correlation Integration for Deep Image Clustering. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
15
|
Abstract
The complexity of the data type and distribution leads to the increase in uncertainty in the relationship between samples, which brings challenges to effectively mining the potential cluster structure of data. Ensemble clustering aims to obtain a unified cluster division by fusing multiple different base clustering results. This paper proposes a three-way ensemble clustering algorithm based on sample’s perturbation theory to solve the problem of inaccurate decision making caused by inaccurate information or insufficient data. The algorithm first combines the natural nearest neighbor algorithm to generate two sets of perturbed data sets, randomly extracts the feature subsets of the samples, and uses the traditional clustering algorithm to obtain different base clusters. The sample’s stability is obtained by using the co-association matrix and determinacy function, and then the samples can be divided into a stable region and unstable region according to a threshold for the sample’s stability. The stable region consists of high-stability samples and is divided into the core region of each cluster using the K-means algorithm. The unstable region consists of low-stability samples and is assigned to the fringe regions of each cluster. Therefore, a three-way clustering result is formed. The experimental results show that the proposed algorithm in this paper can obtain better clustering results compared with other clustering ensemble algorithms on the UCI Machine Learning Repository data set, and can effectively reveal the clustering structure.
Collapse
|
16
|
Yang Y, Cai J, Yang H, Zhao X. Density clustering with divergence distance and automatic center selection. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.03.027] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Chen Y, Zhu P. Three-way recommendation for a node and a community on social networks. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01571-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
18
|
|
19
|
Location algorithm of transfer stations based on density peak and outlier detection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03206-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
20
|
Shi CL, Xin XW, Zhang JC. Domain adaptation based on rough adjoint inconsistency and optimal transport for identifying autistic patients. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 215:106615. [PMID: 35016084 DOI: 10.1016/j.cmpb.2021.106615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 12/09/2021] [Accepted: 12/30/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND AND OBJECTIVE Computer aided diagnosis technology has been widely used to diagnose autism spectrum disorder (ASD) from neural images. The performance of the model usually depends largely on a sufficient number of training samples that reflect the real sample distribution. Due to the lack of labelled neural images data, multisite data are often pooled together to expand the sample size. However, the heterogeneity among sites will inevitably lead to a decline in the generalization of models. To solve this problem, we propose a multisource unsupervised domain adaptation method using rough adjoint inconsistency and optimal transport. METHODS First, we define the concept of rough adjoint inconsistency and propose a double quantization method based on rough adjoint inconsistency and Dempster-Shafer (D-S) evidence theory to estimate the weight coefficient of each source domain to accurately describe the importance of each source domain to the target domain. Second, using optimal transport theory, we weaken the data distribution differences between domains and solve the problem of class imbalance by adjusting the sampling weights among classes. RESULTS The ASD recognition accuracy of the proposed method is improved on all eight tasks, which are 70.67%, 64.86%, 62.50%, 70.80%, 73.08%, 71.19%, 75.41% and 75.76%, respectively. Our proposed model achieves superior performance compared to traditional machine learning methods and other recently proposed deep learning model. CONCLUSIONS Our method demonstrates that the fusion of rough adjoint inconsistency and optimal transport can be a powerful tool for identifying ASD and quantifying the correlations between domains.
Collapse
Affiliation(s)
- Chun-Lei Shi
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
| | - Xian-Wei Xin
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China
| | - Jia-Cai Zhang
- School of Artificial Intelligence, Beijing Normal University, Beijing, 100875, China; Engineering Research Center of Intelligent Technology and Educational Application, Ministry of Education, Beijing, 100875, China.
| |
Collapse
|
21
|
Sun L, Qin X, Ding W, Xu J. Nearest neighbors-based adaptive density peaks clustering with optimized allocation strategy. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.12.019] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
22
|
|
23
|
|
24
|
Wang T, Sun B, Jiang C, Weng H, Chu X. Kernel alignment-based three-way clustering on attribute space and its application in stroke risk identification. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01478-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
25
|
An Improved K-Means Algorithm Based on Evidence Distance. ENTROPY 2021; 23:e23111550. [PMID: 34828248 PMCID: PMC8625371 DOI: 10.3390/e23111550] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 11/18/2021] [Accepted: 11/18/2021] [Indexed: 11/17/2022]
Abstract
The main influencing factors of the clustering effect of the k-means algorithm are the selection of the initial clustering center and the distance measurement between the sample points. The traditional k-mean algorithm uses Euclidean distance to measure the distance between sample points, thus it suffers from low differentiation of attributes between sample points and is prone to local optimal solutions. For this feature, this paper proposes an improved k-means algorithm based on evidence distance. Firstly, the attribute values of sample points are modelled as the basic probability assignment (BPA) of sample points. Then, the traditional Euclidean distance is replaced by the evidence distance for measuring the distance between sample points, and finally k-means clustering is carried out using UCI data. Experimental comparisons are made with the traditional k-means algorithm, the k-means algorithm based on the aggregation distance parameter, and the Gaussian mixture model. The experimental results show that the improved k-means algorithm based on evidence distance proposed in this paper has a better clustering effect and the convergence of the algorithm is also better.
Collapse
|
26
|
Wu C, Zhang Q, Cheng Y, Gao M, Wang G. Novel three-way generative classifier with weighted scoring distribution. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.08.025] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
27
|
|
28
|
Zhang R, Song X, Ying S, Ren H, Zhang B, Wang H. CA-CSM: a novel clustering algorithm based on cluster center selection model. Soft comput 2021. [DOI: 10.1007/s00500-021-05835-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|