1
|
Cheng D, Huang J, Zhang S, Xia S, Wang G, Xie J. K-Means Clustering With Natural Density Peaks for Discovering Arbitrary-Shaped Clusters. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:11077-11090. [PMID: 37027748 DOI: 10.1109/tnnls.2023.3248064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Due to simplicity, K-means has become a widely used clustering method. However, its clustering result is seriously affected by the initial centers and the allocation strategy makes it hard to identify manifold clusters. Many improved K-means are proposed to accelerate it and improve the quality of initialize cluster centers, but few researchers pay attention to the shortcoming of K-means in discovering arbitrary-shaped clusters. Using graph distance (GD) to measure the dissimilarity between objects is a good way to solve this problem, but computing the GD is time-consuming. Inspired by the idea that granular ball uses a ball to represent the local data, we select representatives from a local neighborhood, called natural density peaks (NDPs). On the basis of NDPs, we propose a novel K-means algorithm for identifying arbitrary-shaped clusters, called NDP-Kmeans. It defines neighbor-based distance between NDPs and takes advantage of the neighbor-based distance to compute the GD between NDPs. Afterward, an improved K-means with high-quality initial centers and GD is used to cluster NDPs. Finally, each remaining object is assigned according to its representative. The experimental results show that our algorithms can not only recognize spherical clusters but also manifold clusters. Therefore, NDP-Kmeans has more advantages in detecting arbitrary-shaped clusters than other excellent algorithms.
Collapse
|
2
|
Wang TG, Shang JL, Liu JX, Li F, Yuan S, Wang J. Joint L 2,p-norm and random walk graph constrained PCA for single-cell RNA-seq data. Comput Methods Biomech Biomed Engin 2024; 27:498-511. [PMID: 36912759 DOI: 10.1080/10255842.2023.2188106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 03/02/2023] [Indexed: 03/14/2023]
Abstract
The development and widespread utilization of high-throughput sequencing technologies in biology has fueled the rapid growth of single-cell RNA sequencing (scRNA-seq) data over the past decade. The development of scRNA-seq technology has significantly expanded researchers' understanding of cellular heterogeneity. Accurate cell type identification is the prerequisite for any research on heterogeneous cell populations. However, due to the high noise and high dimensionality of scRNA-seq data, improving the effectiveness of cell type identification remains a challenge. As an effective dimensionality reduction method, Principal Component Analysis (PCA) is an essential tool for visualizing high-dimensional scRNA-seq data and identifying cell subpopulations. However, traditional PCA has some defects when used in mining the nonlinear manifold structure of the data and usually suffers from over-density of principal components (PCs). Therefore, we present a novel method in this paper called joint L 2 , p -norm and random walk graph constrained PCA (RWPPCA). RWPPCA aims to retain the data's local information in the process of mapping high-dimensional data to low-dimensional space, to more accurately obtain sparse principal components and to then identify cell types more precisely. Specifically, RWPPCA combines the random walk (RW) algorithm with graph regularization to more accurately determine the local geometric relationships between data points. Moreover, to mitigate the adverse effects of dense PCs, the L 2 , p -norm is introduced to make the PCs sparser, thus increasing their interpretability. Then, we evaluate the effectiveness of RWPPCA on simulated data and scRNA-seq data. The results show that RWPPCA performs well in cell type identification and outperforms other comparison methods.
Collapse
Affiliation(s)
- Tai-Ge Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jun-Liang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Shasha Yuan
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| | - Juan Wang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| |
Collapse
|
3
|
Chen J, Zhu X, Liu H. A mutual neighbor-based clustering method and its medical applications. Comput Biol Med 2022; 150:106184. [PMID: 37859282 DOI: 10.1016/j.compbiomed.2022.106184] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2022] [Revised: 09/23/2022] [Accepted: 10/08/2022] [Indexed: 11/03/2022]
Abstract
Clustering analysis has been widely used in various real-world applications. Due to the simplicity of K-means, it has become the most popular clustering analysis technique in reality. Unfortunately, the performance of K-means heavily relies on initial centers, which should be specified in prior. Besides, it cannot effectively identify manifold clusters. In this paper, we propose a novel clustering algorithm based on representative data objects derived from mutual neighbors to identify different shaped clusters. Specifically, it first obtains mutual neighbors to estimate the density for each data object, and then identifies representative objects with high densities to represent the whole data. Moreover, a concept of path distance, deriving from a minimum spanning tree, is introduced to measure the similarities of representative objects for manifold structures. Finally, an improved K-means with initial centers and path-based distances is proposed to group the representative objects into clusters. For non-representative objects, their cluster labels are determined by neighborhood information. To verify the effectiveness of the proposed method, we conducted comparison experiments on synthetic data and further applied it to medical scenarios. The results show that our clustering method can effectively identify arbitrary-shaped clusters and disease types in comparing to the state-of-the-art clustering ones.
Collapse
Affiliation(s)
- Jun Chen
- Zhejiang Industry Polytechnic College, Shaoxing 312000, PR China.
| | - Xinzhong Zhu
- Zhejiang Normal University, Jinhua 321000, PR China.
| | - Huawen Liu
- Shaoxing University, Shaoxing 312000, PR China.
| |
Collapse
|
4
|
|
5
|
Yang Y, Hsu JH, Löfgren K, Cho W. Cross-platform comparison of framed topics in Twitter and Weibo: machine learning approaches to social media text mining. SOCIAL NETWORK ANALYSIS AND MINING 2021. [DOI: 10.1007/s13278-021-00772-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
6
|
Modelling gene interaction networks from time-series gene expression data using evolving spiking neural networks. EVOLVING SYSTEMS 2020. [DOI: 10.1007/s12530-019-09269-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
7
|
|
8
|
Liu Q, Zhang R, Liu X, Liu Y, Zhao Z, Hu R. A novel clustering algorithm based on PageRank and minimax similarity. Neural Comput Appl 2019. [DOI: 10.1007/s00521-018-3607-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
9
|
Cheng D, Zhu Q, Huang J, Wu Q, Yang L. A Novel Cluster Validity Index Based on Local Cores. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:985-999. [PMID: 30072347 DOI: 10.1109/tnnls.2018.2853710] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
It is critical to evaluate the quality of clusters for most cluster analysis. A number of cluster validity indexes have been proposed, such as the Silhouette and Davies-Bouldin indexes. However, these validity indexes cannot be used to process clusters with arbitrary shapes. Some researchers employ graph-based distance to cluster nonspherical data sets, but the computation of graph-based distances between all pairs of points in a data set is time-consuming. A potential solution is to select some representative points. Inspired by this idea, we propose a novel Local Cores-based Cluster Validity (LCCV) index to improve the performance of Silhouette index. Local cores, with local maximum density, are selected as representative points. Since graph-based distance is used to evaluate the dissimilarity between local cores, the LCCV index is effective for obtaining the optimal cluster number for data sets containing clusters with arbitrary shapes. Moreover, a hierarchical clustering algorithm based on the LCCV index is proposed. The experimental results on synthetic and real data sets indicate that the new index outperforms existing ones.
Collapse
|
10
|
|
11
|
Liu Q, Zhang R, Zhao Z, Wang Z, Jiao M, Wang G. Robust MST-Based Clustering Algorithm. Neural Comput 2018; 30:1624-1646. [PMID: 29652582 DOI: 10.1162/neco_a_01081] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Minimax similarity stresses the connectedness of points via mediating elements rather than favoring high mutual similarity. The grouping principle yields superior clustering results when mining arbitrarily-shaped clusters in data. However, it is not robust against noises and outliers in the data. There are two main problems with the grouping principle: first, a single object that is far away from all other objects defines a separate cluster, and second, two connected clusters would be regarded as two parts of one cluster. In order to solve such problems, we propose robust minimum spanning tree (MST)-based clustering algorithm in this letter. First, we separate the connected objects by applying a density-based coarsening phase, resulting in a low-rank matrix in which the element denotes the supernode by combining a set of nodes. Then a greedy method is presented to partition those supernodes through working on the low-rank matrix. Instead of removing the longest edges from MST, our algorithm groups the data set based on the minimax similarity. Finally, the assignment of all data points can be achieved through their corresponding supernodes. Experimental results on many synthetic and real-world data sets show that our algorithm consistently outperforms compared clustering algorithms.
Collapse
Affiliation(s)
- Qidong Liu
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, 730000 China
| | - Ruisheng Zhang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, 730000 China
| | - Zhili Zhao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, 730000 China
| | - Zhenghai Wang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, 730000 China
| | - Mengyao Jiao
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, 730000 China
| | - Guangjing Wang
- School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, 730000 China
| |
Collapse
|
12
|
Zhang Y, Mańdziuk J, Quek CH, Goh BW. Curvature-based method for determining the number of clusters. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.05.024] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
13
|
Tu E, Kasabov N, Yang J. Mapping Temporal Variables Into the NeuCube for Improved Pattern Recognition, Predictive Modeling, and Understanding of Stream Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1305-1317. [PMID: 26992179 DOI: 10.1109/tnnls.2016.2536742] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper proposes a new method for an optimized mapping of temporal variables, describing a temporal stream data, into the recently proposed NeuCube spiking neural network (SNN) architecture. This optimized mapping extends the use of the NeuCube, which was initially designed for spatiotemporal brain data, to work on arbitrary stream data and to achieve a better accuracy of temporal pattern recognition, a better and earlier event prediction, and a better understanding of complex temporal stream data through visualization of the NeuCube connectivity. The effect of the new mapping is demonstrated on three benchmark problems. The first one is the early prediction of patient sleep stage event from temporal physiological data. The second one is the pattern recognition of dynamic temporal patterns of traffic in the Bay Area of California and the last one is the Challenge 2012 contest data set. In all the cases, the use of the proposed mapping leads to an improved accuracy of pattern recognition and event prediction and a better understanding of the data when compared with traditional machine learning techniques or SNN reservoirs with an arbitrary mapping of the variables.
Collapse
|
14
|
Cheng D, Zhu Q, Huang J, Yang L, Wu Q. Natural neighbor-based clustering algorithm with local representatives. Knowl Based Syst 2017. [DOI: 10.1016/j.knosys.2017.02.027] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
15
|
Doborjeh MG, Kasabov N, Doborjeh ZG. Evolving, dynamic clustering of spatio/spectro-temporal data in 3D spiking neural network models and a case study on EEG data. EVOLVING SYSTEMS 2017. [DOI: 10.1007/s12530-017-9178-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
16
|
Tu E, Zhang Y, Zhu L, Yang J, Kasabov N. A graph-based semi-supervised k nearest-neighbor method for nonlinear manifold distributed data classification. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.07.016] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
17
|
|
18
|
Ferreira JC, Vural E, Guillemot C. Geometry-Aware Neighborhood Search for Learning Local Models for Image Superresolution. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2016; 25:1354-1367. [PMID: 26829789 DOI: 10.1109/tip.2016.2522303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Local learning of sparse image models has proved to be very effective to solve inverse problems in many computer vision applications. To learn such models, the data samples are often clustered using the K-means algorithm with the Euclidean distance as a dissimilarity metric. However, the Euclidean distance may not always be a good dissimilarity measure for comparing data samples lying on a manifold. In this paper, we propose two algorithms for determining a local subset of training samples from which a good local model can be computed for reconstructing a given input test sample, where we consider the underlying geometry of the data. The first algorithm, called adaptive geometry-driven nearest neighbor search (AGNN), is an adaptive scheme, which can be seen as an out-of-sample extension of the replicator graph clustering method for local model learning. The second method, called geometry-driven overlapping clusters (GOCs), is a less complex nonadaptive alternative for training subset selection. The proposed AGNN and GOC methods are evaluated in image superresolution and shown to outperform spectral clustering, soft clustering, and geodesic distance-based subset selection in most settings.
Collapse
|
19
|
Research of uniformity evaluation model based on entropy clustering in the microwave heating processes. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.07.034] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|