1
|
Zhang J, Qin Y, Tian R, Bai X, Liu J. Similarity measure method of near-infrared spectrum combined with multi-attribute information. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2024; 322:124783. [PMID: 38972098 DOI: 10.1016/j.saa.2024.124783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 07/01/2024] [Accepted: 07/03/2024] [Indexed: 07/09/2024]
Abstract
Due to the high-dimensionality, redundancy, and non-linearity of the near-infrared (NIR) spectra data, as well as the influence of attributes such as producing area and grade of the sample, which can all affect the similarity measure between samples. This paper proposed a t-distributed stochastic neighbor embedding algorithm based on Sinkhorn distance (St-SNE) combined with multi-attribute data information. Firstly, the Sinkhorn distance was introduced which can solve problems such as KL divergence asymmetry and sparse data distribution in high-dimensional space, thereby constructing probability distributions that make low-dimensional space similar to high-dimensional space. In addition, to address the impact of multi-attribute features of samples on similarity measure, a multi-attribute distance matrix was constructed using information entropy, and then combined with the numerical matrix of spectral data to obtain a mixed data matrix. In order to validate the effectiveness of the St-SNE algorithm, dimensionality reduction projection was performed on NIR spectral data and compared with PCA, LPP, and t-SNE algorithms. The results demonstrated that the St-SNE algorithm effectively distinguishes samples with different attribute information, and produced more distinct projection boundaries of sample category in low-dimensional space. Then we tested the classification performance of St-SNE for different attributes by using the tobacco and mango datasets, and compared it with LPP, t-SNE, UMAP, and Fisher t-SNE algorithms. The results showed that St-SNE algorithm had the highest classification accuracy for different attributes. Finally, we compared the results of searching the most similar sample with the target tobacco for cigarette formulas, and experiments showed that the St-SNE had the highest consistency with the recommendation of the experts than that of the other algorithms. It can provide strong support for the maintenance and design of the product formula.
Collapse
Affiliation(s)
- Jinfeng Zhang
- College of Information Science and Technology, Qingdao University of Science and Technology, China
| | - Yuhua Qin
- College of Information Science and Technology, Qingdao University of Science and Technology, China.
| | - Rongkun Tian
- College of Information Science and Technology, Qingdao University of Science and Technology, China
| | - Xiaoli Bai
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China
| | - Jing Liu
- R&D Center, China Tobacco Yunnan Industrial Co., Ltd, No. 367 Hongjin Road, Kunming 650231, China
| |
Collapse
|
2
|
An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
3
|
Guo L, Wang L, Han X, Yue L, Zhang Y, Gao M. ROCM: A Rolling Iteration Clustering Model Via Extracting Data Features. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-10972-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
4
|
A spectral clustering algorithm based on attribute fluctuation and density peaks clustering algorithm. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04058-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
5
|
Zhang X. Emotional Intervention and Education System Construction for Rural Children Based on Semantic Analysis. Occup Ther Int 2022; 2022:1073717. [PMID: 35874601 PMCID: PMC9273381 DOI: 10.1155/2022/1073717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 06/14/2022] [Accepted: 06/17/2022] [Indexed: 11/29/2022] Open
Abstract
Objective Under the background of the policy of caring for the healthy growth of left-behind children, the purpose of selecting the topic is to study some common negative emotional problems of left-behind children in rural areas, focusing on the guidance of negative emotions of left-behind children in rural areas. In emotional problems, we analyze and find out the reasons for these negative emotions through observation and research. Method In this paper, a platform for acquiring emotional semantic data of scene images in an open behavioral experimental environment is designed, which breaks the limitations of time and place, and thus acquires a large amount of emotional semantic data of scene images and then uses principal component analysis to evaluate the validity of the data analysis. Psychological testing was used to measure parent-child affinity, adversity beliefs, and positive/negative emotion scales, respectively, to examine children whose parents went out, children whose fathers went out, and non-left-behind children. The characteristics of parent-child affinity, adversity beliefs, and positive/negative emotions in three types of children were examined, and the direct predictive effects of parent-child affinity and adversity beliefs on the positive/negative emotions of the three types of children were examined. Results/Discussion. Adversity beliefs played a partial mediating role between children's parent-child bonding and positive emotions. The predictive effect of adversity beliefs on children's emotional adaptation differs by emotional type. The main effects of the left-behind category were significant for both positive and negative emotions. The gender main effect of negative emotion was significant, and the negative emotion level of girls was significantly higher than that of boys. The main effect of the left-behind category of adversity beliefs was significant, and the adversity belief levels of children whose parents went out to rural areas were significantly lower than those of children whose fathers went out and non-left-behind children. The negative emotions generated by left-behind children in rural areas are channeled, and to a certain extent, they are improved and alleviated. Through the emotional counseling and improvement of the rural left-behind children in the research site in the article, the service objects can have better emotions, promote mental health, make them happy and grow up healthily, and also provide a certain theory for the establishment of the local left-behind children care system.
Collapse
Affiliation(s)
- Xiaobo Zhang
- School of Education Science, Xinyang Normal University, Xinyang, Henan 464000, China
| |
Collapse
|
6
|
Zhou W, Wang L, Han X, Parmar M, Li M. A novel density deviation multi-peaks automatic clustering algorithm. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-022-00798-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractThe density peaks clustering (DPC) algorithm is a classical and widely used clustering method. However, the DPC algorithm requires manual selection of cluster centers, a single way of density calculation, and cannot effectively handle low-density points. To address the above issues, we propose a novel density deviation multi-peaks automatic clustering method (AmDPC) in this paper. Firstly, we propose a new local-density and use the deviation to measure the relationship between data points and the cut-off distance ($$d_c$$
d
c
). Secondly, we divide the density deviation into multiple density levels equally and extract the points with higher distances in each density level. Finally, for the multi-peak points with higher distances at low-density levels, we merge them according to the size difference of the density deviation. We finally achieve the overall automatic clustering by processing the low-density points. To verify the performance of the method, we test the synthetic dataset, the real-world dataset, and the Olivetti Face dataset, respectively. The simulation experimental results indicate that the AmDPC method can handle low-density points more effectively and has certain effectiveness and robustness.
Collapse
|
7
|
Hsu CC, Tsao WC, Chang A, Chang CY. Analyzing mixed-type data by using word embedding for handling categorical features. INTELL DATA ANAL 2021. [DOI: 10.3233/ida-205453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Most of real-world datasets are of mixed type including both numeric and categorical attributes. Unlike numbers, operations on categorical values are limited, and the degree of similarity between distinct values cannot be measured directly. In order to properly analyze mixed-type data, dedicated methods to handle categorical values in the datasets are needed. The limitation of most existing methods is lack of appropriate numeric representations of categorical values. Consequently, some of analysis algorithms cannot be applied. In this paper, we address this deficiency by transforming categorical values to their numeric representation so as to facilitate various analyses of mixed-type data. In particular, the proposed transformation method preserves semantics of categorical values with respect to the other values in the dataset, resulting in better performance on data analyses including classification and clustering. The proposed method is verified and compared with other methods on extensive real-world datasets.
Collapse
Affiliation(s)
- Chung-Chian Hsu
- Department of Information Management, National Yunlin University of Science and Technology, Douliu, Yunlin, Taiwan
| | - Wei-Cyun Tsao
- Department of Information Management, National Yunlin University of Science and Technology, Douliu, Yunlin, Taiwan
| | - Arthur Chang
- Department of Information Management, National Yunlin University of Science and Technology, Douliu, Yunlin, Taiwan
| | - Chuan-Yu Chang
- Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Yunlin, Taiwan
| |
Collapse
|
8
|
Du M, Wang R, Ji R, Wang X, Dong Y. ROBP a robust border-peeling clustering using Cauchy kernel. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.04.089] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
9
|
Zhang R, Song X, Ying S, Ren H, Zhang B, Wang H. CA-CSM: a novel clustering algorithm based on cluster center selection model. Soft comput 2021. [DOI: 10.1007/s00500-021-05835-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
10
|
Xu X, Ding S, Wang Y, Wang L, Jia W. A fast density peaks clustering algorithm with sparse search. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.11.050] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
11
|
Shi T, Ding S, Xu X, Ding L. A community detection algorithm based on Quasi-Laplacian centrality peaks clustering. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02278-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
12
|
Li Q, Xiong Q, Ji S, Yu Y, Wu C, Yi H. A method for mixed data classification base on RBF-ELM network. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.12.032] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
Wang L, Sun W, Han X, Hao Z, Zhou R, Yu J, Parmar M. An Improved Integrated Clustering Learning Strategy Based on Three-Stage Affinity Propagation Algorithm with Density Peak Optimization Theory. COMPLEXITY 2021; 2021:1-12. [DOI: 10.1155/2021/6666619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
To better reflect the precise clustering results of the data samples with different shapes and densities for affinity propagation clustering algorithm (AP), an improved integrated clustering learning strategy based on three-stage affinity propagation algorithm with density peak optimization theory (DPKT-AP) was proposed in this paper. DPKT-AP combined the ideology of integrated clustering with the AP algorithm, by introducing the density peak theory and k-means algorithm to carry on the three-stage clustering process. In the first stage, the clustering center point was selected by density peak clustering. Because the clustering center was surrounded by the nearest neighbor point with lower local density and had a relatively large distance from other points with higher density, it could help the k-means algorithm in the second stage avoiding the local optimal situation. In the second stage, the k-means algorithm was used to cluster the data samples to form several relatively small spherical subgroups, and each of subgroups had a local density maximum point, which is called the center point of the subgroup. In the third stage, DPKT-AP used the AP algorithm to merge and cluster the spherical subgroups. Experiments on UCI data sets and synthetic data sets showed that DPKT-AP improved the clustering performance and accuracy for the algorithm.
Collapse
Affiliation(s)
- Limin Wang
- School of Internet Finance and Information Engineering, Guangdong University of Finance, Guangzhou 510520, China
| | - Wenjing Sun
- School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117, Jilin, China
| | - Xuming Han
- College of Information Science and Technology, Jinan University, Guangzhou 510632, China
| | - Zhiyuan Hao
- School of Management, Jilin University, Changchun 130022, Jilin, China
| | - Ruihong Zhou
- School of Internet Finance and Information Engineering, Guangdong University of Finance, Guangzhou 510520, China
| | - Jinglin Yu
- School of Internet Finance and Information Engineering, Guangdong University of Finance, Guangzhou 510520, China
| | - Milan Parmar
- School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117, Jilin, China
| |
Collapse
|
14
|
|
15
|
Balaji K, Lavanya K, Mary AG. Machine learning algorithm for clustering of heart disease and chemoinformatics datasets. Comput Chem Eng 2020. [DOI: 10.1016/j.compchemeng.2020.107068] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
16
|
Wang L, Ding S, Wang Y, Ding L. A robust spectral clustering algorithm based on grid-partition and decision-graph. INT J MACH LEARN CYB 2020. [DOI: 10.1007/s13042-020-01231-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
|
18
|
Xu X, Ding S, Wang L, Wang Y. A robust density peaks clustering algorithm with density-sensitive similarity. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106028] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
19
|
Boundary Matching and Interior Connectivity-Based Cluster Validity Anlysis. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10041337] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The evaluation of clustering results plays an important role in clustering analysis. However, the existing validity indices are limited to a specific clustering algorithm, clustering parameter, and assumption in practice. In this paper, we propose a novel validity index to solve the above problems based on two complementary measures: boundary points matching and interior points connectivity. Firstly, when any clustering algorithm is performed on a dataset, we extract all boundary points for the dataset and its partitioned clusters using a nonparametric metric. The measure of boundary points matching is computed. Secondly, the interior points connectivity of both the dataset and all the partitioned clusters are measured. The proposed validity index can evaluate different clustering results on the dataset obtained from different clustering algorithms, which cannot be evaluated by the existing validity indices at all. Experimental results demonstrate that the proposed validity index can evaluate clustering results obtained by using an arbitrary clustering algorithm and find the optimal clustering parameters.
Collapse
|
20
|
Wang Y, Wang D, Zhang X, Pang W, Miao C, Tan AH, Zhou Y. McDPC: multi-center density peak clustering. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04754-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
21
|
Azhar M, Huang JZ, Masud MA, Li MJ, Cui L. A hierarchical Gamma Mixture Model-based method for estimating the number of clusters in complex data. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2019.105891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
22
|
Fuzzy neighborhood-based differential evolution with orientation for nonlinear equation systems. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.06.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
23
|
|
24
|
Abstract
The Density Peak Clustering (DPC) algorithm is a new density-based clustering method. It spends most of its execution time on calculating the local density and the separation distance for each data point in a dataset. The purpose of this study is to accelerate its computation. On average, the DPC algorithm scans half of the dataset to calculate the separation distance of each data point. We propose an approach to calculate the separation distance of a data point by scanning only the neighbors of the data point. Additionally, the purpose of the separation distance is to assist in choosing the density peaks, which are the data points with both high local density and high separation distance. We propose an approach to identify non-peak data points at an early stage to avoid calculating their separation distances. Our experimental results show that most of the data points in a dataset can benefit from the proposed approaches to accelerate the DPC algorithm.
Collapse
|
25
|
Abstract
With the universal existence of mixed data with numerical and categorical attributes in real world, a variety of clustering algorithms have been developed to discover the potential information hidden in mixed data. Most existing clustering algorithms often compute the distances or similarities between data objects based on original data, which may cause the instability of clustering results because of noise. In this paper, a clustering framework is proposed to explore the grouping structure of the mixed data. First, the transformed categorical attributes by one-hot encoding technique and normalized numerical attributes are input to a stacked denoising autoencoders to learn the internal feature representations. Secondly, based on these feature representations, all the distances between data objects in feature space can be calculated and the local density and relative distance of each data object can be also computed. Thirdly, the density peaks clustering algorithm is improved and employed to allocate all the data objects into different clusters. Finally, experiments conducted on some UCI datasets have demonstrated that our proposed algorithm for clustering mixed data outperforms three baseline algorithms in terms of the clustering accuracy and the rand index.
Collapse
|
26
|
|
27
|
Xu X, Ding S, Shi Z. An improved density peaks clustering algorithm with fast finding cluster centers. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.05.034] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
28
|
An Efficient Grid-Based K-Prototypes Algorithm for Sustainable Decision-Making on Spatial Objects. SUSTAINABILITY 2018. [DOI: 10.3390/su10082614] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Data mining plays a critical role in sustainable decision-making. Although the k-prototypes algorithm is one of the best-known algorithms for clustering both numeric and categorical data, clustering a large number of spatial objects with mixed numeric and categorical attributes is still inefficient due to complexity. In this paper, we propose an efficient grid-based k-prototypes algorithm, GK-prototypes, which achieves high performance for clustering spatial objects. The first proposed algorithm utilizes both maximum and minimum distance between cluster centers and a cell, which can reduce unnecessary distance calculation. The second proposed algorithm as an extension of the first proposed algorithm, utilizes spatial dependence; spatial data tends to be similar to objects that are close. Each cell has a bitmap index which stores the categorical values of all objects within the same cell for each attribute. This bitmap index can improve performance if the categorical data is skewed. Experimental results show that the proposed algorithms can achieve better performance than the existing pruning techniques of the k-prototypes algorithm.
Collapse
|
29
|
Quasi-cluster centers clustering algorithm based on potential entropy and t-distributed stochastic neighbor embedding. Soft comput 2018. [DOI: 10.1007/s00500-018-3221-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
30
|
Xu X, Ding S, Xu H, Liao H, Xue Y. A feasible density peaks clustering algorithm with a merging strategy. Soft comput 2018. [DOI: 10.1007/s00500-018-3183-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|