51
|
Collaborative fuzzy clustering of distributed concept-drifting dynamic data using a gossip-based approach. APPL INTELL 2018. [DOI: 10.1007/s10489-018-1260-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
52
|
|
53
|
Liu CL, Hsaio WH, Lin CY. Bayesian exploratory clustering with entropy Chinese restaurant process. INTELL DATA ANAL 2018. [DOI: 10.3233/ida-163332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Chien-Liang Liu
- Department of Industrial Engineering and Management, NCTU, Hsinchu, Taiwan
| | | | - Che-Yuan Lin
- Department of Computer Science, NCTU, Hsinchu, Taiwan
| |
Collapse
|
54
|
López-Rubio E, Palomo EJ, Ortega-Zamorano F. Unsupervised learning by cluster quality optimization. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.01.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
55
|
Sparse Multi-view Task-Centralized Learning for ASD Diagnosis. ACTA ACUST UNITED AC 2018; 10541:159-167. [PMID: 29457153 DOI: 10.1007/978-3-319-67389-9_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
Abstract
It is challenging to derive early diagnosis from neuroimaging data for autism spectrum disorder (ASD). In this work, we propose a novel sparse multi-view task-centralized (Sparse-MVTC) classification method for computer-assisted diagnosis of ASD. In particular, since ASD is known to be age- and sex-related, we partition all subjects into different groups of age/sex, each of which can be treated as a classification task to learn. Meanwhile, we extract multi-view features from functional magnetic resonance imaging to describe the brain connectivity of each subject. This formulates a multi-view multi-task sparse learning problem and it is solved by a novel Sparse-MVTC method. Specifically, we treat each task as a central task and other tasks as the auxiliary ones. We then consider the task-task and view-view relations between the central task and each auxiliary task. We can use this task-centralized strategy for a highly efficient solution. The comprehensive experiments on the ABIDE database demonstrate that our proposed Sparse-MVTC method can significantly outperform the existing classification methods in ASD diagnosis.
Collapse
|
56
|
|
57
|
Li J, Wu Y, Zhao J, Lu K. Low-Rank Discriminant Embedding for Multiview Learning. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:3516-3529. [PMID: 27244756 DOI: 10.1109/tcyb.2016.2565898] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
This paper focuses on the specific problem of multiview learning where samples have the same feature set but different probability distributions, e.g., different viewpoints or different modalities. Since samples lying in different distributions cannot be compared directly, this paper aims to learn a latent subspace shared by multiple views assuming that the input views are generated from this latent subspace. Previous approaches usually learn the common subspace by either maximizing the empirical likelihood, or preserving the geometric structure. However, considering the complementarity between the two objectives, this paper proposes a novel approach, named low-rank discriminant embedding (LRDE), for multiview learning by taking full advantage of both sides. By further considering the duality between data points and features of multiview scene, i.e., data points can be grouped based on their distribution on features, while features can be grouped based on their distribution on the data points, LRDE not only deploys low-rank constraints on both sample level and feature level to dig out the shared factors across different views, but also preserves geometric information in both the ambient sample space and the embedding feature space by designing a novel graph structure under the framework of graph embedding. Finally, LRDE jointly optimizes low-rank representation and graph embedding in a unified framework. Comprehensive experiments in both multiview manner and pairwise manner demonstrate that LRDE performs much better than previous approaches proposed in recent literatures.
Collapse
|
58
|
Yu D, Wang W, Zhang S, Zhang W, Liu R. Hybrid self-optimized clustering model based on citation links and textual features to detect research topics. PLoS One 2017; 12:e0187164. [PMID: 29077747 PMCID: PMC5659815 DOI: 10.1371/journal.pone.0187164] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 10/14/2017] [Indexed: 11/18/2022] Open
Abstract
The challenge of detecting research topics in a specific research field has attracted attention from researchers in the bibliometrics community. In this study, to solve two problems of clustering papers, i.e., the influence of different distributions of citation links and involved textual features on similarity computation, the authors propose a hybrid self-optimized clustering model to detect research topics by extending the hybrid clustering model to identify "core documents". First, the Amsler network, consisting of bibliographic coupling and co-citation links, is created to calculate the citation-based similarity based on the cosine angle of papers. Second, the cosine similarity is also used to compute the text-based similarity, which consists of the textual statistical and topological features. Then, the cosine angle of the linear combination of citation- and text-based similarity is considered as the hybrid similarity. Finally, the Louvain method is applied to cluster papers, and the terms based on term frequency are used to label clusters. To test the performance of the proposed model, a dataset related to the data envelopment analysis field is used for comparison and analysis of clustering results. Based on the benchmark built, different clustering methods with different citation links or textual features are compared according to evaluation measures. The results show that the proposed model can obtain reasonable and effective clustering results, and the research topics of data envelopment analysis field are also analyzed based on the proposed model. As different features are considered in the proposed model compared with previous hybrid clustering models, the proposed clustering model can provide inspiration for further studies on topic identification by other researchers.
Collapse
Affiliation(s)
- Dejian Yu
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
| | - Wanru Wang
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
- * E-mail:
| | - Shuai Zhang
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
| | - Wenyu Zhang
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
| | - Rongyu Liu
- School of Information, Zhejiang University of Finance and Economics, Hangzhou, Zhejiang, China
| |
Collapse
|
59
|
Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions. PLoS One 2017; 12:e0186566. [PMID: 29049392 PMCID: PMC5648298 DOI: 10.1371/journal.pone.0186566] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Accepted: 10/03/2017] [Indexed: 11/19/2022] Open
Abstract
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data.
Collapse
|
60
|
|
61
|
Huang C, Wang S, Pan X, Bi A. v-soft margin multi-task learning logistic regression. INT J MACH LEARN CYB 2017. [DOI: 10.1007/s13042-017-0721-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
62
|
Chang X, Wang Q, Liu Y, Wang Y. Sparse Regularization in Fuzzy c-Means for High-Dimensional Data Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:2616-2627. [PMID: 28114050 DOI: 10.1109/tcyb.2016.2627686] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In high-dimensional data clustering practices, the cluster structure is commonly assumed to be confined to a limited number of relevant features, rather than the entire feature set. However, for high-dimensional data, identifying the relevant features and discovering the cluster structure are still challenging problems. To solve these problems, this paper proposes a novel fuzzy c-means (FCM) model with sparse regularization (ℓq(0<q≤1)-norm regularization), by reformulating the FCM objective function into the weighted between-cluster sum of square form and imposing the sparse regularization on the weights. An algorithm is also developed to explicitly solve the proposed model. Compared with the existing clustering models, the proposed model can shrink the weights of irrelevant features (noisy features) to exact zero, and also can be efficiently solved in analytic forms when q = 1,1/2. Experiments on both synthetic and real-world data sets show that the proposed approach outperforms the existing clustering approaches.
Collapse
|
63
|
Jiang Y, Wu D, Deng Z, Qian P, Wang J, Wang G, Chung FL, Choi KS, Wang S. Seizure Classification From EEG Signals Using Transfer Learning, Semi-Supervised Learning and TSK Fuzzy System. IEEE Trans Neural Syst Rehabil Eng 2017; 25:2270-2284. [PMID: 28880184 DOI: 10.1109/tnsre.2017.2748388] [Citation(s) in RCA: 142] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Recognition of epileptic seizures from offline EEG signals is very important in clinical diagnosis of epilepsy. Compared with manual labeling of EEG signals by doctors, machine learning approaches can be faster and more consistent. However, the classification accuracy is usually not satisfactory for two main reasons: the distributions of the data used for training and testing may be different, and the amount of training data may not be enough. In addition, most machine learning approaches generate black-box models that are difficult to interpret. In this paper, we integrate transductive transfer learning, semi-supervised learning and TSK fuzzy system to tackle these three problems. More specifically, we use transfer learning to reduce the discrepancy in data distribution between the training and testing data, employ semi-supervised learning to use the unlabeled testing data to remedy the shortage of training data, and adopt TSK fuzzy system to increase model interpretability. Two learning algorithms are proposed to train the system. Our experimental results show that the proposed approaches can achieve better performance than many state-of-the-art seizure classification algorithms.
Collapse
|
64
|
Bi XA, Zhao J. Hierarchical trie packet classification algorithm based on expectation-maximization clustering. PLoS One 2017; 12:e0181049. [PMID: 28704476 PMCID: PMC5509293 DOI: 10.1371/journal.pone.0181049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2016] [Accepted: 06/26/2017] [Indexed: 12/03/2022] Open
Abstract
With the development of computer network bandwidth, packet classification algorithms which are able to deal with large-scale rule sets are in urgent need. Among the existing algorithms, researches on packet classification algorithms based on hierarchical trie have become an important packet classification research branch because of their widely practical use. Although hierarchical trie is beneficial to save large storage space, it has several shortcomings such as the existence of backtracking and empty nodes. This paper proposes a new packet classification algorithm, Hierarchical Trie Algorithm Based on Expectation-Maximization Clustering (HTEMC). Firstly, this paper uses the formalization method to deal with the packet classification problem by means of mapping the rules and data packets into a two-dimensional space. Secondly, this paper uses expectation-maximization algorithm to cluster the rules based on their aggregate characteristics, and thereby diversified clusters are formed. Thirdly, this paper proposes a hierarchical trie based on the results of expectation-maximization clustering. Finally, this paper respectively conducts simulation experiments and real-environment experiments to compare the performances of our algorithm with other typical algorithms, and analyzes the results of the experiments. The hierarchical trie structure in our algorithm not only adopts trie path compression to eliminate backtracking, but also solves the problem of low efficiency of trie updates, which greatly improves the performance of the algorithm.
Collapse
Affiliation(s)
- Xia-an Bi
- College of Mathematics and Computer Science, Hunan Normal University, Changsha, P.R. China
- * E-mail:
| | - Junxia Zhao
- College of Mathematics and Computer Science, Hunan Normal University, Changsha, P.R. China
| |
Collapse
|
65
|
Kankanhalli M. Benchmarking a Multimodal and Multiview and Interactive Dataset for Human Action Recognition. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:1781-1794. [PMID: 27429453 DOI: 10.1109/tcyb.2016.2582918] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Human action recognition is an active research area in both computer vision and machine learning communities. In the past decades, the machine learning problem has evolved from conventional single-view learning problem, to cross-view learning, cross-domain learning and multitask learning, where a large number of algorithms have been proposed in the literature. Despite having large number of action recognition datasets, most of them are designed for a subset of the four learning problems, where the comparisons between algorithms can further limited by variances within datasets, experimental configurations, and other factors. To the best of our knowledge, there exists no dataset that allows concurrent analysis on the four learning problems. In this paper, we introduce a novel multimodal and multiview and interactive (M2I) dataset, which is designed for the evaluation of human action recognition methods under all four scenarios. This dataset consists of 1760 action samples from 22 action categories, including nine person-person interactive actions and 13 person-object interactive actions. We systematically benchmark state-of-the-art approaches on M2I dataset on all four learning problems. Overall, we evaluated 13 approaches with nine popular feature and descriptor combinations. Our comprehensive analysis demonstrates that M2I dataset is challenging due to significant intraclass and view variations, and multiple similar action categories, as well as provides solid foundation for the evaluation of existing state-of-the-art algorithms.
Collapse
|
66
|
|
67
|
Knowledge-leveraged transfer fuzzy C-Means for texture image segmentation with self-adaptive cluster prototype matching. Knowl Based Syst 2017; 130:33-50. [PMID: 30050232 DOI: 10.1016/j.knosys.2017.05.018] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We study a novel fuzzy clustering method to improve the segmentation performance on the target texture image by leveraging the knowledge from a prior texture image. Two knowledge transfer mechanisms, i.e. knowledge-leveraged prototype transfer (KL-PT) and knowledge-leveraged prototype matching (KL-PM) are first introduced as the bases. Applying them, the knowledge-leveraged transfer fuzzy C-means (KL-TFCM) method and its three-stage-interlinked framework, including knowledge extraction, knowledge matching, and knowledge utilization, are developed. There are two specific versions: KL-TFCM-c and KL-TFCM-f, i.e. the so-called crisp and flexible forms, which use the strategies of maximum matching degree and weighted sum, respectively. The significance of our work is fourfold: 1) Owing to the adjustability of referable degree between the source and target domains, KL-PT is capable of appropriately learning the insightful knowledge, i.e. the cluster prototypes, from the source domain; 2) KL-PM is able to self-adaptively determine the reasonable pairwise relationships of cluster prototypes between the source and target domains, even if the numbers of clusters differ in the two domains; 3) The joint action of KL-PM and KL-PT can effectively resolve the data inconsistency and heterogeneity between the source and target domains, e.g. the data distribution diversity and cluster number difference. Thus, using the three-stage-based knowledge transfer, the beneficial knowledge from the source domain can be extensively, self-adaptively leveraged in the target domain. As evidence of this, both KL-TFCM-c and KL-TFCM-f surpass many existing clustering methods in texture image segmentation; and 4) In the case of different cluster numbers between the source and target domains, KL-TFCM-f proves higher clustering effectiveness and segmentation performance than does KL-TFCM-c.
Collapse
|
68
|
Qian P, Jiang Y, Wang S, Su KH, Wang J, Hu L, Muzic RF. Affinity and Penalty Jointly Constrained Spectral Clustering With All-Compatibility, Flexibility, and Robustness. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1123-1138. [PMID: 26915134 PMCID: PMC4990515 DOI: 10.1109/tnnls.2015.2511179] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The existing, semisupervised, spectral clustering approaches have two major drawbacks, i.e., either they cannot cope with multiple categories of supervision or they sometimes exhibit unstable effectiveness. To address these issues, two normalized affinity and penalty jointly constrained spectral clustering frameworks as well as their corresponding algorithms, referred to as type-I affinity and penalty jointly constrained spectral clustering (TI-APJCSC) and type-II affinity and penalty jointly constrained spectral clustering (TII-APJCSC), respectively, are proposed in this paper. TI refers to type-I and TII to type-II. The significance of this paper is fourfold. First, benefiting from the distinctive affinity and penalty jointly constrained strategies, both TI-APJCSC and TII-APJCSC are substantially more effective than the existing methods. Second, both TI-APJCSC and TII-APJCSC are fully compatible with the three well-known categories of supervision, i.e., class labels, pairwise constraints, and grouping information. Third, owing to the delicate framework normalization, both TI-APJCSC and TII-APJCSC are quite flexible. With a simple tradeoff factor varying in the small fixed interval (0, 1], they can self-adapt to any semisupervised scenario. Finally, both TI-APJCSC and TII-APJCSC demonstrate strong robustness, not only to the number of pairwise constraints but also to the parameter for affinity measurement. As such, the novel TI-APJCSC and TII-APJCSC algorithms are very practical for medium- and small-scale semisupervised data sets. The experimental studies thoroughly evaluated and demonstrated these advantages on both synthetic and real-life semisupervised data sets.
Collapse
Affiliation(s)
- Pengjiang Qian
- School of Digital Media, Jiangnan University, Wuxi 214122, China
| | - Yizhang Jiang
- School of Digital Media, Jiangnan University, Wuxi 214122, China
| | - Shitong Wang
- School of Digital Media, Jiangnan University, Wuxi 214122, China
| | - Kuan-Hao Su
- Case Center for Imaging Research, Department of Radiology, University Hospitals, Case Western Reserve University, Cleveland, OH 44106 USA
| | - Jun Wang
- School of Mechanical Engineering, Jiangnan University, Wuxi 214122, China ()
| | - Lingzhi Hu
- Philips Electronics North America, Highland Heights, OH 44143 USA ()
| | - Raymond F. Muzic
- Case Center for Imaging Research, Department of Radiology, University Hospitals, Case Western Reserve University, Cleveland, OH 44106 USA
| |
Collapse
|
69
|
Liu Y, Wu S, Liu Z, Chao H. A fuzzy co-clustering algorithm for biomedical data. PLoS One 2017; 12:e0176536. [PMID: 28445496 PMCID: PMC5406011 DOI: 10.1371/journal.pone.0176536] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 04/12/2017] [Indexed: 11/23/2022] Open
Abstract
Fuzzy co-clustering extends co-clustering by assigning membership functions to both the objects and the features, and is helpful to improve clustering accurarcy of biomedical data. In this paper, we introduce a new fuzzy co-clustering algorithm based on information bottleneck named ibFCC. The ibFCC formulates an objective function which includes a distance function that employs information bottleneck theory to measure the distance between feature data point and the feature cluster centroid. Many experiments were conducted on five biomedical datasets, and the ibFCC was compared with such prominent fuzzy (co-)clustering algorithms as FCM, FCCM, RFCC and FCCI. Experimental results showed that ibFCC could yield high quality clusters and was better than all these methods in terms of accuracy.
Collapse
Affiliation(s)
- Yongli Liu
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, China
- * E-mail:
| | - Shuai Wu
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, China
| | - Zhizhong Liu
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, China
| | - Hao Chao
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, China
| |
Collapse
|
70
|
Chen A, Wang S. A robust fuzzy clustering algorithm using mean-field-approximation based hidden Markov random field model for image segmentation. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-151345] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
71
|
Askari S, Montazerin N, Zarandi MF, Hakimi E. Generalized entropy based possibilistic fuzzy C-Means for clustering noisy data and its convergence proof. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.09.025] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
72
|
Bai X, Chen Z, Zhang Y, Liu Z, Lu Y. Infrared Ship Target Segmentation Based on Spatial Information Improved FCM. IEEE TRANSACTIONS ON CYBERNETICS 2016; 46:3259-3271. [PMID: 26672055 DOI: 10.1109/tcyb.2015.2501848] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Segmentation of infrared (IR) ship images is always a challenging task, because of the intensity inhomogeneity and noise. The fuzzy C-means (FCM) clustering is a classical method widely used in image segmentation. However, it has some shortcomings, like not considering the spatial information or being sensitive to noise. In this paper, an improved FCM method based on the spatial information is proposed for IR ship target segmentation. The improvements include two parts: 1) adding the nonlocal spatial information based on the ship target and 2) using the spatial shape information of the contour of the ship target to refine the local spatial constraint by Markov random field. In addition, the results of K -means are used to initialize the improved FCM method. Experimental results show that the improved method is effective and performs better than the existing methods, including the existing FCM methods, for segmentation of the IR ship images.
Collapse
|
73
|
Xu S, Chan KS, Gao J, Xu X, Li X, Hua X, An J. An integrated K-means – Laplacian cluster ensemble approach for document datasets. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.06.034] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
74
|
Chen Y, Zhao Y, Qin B, Liu T. Product Aspect Clustering by Incorporating Background Knowledge for Opinion Mining. PLoS One 2016; 11:e0159901. [PMID: 27561001 PMCID: PMC4999213 DOI: 10.1371/journal.pone.0159901] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 07/11/2016] [Indexed: 12/02/2022] Open
Abstract
Product aspect recognition is a key task in fine-grained opinion mining. Current methods primarily focus on the extraction of aspects from the product reviews. However, it is also important to cluster synonymous extracted aspects into the same category. In this paper, we focus on the problem of product aspect clustering. The primary challenge is to properly cluster and generalize aspects that have similar meanings but different representations. To address this problem, we learn two types of background knowledge for each extracted aspect based on two types of effective aspect relations: relevant aspect relations and irrelevant aspect relations, which describe two different types of relationships between two aspects. Based on these two types of relationships, we can assign many relevant and irrelevant aspects into two different sets as the background knowledge to describe each product aspect. To obtain abundant background knowledge for each product aspect, we can enrich the available information with background knowledge from the Web. Then, we design a hierarchical clustering algorithm to cluster these aspects into different groups, in which aspect similarity is computed using the relevant and irrelevant aspect sets for each product aspect. Experimental results obtained in both camera and mobile phone domains demonstrate that the proposed product aspect clustering method based on two types of background knowledge performs better than the baseline approach without the use of background knowledge. Moreover, the experimental results also indicate that expanding the available background knowledge using the Web is feasible.
Collapse
Affiliation(s)
- Yiheng Chen
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yanyan Zhao
- Department of Media Technology and Art, Harbin Institute of Technology, Harbin, China
- * E-mail:
| | - Bing Qin
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ting Liu
- Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
75
|
|
76
|
|
77
|
Bi A, Chung F, Wang S, Jiang Y, Huang C. Bayesian Enhanced α-Expansion Move Clustering with Loose Link Constraints. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2016.02.054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
78
|
Qian P, Sun S, Jiang Y, Su KH, Ni T, Wang S, Muzic RF. Cross-domain, soft-partition clustering with diversity measure and knowledge reference. PATTERN RECOGNITION 2016; 50:155-177. [PMID: 27275022 PMCID: PMC4892128 DOI: 10.1016/j.patcog.2015.08.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Conventional, soft-partition clustering approaches, such as fuzzy c-means (FCM), maximum entropy clustering (MEC) and fuzzy clustering by quadratic regularization (FC-QR), are usually incompetent in those situations where the data are quite insufficient or much polluted by underlying noise or outliers. In order to address this challenge, the quadratic weights and Gini-Simpson diversity based fuzzy clustering model (QWGSD-FC), is first proposed as a basis of our work. Based on QWGSD-FC and inspired by transfer learning, two types of cross-domain, soft-partition clustering frameworks and their corresponding algorithms, referred to as type-I/type-II knowledge-transfer-oriented c-means (TI-KT-CM and TII-KT-CM), are subsequently presented, respectively. The primary contributions of our work are four-fold: (1) The delicate QWGSD-FC model inherits the most merits of FCM, MEC and FC-QR. With the weight factors in the form of quadratic memberships, similar to FCM, it can more effectively calculate the total intra-cluster deviation than the linear form recruited in MEC and FC-QR. Meanwhile, via Gini-Simpson diversity index, like Shannon entropy in MEC, and equivalent to the quadratic regularization in FC-QR, QWGSD-FC is prone to achieving the unbiased probability assignments, (2) owing to the reference knowledge from the source domain, both TI-KT-CM and TII-KT-CM demonstrate high clustering effectiveness as well as strong parameter robustness in the target domain, (3) TI-KT-CM refers merely to the historical cluster centroids, whereas TII-KT-CM simultaneously uses the historical cluster centroids and their associated fuzzy memberships as the reference. This indicates that TII-KT-CM features more comprehensive knowledge learning capability than TI-KT-CM and TII-KT-CM consequently exhibits more perfect cross-domain clustering performance and (4) neither the historical cluster centroids nor the historical cluster centroid based fuzzy memberships involved in TI-KT-CM or TII-KT-CM can be inversely mapped into the raw data. This means that both TI-KT-CM and TII-KT-CM can work without disclosing the original data in the source domain, i.e. they are of good privacy protection for the source domain. In addition, the convergence analyses regarding both TI-KT-CM and TII-KT-CM are conducted in our research. The experimental studies thoroughly evaluated and demonstrated our contributions on both synthetic and real-life data scenarios.
Collapse
Affiliation(s)
- Pengjiang Qian
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
- Case Center for Imaging Research, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Radiology, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, OH 44106, USA
- Corresponding author at: School of Digital Media, Jiangnan University, Wuxi, Jiangsu, China. Tel.: +86 137 71510961. (P. Qian)
| | - Shouwei Sun
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Yizhang Jiang
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Kuan-Hao Su
- Case Center for Imaging Research, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Radiology, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Tongguang Ni
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
- School of Information Science and Engineering, Changzhou University, Changzhou, Jiangsu 213164, China
| | - Shitong Wang
- School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122, China
| | - Raymond F. Muzic
- Case Center for Imaging Research, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Radiology, University Hospitals Case Medical Center, Case Western Reserve University, Cleveland, OH 44106, USA
| |
Collapse
|
79
|
Robust fuzzy clustering using nonsymmetric student׳s t finite mixture model for MR image segmentation. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.10.087] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|