1
|
Tao Z, Li J, Fu H, Kong Y, Fu Y. From Ensemble Clustering to Subspace Clustering: Cluster Structure Encoding. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2670-2681. [PMID: 34495848 DOI: 10.1109/tnnls.2021.3107354] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
In this study, we propose a novel algorithm to encode the cluster structure by incorporating ensemble clustering (EC) into subspace clustering (SC). First, the low-rank representation (LRR) is learned from a higher order data relationship induced by ensemble K-means coding, which exploits the cluster structure in a co-association matrix of basic partitions (i.e., clustering results). Second, to provide a fast predictive coding mechanism, an encoding function parameterized by neural networks is introduced to predict the LRR derived from partitions. These two steps are jointly proceeded to seamlessly integrate partition information and original features and thus deliver better representations than the ones obtained from each single source. Moreover, an alternating optimization framework is developed to learn the LRR, train the encoding function, and fine-tune the higher order relationship. Extensive experiments on eight benchmark datasets validate the effectiveness of the proposed algorithm on several clustering tasks compared with state-of-the-art EC and SC methods.
Collapse
|
2
|
Wang Y, Zou J, Wang K, Liu C, Yuan X. Semi-supervised deep embedded clustering with pairwise constraints and subset allocation. Neural Netw 2023; 164:310-322. [PMID: 37163847 DOI: 10.1016/j.neunet.2023.04.016] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 03/08/2023] [Accepted: 04/11/2023] [Indexed: 05/12/2023]
Abstract
Semi-supervised deep clustering methods attract much attention due to their excellent performance on the end-to-end clustering task. However, it is hard to obtain satisfying clustering results since many overlapping samples in industrial text datasets strongly and incorrectly influence the learning process. Existing methods incorporate prior knowledge in the form of pairwise constraints or class labels, which not only largely ignore the correlation between these two supervision information but also cause the problem of weak-supervised constraint or incorrect strong-supervised label guidance. In order to tackle these problems, we propose a semi-supervised method based on pairwise constraints and subset allocation (PCSA-DEC). We redefine the similarity-based constraint loss by forcing the similarity of samples in the same class much higher than other samples and design a novel subset allocation loss to precisely learn strong-supervised information contained in labels which consistent with unlabeled data. Experimental results on the two industrial text datasets show that our method can yield 8.2%-8.7% improvement in accuracy and 13.4%-19.8% on normalized mutual information over the state-of-the-art method.
Collapse
Affiliation(s)
- Yalin Wang
- School of Automation, Central South University, Changsha, 410083, Hunan, China.
| | - Jiangfeng Zou
- School of Automation, Central South University, Changsha, 410083, Hunan, China.
| | - Kai Wang
- School of Automation, Central South University, Changsha, 410083, Hunan, China.
| | - Chenliang Liu
- School of Automation, Central South University, Changsha, 410083, Hunan, China.
| | - Xiaofeng Yuan
- School of Automation, Central South University, Changsha, 410083, Hunan, China.
| |
Collapse
|
3
|
Ren X, Jia L, Zhao Z, Qiang Y, Wu W, Han P, Zhao J, Sun J. Weakly supervised label propagation algorithm classifies lung cancer imaging subtypes. Sci Rep 2023; 13:5167. [PMID: 36997586 PMCID: PMC10063585 DOI: 10.1038/s41598-023-32301-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 03/25/2023] [Indexed: 04/01/2023] Open
Abstract
Aiming at the problems of long time, high cost, invasive sampling damage, and easy emergence of drug resistance in lung cancer gene detection, a reliable and non-invasive prognostic method is proposed. Under the guidance of weakly supervised learning, deep metric learning and graph clustering methods are used to learn higher-level abstract features in CT imaging features. The unlabeled data is dynamically updated through the k-nearest label update strategy, and the unlabeled data is transformed into weak label data and continue to update the process of strong label data to optimize the clustering results and establish a classification model for predicting new subtypes of lung cancer imaging. Five imaging subtypes are confirmed on the lung cancer dataset containing CT, clinical and genetic information downloaded from the TCIA lung cancer database. The successful establishment of the new model has a significant accuracy rate for subtype classification (ACC = 0.9793), and the use of CT sequence images, gene expression, DNA methylation and gene mutation data from the cooperative hospital in Shanxi Province proves the biomedical value of this method. The proposed method also can comprehensively evaluate intratumoral heterogeneity based on the correlation between the final lung CT imaging features and specific molecular subtypes.
Collapse
Affiliation(s)
- Xueting Ren
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Liye Jia
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Zijuan Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Yan Qiang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| | - Wei Wu
- Department of Clinical Laboratory, Affiliated People's Hospital of Shanxi Medical University, Shanxi Provincial People's Hospital, Taiyuan, Shanxi, China
| | - Peng Han
- North Automatic Control Technology Institute, Taiyuan, Shanxi, China
| | - Juanjuan Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China.
| | - Jingyu Sun
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, Shanxi, China
| |
Collapse
|
4
|
Cai J, Hao J, Yang H, Zhao X, Yang Y. A Review on Semi-supervised Clustering. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.02.088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
|
5
|
Yang Z, Ren Y, Wu Z, Zeng M, Xu J, Yang Y, Pu X, Yu PS, He L. DC-FUDA: Improving deep clustering via fully unsupervised domain adaptation. Neurocomputing 2023. [DOI: 10.1016/j.neucom.2023.01.058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
6
|
Jingnan L, Chuan L, Ruizhang H, Yongbin Q, Yanping C. Intention-guided Deep Semi-supervised Document Clustering Via Metric Learning. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2022.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
7
|
KnAC: an approach for enhancing cluster analysis with background knowledge and explanations. APPL INTELL 2022. [DOI: 10.1007/s10489-022-04310-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
AbstractPattern discovery in multidimensional data sets has been the subject of research for decades. There exists a wide spectrum of clustering algorithms that can be used for this purpose. However, their practical applications share a common post-clustering phase, which concerns expert-based interpretation and analysis of the obtained results. We argue that this can be the bottleneck in the process, especially in cases where domain knowledge exists prior to clustering. Such a situation requires not only a proper analysis of automatically discovered clusters but also conformance checking with existing knowledge. In this work, we present Knowledge Augmented Clustering (KnAC). Its main goal is to confront expert-based labelling with automated clustering for the sake of updating and refining the former. Our solution is not restricted to any existing clustering algorithm. Instead, KnAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and a model-agnostic improvement of any state-of-the-art clustering method. We demonstrate the feasibility of our method on artificially, reproducible examples and in a real life use case scenario. In both cases, we achieved better results than classic clustering algorithms without augmentation.
Collapse
|
8
|
A Generalization of Sigmoid Loss Function Using Tsallis Statistics for Binary Classification. Neural Process Lett 2022. [DOI: 10.1007/s11063-022-11087-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
9
|
Sun B, Zhou P, Du L, Li X. Active deep image clustering. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
10
|
Castellano G, Vessio G. A Deep Learning Approach to Clustering Visual Arts. Int J Comput Vis 2022. [DOI: 10.1007/s11263-022-01664-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractClustering artworks is difficult for several reasons. On the one hand, recognizing meaningful patterns based on domain knowledge and visual perception is extremely hard. On the other hand, applying traditional clustering and feature reduction techniques to the highly dimensional pixel space can be ineffective. To address these issues, in this paper we propose : a DEep learning approach to cLustering vIsUal artS. The method uses a pre-trained convolutional network to extract features and then feeds these features into a deep embedded clustering model, where the task of mapping the input data to a latent space is jointly optimized with the task of finding a set of cluster centroids in this latent space. Quantitative and qualitative experimental results show the effectiveness of the proposed method. can be useful for several tasks related to art analysis, in particular visual link retrieval and historical knowledge discovery in painting datasets.
Collapse
|
11
|
Chen R, Tang Y, Zhang W, Feng W. Deep multi-view semi-supervised clustering with sample pairwise constraints. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
12
|
Sui Y, Feng S, Zhang H, Cao J, Hu L, Zhu N. Causality-aware Enhanced Model for Multi-hop Question Answering over Knowledge Graphs. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
13
|
Huang D, Hu J, Li T, Du S, Chen H. Consistency regularization for deep semi-supervised clustering with pairwise constraints. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01599-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
14
|
EDCWRN: efficient deep clustering with the weight of representations and the help of neighbors. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03895-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
15
|
Hazratgholizadeh R, Balafar MA, Derakhshi MRF. Active constrained deep embedded clustering with dual source. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03752-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
16
|
Evaluation and Prediction of Landslide Susceptibility in Yichang Section of Yangtze River Basin Based on Integrated Deep Learning Algorithm. REMOTE SENSING 2022. [DOI: 10.3390/rs14112717] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Landslide susceptibility evaluation (LSE) refers to the probability of landslide occurrence in a region under a specific geological environment and trigger conditions, which is crucial to preventing and controlling landslide risk. The mainstream of the Yangtze River in Yichang City belongs to the largest basin in the Three Gorges Reservoir area and is prone to landslides. Affected by global climate change, seismic activity, and accelerated urbanization, geological disasters such as landslide collapses and debris flows in the study area have increased significantly. Therefore, it is urgent to carry out the LSE in the Yichang section of the Yangtze River Basin. The main results are as follows: (1) Based on historical landslide catalog, geological data, geographic data, hydrological data, remote sensing data, and other multi-source spatial-temporal big data, we construct the LSE index system; (2) In this paper, unsupervised Deep Embedding Clustering (DEC) algorithm and deep integration network (Capsule Neural Network based on SENet: SE-CapNet) are used for the first time to participate in non-landslide sample selection, and LSE in the study area and the accuracy of the algorithm is 96.29; (3) Based on the constructed sensitivity model and rainfall forecast data, the main driving mechanisms of landslides in the Yangtze River Basin were revealed. In this paper, the study area’s mid-long term LSE prediction and trend analysis are carried out. (4) The complete results show that the method has good performance and high precision, providing a reference for subsequent LSE, landslide susceptibility prediction (LSP), and change rule research, and providing a scientific basis for landslide disaster prevention.
Collapse
|
17
|
Ajay P, Nagaraj B, Kumar RA, Huang R, Ananthi P. Unsupervised Hyperspectral Microscopic Image Segmentation Using Deep Embedded Clustering Algorithm. SCANNING 2022; 2022:1200860. [PMID: 35800209 PMCID: PMC9192273 DOI: 10.1155/2022/1200860] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 05/23/2022] [Indexed: 06/15/2023]
Abstract
Hyperspectral microscopy in biology and minerals, unsupervised deep learning neural network denoising SRS photos: hyperspectral resolution enhancement and denoising one hyperspectral picture is enough to teach unsupervised method. An intuitive chemical species map for a lithium ore sample is produced using k-means clustering. Many researchers are now interested in biosignals. Uncertainty limits the algorithms' capacity to evaluate these signals for further information. Even while AI systems can answer puzzles, they remain limited. Deep learning is used when machine learning is inefficient. Supervised learning needs a lot of data. Deep learning is vital in modern AI. Supervised learning requires a large labeled dataset. The selection of parameters prevents over- or underfitting. Unsupervised learning is used to overcome the challenges outlined above (performed by the clustering algorithm). To accomplish this, two processing processes were used: (1) utilizing nonlinear deep learning networks to turn data into a latent feature space (Z). The Kullback-Leibler divergence is used to test the objective function convergence. This article explores a novel research on hyperspectral microscopic picture using deep learning and effective unsupervised learning.
Collapse
Affiliation(s)
- P. Ajay
- Faculty of Information and Communication Engineering, Anna University, Chennai, India
| | - B. Nagaraj
- Department of ECE, Rathinam Technical Campus, India
| | - R. Arun Kumar
- Rathinam Technical Campus, Department of Electronics and Communication Engineering, India
| | | | - P. Ananthi
- Department of Artificial Intelligence and Data Science, Rathinam Technical Campus, India
| |
Collapse
|
18
|
Moens S, Cule B, Goethals B. RASCL: a randomised approach to subspace clusters. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00327-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
AbstractSubspace clustering aims to discover clusters in projections of highly dimensional numerical data. In this paper, we focus on discovering small collections of highly interesting subspace clusters that do not try to cluster all data points, leaving noisy data points unclustered. To this end, we propose a randomised method that first converts the highly dimensional database to a binarised one using projected samples of the original database. Subsequently, this database is mined for frequent itemsets, which we show can be translated back to subspace clusters. In this way, we are able to explore multiple subspaces of different sizes at the same time. In our extensive experimental analysis, we show on synthetic as well as real-world data that our method is capable of discovering highly interesting subspace clusters efficiently.
Collapse
|
19
|
Jing X, Yan Z, Shen Y, Pedrycz W, Yang J. A Group-Based Distance Learning Method for Semisupervised Fuzzy Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:3083-3096. [PMID: 33027030 DOI: 10.1109/tcyb.2020.3023373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Learning a proper distance for clustering from prior knowledge falls into the realm of semisupervised fuzzy clustering. Although most existing learning methods take prior knowledge (e.g., pairwise constraints) into account, they pay little attention to local knowledge of data, which, however, can be utilized to optimize the distance. In this article, we propose a novel distance learning method, which learns from the Group-level information, for semisupervised fuzzing clustering. We first present a new format of constraint information, called Group-level constraints, by elevating the pairwise constraints (must-links and cannot-links) from point level to Group level. The Groups, generated around data points contained in the pairwise constraints, carry not only the local information of data (the relation between close data points) but also more background information under some given limited prior knowledge. Then, we propose a novel method to learn a distance by using the Group-level constraints, namely, Group-based distance learning, in order to optimize the performance of fuzzy clustering. The distance learning process aims to pull must-link Groups as close as possible while pushing cannot-link Groups as far as possible. We formulate the learning process with the weights of constraints by invoking some linear and nonlinear transformations. The linear Group-based distance learning method is realized by means of semidefinite programming, and the nonlinear learning method is realized by using the neural network, which can explicitly provide nonlinear mappings. Experimental results based on both synthetic and real-world datasets show that the proposed methods yield much better performance compared to other distance learning methods using pairwise constraints.
Collapse
|
20
|
Wu S, Zheng WS. Semisupervised Feature Learning by Deep Entropy-Sparsity Subspace Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; 33:774-788. [PMID: 33493120 DOI: 10.1109/tnnls.2020.3029033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
While feature learning by deep neural networks is currently widely used, it is still very challenging to perform this task, given the very limited quantity of labeled data. To solve this problem, we propose to unite subspace clustering with deep semisupervised feature learning to form a unified learning framework to pursue feature learning by subspace clustering. More specifically, we develop a deep entropy-sparsity subspace clustering (deep ESSC) model, which forces a deep neural network to learn features using subspace clustering constrained by our designed entropy-sparsity scheme. The model can inherently harmonize deep semisupervised feature learning and subspace clustering simultaneously by the proposed self-similarity preserving strategy. To optimize the deep ESSC model, we introduce two unconstrained variables to eliminate the two constraints via softmax functions. We provide a general algebraic-treatment scheme for solving the proposed deep ESSC model. Extensive experiments with comprehensive analysis substantiate that our deep ESSC model is more effective than the related methods.
Collapse
|
21
|
U-Vectors: Generating Clusterable Speaker Embedding from Unlabeled Data. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112110079] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate data. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker. Depending on such a belief, a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings. Without relying on domain adaption policy, the process unsupervisely produces clusterable speaker embeddings, termed unsupervised vectors (u-vectors). The evaluation is concluded in two popular speaker recognition datasets for English language, TIMIT, and LibriSpeech. Also, a Bengali dataset is included to illustrate the diversity of the domain shifts for speaker recognition systems. Finally, we conclude that the proposed approach achieves satisfactory performance using pairwise architectures.
Collapse
|
22
|
Dambha-Miller H, Simpson G, Akyea RK, Hounkpatin H, Morrison L, Gibson J, Stokes J, Islam N, Chapman A, Stuart B, Zaccardi F, Zlatev Z, Jones K, Roderick P, Boniface M, Santer M, Farmer A. The development and validation of population clusters for integrating health and social care: A protocol for a mixed-methods study in Multiple Long-Term Conditions (Cluster-AIM) (Preprint). JMIR Res Protoc 2021; 11:e34405. [PMID: 35708751 PMCID: PMC9247810 DOI: 10.2196/34405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 03/18/2022] [Accepted: 04/21/2022] [Indexed: 11/13/2022] Open
Abstract
Background Multiple long-term health conditions (multimorbidity) (MLTC-M) are increasingly prevalent and associated with high rates of morbidity, mortality, and health care expenditure. Strategies to address this have primarily focused on the biological aspects of disease, but MLTC-M also result from and are associated with additional psychosocial, economic, and environmental barriers. A shift toward more personalized, holistic, and integrated care could be effective. This could be made more efficient by identifying groups of populations based on their health and social needs. In turn, these will contribute to evidence-based solutions supporting delivery of interventions tailored to address the needs pertinent to each cluster. Evidence is needed on how to generate clusters based on health and social needs and quantify the impact of clusters on long-term health and costs. Objective We intend to develop and validate population clusters that consider determinants of health and social care needs for people with MLTC-M using data-driven machine learning (ML) methods compared to expert-driven approaches within primary care national databases, followed by evaluation of cluster trajectories and their association with health outcomes and costs. Methods The mixed methods program of work with parallel work streams include the following: (1) qualitative semistructured interview studies exploring patient, caregiver, and professional views on clinical and socioeconomic factors influencing experiences of living with or seeking care in MLTC-M; (2) modified Delphi with relevant stakeholders to generate variables on health and social (wider) determinants and to examine the feasibility of including these variables within existing primary care databases; and (3) cohort study with expert-driven segmentation, alongside data-driven algorithms. Outputs will be compared, clusters characterized, and trajectories over time examined to quantify associations with mortality, additional long-term conditions, worsening frailty, disease severity, and 10-year health and social care costs. Results The study will commence in October 2021 and is expected to be completed by October 2023. Conclusions By studying MLTC-M clusters, we will assess how more personalized care can be developed, how accurate costs can be provided, and how to better understand the personal and medical profiles and environment of individuals within each cluster. Integrated care that considers “whole persons” and their environment is essential in addressing the complex, diverse, and individual needs of people living with MLTC-M. International Registered Report Identifier (IRRID) PRR1-10.2196/34405
Collapse
Affiliation(s)
| | - Glenn Simpson
- Primary Care Research Centre, Southampton, United Kingdom
| | - Ralph K Akyea
- University of Nottingham, Nottingham, United Kingdom
| | | | | | - Jon Gibson
- Division of Population Health, Health Services Research & Primary Care, University of Manchester, Manchester, United Kingdom
| | - Jonathan Stokes
- Division of Population Health, Health Services Research & Primary Care, University of Manchester, Manchester, United Kingdom
| | | | - Adriane Chapman
- Electronic and Computer Science Centre for Health Technologies, University of Southampton, Southampton, United Kingdom
| | - Beth Stuart
- Primary Care Research Centre, Southampton, United Kingdom
| | - Francesco Zaccardi
- Diabetes Research Centre, University of Leicester, Leicester, United Kingdom
| | - Zlatko Zlatev
- Electronic and Computer Science Centre for Health Technologies, University of Southampton, Southampton, United Kingdom
| | - Karen Jones
- Centre for the Study of Health, Science and Environment, University of Kent, Kent, United Kingdom
| | - Paul Roderick
- Public Health, University of Southampton, Southampton, United Kingdom
| | - Michael Boniface
- Electronic and Computer Science Centre for Health Technologies, University of Southampton, Southampton, United Kingdom
| | - Miriam Santer
- Primary Care Research Centre, Southampton, United Kingdom
| | - Andrew Farmer
- Nuffield Department of Primary Care Health Sciences, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
23
|
Deep Semi-Supervised Algorithm for Learning Cluster-Oriented Representations of Medical Images Using Partially Observable DICOM Tags and Images. Diagnostics (Basel) 2021; 11:diagnostics11101920. [PMID: 34679618 PMCID: PMC8534981 DOI: 10.3390/diagnostics11101920] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 10/13/2021] [Accepted: 10/15/2021] [Indexed: 11/16/2022] Open
Abstract
The task of automatically extracting large homogeneous datasets of medical images based on detailed criteria and/or semantic similarity can be challenging because the acquisition and storage of medical images in clinical practice is not fully standardised and can be prone to errors, which are often made unintentionally by medical professionals during manual input. In this paper, we propose an algorithm for learning cluster-oriented representations of medical images by fusing images with partially observable DICOM tags. Pairwise relations are modelled by thresholding the Gower distance measure which is calculated using eight DICOM tags. We trained the models using 30,000 images, and we tested them using a disjoint test set consisting of 8000 images, gathered retrospectively from the PACS repository of the Clinical Hospital Centre Rijeka in 2017. We compare our method against the standard and deep unsupervised clustering algorithms, as well as the popular semi-supervised algorithms combined with the most commonly used feature descriptors. Our model achieves an NMI score of 0.584 with respect to the anatomic region, and an NMI score of 0.793 with respect to the modality. The results suggest that DICOM data can be used to generate pairwise constraints that can help improve medical images clustering, even when using only a small number of constraints.
Collapse
|
24
|
Xu J, Ren Y, Li G, Pan L, Zhu C, Xu Z. Deep embedded multi-view clustering with collaborative training. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.073] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
25
|
Multi-view Clustering Based on Low-rank Representation and Adaptive Graph Learning. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10634-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
26
|
Arellano-Espitia F, Delgado-Prieto M, Gonzalez-Abreu AD, Saucedo-Dorantes JJ, Osornio-Rios RA. Deep-Compact-Clustering Based Anomaly Detection Applied to Electromechanical Industrial Systems. SENSORS (BASEL, SWITZERLAND) 2021; 21:5830. [PMID: 34502724 PMCID: PMC8433707 DOI: 10.3390/s21175830] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 08/26/2021] [Accepted: 08/26/2021] [Indexed: 11/17/2022]
Abstract
The rapid growth in the industrial sector has required the development of more productive and reliable machinery, and therefore, leads to complex systems. In this regard, the automatic detection of unknown events in machinery represents a greater challenge, since uncharacterized catastrophic faults can occur. However, the existing methods for anomaly detection present limitations when dealing with highly complex industrial systems. For that purpose, a novel fault diagnosis methodology is developed to face the anomaly detection. An unsupervised anomaly detection framework named deep-autoencoder-compact-clustering one-class support-vector machine (DAECC-OC-SVM) is presented, which aims to incorporate the advantages of automatically learnt representation by deep neural network to improved anomaly detection performance. The method combines the training of a deep-autoencoder with clustering compact model and a one-class support-vector-machine function-based outlier detection method. The addressed methodology is applied on a public rolling bearing faults experimental test bench and on multi-fault experimental test bench. The results show that the proposed methodology it is able to accurately to detect unknown defects, outperforming other state-of-the-art methods.
Collapse
Affiliation(s)
- Francisco Arellano-Espitia
- MCIA Department of Electronic Engineering, Technical University of Catalonia (UPC), 08034 Barcelona, Spain;
| | - Miguel Delgado-Prieto
- MCIA Department of Electronic Engineering, Technical University of Catalonia (UPC), 08034 Barcelona, Spain;
| | - Artvin-Darien Gonzalez-Abreu
- HSPdigital CA-Mecatronica Engineering Faculty, Autonomous University of Queretaro, San Juan del Rio 76806, Mexico; (A.-D.G.-A.); (J.J.S.-D.); (R.A.O.-R.)
| | - Juan Jose Saucedo-Dorantes
- HSPdigital CA-Mecatronica Engineering Faculty, Autonomous University of Queretaro, San Juan del Rio 76806, Mexico; (A.-D.G.-A.); (J.J.S.-D.); (R.A.O.-R.)
| | - Roque Alfredo Osornio-Rios
- HSPdigital CA-Mecatronica Engineering Faculty, Autonomous University of Queretaro, San Juan del Rio 76806, Mexico; (A.-D.G.-A.); (J.J.S.-D.); (R.A.O.-R.)
| |
Collapse
|
27
|
|
28
|
Lima BVA, Neto ADD, Silva LES, Machado VP. Deep semi‐supervised classification based in deep clustering and cross‐entropy. INT J INTELL SYST 2021. [DOI: 10.1002/int.22446] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Bruno Vicente Alves Lima
- Departament of Computer and Automation Federal University of Rio Grande do Norte Natal Rio Grande do Norte Brazil
| | - Adrião Duarte Dória Neto
- Departament of Computer and Automation Federal University of Rio Grande do Norte Natal Rio Grande do Norte Brazil
| | | | | |
Collapse
|
29
|
Zurn J, Burgard W, Valada A. Self-Supervised Visual Terrain Classification From Unsupervised Acoustic Feature Learning. IEEE T ROBOT 2021. [DOI: 10.1109/tro.2020.3031214] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
30
|
|
31
|
|
32
|
Huang Z, Ren Y, Pu X, Pan L, Yao D, Yu G. Dual self-paced multi-view clustering. Neural Netw 2021; 140:184-192. [PMID: 33770727 DOI: 10.1016/j.neunet.2021.02.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/17/2021] [Accepted: 02/18/2021] [Indexed: 10/22/2022]
Abstract
By utilizing the complementary information from multiple views, multi-view clustering (MVC) algorithms typically achieve much better clustering performance than conventional single-view methods. Although in this field, great progresses have been made in past few years, most existing multi-view clustering methods still suffer the following shortcomings: (1) most MVC methods are non-convex and thus are easily stuck into suboptimal local minima; (2) the effectiveness of these methods is sensitive to the existence of noises or outliers; and (3) the qualities of different features and views are usually ignored, which can also influence the clustering result. To address these issues, we propose dual self-paced multi-view clustering (DSMVC) in this paper. Specifically, DSMVC takes advantage of self-paced learning to tackle the non-convex issue. By applying a soft-weighting scheme of self-paced learning for instances, the negative impact caused by noises and outliers can be significantly reduced. Moreover, to alleviate the feature and view quality issues, we develop a novel feature selection approach in a self-paced manner and a weighting term for views. Experimental results on real-world data sets demonstrate the effectiveness of the proposed method.
Collapse
Affiliation(s)
- Zongmo Huang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Yazhou Ren
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China.
| | - Xiaorong Pu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
| | - Lili Pan
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Dezhong Yao
- The Clinical Hospital of Chengdu Brain Science Institute, MOE Key Laboratory for Neuroinformation, University of Electronic Science and Technology of China, Chengdu 611731, China; Research Unit of NeuroInformation, Chinese Academy of Medical Sciences, 2019RU035, Chengdu, China; School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, China
| | - Guoxian Yu
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
33
|
Guérin J, Thiery S, Nyiri E, Gibaru O, Boots B. Combining pretrained CNN feature extractors to enhance clustering of complex natural images. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.10.068] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
34
|
Jiang Y, Gu X, Wu D, Hang W, Xue J, Qiu S, Lin CT. A Novel Negative-Transfer-Resistant Fuzzy Clustering Model With a Shared Cross-Domain Transfer Latent Space and its Application to Brain CT Image Segmentation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:40-52. [PMID: 31905144 DOI: 10.1109/tcbb.2019.2963873] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Traditional clustering algorithms for medical image segmentation can only achieve satisfactory clustering performance under relatively ideal conditions, in which there is adequate data from the same distribution, and the data is rarely disturbed by noise or outliers. However, a sufficient amount of medical images with representative manual labels are often not available, because medical images are frequently acquired with different scanners (or different scan protocols) or polluted by various noises. Transfer learning improves learning in the target domain by leveraging knowledge from related domains. Given some target data, the performance of transfer learning is determined by the degree of relevance between the source and target domains. To achieve positive transfer and avoid negative transfer, a negative-transfer-resistant mechanism is proposed by computing the weight of transferred knowledge. Extracting a negative-transfer-resistant fuzzy clustering model with a shared cross-domain transfer latent space (called NTR-FC-SCT) is proposed by integrating negative-transfer-resistant and maximum mean discrepancy (MMD) into the framework of fuzzy c-means clustering. Experimental results show that the proposed NTR-FC-SCT model outperformed several traditional non-transfer and related transfer clustering algorithms.
Collapse
|
35
|
Araújo AFR, Antonino VO, Ponce-Guevara KL. Self-organizing subspace clustering for high-dimensional and multi-view data. Neural Netw 2020; 130:253-268. [PMID: 32711348 DOI: 10.1016/j.neunet.2020.06.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/30/2020] [Accepted: 06/28/2020] [Indexed: 12/14/2022]
Abstract
A surge in the availability of data from multiple sources and modalities is correlated with advances in how to obtain, compress, store, transfer, and process large amounts of complex high-dimensional data. The clustering challenge increases with the growth of data dimensionality which decreases the discriminate power of the distance metrics. Subspace clustering aims to group data drawn from a union of subspaces. In such a way, there is a large number of state-of-the-art approaches and we divide them into families regarding the method used in the clustering. We introduce a soft subspace clustering algorithm, a Self-organizing Map (SOM) with a time-varying structure, to cluster data without any prior knowledge of the number of categories or of the neural network topology, both determined during the training process. The model also assigns proper relevancies (weights) to different dimensions, capturing from the learning process the influence of each dimension on uncovering clusters. We employ a number of real-world datasets to validate the model. This algorithm presents a competitive performance in a diverse range of contexts among them data mining, gene expression, multi-view, computer vision and text clustering problems which include high-dimensional data. Extensive experiments suggest that our method very often outperforms the state-of-the-art approaches in all types of problems considered.
Collapse
Affiliation(s)
- Aluizio F R Araújo
- Centro de Informática, Universidade Federal de Pernambuco, 50740560, Recife, Brazil.
| | - Victor O Antonino
- Centro de Informática, Universidade Federal de Pernambuco, 50740560, Recife, Brazil
| | | |
Collapse
|
36
|
Ohi AQ, Mridha M, Safir FB, Hamid MA, Monowar MM. AutoEmbedder: A semi-supervised DNN embedding system for clustering. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106190] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
37
|
Dos Santos FP, Zor C, Kittler J, Ponti MA. Learning image features with fewer labels using a semi-supervised deep convolutional network. Neural Netw 2020; 132:131-143. [PMID: 32871338 DOI: 10.1016/j.neunet.2020.08.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Revised: 05/04/2020] [Accepted: 08/13/2020] [Indexed: 11/16/2022]
Abstract
Learning feature embeddings for pattern recognition is a relevant task for many applications. Deep learning methods such as convolutional neural networks can be employed for this assignment with different training strategies: leveraging pre-trained models as baselines; training from scratch with the target dataset; or fine-tuning from the pre-trained model. Although there are separate systems used for learning features from labelled and unlabelled data, there are few models combining all available information. Therefore, in this paper, we present a novel semi-supervised deep network training strategy that comprises a convolutional network and an autoencoder using a joint classification and reconstruction loss function. We show our network improves the learned feature embedding when including the unlabelled data in the training process. The results using the feature embedding obtained by our network achieve better classification accuracy when compared with competing methods, as well as offering good generalisation in the context of transfer learning. Furthermore, the proposed network ensemble and loss function is highly extensible and applicable in many recognition tasks.
Collapse
Affiliation(s)
- Fernando P Dos Santos
- Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo (USP), São Carlos/SP, 13566-590, Brazil.
| | - Cemre Zor
- Centre for Medical Image Computing (CMIC), University College London, WC1E 7JE, United Kingdom.
| | - Josef Kittler
- Centre for Vision, Speech and Signal Processing (CVSSP), University of Surrey, GU2 7XH, United Kingdom.
| | - Moacir A Ponti
- Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo (USP), São Carlos/SP, 13566-590, Brazil.
| |
Collapse
|
38
|
Uma Priya D, Santhi Thilagam P. Dynamic Data Retrieval Using Incremental Clustering and Indexing. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH 2020. [DOI: 10.4018/ijirr.2020070105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The evolution of the Internet and real-time applications has contributed to the growth of massive unstructured data which imposes the increased complexity of efficient retrieval of dynamic data. Extant research uses clustering methods and indexes to speed up the retrieval. However, the quality of clustering methods depends on data representation models where existing models suffer from dimensionality explosion and sparsity problems. As documents evolve, index reconstruction from scratch is expensive. In this work, compact vectors of documents generated by the Doc2Vec model are used to cluster the documents and the indexes are incrementally updated with less complexity using the diff method. The probabilistic ranking scheme BM25+ is used to improve the quality of retrieval for user queries. The experimental analysis demonstrates that the proposed system significantly improves the clustering performance and reduces retrieval time to obtain top-k results.
Collapse
Affiliation(s)
- Uma Priya D
- Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India
| | - Santhi Thilagam P
- Department of Computer Science and Engineering, National Institute of Technology Karnataka, Surathkal, India
| |
Collapse
|
39
|
|
40
|
Śmieja M, Struski Ł, Figueiredo MAT. A classification-based approach to semi-supervised clustering with pairwise constraints. Neural Netw 2020; 127:193-203. [PMID: 32387926 DOI: 10.1016/j.neunet.2020.04.017] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 04/06/2020] [Accepted: 04/16/2020] [Indexed: 11/30/2022]
Abstract
In this paper, we introduce a neural network framework for semi-supervised clustering with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose semi-supervised clustering into two simpler classification tasks: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method. The proposed approach is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method.
Collapse
Affiliation(s)
- Marek Śmieja
- Faculty of Mathematics and Computer Science, Jagiellonian University, Kraków, Poland.
| | - Łukasz Struski
- Faculty of Mathematics and Computer Science, Jagiellonian University, Kraków, Poland.
| | - Mário A T Figueiredo
- Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal.
| |
Collapse
|
41
|
Kang Z, Pan H, Hoi SCH, Xu Z. Robust Graph Learning From Noisy Data. IEEE TRANSACTIONS ON CYBERNETICS 2020; 50:1833-1843. [PMID: 30629527 DOI: 10.1109/tcyb.2018.2887094] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Learning graphs from data automatically have shown encouraging performance on clustering and semisupervised learning tasks. However, real data are often corrupted, which may cause the learned graph to be inexact or unreliable. In this paper, we propose a novel robust graph learning scheme to learn reliable graphs from the real-world noisy data by adaptively removing noise and errors in the raw data. We show that our proposed model can also be viewed as a robust version of manifold regularized robust principle component analysis (RPCA), where the quality of the graph plays a critical role. The proposed model is able to boost the performance of data clustering, semisupervised classification, and data recovery significantly, primarily due to two key factors: 1) enhanced low-rank recovery by exploiting the graph smoothness assumption and 2) improved graph construction by exploiting clean data recovered by RPCA. Thus, it boosts the clustering, semisupervised classification, and data recovery performance overall. Extensive experiments on image/document clustering, object recognition, image shadow removal, and video background subtraction reveal that our model outperforms the previous state-of-the-art methods.
Collapse
|
42
|
|
43
|
Gu Y, Wang S, Zhang H, Yao Y, Yang W, Liu L. Clustering-driven unsupervised deep hashing for image retrieval. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2019.08.050] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
44
|
|
45
|
|
46
|
|