1
|
Genetic algorithms: theory, genetic operators, solutions, and applications. EVOLUTIONARY INTELLIGENCE 2023. [DOI: 10.1007/s12065-023-00822-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
2
|
K-means Clustering Algorithms: A Comprehensive Review, Variants Analysis, and Advances in the Era of Big Data. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
3
|
Blending multiple algorithmic granular components: a recipe for clustering. SWARM INTELLIGENCE 2022. [DOI: 10.1007/s11721-022-00219-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
4
|
Boosting k-means clustering with symbiotic organisms search for automatic clustering problems. PLoS One 2022; 17:e0272861. [PMID: 35951672 PMCID: PMC9371361 DOI: 10.1371/journal.pone.0272861] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 07/28/2022] [Indexed: 11/19/2022] Open
Abstract
Kmeans clustering algorithm is an iterative unsupervised learning algorithm that tries to partition the given dataset into k pre-defined distinct non-overlapping clusters where each data point belongs to only one group. However, its performance is affected by its sensitivity to the initial cluster centroids with the possibility of convergence into local optimum and specification of cluster number as the input parameter. Recently, the hybridization of metaheuristics algorithms with the K-Means algorithm has been explored to address these problems and effectively improve the algorithm’s performance. Nonetheless, most metaheuristics algorithms require rigorous parameter tunning to achieve an optimum result. This paper proposes a hybrid clustering method that combines the well-known symbiotic organisms search algorithm with K-Means using the SOS as a global search metaheuristic for generating the optimum initial cluster centroids for the K-Means. The SOS algorithm is more of a parameter-free metaheuristic with excellent search quality that only requires initialising a single control parameter. The performance of the proposed algorithm is investigated by comparing it with the classical SOS, classical K-means and other existing hybrids clustering algorithms on eleven (11) UCI Machine Learning Repository datasets and one artificial dataset. The results from the extensive computational experimentation show improved performance of the hybrid SOSK-Means for solving automatic clustering compared to the standard K-Means, symbiotic organisms search clustering methods and other hybrid clustering approaches.
Collapse
|
5
|
Sheng W, Wang X, Wang Z, Li Q, Zheng Y, Chen S. A Differential Evolution Algorithm With Adaptive Niching and K-Means Operation for Data Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6181-6195. [PMID: 33284774 DOI: 10.1109/tcyb.2020.3035887] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Clustering, as an important part of data mining, is inherently a challenging problem. This article proposes a differential evolution algorithm with adaptive niching and k -means operation (denoted as DE_ANS_AKO) for partitional data clustering. Within the proposed algorithm, an adaptive niching scheme, which can dynamically adjust the size of each niche in the population, is devised and integrated to prevent premature convergence of evolutionary search, thus appropriately searching the space to identify the optimal or near-optimal solution. Furthermore, to improve the search efficiency, an adaptive k -means operation has been designed and employed at the niche level of population. The performance of the proposed algorithm has been evaluated on synthetic as well as real datasets and compared with related methods. The experimental results reveal that the proposed algorithm is able to reliably and efficiently deliver high quality clustering solutions and generally outperforms related methods implemented for comparisons.
Collapse
|
6
|
Kaur A, Kumar Y. Neighborhood search based improved bat algorithm for data clustering. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02934-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
7
|
Integrating the Eigendecomposition Approach and k-Means Clustering for Inferring Building Functions with Location-Based Social Media Data. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2021. [DOI: 10.3390/ijgi10120834] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Understanding the relationship between human activity patterns and urban spatial structure planning is one of the core research topics in urban planning. Since a building is the basic spatial unit of the urban spatial structure, identifying building function types, according to human activities, is essential but challenging. This study presented a novel approach that integrated the eigendecomposition method and k-means clustering for inferring building function types according to location-based social media data, Tencent User Density (TUD) data. The eigendecomposition approach was used to extract the effective principal components (PCs) to characterize the temporal patterns of human activities at building level. This was combined with k-means clustering for building function identification. The proposed method was applied to the study area of Tianhe district, Guangzhou, one of the largest cities in China. The building inference results were verified through the random sampling of AOI data and street views in Baidu Maps. The accuracy for all building clusters exceeded 83.00%. The results indicated that the eigendecomposition approach is effective for revealing the temporal structure inherent in human activities, and the proposed eigendecomposition-k-means clustering approach is reliable for building function identification based on social media data.
Collapse
|
8
|
K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app112311246] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
K-means clustering algorithm is a partitional clustering algorithm that has been used widely in many applications for traditional clustering due to its simplicity and low computational complexity. This clustering technique depends on the user specification of the number of clusters generated from the dataset, which affects the clustering results. Moreover, random initialization of cluster centers results in its local minimal convergence. Automatic clustering is a recent approach to clustering where the specification of cluster number is not required. In automatic clustering, natural clusters existing in datasets are identified without any background information of the data objects. Nature-inspired metaheuristic optimization algorithms have been deployed in recent times to overcome the challenges of the traditional clustering algorithm in handling automatic data clustering. Some nature-inspired metaheuristics algorithms have been hybridized with the traditional K-means algorithm to boost its performance and capability to handle automatic data clustering problems. This study aims to identify, retrieve, summarize, and analyze recently proposed studies related to the improvements of the K-means clustering algorithm with nature-inspired optimization techniques. A quest approach for article selection was adopted, which led to the identification and selection of 147 related studies from different reputable academic avenues and databases. More so, the analysis revealed that although the K-means algorithm has been well researched in the literature, its superiority over several well-established state-of-the-art clustering algorithms in terms of speed, accessibility, simplicity of use, and applicability to solve clustering problems with unlabeled and nonlinearly separable datasets has been clearly observed in the study. The current study also evaluated and discussed some of the well-known weaknesses of the K-means clustering algorithm, for which the existing improvement methods were conceptualized. It is noteworthy to mention that the current systematic review and analysis of existing literature on K-means enhancement approaches presents possible perspectives in the clustering analysis research domain and serves as a comprehensive source of information regarding the K-means algorithm and its variants for the research community.
Collapse
|
9
|
Gene Expression Analysis through Parallel Non-Negative Matrix Factorization. COMPUTATION 2021. [DOI: 10.3390/computation9100106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Genetic expression analysis is a principal tool to explain the behavior of genes in an organism when exposed to different experimental conditions. In the state of art, many clustering algorithms have been proposed. It is overwhelming the amount of biological data whose high-dimensional structure exceeds mostly current computational architectures. The computational time and memory consumption optimization actually become decisive factors in choosing clustering algorithms. We propose a clustering algorithm based on Non-negative Matrix Factorization and K-means to reduce data dimensionality but whilst preserving the biological context and prioritizing gene selection, and it is implemented within parallel GPU-based environments through the CUDA library. A well-known dataset is used in our tests and the quality of the results is measured through the Rand and Accuracy Index. The results show an increase in the acceleration of 6.22× compared to the sequential version. The algorithm is competitive in the biological datasets analysis and it is invariant with respect to the classes number and the size of the gene expression matrix.
Collapse
|
10
|
Wang X, Wang Z, Sheng M, Li Q, Sheng W. An adaptive and opposite K-means operation based memetic algorithm for data clustering. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.01.056] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
11
|
Sabah A, Tiun S, Sani NS, Ayob M, Taha AY. Enhancing web search result clustering model based on multiview multirepresentation consensus cluster ensemble (mmcc) approach. PLoS One 2021; 16:e0245264. [PMID: 33449949 PMCID: PMC7810326 DOI: 10.1371/journal.pone.0245264] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 12/26/2020] [Indexed: 11/18/2022] Open
Abstract
Existing text clustering methods utilize only one representation at a time (single view), whereas multiple views can represent documents. The multiview multirepresentation method enhances clustering quality. Moreover, existing clustering methods that utilize more than one representation at a time (multiview) use representation with the same nature. Hence, using multiple views that represent data in a different representation with clustering methods is reasonable to create a diverse set of candidate clustering solutions. On this basis, an effective dynamic clustering method must consider combining multiple views of data including semantic view, lexical view (word weighting), and topic view as well as the number of clusters. The main goal of this study is to develop a new method that can improve the performance of web search result clustering (WSRC). An enhanced multiview multirepresentation consensus clustering ensemble (MMCC) method is proposed to create a set of diverse candidate solutions and select a high-quality overlapping cluster. The overlapping clusters are obtained from the candidate solutions created by different clustering methods. The framework to develop the proposed MMCC includes numerous stages: (1) acquiring the standard datasets (MORESQUE and Open Directory Project-239), which are used to validate search result clustering algorithms, (2) preprocessing the dataset, (3) applying multiview multirepresentation clustering models, (4) using the radius-based cluster number estimation algorithm, and (5) employing the consensus clustering ensemble method. Results show an improvement in clustering methods when multiview multirepresentation is used. More importantly, the proposed MMCC model improves the overall performance of WSRC compared with all single-view clustering models.
Collapse
Affiliation(s)
- Ali Sabah
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Sabrina Tiun
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
- * E-mail:
| | - Nor Samsiah Sani
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Masri Ayob
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| | - Adil Yaseen Taha
- Center for Artificial Intelligence Technology (CAIT), Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia
| |
Collapse
|
12
|
|
13
|
Xing X. Bottleneck prediction of urban road network based on improved PSO algorithms and fuzzy control. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-189059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Traffic bottleneck refers to the road section or node that often causes the propagation or spread of traffic congestion in the road network, which is the source of the whole road network congestion. In this paper, the author analyzes the bottleneck prediction of urban road network based on improved PSO algorithms and fuzzy control. This paper analyzes the factors and characteristics of the main road of the system, proposes the traffic coordination control of the main road based on the delay model, and carries on the statistical simulation to the actual traffic data, develops the basic theory of the traffic coordination control which is more effective than the traditional timing control strategy. Compared with the traditional model, the algorithm considers the waiting time of the red light at the intersection. For the congested road section, it can better calculate the travel time of the vehicle, making the results more accurate and more applicable. The results of this study can provide a strong theoretical basis and prediction scheme for the traffic management and control of the road network in the target area.
Collapse
Affiliation(s)
- Xue Xing
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin Jilin, China
| |
Collapse
|
14
|
Gong C. Analysis of athletes’ stadium stress source based on improved layered K-means algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-189065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Most of the research on stressors is in the medical field, and there are few analysis of athletes’ stressors, so it can not provide reference for the analysis of athletes’ stressors. Based on this, this study combines machine learning algorithms to analyze the pressure source of athletes’ stadium. In terms of data collection, it is mainly obtained through questionnaire survey and interview form, and it is used as experimental data after passing the test. In order to improve the performance of the algorithm, this paper combines the known K-Means algorithm with the layering algorithm to form a new improved layered K-Means algorithm. At the same time, this paper analyzes the performance of the improved hierarchical K-Means algorithm through experimental comparison and compares the clustering results. In addition, the analysis system corresponding to the algorithm is constructed based on the actual situation, the algorithm is applied to practice, and the user preference model is constructed. Finally, this article helps athletes find stressors and find ways to reduce stressors through personalized recommendations. The research shows that the algorithm of this study is reliable and has certain practical effects and can provide theoretical reference for subsequent related research.
Collapse
Affiliation(s)
- Chen Gong
- School of Physical Education, Northeast Electric Power University, Jilin, China
| |
Collapse
|
15
|
Ezugwu AE, Shukla AK, Agbaje MB, Oyelade ON, José-García A, Agushaka JO. Automatic clustering algorithms: a systematic review and bibliometric analysis of relevant literature. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-05395-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
16
|
Genetic algorithm-based fuzzy clustering applied to multivariate time series. EVOLUTIONARY INTELLIGENCE 2020. [DOI: 10.1007/s12065-020-00422-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
17
|
Ezugwu AE. Nature-inspired metaheuristic techniques for automatic clustering: a survey and performance study. SN APPLIED SCIENCES 2020. [DOI: 10.1007/s42452-020-2073-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
18
|
Ezugwu AES, Agbaje MB, Aljojo N, Els R, Chiroma H, Elaziz MA. A Comparative Performance Study of Hybrid Firefly Algorithms for Automatic Data Clustering. IEEE ACCESS 2020; 8:121089-121118. [DOI: 10.1109/access.2020.3006173] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
19
|
A novel web page recommender using data automatic clustering and Markov process. SN APPLIED SCIENCES 2019. [DOI: 10.1007/s42452-019-1719-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022] Open
|
20
|
A mathematical morphology based method for hierarchical clustering analysis of spatial points on street networks. Appl Soft Comput 2019. [DOI: 10.1016/j.asoc.2019.105785] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
21
|
Fletcher S, Verma B. Pruning High-Similarity Clusters to Optimize Data Diversity when Building Ensemble Classifiers. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2019. [DOI: 10.1142/s1469026819500275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Diversity is a key component for building a successful ensemble classifier. One approach to diversifying the base classifiers in an ensemble classifier is to diversify the data they are trained on. While sampling approaches such as bagging have been used for this task in the past, we argue that since they maintain the global distribution, they do not create diversity. Instead, we make a principled argument for the use of [Formula: see text]-means clustering to create diversity. Expanding on previous work, we observe that when creating multiple clusterings with multiple [Formula: see text] values, there is a risk of different clusterings discovering the same clusters, which would in turn train the same base classifiers. This would bias the ensemble voting process. We propose a new approach that uses the Jaccard Index to detect and remove similar clusters before training the base classifiers, not only saving computation time, but also reducing classification error by removing repeated votes. We empirically demonstrate the effectiveness of the proposed approach compared to the state of the art on 19 UCI benchmark datasets.
Collapse
Affiliation(s)
- Sam Fletcher
- Centre for Intelligent Systems, School of Engineering and Technology, Central Queensland University, Brisbane, QLD 4000, Australia
| | - Brijesh Verma
- Centre for Intelligent Systems, School of Engineering and Technology, Central Queensland University, Brisbane, QLD 4000, Australia
| |
Collapse
|
22
|
Qaddoura R, Faris H, Aljarah I. An efficient clustering algorithm based on the k-nearest neighbors with an indexing ratio. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-01027-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
23
|
|
24
|
Singh H, Kumar Y. Cellular Automata Based Model for E-Healthcare Data Analysis. INTERNATIONAL JOURNAL OF INFORMATION SYSTEM MODELING AND DESIGN 2019. [DOI: 10.4018/ijismd.2019070101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
E-healthcare is warm area of research and a number of algorithms have been applied to classify healthcare data. In the healthcare field, a large amount of clinical data is generated through MRI, CT scans, and other diagnostic tools. Healthcare analytics are used to analyze the clinical data of patient records, disease diagnosis, cost, hospital management, etc. Analytical techniques and data visualization are used to get the real time information. Further, this information can be used for decision making. Also, this information is useful for the better treatment of patients. In this work, an improved big bang-big crunch (BB-BC) based clustering algorithm is applied to analyze healthcare data. Cluster analysis is an important task in the field of data analysis and can be used to understand the organization of data. In this work, two healthcare datasets, CMC and cancer, are used and the proposed algorithm obtains better results when compared to MEBB-BC, BB-BC, GA, PSO and K-means algorithms. The performance of the improved BB-BC algorithm is also examined against benchmark clustering datasets. The simulation results showed that proposed algorithm improves the clustering results significantly when compared to other algorithms.
Collapse
Affiliation(s)
- Hakam Singh
- Department of Computer Science and Engineering, Jaypee University of Information Technology Waknaghat, Solan, Himachal Pradesh, India
| | - Yugal Kumar
- Department of Computer Science and Engineering, Jaypee University of Information Technology Waknaghat, Solan, Himachal Pradesh, India
| |
Collapse
|
25
|
Ma H, Zhou X. A GPS location data clustering approach based on a niche genetic algorithm and hybrid K-means. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-192791] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Hongjiang Ma
- School of Computer Science, Chengdu University of Information Technology, Chengdu, Sichuan 610225, China
| | - Xiangbing Zhou
- School of Information and Engineering, Sichuan Tourism University, Chengdu, Sichuan 610100, China
- School of Mathematics and Computer Science, Aba Teachers University, Wenchuan, Sichuan 623002, China
| |
Collapse
|
26
|
Atasever UH. A novel unsupervised change detection approach based on reconstruction independent component analysis and ABC-Kmeans clustering for environmental monitoring. ENVIRONMENTAL MONITORING AND ASSESSMENT 2019; 191:447. [PMID: 31214850 DOI: 10.1007/s10661-019-7591-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 06/04/2019] [Indexed: 06/09/2023]
Abstract
In this paper, I propose a new unsupervised change detection method for optical satellite imagery. The proposed technique consists of three phases. In the first stage, difference images are calculated using four different functions. Two of the functions were first used in this study. In the second stage, using Reconstruction Independent Component Analysis, this four-difference matrix is projected to one feature. In the last stage, clustering is performed. Kmeans tuned by Artificial Bee Colony (ABC-Kmeans) clustering technique has been developed and proposed by following a different strategy in the clustering phase. The effectiveness of the proposed approach was examined using two different datasets, Sardinia and Mexico. Quantitative evaluation was performed in two stages. In the first stage, proposed method was compared with different unsupervised change detection algorithms using False Alarm, Missed Alarm, Total Error, and Total Error Rate metrics which are calculated using ground truth image in dataset. In the second experimental study, the proposed approach is compared in detail with PCA-Kmeans approach, which is quite often preferred for similar studies, using the Mean Squared Error, Peak Signal to Noise Ratio, Structural Similarity Index, and Universal Image Quality Index metrics. According to quantitative and qualitative analysis, proposed approach can produce quite successful results using optical remote sensing data.
Collapse
Affiliation(s)
- Umit Haluk Atasever
- Department of Geomatics Engineering, Engineering Faculty, Erciyes University, 38039, Kayseri, Turkey.
| |
Collapse
|
27
|
Detecting and Learning Unknown Fault States by Automatically Finding the Optimal Number of Clusters for Online Bearing Fault Diagnosis. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9112326] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
This paper proposes an online fault diagnosis system for bearings that detect emerging fault modes and then updates the diagnostic system knowledge (DSK) to incorporate information about the newly detected fault modes. New fault modes are detected using k-means clustering along with a new cluster evaluation method, i.e., multivariate probability density function’s cluster distribution factor (MPDFCDF). In this proposed model, a heterogeneous pool of features is constructed from the signal. A hybrid feature selection model is adopted for selecting optimal feature for learning the model with existing fault mode. The proposed online fault diagnosis system detects new fault modes from unknown signals using k-means clustering with the help of proposed MPDFCDF cluster evaluation method. The DSK is updated whenever new fault modes are detected and updated DSK is used to classify faults using the k-nearest neighbor (k-NN) classifier. The proposed model is evaluated using acoustic emission signals acquired from low-speed rolling element bearings with different fault modes and severities under different rotational speeds. Experimental results present that the MPDFCDF cluster evaluation method can detect the optimal number of fault clusters, and the proposed online diagnosis model can detect newly emerged faults and update the DSK effectively, which improves the diagnosis performance in terms of the average classification performance.
Collapse
|
28
|
Rahman MA, Islam MZ. Application of a density based clustering technique on biomedical datasets. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.09.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
29
|
Teran Hidalgo SJ, Zhu T, Wu M, Ma S. Overlapping clustering of gene expression data using penalized weighted normalized cut. Genet Epidemiol 2018; 42:796-811. [PMID: 30302823 PMCID: PMC6239939 DOI: 10.1002/gepi.22164] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 07/24/2018] [Accepted: 08/28/2018] [Indexed: 02/06/2023]
Abstract
Clustering has been widely conducted in the analysis of gene expression data. For complex diseases, it has played an important role in identifying unknown functions of genes, serving as the basis of other analysis, and others. A common limitation of most existing clustering approaches is to assume that genes are separated into disjoint clusters. As genes often have multiple functions and thus can belong to more than one functional cluster, the disjoint clustering results can be unsatisfactory. In addition, due to the small sample sizes of genetic profiling studies and other factors, there may not be sufficient evidence to confirm the specific functions of some genes and cluster them definitively into disjoint clusters. In this study, we develop an effective overlapping clustering approach, which takes account into the multiplicity of gene functions and lack of certainty in practical analysis. A penalized weighted normalized cut (PWNCut) criterion is proposed based on the NCut technique and an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:msub><mml:mi>L</mml:mi> <mml:mn>2</mml:mn></mml:msub> </mml:math> norm constraint. It outperforms multiple competitors in simulation. The analysis of the cancer genome atlas (TCGA) data on breast cancer and cervical cancer leads to biologically sensible findings which differ from those using the alternatives. To facilitate implementation, we develop the function pwncut in the R package NCutYX.
Collapse
Affiliation(s)
| | - Tingyu Zhu
- Department of Statistics, Xiamen University, Xiamen, China
| | - Mengyun Wu
- Department of Biostatistics, Yale University, New Haven, Connecticut.,School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, Connecticut
| |
Collapse
|
30
|
|
31
|
|
32
|
Genetic Algorithm with an Improved Initial Population Technique for Automatic Clustering of Low-Dimensional Data. INFORMATION 2018. [DOI: 10.3390/info9040101] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
33
|
An Automatic K-Means Clustering Algorithm of GPS Data Combining a Novel Niche Genetic Algorithm with Noise and Density. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION 2017. [DOI: 10.3390/ijgi6120392] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
34
|
Automatic data clustering using continuous action-set learning automata and its application in segmentation of images. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2016.12.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
35
|
Beg A, Islam MZ, Estivill-Castro V. Genetic algorithm with healthy population and multiple streams sharing information for clustering. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.09.030] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
36
|
Adnan MN, Islam MZ. Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.07.016] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
37
|
|
38
|
Erguzel TT, Ozekes S, Sayar GH, Tan O, Tarhan N. A hybrid artificial intelligence method to classify trichotillomania and obsessive compulsive disorder. Neurocomputing 2015. [DOI: 10.1016/j.neucom.2015.02.039] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
39
|
Zhang L, Lu W, Liu X, Pedrycz W, Zhong C, Wang L. A Global Clustering Approach Using Hybrid Optimization for Incomplete Data Based on Interval Reconstruction of Missing Value. INT J INTELL SYST 2015. [DOI: 10.1002/int.21752] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Liyong Zhang
- School of Control Science and Engineering; Dalian University of Technology; Dalian People's Republic of China
| | - Wei Lu
- School of Control Science and Engineering; Dalian University of Technology; Dalian People's Republic of China
| | - Xiaodong Liu
- School of Control Science and Engineering; Dalian University of Technology; Dalian People's Republic of China
| | - Witold Pedrycz
- Department of Electrical and Computer Engineering; University of Alberta; Edmonton Canada
- Department of Electrical and Computer Engineering; Faculty of Engineering, King Abdulaziz University; Jeddah Saudi Arabia
- Systems Research Institute; Polish Academy of Sciences; Warsaw Poland
| | - Chongquan Zhong
- School of Control Science and Engineering; Dalian University of Technology; Dalian People's Republic of China
| | - Lu Wang
- School of Information; Liaoning University; Shenyang People's Republic of China
| |
Collapse
|