101
|
Zhao H, Mi J, Liang M. A multi-granularity information fusion method based on logistic regression model and Dempster-Shafer evidence theory and its application. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01584-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
102
|
Abstract
In recent years, machine learning, especially deep learning, has developed rapidly and has shown remarkable performance in many tasks of the smart grid field. The representation ability of machine learning algorithms is greatly improved, but with the increase of model complexity, the interpretability of machine learning algorithms is worse. The smart grid is a critical infrastructure area, so machine learning models involving it must be interpretable in order to increase user trust and improve system reliability. Unfortunately, the black-box nature of most machine learning models remains unresolved, and many decisions of intelligent systems still lack explanation. In this paper, we elaborate on the definition, motivations, properties, and classification of interpretability. In addition, we review the relevant literature addressing interpretability for smart grid applications. Finally, we discuss the future research directions of interpretable machine learning in the smart grid.
Collapse
|
103
|
Ren C, Sun L, Gao Y, Yu Y. Density peaks clustering based on local fair density and fuzzy k-nearest neighbors membership allocation strategy. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-202449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The density peaks clustering algorithm (DPC) has been widely concerned since it was proposed in 2014. There is no need to specify in advance and only one parameter required. However, some disadvantages are still witnessed in DPC: (1) Requiring repeated experiments for choosing a suitable calculation method of the local density due to the variations in the scale of the dataset, which will lead to additional time cost. (2) Difficulty in finding an optimal cutoff distance threshold, since different parameters not only impact the selection of cluster centers but also directly affect the quality of clusters. (3) Poor fault tolerance of the allocation strategy, especially in manifold datasets or datasets with uneven density distribution. Targetting solutions to these problems, a density peaks clustering based on local fair density and fuzzy k-nearest neighbors membership allocation strategy (LF-DPC) is proposed in this paper. First, to obtain a more balanced local density, two classic local density calculation methods are combined in the algorithm to calculate the local fair density through the optimization function with the smallest local density difference. Second, a robust two stage remaining points allocation strategy is designed. In the first stage, k-nearest neighbors are used to quickly and accurately allocate points from the cluster center. In the second stage, to further improve the accuracy of allocation, a fuzzy k-nearest neighbors membership method is designed to allocate the remaining points. Finally, the LF-DPC algorithm has been experimented based on several synthetic and real-world datasets. The results prove that the proposed algorithm has obvious advantages compared with the other five ones.
Collapse
Affiliation(s)
- Chunhua Ren
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
- Department of Artificial Intelligence and Big Data, Yibin University, Yibin, China
- Sichuan Provincial Key Laboratory of Manufacturing Industry Chains Collaboration and Information Support Technology, Southwest Jiaotong University, Chengdu, China
| | - Linfu Sun
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
- Sichuan Provincial Key Laboratory of Manufacturing Industry Chains Collaboration and Information Support Technology, Southwest Jiaotong University, Chengdu, China
| | - Yunhui Gao
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
- Sichuan Provincial Key Laboratory of Manufacturing Industry Chains Collaboration and Information Support Technology, Southwest Jiaotong University, Chengdu, China
| | - Yang Yu
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
- Sichuan Provincial Key Laboratory of Manufacturing Industry Chains Collaboration and Information Support Technology, Southwest Jiaotong University, Chengdu, China
| |
Collapse
|
104
|
Li T, Rezaeipanah A, Tag El Din EM. An ensemble agglomerative hierarchical clustering algorithm based on clusters clustering technique and the novel similarity measurement. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2022. [DOI: 10.1016/j.jksuci.2022.04.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
105
|
Chakraborty S, Das S. Detecting Meaningful Clusters From High-Dimensional Data: A Strongly Consistent Sparse Center-Based Clustering Approach. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:2894-2908. [PMID: 33360985 DOI: 10.1109/tpami.2020.3047489] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In context to high-dimensional clustering, the concept of feature weighting has gained considerable importance over the years to capture the relative degrees of importance of different features in revealing the cluster structure of the dataset. However, the popular techniques in this area either fail to perform feature selection or do not preserve the simplicity of Lloyd's heuristic to solve the k-means problem and the like. In this paper, we propose a Lasso Weighted k-means ( LW- k-means) algorithm, as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ( p) can be much higher than the number of observations ( n). The LW- k-means method imposes an l1 regularization term involving the feature weights directly to induce feature selection in a sparse clustering framework. We develop a simple block-coordinate descent type algorithm with time-complexity resembling that of Lloyd's method, to optimize the proposed objective. In addition, we establish the strong consistency of the LW- k-means procedure. Such an analysis of the large sample properties is not available for the conventional sparse k-means algorithms, in general. LW- k-means is tested on a number of synthetic and real-life datasets and through a detailed experimental analysis, we find that the performance of the method is highly competitive against the baselines as well as the state-of-the-art procedures for center-based high-dimensional clustering, not only in terms of clustering accuracy but also with respect to computational time.
Collapse
|
106
|
Yang L, Fan W, Bouguila N. Robust unsupervised image categorization based on variational autoencoder with disentangled latent representations. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
107
|
Zheng Y, Xie Y, Lee I, Dehghanian A, Serban N. Parallel Subgradient Algorithm with Block Dual Decomposition for Large-scale Optimization. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH 2022; 299:60-74. [PMID: 35035056 PMCID: PMC8754397 DOI: 10.1016/j.ejor.2021.11.054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
This paper studies computational approaches for solving large-scale optimization problems using a Lagrangian dual reformulation, solved by parallel sub-gradient methods. Since there are many possible reformulations for a given problem, an important question is: Which reformulation leads to the fastest solution time? One approach is to detect a block diagonal structure in the constraint matrix, and reformulate the problem by dualizing the constraints outside of the blocks; the approach is defined herein as block dual decomposition. Main advantage of such a reformulation is that the Lagrangian relaxation has a block diagonal constraint matrix, thus decomposable into smaller sub-problems that can solved in parallel. We show that the block decomposition can critically affect convergence rate of the sub-gradient method. We propose various decomposition methods that use domain knowledge or apply algorithms using knowledge about the structure in the constraint matrix or the dependence in the decision variables, towards reducing the computational effort to solve large-scale optimization problems. In particular, we introduce a block decomposition approach that reduces the number of dualized constraints by utilizing a community detection algorithm. We present empirical experiments on an extensive set of problem instances including a real application. We illustrate that if the number of the dualized constraints in the decomposition increases, the computational effort within each iteration of the sub-gradient method decreases while the number of iterations required for convergence increases. The key message is that it is crucial to employ prior knowledge about the structure of the problem when solving large scale optimization problems using dual decomposition.
Collapse
Affiliation(s)
- Yuchen Zheng
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, 755 Ferst Dr. NW Atlanta, GA 30332
| | - Yujia Xie
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, 755 Ferst Dr. NW Atlanta, GA 30332
| | - Ilbin Lee
- Alberta School of Business, University of Alberta, 2-29B Business Building, Edmonton, Alberta T6G 2R6, Canada
| | - Amin Dehghanian
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, 755 Ferst Dr. NW Atlanta, GA 30332
| | - Nicoleta Serban
- H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, 755 Ferst Dr. NW Atlanta, GA 30332
| |
Collapse
|
108
|
Uddin J, Ghazali R, H. Abawajy J, Shah H, Husaini NA, Zeb A. Rough set based information theoretic approach for clustering uncertain categorical data. PLoS One 2022; 17:e0265190. [PMID: 35559954 PMCID: PMC9106167 DOI: 10.1371/journal.pone.0265190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 02/27/2022] [Indexed: 12/02/2022] Open
Abstract
Motivation Many real applications such as businesses and health generate large categorical datasets with uncertainty. A fundamental task is to efficiently discover hidden and non-trivial patterns from such large uncertain categorical datasets. Since the exact value of an attribute is often unknown in uncertain categorical datasets, conventional clustering analysis algorithms do not provide a suitable means for dealing with categorical data, uncertainty, and stability. Problem statement The ability of decision making in the presence of vagueness and uncertainty in data can be handled using Rough Set Theory. Though, recent categorical clustering techniques based on Rough Set Theory help but they suffer from low accuracy, high computational complexity, and generalizability especially on data sets where they sometimes fail or hardly select their best clustering attribute. Objectives The main objective of this research is to propose a new information theoretic based Rough Purity Approach (RPA). Another objective of this work is to handle the problems of traditional Rough Set Theory based categorical clustering techniques. Hence, the ultimate goal is to cluster uncertain categorical datasets efficiently in terms of the performance, generalizability and computational complexity. Methods The RPA takes into consideration information-theoretic attribute purity of the categorical-valued information systems. Several extensive experiments are conducted to evaluate the efficiency of RPA using a real Supplier Base Management (SBM) and six benchmark UCI datasets. The proposed RPA is also compared with several recent categorical data clustering techniques. Results The experimental results show that RPA outperforms the baseline algorithms. The significant percentage improvement with respect to time (66.70%), iterations (83.13%), purity (10.53%), entropy (14%), and accuracy (12.15%) as well as Rough Accuracy of clusters show that RPA is suitable for practical usage. Conclusion We conclude that as compared to other techniques, the attribute purity of categorical-valued information systems can better cluster the data. Hence, RPA technique can be recommended for large scale clustering in multiple domains and its performance can be enhanced for further research.
Collapse
Affiliation(s)
- Jamal Uddin
- Qurtuba University of Science & IT, Peshawar, Pakistan
- * E-mail:
| | - Rozaida Ghazali
- Universiti Tun Hussien Onn Malaysia, Batu Pahat, Johor, Malaysia
| | | | | | | | - Asim Zeb
- Abbottabad University of Science & Technology, Abbottabad, Pakistan
| |
Collapse
|
109
|
Malandrino D, De Prisco R, Ianulardo M, Zaccagnino R. An adaptive meta-heuristic for music plagiarism detection based on text similarity and clustering. Data Min Knowl Discov 2022. [DOI: 10.1007/s10618-022-00835-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
AbstractPlagiarism is a controversial and debated topic in different fields, especially in the Music one, where the commercial market generates a huge amount of money. The lack of objective metrics to decide whether a song is a plagiarism, makes music plagiarism detection a very complex task: often decisions have to be based on subjective argumentations. Automated music analysis methods that identify music similarities can be of help. In this work, we first propose two novel such methods: a text similarity-based method and a clustering-based method. Then, we show how to combine them to get an improved (hybrid) method. The result is a novel adaptive meta-heuristic for music plagiarism detection. To assess the effectiveness of the proposed methods, considered both singularly and in the combined meta-heuristic, we performed tests on a large dataset of ascertained plagiarism and non-plagiarism cases. Results show that the meta-heuristic outperforms existing methods. Finally, we deployed the meta-heuristic into a tool, accessible as a Web application, and assessed the effectiveness, usefulness, and overall user acceptance of the tool by means of a study involving 20 people, divided into two groups, one of which with access to the tool. The study consisted in having people decide which pair of songs, in a predefined set of pairs, should be considered plagiarisms and which not. The study shows that the group supported by our tool successfully identified all plagiarism cases, performing all tasks with no errors. The whole sample agreed about the usefulness of an automatic tool that provides a measure of similarity between two songs.
Collapse
|
110
|
FCM Clustering Approach Optimization Using Parallel High-Speed Intel FPGA Technology. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2022. [DOI: 10.1155/2022/8260283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Fuzzy C-Means (FCM) is a widely used clustering algorithm that performs well in various scientific applications. Implementing FCM involves a massive number of computations, and many parallelization techniques based on GPUs and multicore systems have been suggested. In this study, we present a method for optimizing the FCM algorithm for high-speed field-programmable gate technology (FPGA) using a high-level C-like programming language called open computing language (OpenCL). The method was designed to enable the high-level compiler/synthesis tool to manipulate a task-parallelism model and create an efficient design. Our experimental results (based on several datasets) show that the proposed method makes the FCM execution time more than 186 times faster than the conventional design running on a single-core CPU platform. Also, its processing power reached 89 giga floating points operations per second (GFLOPs).
Collapse
|
111
|
Tian N, Liu Y, Sun Z, Liu X. Clustering- and Transformer-Based Networks for the Style Analysis of Logo Images. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2090712. [PMID: 35586108 PMCID: PMC9110149 DOI: 10.1155/2022/2090712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 04/03/2022] [Accepted: 04/19/2022] [Indexed: 11/26/2022]
Abstract
In the design field, designers need to investigate and collect logo materials before designing logos and search a large number of design materials on well-known logo websites to find logos with similar styles as reference images. However, manual work is time-consuming and labor-intensive. To solve this problem, we propose a clustering method that uses K-Means clustering and visual transformer model to group the styles of the logo database. Specifically, we use the visual transformer model as a feature extractor to convert logo images into feature vectors and perform K-Means clustering, use the clustering results as pseudo-labels to further train the feature extractor, and continue to iterate the above process to finally obtain reliable clustering results. We validate our approach by creating the logo image dataset JN Logo, a proposed database for image quality and style attributes, containing 14922 logo design images. Our proposed deep transformer-based cluster (DTCluster) automatic style grouping method is used in JN Logo; the DBI reaches 0.904, and the DI reaches 0.189, which are better than those of other K-Means clustering methods and other clustering algorithms. We perform a subjective analysis of five features of the clustering results to obtain a semantic description of the clusters. Finally, we provide six styles and five semantic descriptions for the logo database.
Collapse
Affiliation(s)
- Nannan Tian
- School of Design, Jiangnan University, Wuxi 214122, China
| | - Yuan Liu
- School of Design, Jiangnan University, Wuxi 214122, China
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
| | - Ziruo Sun
- School of Software Engineering, Shandong University, Jinan 250101, China
| | - Xingbo Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China
| |
Collapse
|
112
|
Zhang ZL, Chen C, Qu SY, Ding Q, Xu Q. Unexpected Dynamic Binding May Rescue the Binding Affinity of Rivaroxaban in a Mutant of Coagulation Factor X. Front Mol Biosci 2022; 9:877170. [PMID: 35601826 PMCID: PMC9117642 DOI: 10.3389/fmolb.2022.877170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 04/06/2022] [Indexed: 11/13/2022] Open
Abstract
A novel coagulation factor X (FX) Tyr319Cys mutation (Y99C as chymotrypsin numbering) was identified in a patient with severe bleeding. Unlike the earlier reported Y99A mutant, this mutant can bind and cleave its specific chromogenetic substrate at a normal level, suggesting an intact binding pocket. Here, using molecular dynamics simulations and MM-PBSA calculations on a FX-rivaroxaban (RIV) complex, we confirmed a much stronger binding of RIV in Y99C than in Y99A on a molecular level, which is actually the average result of multiple binding poses in dynamics. Detailed structural analyses also indicated the moderate flexibility of the 99-loop and the importance of the flexible side chain of Trp215 in the different binding poses. This case again emphasizes that binding of ligands may not only be a dynamic process but also a dynamic state, which is often neglected in drug design and screening based on static X-ray structures. In addition, the computational results somewhat confirmed our hypothesis on the activated Tyr319Cys FX (Y99C FXa) with an impaired procoagulant function to bind inhibitors of FXa and to be developed into a potential reversal agent for novel oral anticoagulants (NOAC).
Collapse
Affiliation(s)
- Zhi-Li Zhang
- State Key Laboratory of Microbial Metabolism & Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Changming Chen
- Department of Laboratory Medicine, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Si-Ying Qu
- State Key Laboratory of Microbial Metabolism & Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qiulan Ding
- Department of Laboratory Medicine, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- Collaborative Innovation Center of Hematology, Shanghai Jiao Tong University School of Medicine, Shanghai, China
- *Correspondence: Qiulan Ding, ; Qin Xu,
| | - Qin Xu
- State Key Laboratory of Microbial Metabolism & Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
- *Correspondence: Qiulan Ding, ; Qin Xu,
| |
Collapse
|
113
|
Fang K, Chen Y, Ma S, Zhang Q. Biclustering analysis of functionals via penalized fusion. J MULTIVARIATE ANAL 2022; 189:104874. [PMID: 36817965 PMCID: PMC9937451 DOI: 10.1016/j.jmva.2021.104874] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In biomedical data analysis, clustering is commonly conducted. Biclustering analysis conducts clustering in both the sample and covariate dimensions and can more comprehensively describe data heterogeneity. In most of the existing biclustering analyses, scalar measurements are considered. In this study, motivated by time-course gene expression data and other examples, we take the "natural next step" and consider the biclustering analysis of functionals under which, for each covariate of each sample, a function (to be exact, its values at discrete measurement points) is present. We develop a doubly penalized fusion approach, which includes a smoothness penalty for estimating functionals and, more importantly, a fusion penalty for clustering. Statistical properties are rigorously established, providing the proposed approach a strong ground. We also develop an effective ADMM algorithm and accompanying R code. Numerical analysis, including simulations, comparisons, and the analysis of two time-course gene expression data, demonstrates the practical effectiveness of the proposed approach.
Collapse
Affiliation(s)
- Kuangnan Fang
- Department of Statistics and Data Science, School of Economics, Xiamen University, China
| | - Yuanxing Chen
- Department of Statistics and Data Science, School of Economics, Xiamen University, China
| | - Shuangge Ma
- Department of Biostatistics, Yale University, United States of America
| | - Qingzhao Zhang
- MOE Key Laboratory of Econometrics, Department of Statistics and Data Science, School of Economics, Wang Yanan Institute for Studies in Economics, and Fujian Key Lab of Statistics, Xiamen University, China,Corresponding author. (Q. Zhang)
| |
Collapse
|
114
|
Khan A, Saha G, Pal RK. Controlling the Effects of External Perturbations on a Gene Regulatory Network Using Proportional-Integral-Derivative Controller. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1531-1544. [PMID: 33206608 DOI: 10.1109/tcbb.2020.3039038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Gene regulatory networks are biologically robust, which imparts resilience to living systems against most external perturbations affecting them. However, there is a limit to this and disturbances beyond this limit can impart unwanted signalling on one or more master regulators in a network. Certain disturbances may affect the functioning of other constituent genes of the same network. In most cases, this phenomenon can have some effect on the functioning of the living organism. In this investigation, we have proposed a methodology to mitigate the effects of external perturbations on a genetic network using a proportional-integral-derivative controller. The proposed controller has been used to perturb one or more of the other unaffected master regulators such that the most affected gene/s of the network revert to their normal state. The only required condition of such type of manoeuvring is that there should be multiple master regulators in a network. The proposed technique has been experimented on a 10-gene DREAM4 benchmark network and also on a larger 20-gene network, where only downregulation has been considered due to data constraints. Simulation results indicate that the most vulnerable genes can be reverted to their normal expression levels in 10 out of the 16 simulations performed.
Collapse
|
115
|
Raj A, Bhattacharyya P, Gupta GR. Clusters of COVID-19 Indicators in India: Characterization, Correspondence and Change Analysis. SN COMPUTER SCIENCE 2022; 3:210. [PMID: 35400015 PMCID: PMC8981186 DOI: 10.1007/s42979-022-01083-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Accepted: 03/04/2022] [Indexed: 11/26/2022]
Abstract
We conduct a long-term epidemiology study of COVID-19 in India from Mar 2020 to May 2021 using a number of indicators such as active cases, daily new cases, and deaths, on a micro (district level, per capita) and macro level (state level). Our automated shape-based cluster discovery of the per capita daily new cases (case rate) during the first wave in India (between Mar 2020 and Jan 2021) revealed four distinct shape patterns: sharp-rise and decline, steady-rise and decline, plateau and multiple relatively high peaks. These clusters exhibit a strong geographical correlation. To determine the correspondence between clusters obtained by different indicators, we design a novel metric for determining edge-weights in their intersection graph. This is used for comparative analysis and to develop informative hierarchical cartographic visualizations. We then perform dynamic cluster analysis for different time windows to answer some pertinent questions. Is the second wave similar to or different from the first wave? How has the relative ranking (on micro- and macro-level indicators) of the states varied over the last one year? How much medical resources have been stressed during the peak? We demonstrate that using multiple indicators, we can assess the impact of the epidemic holistically in a particular geography. Our analysis techniques and insights obtained can help the local and state governments in monitoring and managing COVID-19 situation and fine-tuning the ongoing vaccination drive in India.
Collapse
|
116
|
Ditton E, Swinbourne A, Myers T. Selecting a clustering algorithm: A semi-automated hyperparameter tuning framework for effective persona development. ARRAY 2022. [DOI: 10.1016/j.array.2022.100186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
117
|
Liu Q, Jin Y, Heiderich M, Rodemann T, Yu G. An Adaptive Reference Vector-Guided Evolutionary Algorithm Using Growing Neural Gas for Many-Objective Optimization of Irregular Problems. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:2698-2711. [PMID: 33001813 DOI: 10.1109/tcyb.2020.3020630] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Most reference vector-based decomposition algorithms for solving multiobjective optimization problems may not be well suited for solving problems with irregular Pareto fronts (PFs) because the distribution of predefined reference vectors may not match well with the distribution of the Pareto-optimal solutions. Thus, the adaptation of the reference vectors is an intuitive way for decomposition-based algorithms to deal with irregular PFs. However, most existing methods frequently change the reference vectors based on the activeness of the reference vectors within specific generations, slowing down the convergence of the search process. To address this issue, we propose a new method to learn the distribution of the reference vectors using the growing neural gas (GNG) network to achieve automatic yet stable adaptation. To this end, an improved GNG is designed for learning the topology of the PFs with the solutions generated during a period of the search process as the training data. We use the individuals in the current population as well as those in previous generations to train the GNG to strike a balance between exploration and exploitation. Comparative studies conducted on popular benchmark problems and a real-world hybrid vehicle controller design problem with complex and irregular PFs show that the proposed method is very competitive.
Collapse
|
118
|
Shraga R, Gal A, Schumacher D, Senderovich A, Weidlich M. Process discovery with context-aware process trees. INFORM SYST 2022. [DOI: 10.1016/j.is.2020.101533] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
119
|
Ariza-Colpas PP, Vicario E, Oviedo-Carrascal AI, Butt Aziz S, Piñeres-Melo MA, Quintero-Linero A, Patara F. Human Activity Recognition Data Analysis: History, Evolutions, and New Trends. SENSORS 2022; 22:s22093401. [PMID: 35591091 PMCID: PMC9103712 DOI: 10.3390/s22093401] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2021] [Revised: 03/31/2022] [Accepted: 04/04/2022] [Indexed: 01/23/2023]
Abstract
The Assisted Living Environments Research Area–AAL (Ambient Assisted Living), focuses on generating innovative technology, products, and services to assist, medical care and rehabilitation to older adults, to increase the time in which these people can live. independently, whether they suffer from neurodegenerative diseases or some disability. This important area is responsible for the development of activity recognition systems—ARS (Activity Recognition Systems), which is a valuable tool when it comes to identifying the type of activity carried out by older adults, to provide them with assistance. that allows you to carry out your daily activities with complete normality. This article aims to show the review of the literature and the evolution of the different techniques for processing this type of data from supervised, unsupervised, ensembled learning, deep learning, reinforcement learning, transfer learning, and metaheuristics approach applied to this sector of science. health, showing the metrics of recent experiments for researchers in this area of knowledge. As a result of this article, it can be identified that models based on reinforcement or transfer learning constitute a good line of work for the processing and analysis of human recognition activities.
Collapse
Affiliation(s)
- Paola Patricia Ariza-Colpas
- Department of Computer Science and Electronics, Universidad de la Costa CUC, Barranquilla 080002, Colombia
- Faculty of Engineering in Information and Communication Technologies, Universidad Pontificia Bolivariana, Medellín 050031, Colombia;
- Correspondence:
| | - Enrico Vicario
- Department of Information Engineering, University of Florence, 50139 Firenze, Italy; (E.V.); (F.P.)
| | - Ana Isabel Oviedo-Carrascal
- Faculty of Engineering in Information and Communication Technologies, Universidad Pontificia Bolivariana, Medellín 050031, Colombia;
| | - Shariq Butt Aziz
- Department of Computer Science and IT, University of Lahore, Lahore 44000, Pakistan;
| | | | | | - Fulvio Patara
- Department of Information Engineering, University of Florence, 50139 Firenze, Italy; (E.V.); (F.P.)
| |
Collapse
|
120
|
On Information Granulation via Data Clustering for Granular Computing-Based Pattern Recognition: A Graph Embedding Case Study. ALGORITHMS 2022. [DOI: 10.3390/a15050148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Granular Computing is a powerful information processing paradigm, particularly useful for the synthesis of pattern recognition systems in structured domains (e.g., graphs or sequences). According to this paradigm, granules of information play the pivotal role of describing the underlying (possibly complex) process, starting from the available data. Under a pattern recognition viewpoint, granules of information can be exploited for the synthesis of semantically sound embedding spaces, where common supervised or unsupervised problems can be solved via standard machine learning algorithms. In this work, we show a comparison between different strategies for the automatic synthesis of information granules in the context of graph classification. These strategies mainly differ on the specific topology adopted for subgraphs considered as candidate information granules and the possibility of using or neglecting the ground-truth class labels in the granulation process. Computational results on 10 different open-access datasets show that by using a class-aware granulation, performances tend to improve (regardless of the information granules topology), counterbalanced by a possibly higher number of information granules.
Collapse
|
121
|
Suleman Basha M, Mouleeswaran SK, Rajendra Prasad K. Hybrid visual computing models to discover the clusters assessment of high dimensional big data. Soft comput 2022. [DOI: 10.1007/s00500-022-07092-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
122
|
Yuan S, Zhao H, Liu J, Song B. Self-organizing map based differential evolution with dynamic selection strategy for multimodal optimization problems. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:5968-5997. [PMID: 35603387 DOI: 10.3934/mbe.2022279] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Many real-world problems can be classified as multimodal optimization problems (MMOPs), which require to locate global optima as more as possible and refine the accuracy of found optima as high as possible. When dealing with MMOPs, how to divide population and obtain effective niches is a key to balance population diversity and convergence during evolution. In this paper, a self-organizing map (SOM) based differential evolution with dynamic selection strategy (SOMDE-DS) is proposed to improve the performance of differential evolution (DE) in solving MMOPs. Firstly, a SOM based method is introduced as a niching technique to divide population reasonably by using the similarity information among different individuals. Secondly, a variable neighborhood search (VNS) strategy is proposed to locate more possible optimal regions by expanding the search space. Thirdly, a dynamic selection (DS) strategy is designed to balance exploration and exploitation of the population by taking advantages of both local search strategy and global search strategy. The proposed SOMDE-DS is compared with several widely used multimodal optimization algorithms on benchmark CEC'2013. The experimental results show that SOMDE-DS is superior or competitive with the compared algorithms.
Collapse
Affiliation(s)
- Shihao Yuan
- Xidian University, Guangzhou Institute of Technology, Guangzhou 510555, China
| | - Hong Zhao
- Xidian University, Guangzhou Institute of Technology, Guangzhou 510555, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Jing Liu
- Xidian University, Guangzhou Institute of Technology, Guangzhou 510555, China
| | - Binjie Song
- South China University of Technology, the School of Computer Science and Engineering, Guangzhou 510006, China
| |
Collapse
|
123
|
Jiménez P, Roldán JC, Corchuelo R. On exploring data lakes by finding compact, isolated clusters. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.12.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
124
|
Dynamic image clustering from projected coordinates of deep similarity learning. Neural Netw 2022; 152:1-16. [DOI: 10.1016/j.neunet.2022.03.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 02/18/2022] [Accepted: 03/24/2022] [Indexed: 11/23/2022]
|
125
|
Trabelsi I, Hérault R, Baillet H, Thouvarecq R, Seifert L, Gasso G. Identifying patterns in trunk/head/elbow changes of riders and non-riders: A cluster analysis approach. Comput Biol Med 2022; 143:105193. [PMID: 35123140 DOI: 10.1016/j.compbiomed.2021.105193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 12/28/2021] [Accepted: 12/28/2021] [Indexed: 11/03/2022]
Abstract
Correct rider oscillation and position are the basics for a good horseback riding performance. In this paper, we propose a framework for the automatic analysis of athletes behaviour based on cluster analysis. Two groups of athletes (riders vs non-riders) were assigned to a horseback riding simulator exercise. The participants exercised four different incremental horse oscillation frequencies. This paper studies the postural coordination, by computing the different discrete relative phases of head-horse, elbow-horse and trunk-horse oscillations. Two clustering algorithms are then applied to automatically identify the change of rider and non-rider behaviour in terms of postural coordination. The results showed that the postural coordination was influenced by the level of rider expertise. More diverse behaviour was observed for non-riders. At the opposite, riders produced lower postural displacements and deployed more efficient postural control. The postural coordination for both groups was also influenced by the oscillation frequencies.
Collapse
Affiliation(s)
- Imen Trabelsi
- Normandie Univ., UNIROUEN, UNIHAVRE, INSA Rouen, LITIS, France.
| | - Romain Hérault
- Normandie Univ., UNIROUEN, UNIHAVRE, INSA Rouen, LITIS, France
| | - Héloise Baillet
- CETAPS Laboratory, Faculty of Sports Sciences, University of Rouen, Normandie Universite, France
| | - Régis Thouvarecq
- CETAPS Laboratory, Faculty of Sports Sciences, University of Rouen, Normandie Universite, France
| | - Ludovic Seifert
- CETAPS Laboratory, Faculty of Sports Sciences, University of Rouen, Normandie Universite, France
| | - Gilles Gasso
- Normandie Univ., UNIROUEN, UNIHAVRE, INSA Rouen, LITIS, France
| |
Collapse
|
126
|
On efficient model selection for sparse hard and fuzzy center-based clustering algorithms. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.12.070] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
127
|
Martín-del-Campo-Rodríguez C, Sidorov G, Batyrshin I. Unsupervised authorship attribution using feature selection and weighted cosine similarity. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-219226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
This paper presents a computational model for the unsupervised authorship attribution task based on a traditional machine learning scheme. An improvement over the state of the art is achieved by comparing different feature selection methods on the PAN17 author clustering dataset. To achieve this improvement, specific pre-processing and features extraction methods were proposed, such as a method to separate tokens by type to assign them to only one category. Similarly, special characters are used as part of the punctuation marks to improve the result obtained when applying typed character n-grams. The Weighted cosine similarity measure is applied to improve the B3 F-score by reducing the vector values where attributes are exclusive. This measure is used to define distances between documents, which later are occupied by the clustering algorithm to perform authorship attribution.
Collapse
Affiliation(s)
- Carolina Martín-del-Campo-Rodríguez
- Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz, s/n, Col. Nueva Industrial Vallejo, Mexico City, Mexico
| | - Grigori Sidorov
- Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz, s/n, Col. Nueva Industrial Vallejo, Mexico City, Mexico
| | - Ildar Batyrshin
- Instituto Politécnico Nacional, Centro de Investigación en Computación, Av. Juan de Dios Bátiz, s/n, Col. Nueva Industrial Vallejo, Mexico City, Mexico
| |
Collapse
|
128
|
An analysis framework for clustering algorithm selection with applications to spectroscopy. PLoS One 2022; 17:e0266369. [PMID: 35358292 PMCID: PMC8970496 DOI: 10.1371/journal.pone.0266369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 03/19/2022] [Indexed: 11/19/2022] Open
Abstract
Cluster analysis is a valuable unsupervised machine learning technique that is applied in a multitude of domains to identify similarities or clusters in unlabelled data. However, its performance is dependent of the characteristics of the data it is being applied to. There is no universally best clustering algorithm, and hence, there are numerous clustering algorithms available with different performance characteristics. This raises the problem of how to select an appropriate clustering algorithm for the given analytical purposes. We present and validate an analysis framework to address this problem. Unlike most current literature which focuses on characterizing the clustering algorithm itself, we present a wider holistic approach, with a focus on the user’s needs, the data’s characteristics and the characteristics of the clusters it may contain. In our analysis framework, we utilize a softer qualitative approach to identify appropriate characteristics for consideration when matching clustering algorithms to the intended application. These are used to generate a small subset of suitable clustering algorithms whose performance are then evaluated utilizing quantitative cluster validity indices. To validate our analysis framework for selecting clustering algorithms, we applied it to four different types of datasets: three datasets of homemade explosives spectroscopy, eight datasets of publicly available spectroscopy data covering food and biomedical applications, a gene expression cancer dataset, and three classic machine learning datasets. Each data type has discernible differences in the composition of the data and the context within which they are used. Our analysis framework, when applied to each of these challenges, recommended differing subsets of clustering algorithms for final quantitative performance evaluation. For each application, the recommended clustering algorithms were confirmed to contain the top performing algorithms through quantitative performance indices.
Collapse
|
129
|
Singh V, Verma NK. Gene Expression Data Analysis Using Feature Weighted Robust Fuzzy c-Means Clustering. IEEE Trans Nanobioscience 2022; PP:99-105. [PMID: 35259111 DOI: 10.1109/tnb.2022.3157396] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Clustering of gene expression data has been proven to be very useful in various applications, i.e., identifying the natural structure inherent in gene expression, understanding gene functions, mining relevant information from noisy data, and understanding gene regulation. In all these applications, genes, i.e., features, play a crucial role in characterizing them into different groups. These features may be relevant, irrelevant, or redundant, but they have different contributions during the clustering process. This paper presents a novel approach by considering the effect of features during the clustering process. In the proposed method, the fuzzy c-means the objective function is modified using a weighted Euclidean distance between the features with a monotonically decreasing function. The monotonically decreasing function helps control the features' contribution during the clustering process to partition the data into more relevant clusters. The proposed approach is validated, and performance is presented in various clustering performance measures on the different standard datasets. These clustering performance measures have also been compared with multiple state-of-the-art methods.
Collapse
|
130
|
Hussain SF, Butt IA, Hanif M, Anwar S. Clustering uncertain graphs using ant colony optimization (ACO). Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07063-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
131
|
Martín-Santamaría R, Sánchez-Oro J, Pérez-Peló S, Duarte A. Strategic oscillation for the balanced minimum sum-of-squares clustering problem. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.11.048] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
132
|
A new approach to adaptive threshold based method for QRS detection with fuzzy clustering. Biocybern Biomed Eng 2022. [DOI: 10.1016/j.bbe.2022.02.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
133
|
Tarnutzer A, Weber K. Pattern analysis of peripheral-vestibular deficits with machine learning using hierarchical clustering. J Neurol Sci 2022; 434:120159. [DOI: 10.1016/j.jns.2022.120159] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 12/13/2021] [Accepted: 01/13/2022] [Indexed: 11/27/2022]
|
134
|
Gao T, Chen D, Tang Y, Du B, Ranjan R, Zomaya AY, Dustdar S. Adaptive density peaks clustering: Towards exploratory EEG analysis. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
135
|
Sengupta S, Das S. Selective Nearest Neighbors Clustering. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2021.10.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
136
|
Du Y, Sun F. HiCBin: binning metagenomic contigs and recovering metagenome-assembled genomes using Hi-C contact maps. Genome Biol 2022; 23:63. [PMID: 35227283 PMCID: PMC8883645 DOI: 10.1186/s13059-022-02626-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 02/06/2022] [Indexed: 01/20/2023] Open
Abstract
Recovering high-quality metagenome-assembled genomes (MAGs) from complex microbial ecosystems remains challenging. Recently, high-throughput chromosome conformation capture (Hi-C) has been applied to simultaneously study multiple genomes in natural microbial communities. We develop HiCBin, a novel open-source pipeline, to resolve high-quality MAGs utilizing Hi-C contact maps. HiCBin employs the HiCzin normalization method and the Leiden clustering algorithm and includes the spurious contact detection into binning pipelines for the first time. HiCBin is validated on one synthetic and two real metagenomic samples and is shown to outperform the existing Hi-C-based binning methods. HiCBin is available at https://github.com/dyxstat/HiCBin .
Collapse
Affiliation(s)
- Yuxuan Du
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, USA
| |
Collapse
|
137
|
Deterioration Mapping of RC Bridge Elements Based on Automated Analysis of GPR Images. REMOTE SENSING 2022. [DOI: 10.3390/rs14051131] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Ground-Penetrating Radar (GPR) is a popular non-destructive technique for evaluating RC bridge elements as it can identify major subsurface defects within a short span of time. The data interpretation of the GPR profiles based on existing amplitude-based approaches is not completely reliable when compared to the actual condition of concrete with destructive measures. An alternative image-based analysis considers GPR as an imaging tool wherein an experienced analyst marks attenuated areas and generates deterioration maps with greater accuracy. However, this approach is prone to human errors and is highly subjective. The proposed model aims to improve it through automated detection of hyperbolas in GPR profiles and classification based on mathematical modeling. Firstly, GPR profiles are pre-processed, and hyperbolic reflections were detected in them based on a trained classifier using the Viola–Jones Algorithm. The false positives are eliminated, and missing regions are identified automatically across the top/bottom layer of reinforcement based on user-interactive regional comparison and statistical analysis. Subsequently, entropy, a textural factor, is evaluated to differentiate the detected regions closely equivalent to the human visual system. These detected regions are finally clustered based on entropy values using the K-means algorithm and a deterioration map is generated which is robust, reliable, and corresponds to the in situ state of concrete. A case study of a parking lot demonstrated good correspondence of deterioration maps generated by the developed model when compared with both amplitude- and image-based analysis. These maps can facilitate structural inspectors to locally identify deteriorated zones within structural elements that require immediate attention for repair and rehabilitation.
Collapse
|
138
|
Abstract
We introduce a new approach to clustering categorical data: Condorcet clustering with a fixed number of groups, denoted α-Condorcet. As k-modes, this approach is essentially based on similarity and dissimilarity measures. The paper is divided into three parts: first, we propose a new Condorcet criterion, with a fixed number of groups (to select cases into clusters). In the second part, we propose a heuristic algorithm to carry out the task. In the third part, we compare α-Condorcet clustering with k-modes clustering. The comparison is made with a quality’s index, accuracy of a measurement, and a within-cluster sum-of-squares index. Our findings are illustrated using real datasets: the feline dataset and the US Census 1990 dataset.
Collapse
|
139
|
Bootstrap–CURE: A Novel Clustering Approach for Sensor Data—An Application to 3D Printing Industry. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12042191] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The agenda of Industry 4.0 highlights smart manufacturing by making machines smart enough to make data-driven decisions. Large-scale 3D printers, being one of the important pillars in Industry 4.0, are equipped with smart sensors to continuously monitor print processes and make automated decisions. One of the biggest challenges in decision autonomy is to consume data quickly along the process and extract knowledge from the printer, suitable for improving the printing process. This paper presents the innovative unsupervised learning approach, bootstrap–CURE, to decode the sensor patterns and operation modes of 3D printers by analyzing multivariate sensor data. An automatic technique to detect the suitable number of clusters using the dendrogram is developed. The proposed methodology is scalable and significantly reduces computational cost as compared to classical CURE. A distinct combination of the 3D printer’s sensors is found, and its impact on the printing process is also discussed. A real application is presented to illustrate the performance and usefulness of the proposal. In addition, a new state of the art for sensor data analysis is presented.
Collapse
|
140
|
|
141
|
Detecting Learning Patterns in Tertiary Education Using K-Means Clustering. INFORMATION 2022. [DOI: 10.3390/info13020094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
We are in the era where various processes need to be online. However, data from digital learning platforms are still underutilised in higher education, yet, they contain student learning patterns, whose awareness would contribute to educational development. Furthermore, the knowledge of student progress would inform educators whether they would mitigate teaching conditions for critically performing students. Less knowledge of performance patterns limits the development of adaptive teaching and learning mechanisms. In this paper, a model for data exploitation to dynamically study students progress is proposed. Variables to determine current students progress are defined and are used to group students into different clusters. A model for dynamic clustering is proposed and related cluster migration is analysed to isolate poorer or higher performing students. K-means clustering is performed on real data consisting of students from a South African tertiary institution. The proposed model for cluster migration analysis is applied and the corresponding learning patterns are revealed.
Collapse
|
142
|
Weiskopf D. Uncertainty Visualization: Concepts, Methods, and Applications in Biological Data Visualization. FRONTIERS IN BIOINFORMATICS 2022; 2:793819. [PMID: 36304261 PMCID: PMC9580861 DOI: 10.3389/fbinf.2022.793819] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/14/2022] [Indexed: 11/23/2022] Open
Abstract
This paper provides an overview of uncertainty visualization in general, along with specific examples of applications in bioinformatics. Starting from a processing and interaction pipeline of visualization, components are discussed that are relevant for handling and visualizing uncertainty introduced with the original data and at later stages in the pipeline, which shows the importance of making the stages of the pipeline aware of uncertainty and allowing them to propagate uncertainty. We detail concepts and methods for visual mappings of uncertainty, distinguishing between explicit and implict representations of distributions, different ways to show summary statistics, and combined or hybrid visualizations. The basic concepts are illustrated for several examples of graph visualization under uncertainty. Finally, this review paper discusses implications for the visualization of biological data and future research directions.
Collapse
|
143
|
Xie X, Zhang X, Shen J, Du K. Poplar's Waterlogging Resistance Modeling and Evaluating: Exploring and Perfecting the Feasibility of Machine Learning Methods in Plant Science. FRONTIERS IN PLANT SCIENCE 2022; 13:821365. [PMID: 35222479 PMCID: PMC8874143 DOI: 10.3389/fpls.2022.821365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 01/20/2022] [Indexed: 06/14/2023]
Abstract
Floods, as one of the most common disasters in the natural environment, have caused huge losses to human life and property. Predicting the flood resistance of poplar can effectively help researchers select seedlings scientifically and resist floods precisely. Using machine learning algorithms, models of poplar's waterlogging tolerance were established and evaluated. First of all, the evaluation indexes of poplar's waterlogging tolerance were analyzed and determined. Then, significance testing, correlation analysis, and three feature selection algorithms (Hierarchical clustering, Lasso, and Stepwise regression) were used to screen photosynthesis, chlorophyll fluorescence, and environmental parameters. Based on this, four machine learning methods, BP neural network regression (BPR), extreme learning machine regression (ELMR), support vector regression (SVR), and random forest regression (RFR) were used to predict the flood resistance of poplar. The results show that random forest regression (RFR) and support vector regression (SVR) have high precision. On the test set, the coefficient of determination (R2) is 0.8351 and 0.6864, the root mean square error (RMSE) is 0.2016 and 0.2780, and the mean absolute error (MAE) is 0.1782 and 0.2031, respectively. Therefore, random forest regression (RFR) and support vector regression (SVR) can be given priority to predict poplar flood resistance.
Collapse
Affiliation(s)
- Xuelin Xie
- College of Sciences, Huazhong Agricultural University, Wuhan, China
| | | | - Jingfang Shen
- College of Sciences, Huazhong Agricultural University, Wuhan, China
| | - Kebing Du
- College of Horticulture and Forestry Sciences, Hubei Engineering Technology Research Center for Forestry Information, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
144
|
Joloudari JH, Saadatfar H, GhasemiGol M, Alizadehsani R, Sani ZA, Hasanzadeh F, Hassannataj E, Sharifrazi D, Mansor Z. FCM-DNN: diagnosing coronary artery disease by deep accuracy fuzzy C-means clustering model. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:3609-3635. [PMID: 35341267 DOI: 10.3934/mbe.2022167] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cardiovascular disease is one of the most challenging diseases in middle-aged and older people, which causes high mortality. Coronary artery disease (CAD) is known as a common cardiovascular disease. A standard clinical tool for diagnosing CAD is angiography. The main challenges are dangerous side effects and high angiography costs. Today, the development of artificial intelligence-based methods is a valuable achievement for diagnosing disease. Hence, in this paper, artificial intelligence methods such as neural network (NN), deep neural network (DNN), and fuzzy C-means clustering combined with deep neural network (FCM-DNN) are developed for diagnosing CAD on a cardiac magnetic resonance imaging (CMRI) dataset. The original dataset is used in two different approaches. First, the labeled dataset is applied to the NN and DNN to create the NN and DNN models. Second, the labels are removed, and the unlabeled dataset is clustered via the FCM method, and then, the clustered dataset is fed to the DNN to create the FCM-DNN model. By utilizing the second clustering and modeling, the training process is improved, and consequently, the accuracy is increased. As a result, the proposed FCM-DNN model achieves the best performance with a 99.91% accuracy specifying 10 clusters, i.e., 5 clusters for healthy subjects and 5 clusters for sick subjects, through the 10-fold cross-validation technique compared to the NN and DNN models reaching the accuracies of 92.18% and 99.63%, respectively. To the best of our knowledge, no study has been conducted for CAD diagnosis on the CMRI dataset using artificial intelligence methods. The results confirm that the proposed FCM-DNN model can be helpful for scientific and research centers.
Collapse
Affiliation(s)
| | - Hamid Saadatfar
- Department of Computer Engineering, Faculty of Engineering, University of Birjand, Birjand, Iran
| | - Mohammad GhasemiGol
- Department of Computer Engineering, Faculty of Engineering, University of Birjand, Birjand, Iran
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation, Deakin University, Geelong, VIC 3216, Australia
| | - Zahra Alizadeh Sani
- Rajaie Cardiovascular Medical and Research Center, Iran University of Medical Sciences, Tehran, Iran
- Omid hospital, Iran University of Medical Sciences, Tehran, Iran
| | | | - Edris Hassannataj
- Department of Nursing, School of Nursing and Allied Medical Sciences, Maragheh Faculty of Medical Sciences, Maragheh, Iran
| | - Danial Sharifrazi
- Department of Computer Engineering, School of Technical and Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
| | - Zulkefli Mansor
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi 43600, Malaysia
| |
Collapse
|
145
|
Alharbi M, Laramee RS, Cheesman T. TransVis: Integrated Distant and Close Reading of Othello Translations. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1397-1414. [PMID: 32746287 DOI: 10.1109/tvcg.2020.3012778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Studying variation among time-evolved translations is a valuable research area for cultural heritage. Understanding how and why translations vary reveals cultural, ideological, and even political influences on literature as well as author relations. In this article, we introduce a novel integrated visual application to support distant and close reading of a collection of Othello translations. We present a new interactive application that provides an alignment overview of all the translations and their correspondences in parallel with smooth zooming and panning capability to integrate distant and close reading within the same view. We provide a range of filtering and selection options to customize the alignment overview as well as focus on specific subsets. Selection and filtering are responsive to expert user preferences and update the analytical text metrics interactively. Also, we introduce a customized view for close reading which preserves the history of selections and the alignment overview state and enables backtracing and re-examining them. Finally, we present a new Term-Level Comparisons view (TLC) to compare and convey relative term weighting in the context of an alignment. Our visual design is guided by, used and evaluated by a domain expert specialist in German translations of Shakespeare.
Collapse
|
146
|
Yang S, Zhang L, Xu C, Yu H, Fan J, Xu Z. Massive data clustering by multi-scale psychological observations. Natl Sci Rev 2022; 9:nwab183. [PMID: 35242339 PMCID: PMC8889001 DOI: 10.1093/nsr/nwab183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 09/09/2021] [Accepted: 09/23/2021] [Indexed: 11/14/2022] Open
Abstract
Clustering is the discovery of latent group structure in data and is a fundamental problem in artificial intelligence, and a vital procedure in data-driven scientific research over all disciplines. Yet, existing methods have various limitations, especially weak cognitive interpretability and poor computational scalability, when it comes to clustering massive datasets that are increasingly available in all domains. Here, by simulating the multi-scale cognitive observation process of humans, we design a scalable algorithm to detect clusters hierarchically hidden in massive datasets. The observation scale changes, following the Weber-Fechner law to capture the gradually emerging meaningful grouping structure. We validated our approach in real datasets with up to a billion records and 2000 dimensions, including taxi trajectories, single-cell gene expressions, face images, computer logs and audios. Our approach outperformed popular methods in usability, efficiency, effectiveness and robustness across different domains.
Collapse
Affiliation(s)
- Shusen Yang
- National Engineering Laboratory of Big Data Analytics, Xi’an Jiaotong University, Xi’an 710049, China
- Industrial Artificial Intelligent Center, Pazhou Laboratory, Guangzhou 510335, China
| | - Liwen Zhang
- National Engineering Laboratory of Big Data Analytics, Xi’an Jiaotong University, Xi’an 710049, China
| | - Chen Xu
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| | - Hanqiao Yu
- National Engineering Laboratory of Big Data Analytics, Xi’an Jiaotong University, Xi’an 710049, China
| | - Jianqing Fan
- Center for Statistics and Machine Learning, Princeton University, Princeton, NJ 08544, USA
| | - Zongben Xu
- National Engineering Laboratory of Big Data Analytics, Xi’an Jiaotong University, Xi’an 710049, China
| |
Collapse
|
147
|
Landslide evolution state prediction and down-level control based on multi-task learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
148
|
Duan Y, Liu C, Li S, Guo X, Yang C. Gradient-based elephant herding optimization for cluster analysis. APPL INTELL 2022; 52:11606-11637. [PMID: 35106027 PMCID: PMC8795968 DOI: 10.1007/s10489-021-03020-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2021] [Indexed: 11/17/2022]
Abstract
Clustering analysis is essential for obtaining valuable information from a predetermined dataset. However, traditional clustering methods suffer from falling into local optima and an overdependence on the quality of the initial solution. Given these defects, a novel clustering method called gradient-based elephant herding optimization for cluster analysis (GBEHO) is proposed. A well-defined set of heuristics is introduced to select the initial centroids instead of selecting random initial points. Specifically, the elephant optimization algorithm (EHO) is combined with the gradient-based algorithm GBO for assigning initial cluster centers across the search space. Second, to overcome the imbalance between the original EHO exploration and exploitation, the initialized population is improved by introducing Gaussian chaos mapping. In addition, two operators, i.e., random wandering and variation operators, are set to adjust the location update strategy of the agents. Nine datasets from synthetic and real-world datasets are adopted to evaluate the effectiveness of the proposed algorithm and the other metaheuristic algorithms. The results show that the proposed algorithm ranks first among the 10 algorithms. It is also extensively compared with state-of-the-art techniques, and four evaluation criteria of accuracy rate, specificity, detection rate, and F-measure are used. The obtained results clearly indicate the excellent performance of GBEHO, while the stability is also more prominent.
Collapse
|
149
|
Kaur A, Kumar Y. Neighborhood search based improved bat algorithm for data clustering. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02934-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
150
|
Lu K, Liao H. A survey of group decision making methods in Healthcare Industry 4.0: bibliometrics, applications, and directions. APPL INTELL 2022; 52:13689-13713. [PMID: 35002080 PMCID: PMC8727077 DOI: 10.1007/s10489-021-02909-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2021] [Indexed: 12/07/2022]
Abstract
Healthcare Industry 4.0 refers to intelligent operation processes in the medical industry. With the development of information technology, large-scale group decision making (GDM), which allows a larger number of decision makers (DMs) from different places or sectors to participate in decision making, has been rapidly developed and applied in Healthcare Industry 4.0 to help to make decisions efficiently and smartly. To make full use of GDM methods to promote the developments of the medical industry, it is necessary to review the existing relevant achievements. Therefore, this paper conducts an overview to generate a comprehensive understanding of GDM in Healthcare Industry 4.0 and to identify future development directions. Bibliometric analyses are conducted in order to learn the development trends from published papers. The implementations of GDM methods in Healthcare Industry 4.0 are reviewed in accordance with the paradigm of the general GDM process, which includes information representation, dimension reduction, consensus reaching, and result elicitation. We also provide current research challenges and future directions regarding medical GDM. It is hoped that our study will be helpful for researchers in the field of GDM in Healthcare Industry 4.0.
Collapse
Affiliation(s)
- Keyu Lu
- Business School, Sichuan University, Chengdu, 610064 China
| | - Huchang Liao
- Business School, Sichuan University, Chengdu, 610064 China
| |
Collapse
|