1
|
Marco N, Şentürk D, Jeste S, DiStefano CC, Dickinson A, Telesca D. Flexible Regularized Estimation in High-Dimensional Mixed Membership Models. Comput Stat Data Anal 2024; 194:107931. [PMID: 39324030 PMCID: PMC11423932 DOI: 10.1016/j.csda.2024.107931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
Mixed membership models are an extension of finite mixture models, where each observation can partially belong to more than one mixture component. A probabilistic framework for mixed membership models of high-dimensional continuous data is proposed with a focus on scalability and interpretability. The novel probabilistic representation of mixed membership is based on convex combinations of dependent multivariate Gaussian random vectors. In this setting, scalability is ensured through approximations of a tensor covariance structure through multivariate eigen-approximations with adaptive regularization imposed through shrinkage priors. Conditional weak posterior consistency is established on an unconstrained model, allowing for a simple posterior sampling scheme while keeping many of the desired theoretical properties of our model. The model is motivated by two biomedical case studies: a case study on functional brain imaging of children with autism spectrum disorder (ASD) and a case study on gene expression data from breast cancer tissue. These applications highlight how the typical assumption made in cluster analysis, that each observation comes from one homogeneous subgroup, may often be restrictive in several applications, leading to unnatural interpretations of data features.
Collapse
Affiliation(s)
- Nicholas Marco
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Damla Şentürk
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Shafali Jeste
- Division of Neurology and Neurological Institute, Children's Hospital Los Angeles, Los Angeles, CA 90027, USA
| | | | - Abigail Dickinson
- Department of Psychiatry and Biobehavioral Sciences, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Donatello Telesca
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
2
|
Castanho EN, Aidos H, Madeira SC. Biclustering data analysis: a comprehensive survey. Brief Bioinform 2024; 25:bbae342. [PMID: 39007596 PMCID: PMC11247412 DOI: 10.1093/bib/bbae342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 05/16/2024] [Accepted: 07/01/2024] [Indexed: 07/16/2024] Open
Abstract
Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.
Collapse
Affiliation(s)
- Eduardo N Castanho
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Helena Aidos
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| |
Collapse
|
3
|
Hector I, Panjanathan R. Predictive maintenance in Industry 4.0: a survey of planning models and machine learning techniques. PeerJ Comput Sci 2024; 10:e2016. [PMID: 38855197 PMCID: PMC11157603 DOI: 10.7717/peerj-cs.2016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Accepted: 04/02/2024] [Indexed: 06/11/2024]
Abstract
Equipment downtime resulting from maintenance in various sectors around the globe has become a major concern. The effectiveness of conventional reactive maintenance methods in addressing interruptions and enhancing operational efficiency has become inadequate. Therefore, acknowledging the constraints associated with reactive maintenance and the growing need for proactive approaches to proactively detect possible breakdowns is necessary. The need for optimisation of asset management and reduction of costly downtime emerges from the demand for industries. The work highlights the use of Internet of Things (IoT)-enabled Predictive Maintenance (PdM) as a revolutionary strategy across many sectors. This article presents a picture of a future in which the use of IoT technology and sophisticated analytics will enable the prediction and proactive mitigation of probable equipment failures. This literature study has great importance as it thoroughly explores the complex steps and techniques necessary for the development and implementation of efficient PdM solutions. The study offers useful insights into the optimisation of maintenance methods and the enhancement of operational efficiency by analysing current information and approaches. The article outlines essential stages in the application of PdM, encompassing underlying design factors, data preparation, feature selection, and decision modelling. Additionally, the study discusses a range of ML models and methodologies for monitoring conditions. In order to enhance maintenance plans, it is necessary to prioritise ongoing study and improvement in the field of PdM. The potential for boosting PdM skills and guaranteeing the competitiveness of companies in the global economy is significant through the incorporation of IoT, Artificial Intelligence (AI), and advanced analytics.
Collapse
Affiliation(s)
- Ida Hector
- School of Computer Science and Engineering, Vellore Institute of Technology Chennai, Chennai, Tamil Nadu, India
| | - Rukmani Panjanathan
- School of Computer Science and Engineering, Vellore Institute of Technology Chennai, Chennai, Tamil Nadu, India
| |
Collapse
|
4
|
Ngo H, Fang H, Rumbut J, Wang H. Federated Fuzzy Clustering for Decentralized Incomplete Longitudinal Behavioral Data. IEEE INTERNET OF THINGS JOURNAL 2024; 11:14657-14670. [PMID: 38605934 PMCID: PMC11006372 DOI: 10.1109/jiot.2023.3343719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/13/2024]
Abstract
The use of medical data for machine learning, including unsupervised methods such as clustering, is often restricted by privacy regulations such as the Health Insurance Portability and Accountability Act (HIPAA). Medical data is sensitive and highly regulated and anonymization is often insufficient to protect a patient's identity. Traditional clustering algorithms are also unsuitable for longitudinal behavioral health trials, which often have missing data and observe individual behaviors over varying time periods. In this work, we develop a new decentralized federated multiple imputation-based fuzzy clustering algorithm for complex longitudinal behavioral trial data collected from multisite randomized controlled trials over different time periods. Federated learning (FL) preserves privacy by aggregating model parameters instead of data. Unlike previous FL methods, this proposed algorithm requires only two rounds of communication and handles clients with varying numbers of time points for incomplete longitudinal data. The model is evaluated on both empirical longitudinal dietary health data and simulated clusters with different numbers of clients, effect sizes, correlations, and sample sizes. The proposed algorithm converges rapidly and achieves desirable performance on multiple clustering metrics. This new method allows for targeted treatments for various patient groups while preserving their data privacy and enables the potential for broader applications in the Internet of Medical Things.
Collapse
Affiliation(s)
- Hieu Ngo
- College of Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747
| | - Hua Fang
- Department of Computer and Information Science, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747 and the Department of Population and Quantitative Health Science, University of Massachusetts Chan Medical School, Worcester, MA 01655 USA
| | - Joshua Rumbut
- College of Engineering, University of Massachusetts Dartmouth, North Dartmouth, MA, 02747 and the Department of Population and Quantitative Health Science, University of Massachusetts Chan Medical School, Worcester, MA 01655 USA
| | - Honggang Wang
- Department of Graduate Computer Science and Engineering, Katz School of Science and Health, Yeshiva University, New York City, NY, 10033
| |
Collapse
|
5
|
Kiani Mavi R, Zarbakhshnia N, Kiani Mavi N, Kazemi S. Clustering sustainable suppliers in the plastics industry: A fuzzy equivalence relation approach. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 345:118811. [PMID: 37659368 DOI: 10.1016/j.jenvman.2023.118811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 08/03/2023] [Accepted: 08/11/2023] [Indexed: 09/04/2023]
Abstract
Nowadays, pure economic supply chain management is not commonly contemplated among companies (especially buyers), as recently novel dimensions of supply chains, e.g., environmental, sustainability, and risk, play significant roles. In addition, since companies prefer buying their needs from a group of suppliers, the problem of supplier selection is not solely choosing or qualifying a supplier from among others. Buyers, hence, commonly assemble a portfolio of suppliers by looking at the multi-dimensional pre-determined selection criteria. Since sustainable supplier selection criteria are often assessed by linguistic terms, an appropriate clustering approach is required. This paper presents an innovative way to implement fuzzy equivalence relation to clustering sustainable suppliers through developing a comprehensive taxonomy of sustainable supplier selection criteria, including supply chain risk. Fifteen experts participated in this study to evaluate 20 suppliers and cluster them in the plastics industry. Findings reveal that the best partitioning occurs when the suppliers are divided into two clusters, with 4 (20%) and 16 (80%) suppliers, respectively. The four suppliers in cluster one are performing better in terms of the capability of supplier/delivery, service, risk, and sustainability criteria such as environment protection/management, and green innovation. These factors are critical in clustering and selecting sustainable suppliers. The originality of this study lies in developing an all-inclusive set of criteria for clustering sustainable suppliers and adding risk factors to the conventional supplier selection criteria. In addition to partitioning the suppliers and determining the best-performing ones, this study also highlights the most influential factors by analysing the suppliers in the best cluster.
Collapse
Affiliation(s)
- Reza Kiani Mavi
- School of Business and Law, Edith Cowan University, Joondalup, WA, 6027, Australia.
| | - Navid Zarbakhshnia
- Department of Management, Monash Business School, Monash University, Caulfield, Victoria, Australia
| | - Neda Kiani Mavi
- School of Business and Law, Edith Cowan University, Joondalup, WA, 6027, Australia
| | - Sajad Kazemi
- Doctoral Student, Graduate School of Management, Saint Petersburg State University, Russia
| |
Collapse
|
6
|
Zwir I, Arnedo J, Mesa A, Del Val C, de Erausquin GA, Cloninger CR. Temperament & Character account for brain functional connectivity at rest: A diathesis-stress model of functional dysregulation in psychosis. Mol Psychiatry 2023; 28:2238-2253. [PMID: 37015979 PMCID: PMC10611583 DOI: 10.1038/s41380-023-02039-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 03/11/2023] [Accepted: 03/15/2023] [Indexed: 04/06/2023]
Abstract
The human brain's resting-state functional connectivity (rsFC) provides stable trait-like measures of differences in the perceptual, cognitive, emotional, and social functioning of individuals. The rsFC of the prefrontal cortex is hypothesized to mediate a person's rational self-government, as is also measured by personality, so we tested whether its connectivity networks account for vulnerability to psychosis and related personality configurations. Young adults were recruited as outpatients or controls from the same communities around psychiatric clinics. Healthy controls (n = 30) and clinically stable outpatients with bipolar disorder (n = 35) or schizophrenia (n = 27) were diagnosed by structured interviews, and then were assessed with standardized protocols of the Human Connectome Project. Data-driven clustering identified five groups of patients with distinct patterns of rsFC regardless of diagnosis. These groups were distinguished by rsFC networks that regulate specific biopsychosocial aspects of psychosis: sensory hypersensitivity, negative emotional balance, impaired attentional control, avolition, and social mistrust. The rsFc group differences were validated by independent measures of white matter microstructure, personality, and clinical features not used to identify the subjects. We confirmed that each connectivity group was organized by differential collaborative interactions among six prefrontal and eight other automatically-coactivated networks. The temperament and character traits of the members of these groups strongly accounted for the differences in rsFC between groups, indicating that configurations of rsFC are internal representations of personality organization. These representations involve weakly self-regulated emotional drives of fear, irrational desire, and mistrust, which predispose to psychopathology. However, stable outpatients with different diagnoses (bipolar or schizophrenic psychoses) were highly similar in rsFC and personality. This supports a diathesis-stress model in which different complex adaptive systems regulate predisposition (which is similar in stable outpatients despite diagnosis) and stress-induced clinical dysfunction (which differs by diagnosis).
Collapse
Affiliation(s)
- Igor Zwir
- Washington University School of Medicine, Department of Psychiatry, St. Louis, MO, USA
- University of Granada, Department of Computer Science, Granada, Spain
- University of Texas, Rio Grande Valley School of Medicine, Institute of Neuroscience, Harlingen, TX, USA
| | - Javier Arnedo
- Washington University School of Medicine, Department of Psychiatry, St. Louis, MO, USA
- University of Granada, Department of Computer Science, Granada, Spain
| | - Alberto Mesa
- University of Granada, Department of Computer Science, Granada, Spain
| | - Coral Del Val
- University of Granada, Department of Computer Science, Granada, Spain
| | - Gabriel A de Erausquin
- University of Texas, Long School of Medicine, Department of Neurology, San Antonio, TX, USA
- Laboratory of Brain Development, Modulation and Repair, Glenn Biggs Institute of Alzheimer's & Neurodegenerative Disorders, San Antonio, TX, USA
| | - C Robert Cloninger
- Washington University School of Medicine, Department of Psychiatry, St. Louis, MO, USA.
| |
Collapse
|
7
|
Liu X, Shao W, Chen J, Lü Z, Glover F, Ding J. Multi-start local search algorithm based on a novel objective function for clustering analysis. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04580-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2023]
|
8
|
Deep Learning Techniques for Quantification of Tumour Necrosis in Post-neoadjuvant Chemotherapy Osteosarcoma Resection Specimens for Effective Treatment Planning. ACTA INFORMATICA PRAGENSIA 2023. [DOI: 10.18267/j.aip.207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023] Open
|
9
|
Globally automatic fuzzy clustering for probability density functions and its application for image data. APPL INTELL 2023. [DOI: 10.1007/s10489-023-04470-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
10
|
Campagner A, Ciucci D, Denœux T. A General Framework for Evaluating and Comparing Soft Clusterings. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
11
|
Wan Y, Ma A, Zhang L, Zhong Y. Multiobjective Sine Cosine Algorithm for Remote Sensing Image Spatial-Spectral Clustering. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:11172-11186. [PMID: 33872167 DOI: 10.1109/tcyb.2021.3064552] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Remote sensing image data clustering is a tough task, which involves classifying the image without any prior information. Remote sensing image clustering, in essence, belongs to a complex optimization problem, due to the high dimensionality and complexity of remote sensing imagery. Therefore, it can be easily affected by the initial values and trapped in locally optimal solutions. Meanwhile, remote sensing images contain complex and diverse spatial-spectral information, which makes them difficult to model with only a single objective function. Although evolutionary multiobjective optimization methods have been presented for the clustering task, the tradeoff between the global and local search abilities is not well adjusted in the evolutionary process. In this article, in order to address these problems, a multiobjective sine cosine algorithm for remote sensing image data spatial-spectral clustering (MOSCA_SSC) is proposed. In the proposed method, the clustering task is converted into a multiobjective optimization problem, and the Xie-Beni (XB) index and Jeffries-Matusita (Jm) distance combined with the spatial information term (SI_Jm measure) are utilized as the objective functions. In addition, for the first time, the sine cosine algorithm (SCA), which can effectively adjust the local and global search capabilities, is introduced into the framework of multiobjective clustering for continuous optimization. Furthermore, the destination solution in the SCA is automatically selected and updated from the current Pareto front through employing the knee-point-based selection approach. The benefits of the proposed method were demonstrated by clustering experiments with ten UCI datasets and four real remote sensing image datasets.
Collapse
|
12
|
Density-based IFCM along with its interval valued and probabilistic extensions, and a review of intuitionistic fuzzy clustering methods. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10236-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
13
|
Jiao L, Yang H, Liu ZG, Pan Q. Interpretable fuzzy clustering using unsupervised fuzzy decision trees. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.08.077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
14
|
Kiruba Nesamalar E, SatheeshKumar J, Amudha T. Efficient DNA-ligand interaction framework using fuzzy C-means clustering based glowworm swarm optimization (FCMGSO) method. J Biomol Struct Dyn 2022:1-13. [PMID: 35930294 DOI: 10.1080/07391102.2022.2105958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Assessment of DNA and ligand interaction is a great challenge to the medical researchers and drug industries since the accurate mapping of DNA and ligand plays an important role in associating drugs for suitable diseases. The primary objective of this research work is to develop an efficient model for predicting the best DNA and Ligand mapping. In this research work, 500 instances of DNA and drugs used for cancer and non-cancer diseases from the National Centre for Biotechnology Information (NCBI) were considered for analysis. Binding energy is one of the important measures to predict and finalize the best DNA and ligand interaction. Existing methods used for the docking process such as Simulated Annealing (SA), Lamarckian Genetic Algorithm (LGA), Genetic Clustering (GC), Fuzzy C-means clustering (FCM), and Genetic Clustering with Multi swarm Optimization (GCMSO) were applied for all 500 instances. These algorithms failed to produce better binding energy due to a lack of optimization in the existing approaches. Optimization methods play a major role in predicting accurate DNA ligand docking. Hence, this research proposes an efficient architecture using Fuzzy C-Means Clustering with Glowworm Swarm (FCMGSO) optimization method for accurate analysis of the DNA-ligand docking process. Results are proving that the proposed FCMGSO algorithm shows less binding energy than other existing methods in all instances of samples considered from the NCBI dataset.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
| | - J SatheeshKumar
- Department of Computer Applications, Bharathiar University, Coimbatore, India
| | - T Amudha
- Department of Computer Applications, Bharathiar University, Coimbatore, India
| |
Collapse
|
15
|
Hybrid Fuzzy C-Means Clustering Algorithm Oriented to Big Data Realms. AXIOMS 2022. [DOI: 10.3390/axioms11080377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
A hybrid variant of the Fuzzy C-Means and K-Means algorithms is proposed to solve large datasets such as those presented in Big Data. The Fuzzy C-Means algorithm is sensitive to the initial values of the membership matrix. Therefore, a special configuration of the matrix can accelerate the convergence of the algorithm. In this sense, a new approach is proposed, which we call Hybrid OK-Means Fuzzy C-Means (HOFCM), and it optimizes the values of the membership matrix parameter. This approach consists of three steps: (a) generate a set of n solutions of an x dataset, applying a variant of the K-Means algorithm; (b) select the best solution as the basis for generating the optimized membership matrix; (c) resolve the x dataset with Fuzzy C-Means. The experimental results with four real datasets and one synthetic dataset show that HOFCM reduces the time by up to 93.94% compared to the average time of the standard Fuzzy C-Means. It is highlighted that the quality of the solution was reduced by 2.51% in the worst case.
Collapse
|
16
|
Developing an Enterprise Diagnostic Index System Based on Interval-Valued Hesitant Fuzzy Clustering. MATHEMATICS 2022. [DOI: 10.3390/math10142440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
Global economic integration drives the development of dynamic competition. In a dynamic competitive environment, the ever-changing customer demands and technology directly affect the leadership of the core competence of enterprises. Therefore, assessing the performance of enterprises in a timely manner is necessary to adjust business activities and completely adapt to new changes. Enterprise diagnosis is a scientific tool for judging the development status of enterprises, and building a scientific and rational index system is the key to enterprise diagnosis. Considering the large number of enterprise diagnostic indicators and the high similarity among indicators, this study proposes a selection method for enterprise diagnostic indicators based on interval-valued hesitant fuzzy clustering by comparing the existing indicator systems. First, enterprise organizations are considered as the starting point. Through the key analysis of relevant indicators of domestic and foreign enterprise diagnosis, enterprise diagnosis candidate indicators are constructed from three aspects, namely enterprise performance, employee health, and social benefit. In view of the ambiguity and inconsistency of expert judgment, this study proposes an interval-valued hesitant fuzzy set based on the characteristics of hesitant fuzzy sets and interval-valued evaluation. For improving the interval-valued hesitant fuzzy entropy function, an interval-valued hesitant fuzzy similarity measurement formula considering information features is designed to avoid the problem of data length and improve the degree of identification among indicators. Then, the similarity, equivalence, and truncation matrices are constructed, and the interval-valued hesitant fuzzy clustering method is used to eliminate redundant indicators with repeated information. The availability of the proposed method is illustrated via an example, and the key indicators in the enterprise diagnostic index system are found. Finally, the advantages of the proposed method are discussed using comparative analysis with existing methods. A rational and comprehensive enterprise diagnostic index system was constructed. The system can be used as a scientific basis for diagnosing the development of enterprises and providing an objective and effective reference.
Collapse
|
17
|
Rashid J, Kim J, Hussain A, Naseem U, Juneja S. A novel multiple kernel fuzzy topic modeling technique for biomedical data. BMC Bioinformatics 2022; 23:275. [PMID: 35820793 PMCID: PMC9277941 DOI: 10.1186/s12859-022-04780-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 06/08/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. METHODS In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. RESULTS Extensive experiments are conducted on six biomedical datasets. MKFTM achieved the highest classification accuracy 99.04%, 99.62%, 99.69%, 99.61% in the Muchmore Springer dataset and 94.10%, 89.45%, 92.91%, 90.35% in the Ohsumed dataset. The CH index value of MKFTM is higher, which shows that its clustering performance is better than state-of-the-art topic models. CONCLUSION We have confirmed from results that proposed MKFTM approach is very efficient to handles to sparsity and redundancy problem in biomedical text documents. MKFTM discovers semantically relevant topics with high accuracy for biomedical documents. Its gives better results for classification and clustering in biomedical documents. MKFTM is a new approach to topic modeling, which has the flexibility to work with a variety of clustering methods.
Collapse
Affiliation(s)
- Junaid Rashid
- Department of Computer Science and Engineering, Kongju National University, Cheonan, 31080 Korea
| | - Jungeun Kim
- Department of Software, Department of Computer Science and Engineering, Kongju National University, Cheonan, 31080 Korea
| | - Amir Hussain
- Data Science and Cyber Analytics Research Group, Edinburgh Napier University, Edinburgh, EH11 4DY UK
| | - Usman Naseem
- School of Computer Science, University of Sydney, Sydney, Australia
| | - Sapna Juneja
- Department of Computer Science, KIET Group of Institutions, Dehli NCR, Ghaziabad, India
| |
Collapse
|
18
|
BIM for the Realization of Sustainable Digital Models in a University-Business Collaborative Learning Environment: Assessment of Use and Students’ Perception. BUILDINGS 2022. [DOI: 10.3390/buildings12070971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
This paper develops an assessment of an academic implementation of building information modeling (BIM) carried out in an expert project subject of a School of Industrial Engineering. The objectives were for the students discover sustainable industrial during the design process and the students understand and participate in a real process of the implementation of industrial projects through real collaboration between academic and business contexts. The outcomes of this academic initiative were evaluated using academic results as well as students’ perceptions. Academic results were analyzed using the FUZZY VIKOR method. An analysis of variance (ANOVA) was performed to determine whether the use of BIM, the proposed university-enterprise environment and the sustainability proposal rate of the students’ projects had statistically significant effects on the results. Students´ perception evaluation was based on a Likert survey with five levels, and the results were interpreted using fuzzy k-means clustering and classification tree analysis. The results show that 77.8% of students consider that for learning, it is more effective to carry out a project related to an existing company, with the realization of the project with BIM methodology being of great value. The sustainability aspects were applied more easily thanks to the proposed methodology, and they were positively valued by the company.
Collapse
|
19
|
Azzouzi S, Hjouji A, EL-Mekkaoui J, EL Khalfi A. A novel efficient clustering algorithm based on possibilistic approach and kernel technique for image clustering problems. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03703-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
20
|
Azzouzi S, Hjouji A, EL-Mekkaoui J, EL Khalfi A. An improved image clustering algorithm based on Kernel method and Tchebychev orthogonal moments. EVOLUTIONARY INTELLIGENCE 2022. [DOI: 10.1007/s12065-022-00734-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
21
|
Improving Classification Performance of Fully Connected Layers by Fuzzy Clustering in Transformed Feature Space. Symmetry (Basel) 2022. [DOI: 10.3390/sym14040658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Fully connected (FC) layers are used in almost all neural network architectures ranging from multilayer perceptrons to deep neural networks. FC layers allow any kind of symmetric/asymmetric interaction between features without making any assumption about the structure of the data. However, success of convolutional and recursive layers and findings of many studies have proven that the intrinsic structure of a dataset holds a great potential to improve the success of a classification problem. Leveraging clustering to explore and exploit this intrinsic structure in classification problems has been the subject of various studies. In this paper, we propose a new training pipeline for fully connected layers which enables them to make more accurate classification predictions. The proposed method aims to reflect the clustering patterns in the original feature space of the training dataset to the transformed feature space created by the FC layer. In this way, we intend to enhance the representation ability of the extracted features and accordingly increase the classification accuracy. The Fuzzy C-Means algorithm is employed in this study as the clustering tool. To evaluate the performance of the proposed method, 11 experiments were conducted on 9 benchmark UCI datasets. Empirical results show that the proposed method works well in practice and gives higher classification accuracies compared to a regular FC layer in most datasets.
Collapse
|
22
|
Deep autoencoder-based fuzzy c-means for topic detection. ARRAY 2022. [DOI: 10.1016/j.array.2021.100124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
23
|
Fuzzy clustering algorithms with distance metric learning and entropy regularization. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
24
|
|
25
|
|
26
|
Li Y, Chen C, Hu X, Qin J, Ma Y. Fuzzy Rule-Based Models: A Design with Prototype Relocation and Granular Generalization. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.12.093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
27
|
Murfi H. A scalable eigenspace-based fuzzy c-means for topic detection. DATA TECHNOLOGIES AND APPLICATIONS 2021. [DOI: 10.1108/dta-11-2020-0262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeThe aim of this research is to develop an eigenspace-based fuzzy c-means method for scalable topic detection.Design/methodology/approachThe eigenspace-based fuzzy c-means (EFCM) combines representation learning and clustering. The textual data are transformed into a lower-dimensional eigenspace using truncated singular value decomposition. Fuzzy c-means is performed on the eigenspace to identify the centroids of each cluster. The topics are provided by transforming back the centroids into the nonnegative subspace of the original space. In this paper, we extend the EFCM method for scalability by using the two approaches, i.e. single-pass and online. We call the developed topic detection methods as oEFCM and spEFCM.FindingsOur simulation shows that both oEFCM and spEFCM methods provide faster running times than EFCM for data sets that do not fit in memory. However, there is a decrease in the average coherence score. For both data sets that fit and do not fit into memory, the oEFCM method provides a tradeoff between running time and coherence score, which is better than spEFCM.Originality/valueThis research produces a scalable topic detection method. Besides this scalability capability, the developed method also provides a faster running time for the data set that fits in memory.
Collapse
|
28
|
Fuzzy C-Means Clustering Algorithm with Multiple Fuzzification Coefficients. ALGORITHMS 2020. [DOI: 10.3390/a13070158] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Clustering is an unsupervised machine learning technique with many practical applications that has gathered extensive research interest. Aside from deterministic or probabilistic techniques, fuzzy C-means clustering (FCM) is also a common clustering technique. Since the advent of the FCM method, many improvements have been made to increase clustering efficiency. These improvements focus on adjusting the membership representation of elements in the clusters, or on fuzzifying and defuzzifying techniques, as well as the distance function between elements. This study proposes a novel fuzzy clustering algorithm using multiple different fuzzification coefficients depending on the characteristics of each data sample. The proposed fuzzy clustering method has similar calculation steps to FCM with some modifications. The formulas are derived to ensure convergence. The main contribution of this approach is the utilization of multiple fuzzification coefficients as opposed to only one coefficient in the original FCM algorithm. The new algorithm is then evaluated with experiments on several common datasets and the results show that the proposed algorithm is more efficient compared to the original FCM as well as other clustering methods.
Collapse
|
29
|
De Luca G, Zuccolotto P. Regime dependent interconnectedness among fuzzy clusters of financial time series. ADV DATA ANAL CLASSI 2020. [DOI: 10.1007/s11634-020-00405-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
30
|
Li H. Examination on image segmentation method of ischemic optic neuropathy based on fuzzy clustering theory. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-179585] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Hui Li
- Ophthalmology, Affiliated Hospital of Jilin Medical University, Jilin, China
| |
Collapse
|