1
|
Chweidan H, Rudyuk N, Tzur D, Goldstein C, Almoznino G. Statistical Methods and Machine Learning Algorithms for Investigating Metabolic Syndrome in Temporomandibular Disorders: A Nationwide Study. Bioengineering (Basel) 2024; 11:134. [PMID: 38391620 PMCID: PMC10886027 DOI: 10.3390/bioengineering11020134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 01/23/2024] [Accepted: 01/25/2024] [Indexed: 02/24/2024] Open
Abstract
The objective of this study was to analyze the associations between temporomandibular disorders (TMDs) and metabolic syndrome (MetS) components, consequences, and related conditions. This research analyzed data from the Dental, Oral, Medical Epidemiological (DOME) records-based study which integrated comprehensive socio-demographic, medical, and dental databases from a nationwide sample of dental attendees aged 18-50 years at military dental clinics for 1 year. Statistical and machine learning models were performed with TMDs as the dependent variable. The independent variables included age, sex, smoking, each of the MetS components, and consequences and related conditions, including hypertension, hyperlipidemia, diabetes, impaired glucose tolerance (IGT), obesity, cardiac disease, obstructive sleep apnea (OSA), nonalcoholic fatty liver disease (NAFLD), transient ischemic attack (TIA), stroke, deep venous thrombosis (DVT), and anemia. The study included 132,529 subjects, of which 1899 (1.43%) had been diagnosed with TMDs. The following parameters retained a statistically significant positive association with TMDs in the multivariable binary logistic regression analysis: female sex [OR = 2.65 (2.41-2.93)], anemia [OR = 1.69 (1.48-1.93)], and age [OR = 1.07 (1.06-1.08)]. Features importance generated by the XGBoost machine learning algorithm ranked the significance of the features with TMDs (the target variable) as follows: sex was ranked first followed by age (second), anemia (third), hypertension (fourth), and smoking (fifth). Metabolic morbidity and anemia should be included in the systemic evaluation of TMD patients.
Collapse
Affiliation(s)
- Harry Chweidan
- Department of Prosthodontics, Oral and Maxillofacial Center, Israel Defense Forces, Medical Corps, Tel-Hashomer, Ramat Gan 02149, Israel
| | - Nikolay Rudyuk
- Department of Prosthodontics, Oral and Maxillofacial Center, Israel Defense Forces, Medical Corps, Tel-Hashomer, Ramat Gan 02149, Israel
| | - Dorit Tzur
- Medical Information Department, General Surgeon Headquarters, Israel Defense Forces, Medical Corps, Tel-Hashomer, Ramat Gan 02149, Israel
| | - Chen Goldstein
- Big Biomedical Data Research Laboratory, Dean's Office, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel
| | - Galit Almoznino
- Big Biomedical Data Research Laboratory, Dean's Office, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel
- Department of Oral Medicine, Sedation & Maxillofacial Imaging, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel
| |
Collapse
|
2
|
Goldstein A, Shahar Y, Weisman Raymond M, Peleg H, Ben-Chetrit E, Ben-Yehuda A, Shalom E, Goldstein C, Shiloh SS, Almoznino G. Multi-Dimensional Validation of the Integration of Syntactic and Semantic Distance Measures for Clustering Fibromyalgia Patients in the Rheumatic Monitor Big Data Study. Bioengineering (Basel) 2024; 11:97. [PMID: 38275577 PMCID: PMC10813477 DOI: 10.3390/bioengineering11010097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/28/2023] [Accepted: 01/11/2024] [Indexed: 01/27/2024] Open
Abstract
This study primarily aimed at developing a novel multi-dimensional methodology to discover and validate the optimal number of clusters. The secondary objective was to deploy it for the task of clustering fibromyalgia patients. We present a comprehensive methodology that includes the use of several different clustering algorithms, quality assessment using several syntactic distance measures (the Silhouette Index (SI), Calinski-Harabasz index (CHI), and Davies-Bouldin index (DBI)), stability assessment using the adjusted Rand index (ARI), and the validation of the internal semantic consistency of each clustering option via the performance of multiple clustering iterations after the repeated bagging of the data to select multiple partial data sets. Then, we perform a statistical analysis of the (clinical) semantics of the most stable clustering options using the full data set. Finally, the results are validated through a supervised machine learning (ML) model that classifies the patients back into the discovered clusters and is interpreted by calculating the Shapley additive explanations (SHAP) values of the model. Thus, we refer to our methodology as the clustering, distance measures and iterative statistical and semantic validation (CDI-SSV) methodology. We applied our method to the analysis of a comprehensive data set acquired from 1370 fibromyalgia patients. The results demonstrate that the K-means was highly robust in the syntactic and the internal consistent semantics analysis phases and was therefore followed by a semantic assessment to determine the optimal number of clusters (k), which suggested k = 3 as a more clinically meaningful solution, representing three distinct severity levels. the random forest model validated the results by classification into the discovered clusters with high accuracy (AUC: 0.994; accuracy: 0.946). SHAP analysis emphasized the clinical relevance of "functional problems" in distinguishing the most severe condition. In conclusion, the CDI-SSV methodology offers significant potential for improving the classification of complex patients. Our findings suggest a classification system for different profiles of fibromyalgia patients, which has the potential to improve clinical care, by providing clinical markers for the evidence-based personalized diagnosis, management, and prognosis of fibromyalgia patients.
Collapse
Affiliation(s)
- Ayelet Goldstein
- Computer Science Department, Hadassah Academic College, Jerusalem 9101001, Israel;
| | - Yuval Shahar
- Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva 8410501, Israel; (Y.S.)
| | - Michal Weisman Raymond
- Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva 8410501, Israel; (Y.S.)
| | - Hagit Peleg
- Rheumatology Unit, Hadassah Medical Center, Jerusalem 9112102, Israel
| | - Eldad Ben-Chetrit
- Rheumatology Unit, Hadassah Medical Center, Jerusalem 9112102, Israel
| | - Arie Ben-Yehuda
- Division of Internal Medicine, Hadassah Medical Center, Jerusalem 9112102, Israel
| | - Erez Shalom
- Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva 8410501, Israel; (Y.S.)
| | - Chen Goldstein
- Faculty of Dental Medicine, Hebrew University of Jerusalem, Israel; Big Biomedical Data Research Laboratory, Dean’s Office, Hadassah Medical Center, Jerusalem 91120, Israel
| | - Shmuel Shay Shiloh
- Faculty of Dental Medicine, Hebrew University of Jerusalem, Israel; Big Biomedical Data Research Laboratory, Dean’s Office, Hadassah Medical Center, Jerusalem 91120, Israel
| | - Galit Almoznino
- Faculty of Dental Medicine, Hebrew University of Jerusalem, Israel; Big Biomedical Data Research Laboratory, Dean’s Office, Hadassah Medical Center, Jerusalem 91120, Israel
- Department of Oral Medicine, Sedation & Maxillofacial Imaging, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel
| |
Collapse
|
3
|
Safa M, Pandian A, Gururaj HL, Ravi V, Krichen M. Real time health care big data analytics model for improved QoS in cardiac disease prediction with IoT devices. HEALTH AND TECHNOLOGY 2023. [DOI: 10.1007/s12553-023-00747-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023]
|
4
|
DWDP-Stream: A Dynamic Weight and Density Peaks Clustering Algorithm for Data Stream. INT J COMPUT INT SYS 2022. [DOI: 10.1007/s44196-022-00157-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
AbstractIdentifying clusters of arbitrary shapes and constantly processing the newly arrived data points are two critical challenges in the study of clustering. This paper proposes a dynamic weight and density peaks clustering algorithm to simultaneously solve these two key issues. An online–offline framework is used, creating and maintaining micro-clusters in the online phase, and treating the micro-clusters as pseudo-points to form the final cluster in the offline phase. In the online phase, when a new data point is merged into the corresponding micro-cluster, a dynamic weight method is proposed to update the weight of the micro-cluster according to the distance between the point and the center of the micro-cluster, so as to more accurately describe the information of the micro-cluster. In the offline phase, the density peak clustering algorithm is improved, natural neighbors are introduced to adaptively obtain the local density of the data point, and the allocation process is improved to reduce the probability of allocation errors. The algorithm is evaluated on different synthetic and real-world datasets using different quality metrics. The experimental results show that the proposed algorithm improves the clustering quality in both static and streaming environments.
Collapse
|
5
|
Rani AJM, Pravin A. Optimization Enabled Black Hole Entropic Fuzzy Clustering Approach for Medical Data. THE COMPUTER JOURNAL 2022; 65:1795-1811. [DOI: 10.1093/comjnl/bxab021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2025]
Abstract
Abstract
Medical data clustering is an important part of medical decision systems as it refines highly sensitive information from the huge medical datasets. Medical data clustering includes processes, like determine random clusters, set data into specified clusters and handle data clusters dynamically. Hence, handling of medical data streams and clustering remains a challenging issue. This paper proposes a technique, namely Rider-based sunflower optimization (RSFO) for medical data clustering. Initially, the significant features are selected using the Tversky index with holoentropy that is established from the input data. The holo-entropy is utilized to analyze the relationship between the attributes and features. Here, the clustering is done by a Black Hole Entropic Fuzzy Clustering (BHEFC) algorithm, where the optimal cluster centroids are selected by the proposed RSFO algorithm. The proposed RSFO is designed by incorporating the Rider optimization algorithm (ROA) and sunflower optimization (SFO). The effectiveness of the proposed BHEFC+RSFO algorithm is analyzed by the Dermatology Data Set, and the proposed method has the maximal accuracy of 94.480%, Jaccard coefficient of 94.224% and Rand coefficient of 91.307%, respectively.
Collapse
Affiliation(s)
- A Jaya Mabel Rani
- Research Scholar , Department of Computer Science and Engineering, , Chennai, India
- Sathyabama Institute of Science and Technology , Department of Computer Science and Engineering, , Chennai, India
| | - A Pravin
- Associate Professor , Department of Computer Science and Engineering, , Chennai, India
- Sathyabama Institute of Science and Technology , Department of Computer Science and Engineering, , Chennai, India
| |
Collapse
|
6
|
Santhana Marichamy V, Natarajan V. Efficient big data security analysis on HDFS based on combination of clustering and data perturbation algorithm using health care database. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2022. [DOI: 10.3233/jifs-213024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In this manuscript proposes an efficient big data security analysis on HDFS based on the combination of Improved Deep K-means Clustering (IDFKM) algorithm and Modified 3D rotation data perturbation algorithm using health care database. To compile a similar group of data, an Improved Deep K-means Clustering (IDFKM) Algorithm is used as partitioning the medical data. After clustering, Modified 3D rotation data perturbation technique is used to satisfy the privacy requirement of the client. Modified 3D rotation Data Perturbation technique perturbs each and every sensitive data of the cluster and all the key parameters values used for clustering have warehoused in the database file sector. The proposed approach is executed by Java program, its efficiency is assessed by Health care database. The metrics under the study of memory usage attains higher accuracy 34.765%, 23.44%, 52.74%, 18.74%, lower execution time 35.23%, 23.76%, 27.86%, 27.76%, higher Efficiency 26.85%, 38.97%, 28.97%, 35.65%. then the proposed method is compared with the existing methods such asSecurity Analysis of SDN Applications for Big Data with spoofing identity, Tampering with data, Repudiation threats, Information disclosure, Denial of service and Elevation of privileges (STRIDE), Big Data Analysis-based Secure Cluster Management for using Ant Colony Optimization (ACA) Optimized Control Plane in Software-Defined Networks, System Architecture for Secure Authentication and Data Sharing in Cloud Enabled Big Data Environment using LemperlZivMarkow Algorithm (LZMA) and Density-based Clustering of Applications with Noise (DBSCAN), Big Data Based Security Analytics using data based security analytics (BDSA) approach for Protecting Virtualized Infrastructures in Cloud Computing respectively.
Collapse
Affiliation(s)
- V. Santhana Marichamy
- Department of Computer Science and Engineering, SRM Valliammai Engineering College, SRM Nagar, Kattankulathur, Tamil Nadu, India
| | - V. Natarajan
- Department of Instrumentation Engineering, Madras Institute of Technology, Anna University, Chennai, India
| |
Collapse
|
7
|
ROSE: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach Learn 2022. [DOI: 10.1007/s10994-022-06168-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
8
|
Jia D, Li F, Tu J. A Multi-Swarm ABC Algorithm for Parameters Optimization of SOFM Neural Network in Dynamic Environment. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2021. [DOI: 10.1142/s1469026821500140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Self-organizing feature map (SOFM) neural network is a kind of competitive unsupervised learning neural network, which has strong self-organizing and self-learning capabilities. It has been widely used in the fields of data classification and data clustering. A crucial step for SOFM neural network is to set its weight parameters correctly because the output accuracy and efficiency of the network depend much on these parameters. Most of current methods for parameter setting are based on static data. However, in a dynamic environment, the statistical characteristics of the generated data will change unpredictably over time. If the SOFM network cannot react to the changes of the environment, its performance will degrade. To deal with this problem, a more powerful multi-swarm artificial bee colony algorithm (MABC) is proposed. In the algorithm, the classic ABC algorithm is improved with multi-swarm and exclusive operation strategies to make it suitable for tracking optimal parameter settings of the SOFM network, so that the SOFM network can be applied to a dynamic environment. Two real data streams, which are regarded as coming from dynamic environments, are used to evaluate the effectiveness of the algorithm. Results show that the proposed algorithm is superior to the classic SOFM algorithm in terms of clustering purity and effectiveness. It is a promising method for the classification of data streams from dynamic environments.
Collapse
Affiliation(s)
- Dongli Jia
- School of Information and Electronic Engineering, Hebei University of Engineering, Taiji Street 19#, HanDan, HeBei 056001, P. R. China
| | - Fan Li
- School of Information and Electronic Engineering, Hebei University of Engineering, Taiji Street 19#, HanDan, HeBei 056001, P. R. China
| | - Jun Tu
- School of Information and Electronic Engineering, Hebei University of Engineering, Taiji Street 19#, HanDan, HeBei 056001, P. R. China
| |
Collapse
|
9
|
RSMOTE: improving classification performance over imbalanced medical datasets. Health Inf Sci Syst 2020; 8:22. [PMID: 32549976 DOI: 10.1007/s13755-020-00112-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 05/20/2020] [Indexed: 10/24/2022] Open
Abstract
Introduction Medical diagnosis is a crucial step for patient treatment. However, diagnosis is prone to bias due to imbalanced datasets. To overcome the imbalanced dataset problem, simple minority oversampling technique (SMOTE) was proposed that can generate new synthetic samples at data level to create the balance between minority and majority classes. However, the synthetic samples are generated on a random basis which causes class mixture problem; thus, resulting in deteriorating the classification performance and biased diagnosis. Purpose In order to overcome the SMOTE shortcomings, some modified methods were proposed that try to generate synthetic samples along the line segment of selected minority samples. Most of these methods adopt one of the two policies for selecting minority samples to generate synthetic samples: borderline region sampling or safe region sampling. However, they both suffer from over-generalisation problem. We propose a modified SMOTE-based resampling method called RSMOTE to alleviate the medical imbalanced dataset problem. We provide an in-depth analysis and verify the performance of RSMOTE over imbalanced medical datasets. Methods In this paper, the proposed RSMOTE divides the minority sample domain into four regions (normal, semi-normal, semi-critical, and critical) based on the minority sample density analysis. RSMOTE discovers the minority sample region globally and applies the resampling near a specific group of samples. Results Our analysis and experiments verify that if synthetic samples are generated in the regions with high minority sample density, classification performance will be improved due to low risk of class mixture. Unlike some safe region methods, RSMOTE decides the region of minority samples on a global basis, thus removing the over-generalisation problem. Classic and additional evaluation metrics are considered to measure the effectiveness of the modified method: Recall, FP Rate, Precision, F-Measure, ROC area, and Average Aggregated Metric. We carried out experiments over various imbalanced medical datasets. Conclusion Based on the minority sample density analysis, we propose RSMOTE method that divides the minority sample domain into four regions. The proposed RSMOTE includes four re-sampling methods that each of them carries out resampling on a specific region. According to the experimental results, resampling on the regions with high minority sample density obtained better results while those with lower minority sample density got the inferior results. Thus, we conclude that the RSMOTE is a more flexible resampling method for the imbalanced medical datasets that is capable of generating samples with various minority sample densities.
Collapse
|
10
|
Medical data visual synchronization and information interaction using Internet-based graphics rendering and message-oriented streaming. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100253] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
|