1
|
Wu L, Yuan L, Zhao G, Lin H, Li SZ. Deep Clustering and Visualization for End-to-End High-Dimensional Data Analysis. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8543-8554. [PMID: 35263258 DOI: 10.1109/tnnls.2022.3151498] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
High-dimensional data analysis for exploration and discovery includes two fundamental tasks: deep clustering and data visualization. When these two associated tasks are done separately, as is often the case thus far, disagreements can occur among the tasks in terms of geometry preservation. Namely, the clustering process is often accompanied by the corruption of the geometric structure, whereas visualization aims to preserve the data geometry for better interpretation. Therefore, how to achieve deep clustering and data visualization in an end-to-end unified framework is an important but challenging problem. In this article, we propose a novel neural network-based method, called deep clustering and visualization (DCV), to accomplish the two associated tasks end-to-end to resolve their disagreements. The DCV framework consists of two nonlinear dimensionality reduction (NLDR) transformations: 1) one from the input data space to latent feature space for clustering and 2) the other from the latent feature space to the final 2-D space for visualization. Importantly, the first NLDR transformation is mainly optimized by one Clustering Loss, allowing arbitrary corruption of the geometric structure for better clustering, while the second NLDR transformation is optimized by one Geometry-Preserving Loss to recover the corrupted geometry for better visualization. Extensive comparative results show that the DCV framework outperforms other leading clustering-visualization algorithms in terms of both quantitative evaluation metrics and qualitative visualization.
Collapse
|
2
|
Predictive Model for Diagnosis of Gestational Diabetes in the Kurdistan Region by a Combination of Clustering and Classification Algorithms: An Ensemble Approach. APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING 2022. [DOI: 10.1155/2022/9749579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Gestational diabetes is a type of high blood sugar that develops during pregnancy. It can occur at any stage of pregnancy and cause problems for both the mother and the baby, during and after birth. The risks can be reduced if they are early detected and managed, especially in areas where only periodic tests of pregnant women are available. Intelligent systems designed by machine learning algorithms are remodelling all fields of our lives, including the healthcare system. This study proposes a combined prediction model to diagnose gestational diabetes. The dataset was obtained from the Kurdistan region laboratories, which collected information from pregnant women with and without diabetes. The suggested model uses the clustering KMeans technique for data reduction and the elbow method to find the optimal k value and the Mahalanobis distance method to find more related cluster to new samples, and the classification methods such as decision tree, random forest, SVM, KNN, logistic regression, and Naïve Bayes are used for prediction. The results showed that using a mix of KMeans clustering, elbow method, Mahalanobis distance, and ensemble technique significantly improves prediction accuracy.
Collapse
|
3
|
Broomandi P, Crape B, Jahanbakhshi A, Janatian N, Nikfal A, Tamjidi M, Kim JR, Middleton N, Karaca F. Assessment of the association between dust storms and COVID-19 infection rate in southwest Iran. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:36392-36411. [PMID: 35060047 PMCID: PMC8776378 DOI: 10.1007/s11356-021-18195-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 12/14/2021] [Indexed: 05/21/2023]
Abstract
This study assesses a plausible correlation between a dust intrusion episode and a daily increase in COVID-19 cases. A surge in COVID-19 cases was observed a few days after a Middle East Dust (MED) event that peaked on 25th April 2020 in southwest Iran. To investigate potential causal factors for the spike in number of cases, cross-correlations between daily combined aerosol optical depths (AODs) and confirmed cases were computed for Khuzestan, Iran. Additionally, atmospheric stability data time series were assessed by covering before, during, and after dust intrusion, producing four statistically clustered distinct city groups. Groups 1 and 2 had different peak lag times of 10 and 4-5 days, respectively. Since there were statistically significant associations between AOD levels and confirmed cases in both groups, dust incursion may have increased population susceptibility to COVID-19 disease. Group 3 was utilized as a control group with neither a significant level of dust incursion during the episodic period nor any significant associations. Group 4 cities, which experienced high dust incursion levels, showed no significant correlation with confirmed case count increases. Random Forest Analysis assessed the influence of wind speed and AOD, showing relative importance of 0.31 and 0.23 on the daily increase percent of confirmed cases, respectively. This study may serve as a reference for better understanding and predicting factors affecting COVID-19 transmission and diffusion routes, focusing on the role of MED intrusions.
Collapse
Affiliation(s)
- Parya Broomandi
- Department of Civil and Environmental Engineering, Nazarbayev University, Nur-Sultan, Kazakhstan, 010000
- Department of Chemical Engineering, Masjed-Soleiman Branch, Islamic Azad University, Masjed-Soleiman, Iran
| | - Byron Crape
- Department of Medicine, School of Medicine, Nazarbayev University, Nur-Sultan, Kazakhstan, 010000
| | - Ali Jahanbakhshi
- Environmental Centre, Lancaster University, Lancaster, LA1 4YQ, UK
| | - Nasime Janatian
- Chair of Hydrobiology and Fishery, Institute of Agricultural and Environmental Sciences, Estonian University of Life Sciences, Tartu, Estonia
- Department of Evolutionary Biology, Ecology and Environmental Sciences, University of Barcelona, Barcelona, Spain
| | | | - Mahsa Tamjidi
- Faculty of Natural Resources and Environment, Islamic Azad University, Science and Research Branch of Tehran, Tehran, Iran
| | - Jong R Kim
- Department of Civil and Environmental Engineering, Nazarbayev University, Nur-Sultan, Kazakhstan, 010000.
| | - Nick Middleton
- St Anne's College, University of Oxford, Oxford, OX2 6HS, UK
| | - Ferhat Karaca
- Department of Civil and Environmental Engineering, Nazarbayev University, Nur-Sultan, Kazakhstan, 010000
- The Environment and Resource Efficiency Cluster (EREC), Nazarbayev University, Nur-Sultan, Kazakhstan, 010000
| |
Collapse
|
4
|
Application of online multitask learning based on least squares support vector regression in the financial market. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
5
|
Malaysia PM10 Air Quality Time Series Clustering Based on Dynamic Time Warping. ATMOSPHERE 2022. [DOI: 10.3390/atmos13040503] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Air quality monitoring is important in the management of the environment and pollution. In this study, time series of PM10 from air quality monitoring stations in Malaysia were clustered based on similarity in terms of time series patterns. The identified clusters were analyzed to gain meaningful information regarding air quality patterns in Malaysia and to identify characterization for each cluster. PM10 time series data from 5 July 2017 to 31 January 2019, obtained from the Malaysian Department of Environment and Dynamic Time Warping as the dissimilarity measure were used in this study. At the same time, k-Means, Partitioning Around Medoid, agglomerative hierarchical clustering, and Fuzzy k-Means were the algorithms used for clustering. The results portray that the categories and activities of locations of the monitoring stations do not directly influence the pattern of the PM10 values, instead, the clusters formed are mainly influenced by the region and geographical area of the locations.
Collapse
|