Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

164
(from Reference Citation Analysis)

Article PDFs (49)

Cited by ≥ 1 (85)

Searched Name

K-means

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Number	Citation Analysis
26	Hao X, Han L, Zheng D, Jin X, Li C, Huang L, Huang Z. Assessing resource allocation based on workload: a data envelopment analysis study on clinical departments in a class a tertiary public hospital in China. BMC Health Serv Res 2023;23:808. [PMID: 37507799 PMCID: PMC10375627 DOI: 10.1186/s12913-023-09803-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023] Open Abstract OBJECTIVE Today, the development mode of public hospitals in China is turning from expansion to efficiency, and the management mode is turning from extensive to refined. This study aims to evaluate the efficiency of clinical departments in a Chinese class A tertiary public hospital (Hospital M) to analyze the allocation of hospital resources among these departments providing a reference for the hospital management. METHODS The hospitalization data of inpatients from 32 clinical departments of Hospital M in 2021 are extracted from the hospital information system (HIS), and a dataset containing 38,147 inpatients is got using stratified sampling. Considering the non-homogeneity of clinical departments, the 38,147 patients are clustered using the K-means algorithm based on workload-related data labels including inpatient days, intensive care workload index, nursing workload index, and operation workload index, so that the medical resource consumption of inpatients from non-homogeneous clinical departments can be transformed into the homogeneous workload of medical staff. Taking the numbers of doctors, nurses, and beds as input indicators, and the numbers of inpatients assigned to certain clusters as output indicators, an input-oriented BCC model is built named the workload-based DEA model. Meanwhile, a control DEA model with the number of inpatients and medical revenue as output indicators is built, and the outputs of the two models are compared and analyzed. RESULTS Clustering of 38,147 patients into 3 categories is of better interpretability. 14 departments reach DEA efficient in the workload-based DEA model, 10 reach DEA efficient in the control DEA model, and 8 reach DEA efficient in both models. The workload-based DEA model gives a relatively rational judge on the increase of income brought by scale expansion, and evaluates some special departments like Critical Care Medicine Dept., Geriatrics Dept. and Rehabilitation Medicine Dept. more properly, which better adapts to the functional orientation of public hospitals in China. CONCLUSION The design of evaluating the efficiency of non-homogeneous clinical departments with the workload as output proposed in this study is feasible, and provides a new idea to quantify professional medical human resources, which is of practical significance for public hospitals to optimize the layout of resources, to provide real-time guidance on manpower grouping strategies, and to estimate the expected output reasonably. Collapse Key Words Data envelopment analysis Efficiency K-means Public hospitals Resource allocation Collapse MESH Headings Collapse Grants 2021-SKJJ-C-040 the Military Program of National Social Science Foundation of China 2021-SKJJ-C-040 the Military Program of National Social Science Foundation of China Collapse
27	Song S, Ren X, He J, Gao M, Wang J, Wang B. An Optimal Hierarchical Approach for Oral Cancer Diagnosis Using Rough Set Theory and an Amended Version of the Competitive Search Algorithm. Diagnostics (Basel) 2023;13:2454. [PMID: 37510198 PMCID: PMC10377835 DOI: 10.3390/diagnostics13142454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/06/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open Abstract Oral cancer is introduced as the uncontrolled cells' growth that causes destruction and damage to nearby tissues. This occurs when a sore or lump grows in the mouth that does not disappear. Cancers of the cheeks, lips, floor of the mouth, tongue, sinuses, hard and soft palate, and lungs (throat) are types of this cancer that will be deadly if not detected and cured in the beginning stages. The present study proposes a new pipeline procedure for providing an efficient diagnosis system for oral cancer images. In this procedure, after preprocessing and segmenting the area of interest of the inputted images, the useful characteristics are achieved. Then, some number of useful features are selected, and the others are removed to simplify the method complexity. Finally, the selected features move into a support vector machine (SVM) to classify the images by selected characteristics. The feature selection and classification steps are optimized by an amended version of the competitive search optimizer. The technique is finally implemented on the Oral Cancer (Lips and Tongue) images (OCI) dataset, and its achievements are confirmed by the comparison of it with some other latest techniques, which are weight balancing, a support vector machine, a gray-level co-occurrence matrix (GLCM), the deep method, transfer learning, mobile microscopy, and quadratic discriminant analysis. The simulation results were authenticated by four indicators and indicated the suggested method's efficiency in relation to the others in diagnosing the oral cancer cases. Collapse Key Words K-means amended competitive search algorithm diagnosis oral cancer rough set theory support vector machine Collapse MESH Headings Collapse Grants Collapse
28	Tian J, Zeng Y, Ji L, Zhu H, Guo Z. Control Method of Cold and Hot Shock Test of Sensors in Medium. SENSORS (BASEL, SWITZERLAND) 2023;23:6536. [PMID: 37514830 PMCID: PMC10385061 DOI: 10.3390/s23146536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/11/2023] [Accepted: 07/14/2023] [Indexed: 07/30/2023] Abstract In order to meet the latest requirements for sensor quality test in the industry, the sample sensor needs to be placed in the medium for the cold and hot shock test. However, the existing environmental test chamber cannot effectively control the temperature of the sample in the medium. This paper designs a control method based on the support vector machine (SVM) classification algorithm and K-means clustering combined with neural network correction. When testing sensors in a medium, the clustering SVM classification algorithm is used to distribute the control voltage corresponding to temperature conditions. At the same time, the neural network is used to constantly correct the temperature to reduce overshoot during the temperature-holding phase. Eventually, overheating or overcooling of the basket space indirectly controls the rapid rise or decrease in the temperature of the sensor in the medium. The test results show that this method can effectively control the temperature of the sensor in the medium to reach the target temperature within 15 min and stabilize when the target temperature is between 145 °C and -40 °C. The steady-state error is less than 0.31 °C in the high-temperature area and less than 0.39 °C in the low-temperature area, which well solves the dilemma of the current cold and hot shock test. Collapse Key Words K-means cold and hot shock test environmental test chamber neural network sensor test support vector machine Collapse MESH Headings Collapse Grants Collapse
29	García-Marín NM, Marrero GA, Guerra-Neira A, Rivera-Deán A. Profiles of travelers to intermediate-high health risk areas following the reopening of borders in the COVID-19 crisis: A clustering approach. Travel Med Infect Dis 2023;54:102607. [PMID: 37353065 PMCID: PMC10284617 DOI: 10.1016/j.tmaid.2023.102607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 05/15/2023] [Accepted: 06/13/2023] [Indexed: 06/25/2023] Abstract BACKGROUND The reactivation of international travel in 2021 has created a new scenario in which the profile of the traveler to medium-high health risk areas may well have changed. However, few studies have analyzed this new profile since the reopening of borders in that year. METHODS We designed an ad hoc questionnaire that was administered face-to-face by our medical team during appointments with 330 travelers in the second half of 2021. Information was collected on the following topics: sociodemographic and socioeconomic status; type of travel and previous travel experience; health status and risk perception (of COVID-19 and tropical infectious diseases). Using all features simultaneously, an unsupervised machine learning approach (k-means) is implemented to characterize groups of travelers. Pairwise chi-squared tests were performed to identify key features that showed statistically significant differences between clusters. RESULTS The travelers were clustered into seven groups. We associated the clusters with different intensities of perceived risk of acquiring COVID-19 and tropical infectious diseases on the trip. The perceived risk of both diseases was low in the group "middle or lower middle class young inexperienced male tourist" but high in the group "middle or lower middle-class young with children inexperienced business traveler". CONCLUSIONS Broadening our knowledge of the profiles of travelers to intermediate-high health risk areas would help to tailor the health advice provided by practitioners to their characteristics and type of travel. In a changing health context, the k-means approach supposes a flexible statistical method that calculates travelers' profiles and can be easily adapted to process new information. Collapse Key Words COVID-19 Infectious diseases K-means Risk perception Traveler profiles Collapse MESH Headings Child Humans Male COVID-19/epidemiology Communicable Diseases Travel Surveys and Questionnaires Cluster Analysis Collapse Grants Collapse
30	Chen Y, Sun Y, Bie C, Wang X, He X, Song X. Hierarchical K-means clustering method for accelerated Lorentzian estimation (KALE) in chemical exchange saturation transfer-magnetic resonance imaging quantification. Quant Imaging Med Surg 2023;13:4350-4364. [PMID: 37456289 PMCID: PMC10347364 DOI: 10.21037/qims-22-1379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 04/14/2023] [Indexed: 07/18/2023] Abstract Background Quantification of in vivo chemical exchange saturation transfer (CEST) magnetic resonance signals is challenging due to contamination from coexisting effects, including the direct water effect and asymmetric magnetization transfer. Fitting-based analysis allows the calculation of multiple types of signals from the line shape of Z-spectra. However, the conventional voxelwise method has several drawbacks, including its long computation time and its susceptibility to image noise and Z-spectra oscillations, and it is difficult to determine the initial fitting parameters. Methods Herein, we propose a K-means clustering method for accelerated Lorentzian estimation (KALE) in CEST quantification. Briefly, voxels in CEST images are clustered into K groups according to their Z-spectra characteristics. A 'groupwise' fitting process is then performed with preset initial values, yielding a set of fitted spectra and fitted parameters for each group. With the updated initial values, each group is further clustered into subgroups, and groupwise fitting is performed again. This hierarchical K-means clustering and parameter updating process continues until the pixel number or intensity error meets the termination criteria. Voxelwise fitting could be further conducted to improve the quantification images (termed voxel-K) by utilizing the previous groupwise KALE results as the initial values (termed group-K). Results Incorporated with Lorentzian difference (LD) quantification, KALE was first optimized and evaluated on 5 healthy human brain datasets at 3 Tesla. Compared with traditional voxel-by-voxel LD quantification, the computation times of group-K and voxel-K were significantly reduced by ~85% and ~70%, respectively (P<0.001). Furthermore, the group-K images exhibited better denoising performance than traditional LD and voxel-K. KALE was further validated on six ischemic rat brains acquired at 7 Tesla, with both LD_group-K and LD_voxel-K displaying almost identical contrast maps with traditional voxelwise maps. When incorporated with the five-pool Lorentzian fitting (LF), KALE exhibited an improved contrast-to-noise ratio (CNR) for amplitude maps of each pool [P=0.003, 0.015, 0.047, and 0.047 for amide, nuclear Overhauser effect (NOE), magnetic transfer (MT) and guanidine amine, respectively] and improved fitting goodness (P=0.033). Conclusions KALE quantification provides comparable or even superior contrast maps to traditional voxelwise fitting, with significantly reduced computation time. The 'smart' and hierarchical voxel-clustering and parameter updating process of KALE may facilitate more preclinical and clinical CEST applications. Collapse Key Words Chemical exchange saturation transfer (CEST) K-means Lorentzian fitting (LF) amide nuclear Overhauser effects (NOEs) Collapse MESH Headings Collapse Grants Collapse
31	Alalayah KM, Senan EM, Atlam HF, Ahmed IA, Shatnawi HSA. Effective Early Detection of Epileptic Seizures through EEG Signals Using Classification Algorithms Based on t-Distributed Stochastic Neighbor Embedding and K-Means. Diagnostics (Basel) 2023;13:diagnostics13111957. [PMID: 37296809 DOI: 10.3390/diagnostics13111957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Revised: 05/22/2023] [Accepted: 06/02/2023] [Indexed: 06/12/2023] Open Abstract Epilepsy is a neurological disorder in the activity of brain cells that leads to seizures. An electroencephalogram (EEG) can detect seizures as it contains physiological information of the neural activity of the brain. However, visual examination of EEG by experts is time consuming, and their diagnoses may even contradict each other. Thus, an automated computer-aided diagnosis for EEG diagnostics is necessary. Therefore, this paper proposes an effective approach for the early detection of epilepsy. The proposed approach involves the extraction of important features and classification. First, signal components are decomposed to extract the features via the discrete wavelet transform (DWT) method. Principal component analysis (PCA) and the t-distributed stochastic neighbor embedding (t-SNE) algorithm were applied to reduce the dimensions and focus on the most important features. Subsequently, K-means clustering + PCA and K-means clustering + t-SNE were used to divide the dataset into subgroups to reduce the dimensions and focus on the most important representative features of epilepsy. The features extracted from these steps were fed to extreme gradient boosting, K-nearest neighbors (K-NN), decision tree (DT), random forest (RF) and multilayer perceptron (MLP) classifiers. The experimental results demonstrated that the proposed approach provides superior results to those of existing studies. During the testing phase, the RF classifier with DWT and PCA achieved an accuracy of 97.96%, precision of 99.1%, recall of 94.41% and F1 score of 97.41%. Moreover, the RF classifier with DWT and t-SNE attained an accuracy of 98.09%, precision of 99.1%, recall of 93.9% and F1 score of 96.21%. In comparison, the MLP classifier with PCA + K-means reached an accuracy of 98.98%, precision of 99.16%, recall of 95.69% and F1 score of 97.4%. Collapse Key Words DWT EEG K-means PCA epileptic seizure machine learning t-SNE Collapse MESH Headings Collapse Grants (NU/DRP/SERC/12/17). This research has been funded by the Deanship of Scientific Research at Najran Uni-versity, Kingdom of Saudi Arabia, through a grant code Collapse
32	Allmuttar AYO, Alkhafaji SKD. Using data mining techniques deep analysis and theoretical investigation of COVID-19 pandemic. MEASUREMENT. SENSORS 2023;27:100747. [PMID: 36945699 PMCID: PMC10017173 DOI: 10.1016/j.measen.2023.100747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 01/21/2023] [Accepted: 03/09/2023] [Indexed: 03/17/2023] Abstract This study uses K-Means Clustering to analyze Corona-Virus Diseases (Covid-19). Data mining in medicine has generated novel approaches to examine diseases. Coronavirus is difficult to treat because of its intricate structure, shape, and texture. Due to data mining improvements, the K-Means approach has been developed for evaluating covid-19. Observe the outbreak's evolution, including its peak, and containment measures. A basic K-Means model is used to simulate Coronavirus's prevalence in Iraq. Pandemic-prevention efforts may slow its spread. If inhibition grows to 50%, Iraq will have 500,000 patients by year's end. If precautions were halved, the number would top 1 million. If we abandon all measures, the sickness will worsen. In that case, 55% of the population may be affected by the end of the month. This number will drop after September. Collapse Key Words Analysis Clustering Data mining K-means Reported cases Collapse MESH Headings Collapse Grants Collapse
33	Niyozov S, Domaneschi M, Casas JR, Delgadillo RM. Temperature Effects Removal from Non-Stationary Bridge-Vehicle Interaction Signals for ML Damage Detection. SENSORS (BASEL, SWITZERLAND) 2023;23:s23115187. [PMID: 37299918 DOI: 10.3390/s23115187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/10/2023] [Accepted: 05/25/2023] [Indexed: 06/12/2023] Abstract Bridges are vital components of transport infrastructures, and therefore, it is of utmost importance that they operate safely and reliably. This paper proposes and tests a methodology for detecting and localizing damage in bridges under both traffic and environmental variability considering non-stationary vehicle-bridge interaction. In detail, the current study presents an approach to temperature removal in the case of forced vibrations in the bridge using principal component analysis, with detection and localization of damage using an unsupervised machine learning algorithm. Due to the difficulty in obtaining real data on undamaged and later damaged bridges that are simultaneously influenced by traffic and temperature changes, the proposed method is validated using a numerical bridge benchmark. The vertical acceleration response is derived from a time-history analysis with a moving load under different ambient temperatures. The results show how machine learning algorithms applied to bridge damage detection appear to be a promising technique to efficiently solve the problem's complexity when both operational and environmental variability are included in the recorded data. However, the example application still shows some limitations, such as the use of a numerical bridge and not a real bridge due to the lack of vibration data under health and damage conditions, and with varying temperatures; the simple modeling of the vehicle as a moving load; and the crossing of only one vehicle present in the bridge. This will be considered in future studies. Collapse Key Words K-means PCA bridge damage detection non-stationary operational-environmental variability Collapse MESH Headings Collapse Grants Collapse
34	Sarris AL, Sidiropoulos E, Paraskevopoulos E, Bamidis P. Towards a Digital Twin in Human Brain: Brain Tumor Detection Using K-Means. Stud Health Technol Inform 2023;302:1052-1056. [PMID: 37203579 DOI: 10.3233/shti230345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023] Abstract Digital Twins come to revolutionize the ongoing procedures of healthcare industry, with their ability to stimulate and predict patients' diagnosis and treatment. In this paper a K-means based brain tumor detection algorithm and its 3D modelling design, both derived from MRI scans, are presented towards to the creation of the digital twin. Collapse Key Words Brain Tumor Clustering Digital Twin K-means Collapse MESH Headings Collapse Grants Collapse
35	Xu S, Wang X, Zhu R, Wang D. Spatio-temporal effects of regional resilience construction on carbon emissions: Evidence from 30 Chinese provinces. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;887:164109. [PMID: 37182764 DOI: 10.1016/j.scitotenv.2023.164109] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 04/25/2023] [Accepted: 05/08/2023] [Indexed: 05/16/2023] Abstract In response to the threat of rapidly rising carbon emissions, a variety of measures are being implemented to achieve carbon reduction. Resilience construction offers a fresh approach to improving the regional anti-interference ability to cope with various risks, and it is worth considering its impact on carbon emissions. The objective of this study is to investigate the spatio-temporal impacts of resilience construction (RCI) on carbon intensity (CI) in 30 Chinese provinces from 2010 to 2019. The relation pattern between RCI and CI is thoroughly examined after developing a hybrid model by integrating gray correlation analysis (GRA) and coupled coordination degree (CCD). Using the GTWR model, the coefficients reveal the spatio-temporal pattern of the influence of each variable on CI. Furthermore, this study pioneeringly blends GTWR regression results with the K-Means approach to identify areas with homogeneity and heterogeneity of the pattern. Firstly, the findings indicate that there is a significant link between CI and all dimensions -economic resilience (R_E), social resilience (R_S), and ecological resilience (R_En). The relation between R_En and CI is the greatest, although it has been declining recently while relations of R_S, R_En, and CI have all been steadily rising. Secondly, according to the results of CCD, resilience construction and carbon reduction are progressively reaching orderly development but there are still some provinces at low levels of CCD. Thirdly, the study area is divided into four clusters, and the structure of spatial grouping tends to become stable. Moreover, we analyze each cluster's features and suggest appropriate policy measures. The findings aid in the scientific planning of the direction of resilience construction with the goal of collaborative management of carbon emissions. Collapse Key Words Carbon emissions Coupling model Geographically and temporally weighted regression model (GTWR) K-means Regional resilience construction Collapse MESH Headings Collapse Grants Collapse
36	Faqih M, Omar MB, Ibrahim R. Prediction of Dry-Low Emission Gas Turbine Operating Range from Emission Concentration Using Semi-Supervised Learning. SENSORS (BASEL, SWITZERLAND) 2023;23:3863. [PMID: 37112203 PMCID: PMC10145957 DOI: 10.3390/s23083863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/07/2023] [Revised: 03/27/2023] [Accepted: 04/03/2023] [Indexed: 06/19/2023] Abstract Dry-Low Emission (DLE) technology significantly reduces the emissions from the gas turbine process by implementing the principle of lean pre-mixed combustion. The pre-mix ensures low nitrogen oxides (NO_x) and carbon monoxide (CO) production by operating at a particular range using a tight control strategy. However, sudden disturbances and improper load planning may lead to frequent tripping due to frequency deviation and combustion instability. Therefore, this paper proposed a semi-supervised technique to predict the suitable operating range as a tripping prevention strategy and a guide for efficient load planning. The prediction technique is developed by hybridizing Extreme Gradient Boosting and K-Means algorithm using actual plant data. Based on the result, the proposed model can predict the combustion temperature, nitrogen oxides, and carbon monoxide concentration with an accuracy represented by R squared value of 0.9999, 0.9309, and 0.7109, which outperforms other algorithms such as decision tree, linear regression, support vector machine, and multilayer perceptron. Further, the model can identify DLE gas turbine operation regions and determine the optimum range the turbine can safely operate while maintaining lower emission production. The typical DLE gas turbine's operating range can operate safely is found at 744.68 °C -829.64 °C. The proposed technique can be used as a preventive maintenance strategy in many applications involving tight operating range control in mitigating tripping issues. Furthermore, the findings significantly contribute to power generation fields for better control strategies to ensure the reliable operation of DLE gas turbines. Collapse Key Words Dry-Low Emission gas turbine K-means emission concentration extreme gradient boosting load management Collapse MESH Headings Collapse Grants Collapse
37	Zhang L, Huang D, Chen X, Zhu L, Xie Z, Chen X, Cui G, Zhou Y, Huang G, Shi W. Discrimination between normal and necrotic small intestinal tissue using hyperspectral imaging and unsupervised classification. JOURNAL OF BIOPHOTONICS 2023:e202300020. [PMID: 36966458 DOI: 10.1002/jbio.202300020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 03/07/2023] [Accepted: 03/20/2023] [Indexed: 06/18/2023] Abstract Objective and automatic clinical discrimination of normal and necrotic sites of small intestinal tissue remains challenging. In this study, hyperspectral imaging (HSI) and unsupervised classification techniques were used to distinguish normal and necrotic sites of small intestinal tissues. Small intestinal tissue hyperspectral images of eight Japanese large-eared white rabbits were acquired using a visible near-infrared hyperspectral camera, and K-means and density peaks (DP) clustering algorithms were used to differentiate between normal and necrotic tissue. The three cases in this study showed that the average clustering purity of the DP clustering algorithm reached 92.07% when the two band combinations of 500-622 and 700-858 nm were selected. The results of this study suggest that HSI and DP clustering can assist physicians in distinguishing between normal and necrotic sites in the small intestine in vivo. Collapse Key Words K-means density peaks hyperspectral imaging small intestine tissue unsupervised classification Collapse MESH Headings Collapse Grants Collapse
38	Xi Y, Zhao T, Liu R, Song F, Deng J, Ai N. Assessing Sensory Attributes and Properties of Infant Formula Milk Powder Driving Consumers' Preference. Foods 2023;12:foods12050997. [PMID: 36900514 PMCID: PMC10000600 DOI: 10.3390/foods12050997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 02/18/2023] [Accepted: 02/24/2023] [Indexed: 03/03/2023] Open Abstract Infant formula milk powder (IFMP) is an excellent substitute for breast milk. It is known that the composition of maternal food during pregnancy and lactation and exposure level to food during infancy highly influence taste development in early infancy. However, little is known about the sensory aspects of infant formula. Herein, the sensory characteristics of 14 brands of infant formula segment 1 marketed in China were evaluated, and differences in preferences for IFMPs were determined. Descriptive sensory analysis was performed by well-trained panelists to determine the sensory characteristics of evaluated IFMPs. The brands S1 and S3 had significantly lower astringency and fishy flavor compared to the other brands. Moreover, it was found that S6, S7 and S12 had lower milk flavor scores but higher butter scores. Furthermore, internal preference mapping revealed that the attributes fatty flavor, aftertaste, saltiness, astringency, fishy flavor and sourness negatively contributed to consumer preference in all three clusters. Considering that the majority of consumers prefer milk powders rich in aroma, sweet and steamed flavors, these attributes could be considered for enhancement by the food industry. Collapse Key Words IFMP K-means internal preference mapping sensory analysis Collapse MESH Headings Collapse Grants Collapse
39	Tong Z, Kong Z, Jia X, Yu J, Sun T, Zhang Y. Spatial Heterogeneity and Regional Clustering of Factors Influencing Chinese Adolescents' Physical Fitness. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023;20:3836. [PMID: 36900845 PMCID: PMC10001620 DOI: 10.3390/ijerph20053836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 02/18/2023] [Accepted: 02/20/2023] [Indexed: 06/18/2023] Abstract There is often significant spatial heterogeneity in the factors influencing physical fitness in adolescents, yet less attention has been paid to this in established studies. Based on the 2018 Chinese National Student Physical Fitness Standard Test data, this study uses a multi-scale, geographically weighted regression (MGWR) model combined with a K-means clustering algorithm to construct a spatial regression model of the factors influencing adolescent physical fitness, and to investigate the degree of spatial variation in the physical fitness of Chinese adolescents from a socio-ecological perspective of health promotion. The following conclusions were drawn: the performance of the youth physical fitness regression model was significantly improved after taking spatial scale and heterogeneity into account. At the provincial scale, the non-farm output, average altitude, and precipitation of each region were strongly related to youth physical fitness, and each influencing factor generally showed a banded spatial heterogeneity pattern, which can be summarized into four types: N-S, E-W, NE-SW, and SE-NW. From the perspective of youth physical fitness, China can be divided into three regions of influence: the socio-economic-influenced region, mainly including the eastern region and some of the central provinces of China; the natural-environment-influenced region, which mainly includes the northwestern part of China and some provinces in the highland region; and the multi-factor joint-influenced region, which mainly includes the provinces in the central and northeastern regions of China. Finally, this study provides syndemic suggestions for physical fitness and health promotion for youths in each region. Collapse Key Words K-means influencing factors multi-scale geographically weighted regression (MGWR) physical fitness regional clustering social–ecological model spatial heterogeneity Collapse MESH Headings Humans Adolescent Physical Fitness China Asian People Spatial Regression Collapse Grants Collapse
40	Martins A, Fonseca I, Farinha JT, Reis J, Cardoso AJM. Online Monitoring of Sensor Calibration Status to Support Condition-Based Maintenance. SENSORS (BASEL, SWITZERLAND) 2023;23:2402. [PMID: 36904607 PMCID: PMC10007291 DOI: 10.3390/s23052402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/07/2023] [Accepted: 02/20/2023] [Indexed: 06/18/2023] Abstract Condition-Based Maintenance (CBM), based on sensors, can only be reliable if the data used to extract information are also reliable. Industrial metrology plays a major role in ensuring the quality of the data collected by the sensors. To guarantee that the values collected by the sensors are reliable, it is necessary to have metrological traceability made by successive calibrations from higher standards to the sensors used in the factories. To ensure the reliability of the data, a calibration strategy must be put in place. Usually, sensors are only calibrated on a periodic basis; so, they often go for calibration without it being necessary or collect data inaccurately. In addition, the sensors are checked often, increasing the need for manpower, and sensor errors are frequently overlooked when the redundant sensor has a drift in the same direction. It is necessary to acquire a calibration strategy based on the sensor condition. Through online monitoring of sensor calibration status (OLM), it is possible to perform calibrations only when it is really necessary. To reach this end, this paper aims to provide a strategy to classify the health status of the production equipment and of the reading equipment that uses the same dataset. A measurement signal from four sensors was simulated, for which Artificial Intelligence and Machine Learning with unsupervised algorithms were used. This paper demonstrates how, through the same dataset, it is possible to obtain distinct information. Because of this, we have a very important feature creation process, followed by Principal Component Analysis (PCA), K-means clustering, and classification based on Hidden Markov Models (HMM). Through three hidden states of the HMM, which represent the health states of the production equipment, we will first detect, through correlations, the features of its status. After that, an HMM filter is used to eliminate those errors from the original signal. Next, an equal methodology is conducted for each sensor individually and using statistical features in the time domain where we can obtain, through HMM, the failures of each sensor. Collapse Key Words HMM K-means PCA calibration condition-based maintenance features generation online calibration status sensors Collapse MESH Headings Collapse Grants Collapse
41	Kurt Z, Işık Ş, Kaya Z, Anagün Y, Koca N, Çiçek S. Evaluation of EfficientNet models for COVID-19 detection using lung parenchyma. Neural Comput Appl 2023;35:12121-12132. [PMID: 36843903 PMCID: PMC9940669 DOI: 10.1007/s00521-023-08344-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 01/25/2023] [Indexed: 02/23/2023] Abstract When the COVID-19 pandemic broke out in the beginning of 2020, it became crucial to enhance early diagnosis with efficient means to reduce dangers and future spread of the viruses as soon as possible. Finding effective treatments and lowering mortality rates is now more important than ever. Scanning with a computer tomography (CT) scanner is a helpful method for detecting COVID-19 in this regard. The present paper, as such, is an attempt to contribute to this process by generating an open-source, CT-based image dataset. This dataset contains the CT scans of lung parenchyma regions of 180 COVID-19-positive and 86 COVID-19-negative patients taken at the Bursa Yuksek Ihtisas Training and Research Hospital. The experimental studies show that the modified EfficientNet-ap-nish method uses this dataset effectively for diagnostic purposes. Firstly, a smart segmentation mechanism based on the k-means algorithm is applied to this dataset as a preprocessing stage. Then, performance pretrained models are analyzed using different CNN architectures and with our Nish activation function. The statistical rates are obtained by the various EfficientNet models and the highest detection score is obtained with the EfficientNet-B4-ap-nish version, which provides a 97.93% accuracy rate and a 97.33% F1-score. The implications of the proposed method are immense both for present-day applications and future developments. Collapse Key Words COVID-19 detection CT scan Deep learning EfficientNet K-means Lung parenchyma Collapse MESH Headings Collapse Grants Collapse
42	Steyn Y, Lawlor T, Masuda Y, Tsuruta S, Legarra A, Lourenco D, Misztal I. Nonparallel genome changes within subpopulations over time contributed to genetic diversity within the US Holstein population. J Dairy Sci 2023;106:2551-2572. [PMID: 36797192 DOI: 10.3168/jds.2022-21914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 10/03/2022] [Indexed: 02/16/2023] Abstract Maintaining genetic variation in a population is important for long-term genetic gain. The existence of subpopulations within a breed helps maintain genetic variation and diversity. The 20,990 genotyped animals, representing the breeding animals in the year 2014, were identified as the sires of animals born after 2010 with at least 25 progenies, and females measured for type traits within the last 2 yr of data. K-means clustering with 5 clusters (C1, C2, C3, C4, and C5) was applied to the genomic relationship matrix based on 58,990 SNP markers to stratify the selected candidates into subpopulations. The general higher inbreeding resulting from within-cluster mating than across-cluster mating suggests the successful stratification into genetically different groups. The largest cluster (C4) contained animals that were less related to each animal within and across clusters. The average fixation index was 0.03, indicating that the populations were differentiated, and allele differences across the subpopulations were not due to drift alone. Starting with the selected candidates within each cluster, a family unit was identified by tracing back through the pedigree, identifying the genotyped ancestors, and assigning them to a pseudogeneration. Each of the 5 families (F1, F2, F3, F4, and F5) was traced back for 10 generations, allowing for changes in frequency of individual SNPs over time to be observed, which we call allele frequencies change. Alternative procedures were used to identify SNPs changing in a parallel or nonparallel way across families. For example, markers that have changed the most in the whole population, markers that have changed differently across families, and genes previously identified as those that have changed in allele frequency. The genomic trajectory taken by each family involves selective sweeps, polygenic changes, hitchhiking, and epistasis. The replicate frequency spectrum was used to measure the similarity of change across families and showed that populations have changed differently. The proportion of markers that reversed direction in allele frequency change varied from 0.00 to 0.02 if the rate of change was greater than 0.02 per generation, or from 0.14 to 0.24 if the rate of change was greater than 0.005 per generation within each family. Cluster-specific SNP effects for stature were estimated using only females and applied to obtain indirect genomic predictions for males. Reranking occurs depending on SNP effects used. Additive genetic correlations between clusters show possible differences in populations. Further research is required to determine how this knowledge can be applied to maintain diversity and optimize selection decisions in the future. Collapse Key Words K-means clustering epistasis polygenic adaptation selection sweeps Collapse MESH Headings Collapse Grants Collapse
43	Meirmans PG. Analyzing Autopolyploid Genetic Data Using GenoDive. Methods Mol Biol 2023;2545:261-277. [PMID: 36720818 DOI: 10.1007/978-1-0716-2561-3_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Abstract Analyzing autopolyploid genetic data still presents numerous challenges due to, e.g., missing dosage information of genotypes and the presence of multiple ploidy levels within species or populations, but also because the choice of software is limited when compared to what is available for diploid data. However, over the last years, the number of software programs that can deal with polyploid data is slowly increasing. The software GENODIVE is one of the most widely used programs for the analysis of polyploid genetic data, presenting a wide array of different methods. In this chapter, I outline several frequently used types of population genetic analyses and explain how these apply to polyploid data, including possible pitfalls and biases. I then explain how GENODIVE approaches these analyses and whether and how it can overcome possible biases. Specifically, I focus on analyses of genetic diversity, Hardy-Weinberg equilibrium, quantifying population differentiation, clustering, and calculation of genetic distances. GENODIVE can be downloaded freely from http://www.patrickmeirmans.com/software . Collapse Key Words AMOVA Genetic distances Genetic diversity K-means Polyploidy Population differentiation Collapse MESH Headings Humans Cluster Analysis Diploidy Genotype Ploidies Polyploidy Collapse Grants Collapse
44	Uzcategui-Salazar M, Lillo J. Assessment of social vulnerability to groundwater pollution using K-means cluster analysis. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023;30:14975-14992. [PMID: 36161573 DOI: 10.1007/s11356-022-22810-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 08/26/2022] [Indexed: 06/16/2023] Abstract It is possible to assess the harm that society suffers as a consequence of groundwater contamination in aquifers. Indexing methodologies are commonly applied to assess the social vulnerability to polluted aquifers. However, they assign weighting and rating values to the different factors involved, which makes them very subjective. This research aims to assess the social vulnerability to groundwater pollution taking into account three factors: the uses of groundwater resources, the exposed population, and the socio-economic losses. In order to eliminate the subjectivity of current indexing methodologies, this work uses a K-means cluster analysis for the assessment of social vulnerability. With this method, a social vulnerability map can be produced with greater objectivity. The proposed methodology is applied to an aquifer located in central Spain, an area with significant agricultural development. Low population density and unproductive zones result in low social vulnerability in most of the area. However, high social vulnerability is observed in the southern sector due to agricultural development, which leads to higher socio-economic variables and demand for groundwater resources. Similarly, high social vulnerability is observed in the northeast, mainly influenced by the groundwater use and the exposed population. These results show that social vulnerability in most of the study area is not very significant for assessing the risk of groundwater contamination, because the damage to the social, environmental, or economic sector is low. However, in the south and northeast of the study area, pesticides and fertilizers should be used with caution, as they significantly increase the risk of groundwater contamination. The K-means clustering method proved to be an objective and reliable option for assessing social vulnerability to groundwater pollution in aquifers. Collapse Key Words Clustering analysis Groundwater pollution K-means Social vulnerability Collapse MESH Headings Social Vulnerability Water Pollution/analysis Environmental Monitoring/methods Groundwater Agriculture Collapse Grants Collapse
45	Padannayil NM, Sharma DS, Nangia S, Patro KC, Gaikwad U, Burela N. IMPT of head and neck cancer: unsupervised machine learning treatment planning strategy for reducing radiation dermatitis. Radiat Oncol 2023;18:11. [PMID: 36639667 PMCID: PMC9840252 DOI: 10.1186/s13014-023-02201-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 01/05/2023] [Indexed: 01/15/2023] Open Abstract Radiation dermatitis is a major concern in intensity modulated proton therapy (IMPT) for head and neck cancer (HNC) despite its demonstrated superiority over contemporary photon radiotherapy. In this study, dose surface histogram data extracted from forty-four patients of HNC treated with IMPT was used to predict the normal tissue complication probability (NTCP) of skin. Grades of NTCP-skin were clustered using the K-means clustering unsupervised machine learning (ML) algorithm. A new skin-sparing IMPT (IMPT-SS) planning strategy was developed with three major changes and prospectively implemented in twenty HNC patients. Across skin surfaces exposed from 10 (S10) to 70 (S70) GyRBE, the skin's NTCP demonstrated the strongest associations with S50 and S40 GyRBE (0.95 and 0.94). The increase in the NTCP of skin per unit GyRBE is 0.568 for skin exposed to 50 GyRBE as compared to 0.418 for 40 GyRBE. Three distinct clusters were formed, with 41% of patients in G1, 32% in G2, and 27% in G3. The average (± SD) generalised equivalent uniform dose for G1, G2, and G3 clusters was 26.54 ± 6.75, 38.73 ± 1.80, and 45.67 ± 2.20 GyRBE. The corresponding NTCP (%) were 4.97 ± 5.12, 48.12 ± 12.72 and 87.28 ± 7.73 respectively. In comparison to IMPT, new IMPT-SS plans significantly (P < 0.01) reduced SX GyRBE, gEUD, and associated NTCP-skin while maintaining identical dose volume indices for target and other organs at risk. The mean NTCP-skin value for IMPT-SS was 34% lower than that of IMPT. The dose to skin in patients treated prospectively for HNC was reduced by including gEUD for an acceptable radiation dermatitis determined from the local patient population using an unsupervised MLA in the spot map optimization of a new IMPT planning technique. However, the clinical finding of acute skin toxicity must also be related to the observed reduction in skin dose. Collapse Key Words AI Head and neck cancer IMPT K-means Machine learning Proton therapy Skin toxicity Collapse MESH Headings Collapse Grants Collapse
46	Improving the Accuracy of Diabetes Diagnosis Applications through a Hybrid Feature Selection Algorithm. Neural Process Lett 2023;55:153-169. [PMID: 33814965 PMCID: PMC7997791 DOI: 10.1007/s11063-021-10491-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/09/2021] [Indexed: 01/20/2023] Abstract Artificial intelligence is a future and valuable tool for early disease recognition and support in patient condition monitoring. It can increase the reliability of the cure and decision making by developing useful systems and algorithms. Healthcare workers, especially nurses and physicians, are overworked due to a massive and unexpected increase in the number of patients during the coronavirus pandemic. In such situations, artificial intelligence techniques could be used to diagnose a patient with life-threatening illnesses. In particular, diseases that increase the risk of hospitalization and death in coronavirus patients, such as high blood pressure, heart disease and diabetes, should be diagnosed at an early stage. This article focuses on diagnosing a diabetic patient through data mining techniques. If we are able to diagnose diabetes in the early stages of the disease, we can force patients to stay home and care for their health, so the risk of being infected with the coronavirus would be reduced. The proposed method has three steps: preprocessing, feature selection and classification. Several combinations of Harmony search algorithm, genetic algorithm, and particle swarm optimization algorithm are examined with K-means for feature selection. The combinations have not examined before for diabetes diagnosis applications. K-nearest neighbor is used for classification of the diabetes dataset. Sensitivity, specificity, and accuracy have been measured to evaluate the results. The results achieved indicate that the proposed method with an accuracy of 91.65% outperformed the results of the earlier methods examined in this article. Collapse Key Words Artificial intelligence Coronavirus disease pandemic Diabetes diagnosis application Genetic algorithm Harmony search algorithm K-means Particle swarm optimization Collapse MESH Headings Collapse Grants Collapse
47	Yao Y, Wu S, Liu C, Zhou C, Zhu J, Chen T, Huang C, Feng S, Zhang B, Wu S, Ma F, Liu L, Zhan X. Identification of spinal tuberculosis subphenotypes using routine clinical data: a study based on unsupervised machine learning. Ann Med 2023;55:2249004. [PMID: 37611242 PMCID: PMC10448834 DOI: 10.1080/07853890.2023.2249004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 08/10/2023] [Accepted: 08/11/2023] [Indexed: 08/25/2023] Open Abstract OBJECTIVE The identification of spinal tuberculosis subphenotypes is an integral component of precision medicine. However, we lack proper study models to identify subphenotypes in patients with spinal tuberculosis. Here we identified possible subphenotypes of spinal tuberculosis and compared their clinical results. METHODS A total of 422 patients with spinal tuberculosis who received surgical treatment were enrolled. Clustering analysis was performed using the K-means clustering algorithm and the routinely available clinical data collected from patients within 24 h after admission. Finally, the differences in clinical characteristics, surgical efficacy, and postoperative complications among the subphenotypes were compared. RESULTS Two subphenotypes of spinal tuberculosis were identified. Laboratory examination results revealed that the levels of more than one inflammatory index in cluster 2 were higher than those in cluster 1. In terms of disease severity, Cluster 2 showed a higher Oswestry Disability Index (ODI), a higher visual analysis scale (VAS) score, and a lower Japanese Orthopedic Association (JOA) score. In addition, in terms of postoperative outcomes, cluster 2 patients were more prone to complications, especially wound infections, and had a longer hospital stay. CONCLUSION K-means clustering analysis based on conventional available clinical data can rapidly identify two subtypes of spinal tuberculosis with different clinical results. We believe this finding will help clinicians to rapidly and easily identify the subtypes of spinal tuberculosis at the bedside and become the cornerstone of individualized treatment strategies. Collapse Key Words K-means Spinal tuberculosis cluster analysis heterogeneity machine learning Collapse MESH Headings Humans Unsupervised Machine Learning Tuberculosis, Spinal/diagnosis Tuberculosis, Spinal/surgery Algorithms Cluster Analysis Hospitalization Collapse Grants Collapse
48	Ferreño D, Revuelta JM, Sainz-Aja JA, Wert-Carvajal C, Casado JA, Diego S, Carrascal IA, Silva J, Gutiérrez-Solana F. Shannon entropy as a reliable score to diagnose human fibroelastic degenerative mitral chords: A micro-ct ex-vivo study. Med Eng Phys 2022;110:103919. [PMID: 36564142 DOI: 10.1016/j.medengphy.2022.103919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Revised: 09/12/2022] [Accepted: 11/03/2022] [Indexed: 11/09/2022] Abstract This paper is aimed at identifying by means of micro-CT the microstructural differences between normal and degenerative mitral marginal chordae tendineae. The control group is composed of 21 normal chords excised from 14 normal mitral valves from heart transplant recipients. The experimental group comprises 22 degenerative fibroelastic chords obtained at surgery from 11 pathological valves after mitral repair or replacement. In the control group the superficial endothelial cells and spongiosa layer remained intact, covering the wavy core collagen. In contrast, in the experimental group the collagen fibers were arranged as straightened thick bundles in a parallel configuration. 100 cross-sections were examined by micro-CT from each chord. Each image was randomized through the K-means machine learning algorithm and then, the global and local Shannon entropies were obtained. The optimum number of clusters, K, was estimated to maximize the differences between normal and degenerative chords in global and local Shannon entropy; the p-value after a nested ANOVA test was chosen as the parameter to be minimized. Optimum results were obtained with global Shannon entropy and 2≤K≤7, providing p < 0.01; for K=3, p = 2.86·10^-3. These findings open the door to novel perioperative diagnostic methods in order to avoid or reduce postoperative mitral valve regurgitation recurrences. Collapse Key Words Degenerative mitral valve disease K-means Machine learning Micro computerized tomography Mitral chordae tendineae Shannon entropy Collapse MESH Headings Humans Chordae Tendineae/pathology Collagen Endothelial Cells Mitral Valve/diagnostic imaging Mitral Valve Insufficiency/diagnostic imaging Mitral Valve Insufficiency/surgery X-Ray Microtomography Collapse Grants Collapse
49	Nichols L, Taverner T, Crowe F, Richardson S, Yau C, Kiddle S, Kirk P, Barrett J, Nirantharakumar K, Griffin S, Edwards D, Marshall T. In simulated data and health records, latent class analysis was the optimum multimorbidity clustering algorithm. J Clin Epidemiol 2022;152:164-175. [PMID: 36228971 PMCID: PMC7613854 DOI: 10.1016/j.jclinepi.2022.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 09/16/2022] [Accepted: 10/05/2022] [Indexed: 11/07/2022] Abstract BACKGROUND AND OBJECTIVES To investigate the reproducibility and validity of latent class analysis (LCA) and hierarchical cluster analysis (HCA), multiple correspondence analysis followed by k-means (MCA-kmeans) and k-means (kmeans) for multimorbidity clustering. METHODS We first investigated clustering algorithms in simulated datasets with 26 diseases of varying prevalence in predetermined clusters, comparing the derived clusters to known clusters using the adjusted Rand Index (aRI). We then them investigated the medical records of male patients, aged 65 to 84 years from 50 UK general practices, with 49 long-term health conditions. We compared within cluster morbidity profiles using the Pearson correlation coefficient and assessed cluster stability using in 400 bootstrap samples. RESULTS In the simulated datasets, the closest agreement (largest aRI) to known clusters was with LCA and then MCA-kmeans algorithms. In the medical records dataset, all four algorithms identified one cluster of 20-25% of the dataset with about 82% of the same patients across all four algorithms. LCA and MCA-kmeans both found a second cluster of 7% of the dataset. Other clusters were found by only one algorithm. LCA and MCA-kmeans clustering gave the most similar partitioning (aRI 0.54). CONCLUSION LCA achieved higher aRI than other clustering algorithms. Collapse Key Words Clustering methods Electronic medical records Hierarchical cluster analysis K-means Latent class analysis Multimorbidity Multiple correspondence analysis Collapse MESH Headings Humans Male Latent Class Analysis Multimorbidity Reproducibility of Results Algorithms Cluster Analysis Collapse Grants MC_UU_00002/5 Medical Research Council MC_UU_00006/6 Medical Research Council MR/P021573/1 Medical Research Council MR/S027602/1 Medical Research Council Collapse
50	Madival SD, Mishra DC, Sharma A, Kumar S, Maji AK, Budhlakoti N, Sinha D, Rai A. A Deep Clustering-based Novel Approach for Binning of Metagenomics Data. Curr Genomics 2022;23:353-368. [PMID: 36778191 PMCID: PMC9878855 DOI: 10.2174/1389202923666220928150100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/30/2022] [Accepted: 09/02/2022] [Indexed: 11/22/2022] Open Abstract Background One major challenge in binning Metagenomics data is the limited availability of reference datasets, as only 1% of the total microbial population is yet cultured. This has given rise to the efficacy of unsupervised methods for binning in the absence of any reference datasets. Objective To develop a deep clustering-based binning approach for Metagenomics data and to evaluate results with suitable measures. Methods In this study, a deep learning-based approach has been taken for binning the Metagenomics data. The results are validated on different datasets by considering features such as Tetra-nucleotide frequency (TNF), Hexa-nucleotide frequency (HNF) and GC-Content. Convolutional Autoencoder is used for feature extraction and for binning; the K-means clustering method is used. Results In most cases, it has been found that evaluation parameters such as the Silhouette index and Rand index are more than 0.5 and 0.8, respectively, which indicates that the proposed approach is giving satisfactory results. The performance of the developed approach is compared with current methods and tools using benchmarked low complexity simulated and real metagenomic datasets. It is found better for unsupervised and at par with semi-supervised methods. Conclusion An unsupervised advanced learning-based approach for binning has been proposed, and the developed method shows promising results for various datasets. This is a novel approach for solving the lack of reference data problem of binning in metagenomics. Collapse Key Words Binning K-means convolutional autoencoder deep clustering genomic features metagenomics Collapse MESH Headings Collapse Grants Collapse