51
|
Chen TL, Fushing H, Chou EP. Learned Practical Guidelines for Evaluating Conditional Entropy and Mutual Information in Discovering Major Factors of Response-vs.-Covariate Dynamics. Entropy (Basel) 2022; 24:1382. [PMID: 37420402 DOI: 10.3390/e24101382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 09/22/2022] [Accepted: 09/26/2022] [Indexed: 07/09/2023]
Abstract
We reformulate and reframe a series of increasingly complex parametric statistical topics into a framework of response-vs.-covariate (Re-Co) dynamics that is described without any explicit functional structures. Then we resolve these topics' data analysis tasks by discovering major factors underlying such Re-Co dynamics by only making use of data's categorical nature. The major factor selection protocol at the heart of Categorical Exploratory Data Analysis (CEDA) paradigm is illustrated and carried out by employing Shannon's conditional entropy (CE) and mutual information (I[Re;Co]) as the two key Information Theoretical measurements. Through the process of evaluating these two entropy-based measurements and resolving statistical tasks, we acquire several computational guidelines for carrying out the major factor selection protocol in a do-and-learn fashion. Specifically, practical guidelines are established for evaluating CE and I[Re;Co] in accordance with the criterion called [C1:confirmable]. Following the [C1:confirmable] criterion, we make no attempts on acquiring consistent estimations of these theoretical information measurements. All evaluations are carried out on a contingency table platform, upon which the practical guidelines also provide ways of lessening the effects of the curse of dimensionality. We explicitly carry out six examples of Re-Co dynamics, within each of which, several widely extended scenarios are also explored and discussed.
Collapse
Affiliation(s)
- Ting-Li Chen
- Institute of Statistical Science, Academia Sinica, Taipei 11529, Taiwan
| | - Hsieh Fushing
- Department of Statistics, University of California, Davis, CA 95616, USA
| | - Elizabeth P Chou
- Department of Statistics, National Chengchi University, Taipei 11605, Taiwan
| |
Collapse
|
52
|
Atia N, Benzaoui A, Jacques S, Hamiane M, Kourd KE, Bouakaz A, Ouahabi A. Particle Swarm Optimization and Two-Way Fixed-Effects Analysis of Variance for Efficient Brain Tumor Segmentation. Cancers (Basel) 2022; 14:4399. [PMID: 36139559 DOI: 10.3390/cancers14184399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 09/04/2022] [Accepted: 09/07/2022] [Indexed: 11/29/2022] Open
Abstract
Simple Summary Segmentation of brain tumor images from magnetic resonance imaging (MRI) is a challenging topic in medical image analysis. The brain tumor can take many shapes, and MRI images vary considerably in intensity, making lesion detection difficult for radiologists. This paper proposes a three-step approach to solving this problem: (1) pre-processing, based on morphological operations, is applied to remove the skull bone from the image; (2) the particle swarm optimization (PSO) algorithm, with a two-way fixed-effects analysis of variance (ANOVA)-based fitness function, is used to find the optimal block containing the brain lesion; (3) the K-means clustering algorithm is adopted, to classify the detected block as tumor or non-tumor. An extensive experimental analysis, including visual and statistical evaluations, was conducted, using two MRI databases: a private database provided by the Kouba imaging center—Algiers (KICA)—and the multimodal brain tumor segmentation challenge (BraTS) 2015 database. The results show that the proposed methodology achieved impressive performance, compared to several competing approaches. Abstract Segmentation of brain tumor images, to refine the detection and understanding of abnormal masses in the brain, is an important research topic in medical imaging. This paper proposes a new segmentation method, consisting of three main steps, to detect brain lesions using magnetic resonance imaging (MRI). In the first step, the parts of the image delineating the skull bone are removed, to exclude insignificant data. In the second step, which is the main contribution of this study, the particle swarm optimization (PSO) technique is applied, to detect the block that contains the brain lesions. The fitness function, used to determine the best block among all candidate blocks, is based on a two-way fixed-effects analysis of variance (ANOVA). In the last step of the algorithm, the K-means segmentation method is used in the lesion block, to classify it as a tumor or not. A thorough evaluation of the proposed algorithm was performed, using: (1) a private MRI database provided by the Kouba imaging center—Algiers (KICA); (2) the multimodal brain tumor segmentation challenge (BraTS) 2015 database. Estimates of the selected fitness function were first compared to those based on the sum-of-absolute-differences (SAD) dissimilarity criterion, to demonstrate the efficiency and robustness of the ANOVA. The performance of the optimized brain tumor segmentation algorithm was then compared to the results of several state-of-the-art techniques. The results obtained, by using the Dice coefficient, Jaccard distance, correlation coefficient, and root mean square error (RMSE) measurements, demonstrated the superiority of the proposed optimized segmentation algorithm over equivalent techniques.
Collapse
|
53
|
Li J, Huang J, Jiang T, Tu L, Cui L, Cui J, Ma X, Yao X, Shi Y, Wang S, Wang Y, Liu J, Li Y, Zhou C, Hu X, Xu J. A multi-step approach for tongue image classification in patients with diabetes. Comput Biol Med 2022; 149:105935. [PMID: 35986968 DOI: 10.1016/j.compbiomed.2022.105935] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 06/30/2022] [Accepted: 07/14/2022] [Indexed: 11/03/2022]
Abstract
BACKGROUND In China, diabetes is a common, high-incidence chronic disease. Diabetes has become a severe public health problem. However, the current diagnosis and treatment methods are difficult to control the progress of diabetes. Traditional Chinese Medicine (TCM) has become an option for the treatment of diabetes due to its low cost, good curative effect, and good accessibility. OBJECTIVE Based on the tongue images data to realize the fine classification of the diabetic population, provide a diagnostic basis for the formulation of individualized treatment plans for diabetes, ensure the accuracy and consistency of the TCM diagnosis, and promote the objective and standardized development of TCM diagnosis. METHODS We use the TFDA-1 tongue examination instrument to collect the tongue images of the subjects. Tongue Diagnosis Analysis System (TDAS) is used to extract the TDAS features of the tongue images. Vector Quantized Variational Autoencoder (VQ-VAE) extracts VQ-VAE features from tongue images. Based on VQ-VAE features, K-means clustering tongue images. TDAS features are used to describe the differences between clusters. Vision Transformer (ViT) combined with Grad-weighted Class Activation Mapping (Grad-CAM) is used to verify the clustering results and calculate positioning diagnostic information. RESULTS Based on VQ-VAE features, K-means divides the diabetic population into 4 clusters with clear boundaries. The silhouette, calinski harabasz, and davies bouldin scores are 0.391, 673.256, and 0.809, respectively. Cluster 1 had the highest Tongue Body L (TB-L) and Tongue Coating L (TC-L) and the lowest Tongue Coating Angular second moment (TC-ASM), with a pale red tongue and white coating. Cluster 2 had the highest TC-b with a yellow tongue coating. Cluster 3 had the highest TB-a with a red tongue. Group 4 had the lowest TB-L, TC-L, and TB-b and the highest Per-all with a purple tongue and the largest tongue coating area. ViT verifies the clustering results of K-means, the highest Top-1 Classification Accuracy (CA) is 87.8%, and the average CA is 84.4%. CONCLUSIONS The study organically combined unsupervised learning, self-supervised learning, and supervised learning and designed a complete diabetic tongue image classification method. This method does not rely on human intervention, makes decisions based entirely on tongue image data, and achieves state-of-the-art results. Our research will help TCM deeply participate in the individualized treatment of diabetes and provide new ideas for promoting the standardization of TCM diagnosis.
Collapse
Affiliation(s)
- Jun Li
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Jingbin Huang
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Tao Jiang
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Liping Tu
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Longtao Cui
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Ji Cui
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Xuxiang Ma
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Xinghua Yao
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Yulin Shi
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Sihan Wang
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Yu Wang
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Jiayi Liu
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China
| | - Yongzhi Li
- China Astronaut Research and Training Center, Beijing, 100084, China
| | - Changle Zhou
- Department of Intelligent Science and Technology, Xiamen University, 422 Siming South Road, Xiamen, Fujian, 361005, China
| | - Xiaojuan Hu
- Shanghai Collaborative Innovation Center of Health Service in Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China.
| | - Jiatuo Xu
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, 1200 Cailun Road, Shanghai, 201203, China.
| |
Collapse
|
54
|
Zhan X, Li Y, Liu Y, Cecchi NJ, Gevaert O, Zeineh MM, Grant GA, Camarillo DB. Piecewise Multivariate Linearity Between Kinematic Features and Cumulative Strain Damage Measure (CSDM) Across Different Types of Head Impacts. Ann Biomed Eng 2022; 50:1596-1607. [PMID: 35922726 DOI: 10.1007/s10439-022-03020-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 07/12/2022] [Indexed: 11/28/2022]
Abstract
In a previous study, we found that the relationship between brain strain and kinematic features cannot be described by a generalized linear model across different types of head impacts. In this study, we investigate if such a linear relationship exists when partitioning head impacts using a data-driven approach. We applied the K-means clustering method to partition 3161 impacts from various sources including simulation, college football, mixed martial arts, and car crashes. We found piecewise multivariate linearity between the cumulative strain damage (CSDM; assessed at the threshold of 0.15) and head kinematic features. Compared with the linear regression models without partition and the partition according to the types of head impacts, K-means-based data-driven partition showed significantly higher CSDM regression accuracy, which suggested the presence of piecewise multivariate linearity across types of head impacts. Additionally, we compared the piecewise linearity with the partitions based on individual features used in clustering. We found that the partition with maximum angular acceleration magnitude at 4706 rad/s2 led to the highest piecewise linearity. This study may contribute to an improved method for the rapid prediction of CSDM in the future.
Collapse
Affiliation(s)
- Xianghao Zhan
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Yiheng Li
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Yuzhe Liu
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA.
| | - Nicholas J Cecchi
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| | - Olivier Gevaert
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.,Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, 94305, USA
| | - Michael M Zeineh
- Department of Radiology, Stanford University, Stanford, CA, 94305, USA
| | - Gerald A Grant
- Department of Neurosurgery, Stanford University, Stanford, CA, 94305, USA
| | - David B Camarillo
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
55
|
Pérez-Campuzano D, Rubio Andrada L, Morcillo Ortega P, López-Lázaro A. Visualizing the historical COVID-19 shock in the US airline industry: A Data Mining approach for dynamic market surveillance. J Air Transp Manag 2022; 101:102194. [PMID: 36568914 PMCID: PMC9759375 DOI: 10.1016/j.jairtraman.2022.102194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 02/21/2022] [Accepted: 02/21/2022] [Indexed: 06/17/2023]
Abstract
One of the purposes of Artificial Intelligence tools is to ease the analysis of large amounts of data. In order to support the strategic decision-making process of the airlines, this paper proposes a Data Mining approach (focused on visualization) with the objective of extracting market knowledge from any database of industry players or competitors. The method combines two clustering techniques (Self-Organizing Maps, SOMs, and K-means) via unsupervised learning with promising dynamic applications in different sectors. As a case study, 30-year data from 18 diverse US passenger airlines is used to showcase the capabilities of this tool including the identification and assessment of market trends, M&A events or the COVID-19 consequences.
Collapse
Affiliation(s)
- Darío Pérez-Campuzano
- Universidad Autónoma de Madrid (UAM), Facultad de Ciencias Económicas y Empresariales, Calle Francisco Tomás y Valiente N5, 29049, Madrid, Spain
- LLM Aviation, Paseo de la Habana N26, 28036, Madrid, Spain
| | - Luis Rubio Andrada
- Universidad Autónoma de Madrid (UAM), Facultad de Ciencias Económicas y Empresariales, Calle Francisco Tomás y Valiente N5, 29049, Madrid, Spain
| | - Patricio Morcillo Ortega
- Universidad Autónoma de Madrid (UAM), Facultad de Ciencias Económicas y Empresariales, Calle Francisco Tomás y Valiente N5, 29049, Madrid, Spain
| | - Antonio López-Lázaro
- Universidad Politécnica de Madrid (UPM), Escuela Técnica Superior de Ingeniería Aeronáutica y del Espacio, Plaza del Cardenal Cisneros N3, 28040, Madrid, Spain
- Euroairlines, Paseo de la Habana N26, 28036, Madrid, Spain
| |
Collapse
|
56
|
Parvizi S, Eslamian S, Gheysari M, Gohari A, Kopai SS. Regional frequency analysis of drought severity and duration in Karkheh River Basin, Iran using univariate L-moments method. Environ Monit Assess 2022; 194:336. [PMID: 35389125 DOI: 10.1007/s10661-022-09977-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 03/19/2022] [Indexed: 06/14/2023]
Abstract
Drought is one of the natural disasters that causes a great damage to human life and natural ecosystems. The main differences are in the gradual effect of drought over a relatively long period, impossibility of accurately determining time of the beginning and end of drought, and geographical extent of the associated effects. On the other hand, lack of a universally accepted definition of drought has added to the complexity of this phenomenon. In the last decade, due to increasing frequency of drought in Iran and reduction of water resources, its consequences have become apparent and have caused problems for planners and managers. So in this research, regional frequency analysis using L-moments methods was performed to investigate severity and duration of Standardized Precipitation Index (SPI), Standardized Evapotranspiration Index (SEI), Standardized Runoff Index (SRI), and Standardized Soil Moisture Index (SSI) and to study of meteorological, agricultural, and hydrological droughts in Karkheh River Basin in Iran. Using K-means clustering method, basin was divided into four homogeneous areas. Uncoordinated stations in each cluster were removed. The best regional distribution function was selected for each homogeneous region, and it was found that Pearson type (3) has the highest fit on the data set in the basin. Based on Hosking and Wallis heterogeneity test, Karkheh Basin with H1 < 1 was identified as acceptable homogeneous in all clusters. The results showed that hydrological drought occurs with a very short time delay in Karkheh River Basin after the meteorological drought, and two indicators show meteorological and hydrological drought conditions well. Agricultural drought occurs after meteorological and hydrological drought, respectively, and its severity and duration are less than the other indicators. Meteorological, hydrological, and agricultural droughts do not occur at the same time in all of the years. In general, the SPI drought index shows the most severe droughts compared with the other three indices. By this way, in 5- to 20-year return period with severity of 3SPI and in 20- to 100-year return period with severity of 7SPI, region IV or the western and northwestern areas of the basin has been affected by severe meteorological drought. By using the regional standardized quantities, it is possible to estimate the probability of drought in any part of the catchment that does not have sufficient data for hydrological studies.
Collapse
Affiliation(s)
- Saeideh Parvizi
- Water Engineering Department, Faculty of Agriculture Engineering, Isfahan University of Technology, 8415683111, Isfahan, Iran.
| | - Saeid Eslamian
- Water Engineering Department, Faculty of Agriculture Engineering, Isfahan University of Technology, 8415683111, Isfahan, Iran
| | - Mahdi Gheysari
- Water Engineering Department, Faculty of Agriculture Engineering, Isfahan University of Technology, 8415683111, Isfahan, Iran
| | - Alireza Gohari
- Water Engineering Department, Faculty of Agriculture Engineering, Isfahan University of Technology, 8415683111, Isfahan, Iran
| | - Saeid Soltani Kopai
- Department of Rangeland and Watershed, Faculty of Natural Resources, Isfahan University of Technology, 8415683111, Isfahan, Iran
| |
Collapse
|
57
|
Pathak S, Raj R, Singh K, Verma PK, Kumar B. Development of portable and robust cataract detection and grading system by analyzing multiple texture features for Tele-Ophthalmology. Multimed Tools Appl 2022; 81:23355-23371. [PMID: 35317470 PMCID: PMC8931454 DOI: 10.1007/s11042-022-12544-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 02/18/2021] [Accepted: 01/31/2022] [Indexed: 06/14/2023]
Abstract
This paper presents a low cost, robust, portable and automated cataract detection system which can detect the presence of cataract from the colored digital eye images and grade their severity. Ophthalmologists detect cataract through visual screening using ophthalmoscope and slit lamps. Conventionally a patient has to visit an ophthalmologist for eye screening and treatment follows the course. Developing countries lack the proper health infrastructure and face huge scarcity of trained medical professionals as well as technicians. The condition is not very satisfactory with the rural and remote areas of developed nations. To bridge this barrier between the patient and the availability of resources, current work focuses on the development of portable low-cost, robust cataract screening and grading system. Similar works use fundus and retinal images which use costly imaging modules and image based detection algorithms which use much complex neural network models. Current work derives its benefit from the advancements in digital image processing techniques. A set of preprocessing has been done on the colored eye image and later texture information in form of mean intensity, uniformity, standard deviation and randomness has been calculated and mapped with the diagnostic opinion of doctor for cataract screening of over 200 patients. For different grades of cataract severity edge pixel count was calculated as per doctor's opinion and later these data are used for calculating the thresholds using hybrid k-means algorithm, for giving a decision on the presence of cataract and grade its severity. Low value of uniformity and high value of other texture parameters confirm the presence of cataract as clouding in eye lens causes the uniformity function to take lower value due to presence of coarse texture. Higher the edge pixel count value, this confirms the presence of starting of cataract as solidified regions in lens are nonuniform. Lower value corresponds to fully solidified region or matured cataract. Proposed algorithm was initially developed on MATLAB, and tested on over 300 patients in an eye camp. The system has shown more than 98% accuracy in detection and grading of cataract. Later a cloud based system was developed with 3D printed image acquisition module to manifest an automated, portable and efficient cataract detection system for Tele-Ophthalmology. The proposed system uses a very simple and efficient technique by mapping the diagnostic opinion of the doctor as well, giving very promising results which suggest its potential use in teleophthalmology applications to reduce the cost of delivering eye care services and increasing its reach effectively. Developed system is simple in design and easy to operate and suitable for mass screening of cataracts. Due to non-invasive and non-mydriatic and mountable nature of device, in person screening is not required. Hence, social distancing norms are easy to follow and device is very useful in COVID-19 like situation.
Collapse
Affiliation(s)
- Shashwat Pathak
- Department of Electronics and Communication Engineering, MIET, Meerut, 250005 India
| | - Rahul Raj
- Electro Curietech Private Limited, Incubation Centre IIT Patna, Patna, 801103 India
| | - Kartik Singh
- Computer Science Department, Deepcompute Software (India) Pvt. Ltd., Bengaluru, India
| | - Pawan Kumar Verma
- Department of Electronics & Communication, Dr. B.R Ambedkar NIT Jalandhar, Jalandhar, Punjab 144011 India
| | | |
Collapse
|
58
|
Dong Q, Cao M, Gu F, Gong W, Cai Q. Method for puncture trajectory planning in liver tumors thermal ablation based on NSGA-III. Technol Health Care 2022; 30:1243-1256. [PMID: 35342068 DOI: 10.3233/thc-213592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Thermal ablation of liver tumors is a conventional mode for treating liver tumors. In order to reduce the damage to normal tissue endangered by thermal ablation, the physician needs to plan the puncture path before surgery. OBJECTIVE In this paper, a puncture trajectory planning method for thermal ablation of liver tumor based on NSGA-III is proposed. This method takes the clinical hard constraints and soft constraints into account. METHOD The feasible puncture region is solved by the hard constraints, and after that the pareto front points are obtained under the soft constraints. When accessing the feasible puncture region, an adaptive morphological closing operation method based on K-means algorithm is adopted to process the spherical angle binary image of obstacles that might be encountered in the puncture process. RANSAC is performed to fit the tangent plane of liver surface when calculating the angle between the puncture trajectory and liver surface. In order to evaluate the puncture path obtained by this method, 6 tumors are selected as experimental subjects, and Hausdorff distance and Overlap Rate of Pareto front points with manually recommend points are calculated respectively. RESULTS The average value of Hausdorff distance is 24.91 mm, and the mean value of the overlap rate is 86.43%. CONCLUSION The proposed method can provide high safety and clinical practice of the puncture route.
Collapse
|
59
|
Ferreira A, Bressan C, Hardy SV, Saghatelyan A. Deciphering heterogeneous populations of migrating cells based on the computational assessment of their dynamic properties. Stem Cell Reports 2022:S2213-6711(22)00100-X. [PMID: 35303437 DOI: 10.1016/j.stemcr.2022.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Revised: 02/17/2022] [Accepted: 02/18/2022] [Indexed: 11/23/2022] Open
Abstract
Neuronal migration is a highly dynamic process, and multiple cell movement metrics can be extracted from time-lapse imaging datasets. However, these parameters alone are often insufficient to evaluate the heterogeneity of neuroblast populations. We developed an analytical pipeline based on reducing the dimensions of the dataset by principal component analysis (PCA) and determining sub-populations using k-means, supported by the elbow criterion method and validated by a decision tree algorithm. We showed that neuroblasts derived from the same adult neural stem cell (NSC) lineage as well as across different lineages are heterogeneous and can be sub-divided into different clusters based on their dynamic properties. Interestingly, we also observed overlapping clusters for neuroblasts derived from different NSC lineages. We further showed that genetic perturbations or environmental stimuli affect the migratory properties of neuroblasts in a sub-cluster-specific manner. Our data thus provide a framework for assessing the heterogeneity of migrating neuroblasts. Pipeline to study the heterogeneity of migrating cells based on their dynamic properties Neuroblasts derived from the same neural stem cell (NSC) lineage are heterogeneous Neuroblasts derived from different NSC lineages have overlapping and distinct clusters These clusters are differently affected by genetic factors or environmental stimuli
Collapse
|
60
|
Clark S, Lomax N, Birkin M, Morris M. A foresight whole systems obesity classification for the English UK biobank cohort. BMC Public Health 2022; 22:349. [PMID: 35180877 PMCID: PMC8856870 DOI: 10.1186/s12889-022-12650-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 01/18/2022] [Indexed: 12/20/2022] Open
Abstract
Background The number of people living with obesity or who are overweight presents a global challenge, and the development of effective interventions is hampered by a lack of research which takes a joined up, whole system, approach that considers multiple elements of the complex obesity system together. We need to better understand the collective characteristics and behaviours of those who are overweight or have obesity and how these differ from those who maintain a healthy weight. Methods Using the UK Biobank cohort we develop an obesity classification system using k-means clustering. Variable selection from the UK Biobank cohort is informed by the Foresight obesity system map across key domains (Societal Influences, Individual Psychology, Individual Physiology, Individual Physical Activity, Physical Activity Environment). Results Our classification identifies eight groups of people, similar in respect to their exposure to known drivers of obesity: ‘Younger, urban hard-pressed’, ‘Comfortable, fit families’, ‘Healthy, active and retirees’, ‘Content, rural and retirees’, ‘Comfortable professionals’, ‘Stressed and not in work’, ‘Deprived with less healthy lifestyles’ and ‘Active manual workers’. Pen portraits are developed to describe the characteristics of these different groups. Multinomial logistic regression is used to demonstrate that the classification can effectively detect groups of individuals more likely to be living with overweight or obesity. The group identified as ‘Comfortable, fit families’ are observed to have a higher proportion of healthy weight, while three groups have increased relative risk of being overweight or having obesity: ‘Active manual workers’, ‘Stressed and not in work’ and ‘Deprived with less healthy lifestyles’. Conclusions This paper presents the first study of UK Biobank participants to adopt this obesity system approach to characterising participants. It provides an innovative new approach to better understand the complex drivers of obesity which has the potential to produce meaningful tools for policy makers to better target interventions across the whole system to reduce overweight and obesity. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-022-12650-x.
Collapse
Affiliation(s)
- Stephen Clark
- Consumer Data Research Centre and School of Geography, University of Leeds, LEEDS, LS2 9JT, UK.
| | - Nik Lomax
- School of Geography and Consumer Data Research Centre, University of Leeds, LEEDS, LS2 9JT, UK
| | - Mark Birkin
- Consumer Data Research Centre and School of Geography, University of Leeds, LEEDS, LS2 9JT, UK
| | - Michelle Morris
- School of Medicine and Consumer Data Research Centre, University of Leeds, LEEDS, UK
| |
Collapse
|
61
|
Mansoldo FRP, Berrino E, Guglielmi P, Carradori S, Carta F, Secci D, Supuran CT, Vermelho AB. An innovative spectroscopic approach for qualitative and quantitative evaluation of Mb-CO from myoglobin carbonylation reaction through chemometrics methods. Spectrochim Acta A Mol Biomol Spectrosc 2022; 267:120602. [PMID: 34801390 DOI: 10.1016/j.saa.2021.120602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/13/2021] [Accepted: 11/07/2021] [Indexed: 06/13/2023]
Abstract
In this work, an innovative approach using K-means and multivariate curve resolution-purity based algorithm (MCR-Purity) for the evaluation and quantification of carboxymyoglobin (Mb-CO) formation from Deoxy-Myoglobin (Deoxy-Mb) was presented. Through a multilevel multifactor experimental design, samples with different concentrations of Mb-CO were created. The UV-Vis spectra of these samples were submitted to K-means analysis, finding 3 clusters. The mean spectra of the clusters were extracted and it was possible to detect 2 totally differentiable groups through peaks 423 and 434 nm, which are wavelengths related to the Mb-CO and Deoxy-Mb components, respectively. The spectral data were subjected to MCR-Purity analysis. The MCR-Purity result successfully described the analyzed reaction, explaining more than 99.9% of the variance (R2) with a LOF of 1.43%. Then, a predictive model of MbCO was created through the linear relationship between MCR-Purity contributions and known concentrations of MbCO. The performance parameters of the created predictive model were R2CV = 0.98, RMSECV = 0.58 and RPDcv = 7.8 for the training set, and R2P = 0.98, RMSEP = 0.7 and RPDp = 6.8 for the test set. Thus, the predictive model presented an excellent performance considering that the Mb-CO variation is comprised between 0 and 21 µM. Therefore, these results demonstrate that the application of the proposed strategy to the analysis of spectral data presenting overlapping bands is feasible and robust.
Collapse
Affiliation(s)
- Felipe R P Mansoldo
- Federal University of Rio de Janeiro (UFRJ), Institute of Microbiology Paulo de Góes, BIOINOVAR - Biocatalysis, Bioproducts and Bioenergy, Rio de Janeiro, Brazil
| | - Emanuela Berrino
- Università degli Studi di Firenze, NEUROFARBA Dept., Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy; Department of Drug Chemistry and Technologies, Sapienza University of Rome, P.le A. Moro 5, 00185 Rome, Italy
| | - Paolo Guglielmi
- Department of Drug Chemistry and Technologies, Sapienza University of Rome, P.le A. Moro 5, 00185 Rome, Italy
| | - Simone Carradori
- Department of Pharmacy, "G. d'Annunzio" University of Chieti-Pescara, via dei Vestini 31, 66100 Chieti, Italy
| | - Fabrizio Carta
- Università degli Studi di Firenze, NEUROFARBA Dept., Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy
| | - Daniela Secci
- Department of Drug Chemistry and Technologies, Sapienza University of Rome, P.le A. Moro 5, 00185 Rome, Italy
| | - Claudiu T Supuran
- Università degli Studi di Firenze, NEUROFARBA Dept., Sezione di Scienze Farmaceutiche, Via Ugo Schiff 6, 50019 Sesto Fiorentino (Florence), Italy
| | - Alane B Vermelho
- Federal University of Rio de Janeiro (UFRJ), Institute of Microbiology Paulo de Góes, BIOINOVAR - Biocatalysis, Bioproducts and Bioenergy, Rio de Janeiro, Brazil.
| |
Collapse
|
62
|
Chen J, Fu Y, Hu J, He J. Hypoxia-related gene signature for predicting LUAD patients' prognosis and immune microenvironment. Cytokine 2022; 152:155820. [PMID: 35176657 DOI: 10.1016/j.cyto.2022.155820] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 01/10/2022] [Accepted: 01/29/2022] [Indexed: 12/11/2022]
Abstract
Lung adenocarcinoma (LUAD) is a prevalent lung cancer histology with high morbidity and mortality. Moreover, assessment approaches for patients' prognoses are still not effective. Based on mRNA expression and clinical data from the Cancer Genome Atlas (TCGA)-LUAD data set, we utilized hypoxia-related gene set in MsigDB database to identify hypoxia-related differentially expressed genes (DEGs). On the basis of levels of hypoxia-related DEGs, K-means consensus clustering was introduced to divide LUAD patients into subgroups. After hypoxia-related DEGs were analyzed through univariate, Lasso and multivariate Cox regression analyses, 6 of them were determined to be used for evaluating LUAD patients' prognostic signature. With median risk score obtained from hypoxia-related gene signature as threshold, LUAD patients were divided into high- and low-risk groups. Besides, Kaplan-Meier curves, receiver operator characteristic (ROC) curves, univariate and multivariate Cox regression analyses verified that hypoxia-related gene signature was an important prognostic factor independent of clinical features. Gene set enrichment analysis (GSEA) displayed that pathways which showed differences between high- and low-risk groups in activation of pentose-phosphate pathway and p53 signaling pathway. CIBERSORT was utilized to assess infiltration level of each immune cell from two groups, indicating the differences in infiltration abundance of Plasma cells, T cells CD4+ memory activated and Macrophages M1 cells between high- and low-risk groups. We drew a nomogram for predicting one-, three- and five-year survival of LUAD patients following risk scores of hypoxia-related gene signature and six clinical factors. Calibration curves showed a high fit between survival predicted by nomogram and actual survival. In conclusion, hypoxia-related gene signature can be introduced for predicting LUAD patients' prognosis and assessment of the patients' immune microenvironment, guiding clinicians to make appropriate decisions during diagnosis and treatment of LUAD patients.
Collapse
|
63
|
Hu J, Chen J, Zhu P, Hao S, Wang M, Li H, Liu N. Difference and Cluster Analysis on the Carbon Dioxide Emissions in China During COVID-19 Lockdown via a Complex Network Model. Front Psychol 2022; 12:795142. [PMID: 35095680 PMCID: PMC8790068 DOI: 10.3389/fpsyg.2021.795142] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 12/16/2021] [Indexed: 12/23/2022] Open
Abstract
The continuous increase of carbon emissions is a serious challenge all over the world, and many countries are striving to solve this problem. Since 2020, a widespread lockdown in the country to prevent the spread of COVID-19 escalated, severely restricting the movement of people and unnecessary economic activities, which unexpectedly reduced carbon emissions. This paper aims to analyze the carbon emissions data of 30 provinces in the 2020 and provide references for reducing emissions with epidemic lockdown measures. Based on the method of time series visualization, we transform the time series data into complex networks to find out the hidden information in these data. We found that the lockdown would bring about a short-term decrease in carbon emissions, and most provinces have a short time point of impact, which is closely related to the level of economic development and industrial structure. The current results provide some insights into the evolution of carbon emissions under COVID-19 blockade measures and valuable insights into energy conservation and response to the energy crisis in the post-epidemic era.
Collapse
Affiliation(s)
- Jun Hu
- School of Economics and Management, Fuzhou University, Fuzhou, China
| | - Junhua Chen
- School of Management Science and Engineering, Central University of Finance and Economics, Beijing, China
| | - Peican Zhu
- School of Artificial Intelligence, Optics and Electronics, Northwestern Polytechnical University, Xi'an, China
| | - Shuya Hao
- School of Management Science and Engineering, Central University of Finance and Economics, Beijing, China
| | - Maoze Wang
- School of Management Science and Engineering, Central University of Finance and Economics, Beijing, China
| | - Huijia Li
- School of Science, Beijing Post and Telecommunications University, Beijing, China
| | - Na Liu
- School of Management Science and Engineering, Central University of Finance and Economics, Beijing, China
| |
Collapse
|
64
|
Sawalmeh A, Othman NS, Liu G, Khreishah A, Alenezi A, Alanazi A. Power-Efficient Wireless Coverage Using Minimum Number of UAVs. Sensors (Basel) 2021; 22:s22010223. [PMID: 35009766 PMCID: PMC8749821 DOI: 10.3390/s22010223] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 12/17/2021] [Accepted: 12/23/2021] [Indexed: 11/16/2022]
Abstract
Unmanned aerial vehicles (UAVs) can be deployed as backup aerial base stations due to cellular outage either during or post natural disaster. In this paper, an approach involving multi-UAV three-dimensional (3D) deployment with power-efficient planning was proposed with the objective of minimizing the number of UAVs used to provide wireless coverage to all outdoor and indoor users that minimizes the required UAV transmit power and satisfies users’ required data rate. More specifically, the proposed algorithm iteratively invoked a clustering algorithm and an efficient UAV 3D placement algorithm, which aimed for maximum wireless coverage using the minimum number of UAVs while minimizing the required UAV transmit power. Two scenarios where users are uniformly and non-uniformly distributed were considered. The proposed algorithm that employed a Particle Swarm Optimization (PSO)-based clustering algorithm resulted in a lower number of UAVs needed to serve all users compared with that when a K-means clustering algorithm was employed. Furthermore, the proposed algorithm that iteratively invoked a PSO-based clustering algorithm and PSO-based efficient UAV 3D placement algorithms reduced the execution time by a factor of ≈1/17 and ≈1/79, respectively, compared to that when the Genetic Algorithm (GA)-based and Artificial Bees Colony (ABC)-based efficient UAV 3D placement algorithms were employed. For the uniform distribution scenario, it was observed that the proposed algorithm required six UAVs to ensure 100% user coverage, whilst the benchmarker algorithm that utilized Circle Packing Theory (CPT) required five UAVs but at the expense of 67% of coverage density.
Collapse
Affiliation(s)
- Ahmad Sawalmeh
- Computer Science Department, Northern Border University, Arar 91431, Saudi Arabia
- Remote Sensing Unit, Northern Border University, Arar 91431, Saudi Arabia;
- Correspondence: or
| | - Noor Shamsiah Othman
- Department of Electrical and Electronics Engineering, Universiti Tenaga Nasional, Kajang 43000, Selangor, Malaysia;
| | - Guanxiong Liu
- Department of Electrical and Computer Engineering, Newark College of Engineering, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA; (G.L.); (A.K.)
| | - Abdallah Khreishah
- Department of Electrical and Computer Engineering, Newark College of Engineering, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA; (G.L.); (A.K.)
| | - Ali Alenezi
- Remote Sensing Unit, Northern Border University, Arar 91431, Saudi Arabia;
- Electrical Engineering Department, Northern Border University, Arar 91431, Saudi Arabia;
| | - Abdulaziz Alanazi
- Electrical Engineering Department, Northern Border University, Arar 91431, Saudi Arabia;
| |
Collapse
|
65
|
Duan T, Kuang Z, Wang J, Ma Z. GBDTLRL2D Predicts LncRNA-Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network. Front Cell Dev Biol 2021; 9:753027. [PMID: 34977011 PMCID: PMC8718797 DOI: 10.3389/fcell.2021.753027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 11/22/2021] [Indexed: 12/16/2022] Open
Abstract
In recent years, the long noncoding RNA (lncRNA) has been shown to be involved in many disease processes. The prediction of the lncRNA-disease association is helpful to clarify the mechanism of disease occurrence and bring some new methods of disease prevention and treatment. The current methods for predicting the potential lncRNA-disease association seldom consider the heterogeneous networks with complex node paths, and these methods have the problem of unbalanced positive and negative samples. To solve this problem, a method based on the Gradient Boosting Decision Tree (GBDT) and logistic regression (LR) to predict the lncRNA-disease association (GBDTLRL2D) is proposed in this paper. MetaGraph2Vec is used for feature learning, and negative sample sets are selected by using K-means clustering. The innovation of the GBDTLRL2D is that the clustering algorithm is used to select a representative negative sample set, and the use of MetaGraph2Vec can better retain the semantic and structural features in heterogeneous networks. The average area under the receiver operating characteristic curve (AUC) values of GBDTLRL2D obtained on the three datasets are 0.98, 0.98, and 0.96 in 10-fold cross-validation.
Collapse
Affiliation(s)
| | - Zhufang Kuang
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | | | | |
Collapse
|
66
|
Alexander N, Alexander DC, Barkhof F, Denaxas S. Identifying and evaluating clinical subtypes of Alzheimer's disease in care electronic health records using unsupervised machine learning. BMC Med Inform Decis Mak 2021; 21:343. [PMID: 34879829 PMCID: PMC8653614 DOI: 10.1186/s12911-021-01693-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 11/15/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Alzheimer's disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable. METHODS We identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer's disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets. RESULTS We identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42-73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters. CONCLUSION Each clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR.
Collapse
Affiliation(s)
- Nonie Alexander
- Institute of Health Informatics, University College London, London, UK. .,Health Data Research UK, London, UK.
| | - Daniel C Alexander
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - Frederik Barkhof
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK.,UCL Institute of Neurology, University College London, London, UK.,Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, UK.,Health Data Research UK, London, UK.,Alan Turing Institute, London, UK
| |
Collapse
|
67
|
Roni RG, Tsipi H, Ofir BA, Nir S, Robert K. Disease evolution and risk-based disease trajectories in congestive heart failure patients. J Biomed Inform 2021; 125:103949. [PMID: 34875386 DOI: 10.1016/j.jbi.2021.103949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 10/10/2021] [Accepted: 11/03/2021] [Indexed: 11/28/2022]
Abstract
Congestive Heart Failure (CHF) is among the most prevalent chronic diseases worldwide, and is commonly associated with comorbidities and complex health conditions. Consequently, CHF patients are typically hospitalized frequently, and are at a high risk of premature death. Early detection of an envisaged patient disease trajectory is crucial for precision medicine. However, despite the abundance of patient-level data, cardiologists currently struggle to identify disease trajectories and track the evolution patterns of the disease over time, especially in small groups of patients with specific disease subtypes. The present study proposed a five-step method that allows clustering CHF patients, detecting cluster similarity, and identifying disease trajectories, and promises to overcome the existing difficulties. This work is based on a rich dataset of patients' records spanning ten years of hospital visits. The dataset contains all the health information documented in the hospital during each visit, including diagnoses, lab results, clinical data, and demographics. It utilizes an innovative Cluster Evolution Analysis (CEA) method to analyze the complex CHF population where each subject is potentially associated with numerous variables. We have defined sub-groups for mortality risk levels, which we used to characterize patients' disease evolution by refined data clustering in three points in time over ten years, and generating patients' migration patterns across periods. The results elicited 18, 23, and 25 clusters respective to the first, second, and third visits, uncovering clinically interesting small sub-groups of patients. In the following post-processing stage, we identified meaningful patterns. The analysis yielded fine-grained patient clusters divided into several finite risk levels, including several small-sized groups of high-risk patients. Significantly, the analysis also yielded longitudinal patterns where patients' risk levels changed over time. Four types of disease trajectories were identified: decline, preserved state, improvement, and mixed-progress. This stage is a unique contribution of the work. The resulting fine partitioning and longitudinal insights promise to significantly assist cardiologists in tailoring personalized interventions to improve care quality. Cardiologists could utilize these results to glean previously undetected relationships between symptoms and disease evolution that would allow a more informed clinical decision-making and effective interventions.
Collapse
Affiliation(s)
| | | | | | - Shlomo Nir
- The Leviev Heart Center, Sheba Medical Center, Israel.
| | | |
Collapse
|
68
|
Kamat PV, Sugandhi R, Kumar S. Deep learning-based anomaly-onset aware remaining useful life estimation of bearings. PeerJ Comput Sci 2021; 7:e795. [PMID: 34909464 PMCID: PMC8641573 DOI: 10.7717/peerj-cs.795] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 11/03/2021] [Indexed: 06/01/2023]
Abstract
Remaining Useful Life (RUL) estimation of rotating machinery based on their degradation data is vital for machine supervisors. Deep learning models are effective and popular methods for forecasting when rotating machinery such as bearings may malfunction and ultimately break down. During healthy functioning of the machinery, however, RUL is ill-defined. To address this issue, this study recommends using anomaly monitoring during both RUL estimator training and operation. Essential time-domain data is extracted from the raw bearing vibration data, and deep learning models are used to detect the onset of the anomaly. This further acts as a trigger for data-driven RUL estimation. The study employs an unsupervised clustering approach for anomaly trend analysis and a semi-supervised method for anomaly detection and RUL estimation. The novel combined deep learning-based anomaly-onset aware RUL estimation framework showed enhanced results on the benchmarked PRONOSTIA bearings dataset under non-varying operating conditions. The framework consisting of Autoencoder and Long Short Term Memory variants achieved an accuracy of over 90% in anomaly detection and RUL prediction. In the future, the framework can be deployed under varying operational situations using the transfer learning approach.
Collapse
Affiliation(s)
- Pooja Vinayak Kamat
- Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Department of CSE and IT, MIT School of Engineering, MIT-ADT University, Pune, India
| | - Rekha Sugandhi
- Department of CSE and IT, MIT School of Engineering, MIT-ADT University, Pune, India
| | - Satish Kumar
- Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, India
- Symbiosis Centre for Applied Artificial Intelligence, Symbiosis International (Deemed University), Pune, India
| |
Collapse
|
69
|
Oukil S, Kasmi R, Mokrani K, García-Zapirain B. Automatic segmentation and melanoma detection based on color and texture features in dermoscopic images. Skin Res Technol 2021; 28:203-211. [PMID: 34779062 PMCID: PMC9907597 DOI: 10.1111/srt.13111] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2021] [Accepted: 09/25/2021] [Indexed: 02/06/2023]
Abstract
PURPOSE Melanoma is known as the most aggressive form of skin cancer and one of the fastest growing malignant tumors worldwide. Several computer-aided diagnosis systems for melanoma have been proposed, still, the algorithms encounter difficulties in the early stage of lesions. This paper aims to discriminate melanoma and benign skin lesion in dermoscopic images. METHODS The proposed algorithm is based on the color and texture of skin lesions by introducing a novel feature extraction technique. The algorithm uses an automatic segmentation based on k-means generating a fairly accurate mask for each lesion. The feature extraction consists of the existing and novel color and texture attributes measuring how color and texture vary inside the lesion. To find the optimal results, all the attributes are extracted from lesions in five different color spaces (RGB, HSV, Lab, XYZ, and YCbCr) and used as the inputs for three classifiers (K nearest neighbors, support vector machine , and artificial neural network). RESULTS The PH2 set is used to assess the performance of the proposed algorithm. The results of our algorithm are compared to the results of published articles that used the same dataset, and it shows that the proposed method outperforms the state of the art by attaining a sensitivity of 99.25%, specificity of 99.58%, and accuracy of 99.51%. CONCLUSION The final results show that the colors combined with texture are powerful and relevant attributes for melanoma detection and show improvement over the state of the art.
Collapse
Affiliation(s)
- S Oukil
- LTII Laboratory University of Bejaia-Algeria, Faculty of Technology, University of Bejaia, Bejaia, Algeria
| | - R Kasmi
- LTII Laboratory University of Bejaia-Algeria, Faculty of Technology, University of Bejaia, Bejaia, Algeria.,Electrical Engineering Department, University of Bouira, Bouira, Algeria
| | - K Mokrani
- LTII Laboratory University of Bejaia-Algeria, Faculty of Technology, University of Bejaia, Bejaia, Algeria
| | | |
Collapse
|
70
|
da Costa JP, Garcia A. New confinement index and new perspective for comparing countries - COVID-19. Comput Methods Programs Biomed 2021; 210:106346. [PMID: 34464767 PMCID: PMC8418097 DOI: 10.1016/j.cmpb.2021.106346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 08/03/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVE In the difficult problem of comparing countries regarding their lockdown measures or deaths caused by the COVID-19, there is still no agreement on what is the best strategy to follow. Thus, we propose a new way of comparison countries that avoids the main difficulties in the comparison by using three-dimensional trajectories for this type of data. METHODS We introduce a new index to analyze the level of confinement that each country was subject to overtime, based on the Community Mobility Reports published by Google resorting to Principal Component Analysis. Subsequently, by using longitudinal clustering, we divide the European countries into similar groups according to the COVID-19 obits and also to the confinement index. However, to make the most out of the clustering methods we resort to artificial longitudinal data to evaluate both the methods and the indices. RESULTS By using artificial data, we discover that Calinski-Harabasz outperformed other internal indices in indicating the real number of clusters. The tests also suggested that K-means with Euclidean distance was the best method among the ones studied. With the application to both the mobility and fatalities datasets, we found two groups in each one. CONCLUSIONS Our analysis enables us to discover that European northern countries had more mobility during the first confinement and that the deaths caused by COVID-19 started to drop around the 40th day since the first death.
Collapse
Affiliation(s)
| | - André Garcia
- University of Porto, Rua do Campo Alegre, 687, Porto 4169-007, Portugal
| |
Collapse
|
71
|
Jones PJ, Catt M, Davies MJ, Edwardson CL, Mirkes EM, Khunti K, Yates T, Rowlands AV. Feature selection for unsupervised machine learning of accelerometer data physical activity clusters - A systematic review. Gait Posture 2021; 90:120-8. [PMID: 34438293 DOI: 10.1016/j.gaitpost.2021.08.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 03/03/2021] [Accepted: 08/08/2021] [Indexed: 02/02/2023]
Abstract
BACKGROUND Identifying clusters of physical activity (PA) from accelerometer data is important to identify levels of sedentary behaviour and physical activity associated with risks of serious health conditions and time spent engaging in healthy PA. Unsupervised machine learning models can capture PA in everyday free-living activity without the need for labelled data. However, there is scant research addressing the selection of features from accelerometer data. The aim of this systematic review is to summarise feature selection techniques applied in studies concerned with unsupervised machine learning of accelerometer-based device obtained physical activity, and to identify commonly used features identified through these techniques. Feature selection methods can reduce the complexity and computational burden of these models by removing less important features and assist in understanding the relative importance of feature sets and individual features in clustering. METHOD We conducted a systematic search of Pubmed, Medline, Google Scholar, Scopus, Arxiv and Web of Science databases to identify studies published before January 2021 which used feature selection methods to derive PA clusters using unsupervised machine learning models. RESULTS A total of 13 studies were eligible for inclusion within the review. The most popular feature selection techniques were Principal Component Analysis (PCA) and correlation-based methods, with k-means frequently used in clustering accelerometer data. Cluster quality evaluation methods were diverse, including both external (e.g. cluster purity) or internal evaluation measures (silhouette score most frequently). Only four of the 13 studies had more than 25 participants and only four studies included two or more datasets. CONCLUSION There is a need to assess multiple feature selection methods upon large cohort data consisting of multiple (3 or more) PA datasets. The cut-off criteria e.g. number of components, pairwise correlation value, explained variance ratio for PCA, etc. should be expressly stated along with any hyperparameters used in clustering.
Collapse
|
72
|
Saeidifar M, Yazdi M, Zolghadrasli A. Performance Improvement in Brain Tumor Detection in MRI Images Using a Combination of Evolutionary Algorithms and Active Contour Method. J Digit Imaging 2021; 34:1209-1224. [PMID: 34561783 DOI: 10.1007/s10278-021-00514-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 08/23/2021] [Accepted: 08/31/2021] [Indexed: 10/20/2022] Open
Abstract
The process of treating brain cancer depends on the experience and knowledge of the physician, which may be associated with eye errors or may vary from person to person. For this reason, it is important to utilize an automatic tumor detection algorithm to assist radiologists and physicians for brain tumor diagnosis. The aim of the present study is to automatically detect the location of the tumor in a brain MRI image with high accuracy. For this end, in the proposed algorithm, first, the skull is separated from the brain using morphological operators. The image is then segmented by six evolutionary algorithms, i.e., Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC), Genetic Algorithm (GA), Differential Evolution (DE), Harmony Search (HS), and Gray Wolf Optimization (GWO), as well as two other frequently-used techniques in the literature, i.e., K-means and Otsu thresholding algorithms. Afterwards, the tumor area is isolated from the brain using the four features extracted from the main tumor. Evaluation of the segmented area revealed that the PSO has the best performance compared with the other approaches. The segmented results of the PSO are then used as the initial curve for the Active contour to precisely specify the tumor boundaries. The proposed algorithm is applied on fifty images with two different types of tumors. Experimental results on T1-weighted brain MRI images show a better performance of the proposed algorithm compared to other evolutionary algorithms, K-means, and Otsu thresholding methods.
Collapse
Affiliation(s)
- Mahtab Saeidifar
- School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
| | - Mehran Yazdi
- School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran.
| | | |
Collapse
|
73
|
Yang Z, Liu M, Wang B, Wang B. Classification of protein domains based on their three-dimensional shapes (CPD3DS). Synth Syst Biotechnol 2021; 6:224-230. [PMID: 34541344 PMCID: PMC8429105 DOI: 10.1016/j.synbio.2021.08.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 08/23/2021] [Accepted: 08/30/2021] [Indexed: 11/13/2022] Open
Abstract
Protein design has become a powerful method to expand the number of natural proteins and design customized proteins according to demands. Domain-based protein design spares the need to create novel elements from scratch, which makes it a more efficient strategy than scratch-based protein design in designing multi-domain proteins, protein complexes and biomaterials. As the surface shape plays a central role in domain-domain and protein-protein interactions, a global map of the surface shapes of all domains should be very beneficial for domain-based protein design. Therefore, in this study, we characterized the surface shapes of protein domains, collected from CATH and SCOP databases, with their 3D-Zernike descriptors (3DZDs). Then similarities of domain shape features were identified, and all domains were classified accordingly. The preferences of the combinations of domains between different clusters were analyzed in natural proteins from the Protein Data Bank. A user-friendly website, termed CPD3DS, was also developed for storage, retrieval, analyses and visualization of our results. This work not only provides an overall view of protein domain shapes by showing their variety and similarities, but also opens up a new avenue to understand the properties of protein structural domains, and design principles of protein architectures.
Collapse
Affiliation(s)
- Zhaochang Yang
- School of Life Science and Technology, University of Electronic Science and Technology of China, China
| | - Mingkang Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, China
| | - Bin Wang
- School of Information and Software Engineering, University of Electronic Science and Technology of China, China
| | - Beibei Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China, China.,Centre for Informational Biology, University of Electronic Science and Technology of China, 2006 Xiyuan Road, Chengdu, Sichuan, 611731, China
| |
Collapse
|
74
|
Wang J, Chen X, Zhao H, Li Y, Liu Z. Fault Feature Extraction for Reciprocating Compressors Based on Underdetermined Blind Source Separation. Entropy (Basel) 2021; 23:1217. [PMID: 34573842 DOI: 10.3390/e23091217] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/11/2021] [Accepted: 09/13/2021] [Indexed: 11/16/2022]
Abstract
In practical engineering applications, the vibration signals collected by sensors often contain outliers, resulting in the separation accuracy of source signals from the observed signals being seriously affected. The mixing matrix estimation is crucial to the underdetermined blind source separation (UBSS), determining the accuracy level of the source signals recovery. Therefore, a two-stage clustering method is proposed by combining hierarchical clustering and K-means to improve the reliability of the estimated mixing matrix in this paper. The proposed method is used to solve the two major problems in the K-means algorithm: the random selection of initial cluster centers and the sensitivity of the algorithm to outliers. Firstly, the observed signals are clustered by hierarchical clustering to get the cluster centers. Secondly, the cosine distance is used to eliminate the outliers deviating from cluster centers. Then, the initial cluster centers are obtained by calculating the mean value of each remaining cluster. Finally, the mixing matrix is estimated with the improved K-means, and the sources are recovered using the least square method. Simulation and the reciprocating compressor fault experiments demonstrate the effectiveness of the proposed method.
Collapse
|
75
|
Coulombe JC, Mullen ZK, Lynch ME, Stodieck LS, Ferguson VL. Application of machine learning classifiers for microcomputed tomography data assessment of mouse bone microarchitecture. MethodsX 2021; 8:101497. [PMID: 34754768 PMCID: PMC8563473 DOI: 10.1016/j.mex.2021.101497] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 08/22/2021] [Indexed: 01/24/2023] Open
Abstract
The current standard approach for analyzing cortical bone structure and trabecular bone microarchitecture from micro-computed tomography (microCT) is through classic parametric (e.g., ANOVA, Student's T-test) and nonparametric (e.g., Mann-Whitney U test) statistical tests and the reporting of p-values to indicate significance. However, on their own, these univariate assessments of significance fall prey to a number of weaknesses, including an increased chance of Type 1 error from multiple comparisons. Machine learning classification methods (e.g., unsupervised, k-means cluster analysis and supervised Support Vector Machine classification, SVM) simultaneously utilize an entire dataset comprised of many cortical structure or trabecular microarchitecture measures, thus minimizing bias and Type 1 error that are generated through multiple testing. Through simultaneous evaluation of an entire dataset, k-means and SVM thus provide a complementary approach to classic statistical analysis and enable a more robust assessment of microCT measures.
Collapse
Affiliation(s)
- Jennifer C. Coulombe
- Department of Mechanical Engineering, UCB 427, University of Colorado, Boulder, CO 80309, United States of America
- BioFrontiers Institute, UCB 596, University of Colorado, Boulder, CO 80309, United States of America
| | - Zachary K. Mullen
- Laboratory for Interdisciplinary Statistical Analysis / Department of Computer Science, UCB 427, University of Colorado, Boulder, CO 80309, United States of America
| | - Maureen E. Lynch
- Department of Mechanical Engineering, UCB 427, University of Colorado, Boulder, CO 80309, United States of America
- BioFrontiers Institute, UCB 596, University of Colorado, Boulder, CO 80309, United States of America
| | - Louis S. Stodieck
- Aerospace Engineering Sciences / BioServe Space Technologies, UCB 429, University of Colorado, Boulder, CO 80309, United States of America
| | - Virginia L. Ferguson
- Department of Mechanical Engineering, UCB 427, University of Colorado, Boulder, CO 80309, United States of America
- BioFrontiers Institute, UCB 596, University of Colorado, Boulder, CO 80309, United States of America
- Aerospace Engineering Sciences / BioServe Space Technologies, UCB 429, University of Colorado, Boulder, CO 80309, United States of America
| |
Collapse
|
76
|
Shahvaran Z, Kazemi K, Fouladivanda M, Helfroush MS, Godefroy O, Aarabi A. Morphological active contour model for automatic brain tumor extraction from multimodal magnetic resonance images. J Neurosci Methods 2021; 362:109296. [PMID: 34302860 DOI: 10.1016/j.jneumeth.2021.109296] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2019] [Revised: 07/18/2021] [Accepted: 07/19/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Brain tumor extraction from magnetic resonance (MR) images is challenging due to variations in the location, shape, size and intensity of tumors. Manual delineation of brain tumors from MR images is time-consuming and prone to human errors. METHOD In this paper, we present a method for automatic tumor extraction from multimodal MR images. Brain tumors are first detected using k-means clustering. A morphological region-based active contour model is then used for tumor extraction using an initial contour defined based on the boundary of the detected brain tumor regions. The contour evolution for tumor extraction was performed using successive application of morphological operators. In our model, a Gaussian distribution was used to model local image intensities. The spatial correlation between neighboring voxels was also modeled using Markov random field. RESULTS The proposed method was evaluated on BraTS 2013 dataset including patients with high-grade and low-grade tumors. In comparison with other active contour based methods, the proposed method yielded better performance on tumor segmentation with mean Dice similarity coefficients of 0.9179 ( ± 0.025) and 0.8910 ( ± 0.042) obtained on high-grade and low-grade tumors, respectively. CONCLUSION The proposed method achieved higher accuracies for brain tumor extraction in comparison to other contour-based methods.
Collapse
|
77
|
Li S, Song K, Wang S, Liu G, Wen Z, Shang Y, Lyu L, Chen F, Xu S, Tao H, Du Y, Fang C, Mu G. Quantification of chlorophyll-a in typical lakes across China using Sentinel-2 MSI imagery with machine learning algorithm. Sci Total Environ 2021; 778:146271. [PMID: 33721636 DOI: 10.1016/j.scitotenv.2021.146271] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2020] [Revised: 02/07/2021] [Accepted: 02/28/2021] [Indexed: 06/12/2023]
Abstract
Lake eutrophication has attracted the attention of the government and general public. Chlorophyll-a (Chl-a) is a key indicator of algal biomass and eutrophication. Many efforts have been devoted to establishing accurate algorithms for estimating Chl-a concentrations. In this study, a total of 273 samples were collected from 45 typical lakes across China during 2017-2019. Here, we proposed applicable machine learning algorithms (i.e., linear regression model (LR), support vector machine model (SVM) and Catboost model (CB)), which integrate a broad scale dataset of lake biogeochemical characteristics using Multispectral Imager (MSI) product to seamlessly retrieve the Chl-a concentration. A K-means clustering approach was used to cluster the 273 normalized water leaving reflectance spectra [Rrs (λ)] extracted from MSI imagery with Case 2 Regional Coast Colour (CR2CC) processor into three groups. The pH, electrical conductivity (EC), total suspended matter (TSM) and dissolved organic carbon (DOC) from three clustering groups had significant differences (p < 0.05**), indicating that water quality parameters have an integrated impact on Rrs(λ)-spectra. The results of machine learning algorithms integrating demonstrated that SVM obtained a better degree of measured- and derived- fitting (calibration: slope = 0.81, R2 = 0.91; validation: slope = 1.21, R2 = 0.88). On the contrary, the documented nine Chl-a algorithms gave poor results (fitting 1:1 linear slope < 0.4 and R2 < 0.70) with synchronous train and test datasets. It demonstrated that machine learning provides a robust model for quantifying Chl-a concentration. Further, considering three Rrs(λ) clustering groups by k-means, Chl-a SVM model indicated that cluster 1 group gave a better retrieving performance (slope = 0.71, R2 = 0.78), followed by cluster 3 group (slope = 0.77, R2 = 0.64) and cluster 2 group (slope = 0.67, R2 = 0.50). These are related to the low TSM and high DOC levels for cluster-1 and cluster-3 Rrs(λ) spectra, which reduce the influence of particle in red bands for Rrs(λ) signal. Our results highlighted the quantification of lake Chl-a concentrations using MSI imagery and SVM, which can realize the large-scale monitoring and more appropriate for medium/low Chl-a level. The remote estimation of Chl-a based on artificial intelligence can provide an effective and robust way to monitor the lake eutrophication on a macro-scale; and offer a better approach to elucidate the response of lake ecosystems to global change.
Collapse
Affiliation(s)
- Sijia Li
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China
| | - Kaishan Song
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China.
| | - Shuai Wang
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China
| | - Ge Liu
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China
| | - Zhidan Wen
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China
| | - Yingxin Shang
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China
| | - Lili Lyu
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China; University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Fangfang Chen
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China; Key Laboratory of Vegetation Ecology, Ministry of Education, Institute of Grassland Science, Northeast Normal University, Changchun 130024, PR China
| | - Shiqi Xu
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China; Key Laboratory of Vegetation Ecology, Ministry of Education, Institute of Grassland Science, Northeast Normal University, Changchun 130024, PR China
| | - Hui Tao
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China; University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Yunxia Du
- Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, PR China; University of Chinese Academy of Sciences, Beijing 100049, PR China
| | - Chong Fang
- Faculty of Infrastructure Engineering, Dalian University of Technology, Dalian 116024, PR China
| | - Guangyi Mu
- Jilin Provincial Key Laboratory of Municipal Wastewater Treatment, Changchun Institute of Technology, Changchun 130012, PR China
| |
Collapse
|
78
|
Chan SW, Hu WH, Ouyang YC, Su HC, Lin CY, Chang YC, Hsu CC, Chen KW, Liu CC, Chien SH. Quantitative Measurement of Breast Tumors Using Intravoxel Incoherent Motion (IVIM) MR Images. J Pers Med 2021; 11:656. [PMID: 34357123 DOI: 10.3390/jpm11070656] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 07/07/2021] [Accepted: 07/10/2021] [Indexed: 12/13/2022] Open
Abstract
Breast magnetic resonance imaging (MRI) is currently a widely used clinical examination tool. Recently, MR diffusion-related technologies, such as intravoxel incoherent motion diffusion weighted imaging (IVIM-DWI), have been extensively studied by breast cancer researchers and gradually adopted in clinical practice. In this study, we explored automatic tumor detection by IVIM-DWI. We considered the acquired IVIM-DWI data as a hyperspectral image cube and used a well-known hyperspectral subpixel target detection technique: constrained energy minimization (CEM). Two extended CEM methods—kernel CEM (K-CEM) and iterative CEM (I-CEM)—were employed to detect breast tumors. The K-means and fuzzy C-means clustering algorithms were also evaluated. The quantitative measurement results were compared to dynamic contrast-enhanced T1-MR imaging as ground truth. All four methods were successful in detecting tumors for all the patients studied. The clustering methods were found to be faster, but the CEM methods demonstrated better performance according to both the Dice and Jaccard metrics. These unsupervised tumor detection methods have the advantage of potentially eliminating operator variability. The quantitative results can be measured by using ADC, signal attenuation slope, D*, D, and PF parameters to classify tumors of mass, non-mass, cyst, and fibroadenoma types.
Collapse
|
79
|
Langton S, Dixon A, Farrell G. Small area variation in crime effects of COVID-19 policies in England and Wales. J Crim Justice 2021; 75:101830. [PMID: 36536682 PMCID: PMC9753224 DOI: 10.1016/j.jcrimjus.2021.101830] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/15/2021] [Accepted: 06/16/2021] [Indexed: 05/12/2023]
Abstract
PURPOSE The aim of this study is to examine small area variation in crime trajectories during the COVID-19 pandemic in England and Wales. While we know how police-recorded crime responded to lockdown policies at the 'macro' level, less is known about the extent to which these trends were experienced uniformly at localized spatial scales. METHODS Longitudinal k-means clustering is used to unpick local area variation in police notifiable offences across England and Wales. We describe the clusters identified in terms of their spatial patterning, opportunity structures and crime type profile. RESULTS We find that in most small areas, crime remained fairly stable throughout the pandemic. Instead, a small number of meso-level areas contributed a disproportionately large amount to the macro-level trend. These were typically city centers with plentiful pre-pandemic crime opportunities, dominated by theft and shoplifting offences. CONCLUSION Findings offer support for opportunity theories of crime and for a mobility theory of crime during the pandemic. We explore potential implications for policy, theory and further research.
Collapse
|
80
|
Peng CY, Raihany U, Kuo SW, Chen YZ. Sound Detection Monitoring Tool in CNC Milling Sounds by K-Means Clustering Algorithm. Sensors (Basel) 2021; 21:s21134288. [PMID: 34201656 PMCID: PMC8296841 DOI: 10.3390/s21134288] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 06/07/2021] [Accepted: 06/16/2021] [Indexed: 02/01/2023]
Abstract
Computer numerical control (CNC) is a machine used in the manufacturing industry to produce components quickly for the engineering field or the desired shape. In the milling process carried out by CNC machines, sometimes vibrations occur that cause unwanted cracks or damage, which if left unchecked, will cause more severe damage. For this reason, this study describes how to monitor and analyze the sound produced by CNC during the milling process. This study uses six sound sample videos from YouTube, and there are two modes: (1) the operating mode is three different shapes with XY, XZ, and XYZ axes, and the second (2) is based on material differences. Namely, wood, Styrofoam, and plastic. The sound generated from all samples of the CNC milling processes will be detected using a sound detection program that has been designed in the LabVIEW using a simple microphone. The resulting sound frequency will be analyzed using the fast Fourier transform (FFT) process in spectral measurements, which will produce the amplitude and frequency of the detected sound in real time in the form of a graph. All frequency results that have been obtained from the sound detection monitoring tool in the CNC milling machine will be imported into the K-means clustering algorithm where the different frequencies between the resonant frequency and noise will be classified. Based on the experiments conducted, the sound detection program can detect sounds with a significant level of sensitivity.
Collapse
Affiliation(s)
- Cheng-Yu Peng
- Department of Electronic Engineering, National Chin-Yi University of Technology, Taichung 41170, Taiwan; (U.R.); (Y.-Z.C.)
- Correspondence:
| | - Ully Raihany
- Department of Electronic Engineering, National Chin-Yi University of Technology, Taichung 41170, Taiwan; (U.R.); (Y.-Z.C.)
| | - Shu-Wei Kuo
- Department of Electrical Engineering, National Taipei University of Technology, Taipei 10608, Taiwan;
| | - Yen-Zuo Chen
- Department of Electronic Engineering, National Chin-Yi University of Technology, Taichung 41170, Taiwan; (U.R.); (Y.-Z.C.)
| |
Collapse
|
81
|
Guo L, Yang J, Song N. Corrigendum: Spectral Clustering Algorithm for Cognitive Diagnostic Assessment. Front Psychol 2021; 12:706512. [PMID: 34220660 PMCID: PMC8248678 DOI: 10.3389/fpsyg.2021.706512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 05/11/2021] [Indexed: 11/13/2022] Open
Abstract
[This corrects the article DOI: 10.3389/fpsyg.2020.00944.].
Collapse
Affiliation(s)
- Lei Guo
- Faculty of Psychology, Southwest University, Chongqing, China.,Southwest University Branch, Collaborative Innovation Center of Assessment Toward Basic Education Quality, Chongqing, China
| | - Jing Yang
- School of Mathematics and Statistics, Northeast Normal University, Changchun, China
| | - Naiqing Song
- Southwest University Branch, Collaborative Innovation Center of Assessment Toward Basic Education Quality, Chongqing, China.,Basic Education Research Center, Southwest University, Chongqing, China.,Urban and Rural Education Research Center, Southwest University, Chongqing, China
| |
Collapse
|
82
|
Vera JF, Macías R. On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling. Psychometrika 2021; 86:489-513. [PMID: 34008128 DOI: 10.1007/s11336-021-09757-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Revised: 09/25/2020] [Accepted: 02/26/2021] [Indexed: 06/12/2023]
Abstract
In this article, we analyse the usefulness of multidimensional scaling in relation to performing K-means clustering on a dissimilarity matrix, when the dimensionality of the objects is unknown. In this situation, traditional algorithms cannot be used, and so K-means clustering procedures are being performed directly on the basis of the observed dissimilarity matrix. Furthermore, the application of criteria originally formulated for two-mode data sets to determine the number of clusters depends on their possible reformulation in a one-mode situation. The linear invariance property in K-means clustering for squared dissimilarities, together with the use of multidimensional scaling, is investigated to determine the cluster membership of the observations and to address the problem of selecting the number of clusters in K-means for a dissimilarity matrix. In particular, we analyse the performance of K-means clustering on the full dimensional scaling configuration and on the equivalently partitioned configuration related to a suitable translation of the squared dissimilarities. A Monte Carlo experiment is conducted in which the methodology examined is compared with the results obtained by procedures directly applicable to a dissimilarity matrix.
Collapse
Affiliation(s)
| | - Rodrigo Macías
- Centro De Investigación En Matemáticas, Unidad Monterrey, Mexico
| |
Collapse
|
83
|
Riana D, Rahayu S, Hasan M, Anton. Comparison of segmentation and identification of swietenia mahagoni wood defects with augmentation images. Heliyon 2021; 7:e07417. [PMID: 34307930 PMCID: PMC8258648 DOI: 10.1016/j.heliyon.2021.e07417] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Revised: 01/04/2021] [Accepted: 06/23/2021] [Indexed: 11/23/2022] Open
Abstract
The largest income for Southeast Asian countries comes from the export activities of wood production. The potential for timber exports in Indonesia continues to increase each year. This soaring potential needs to be continually improved by maintaining quality so that trust and good cooperation can continue to be established with partner countries. Wood quality is closely related to wood defects. The faster the detection of wood defects is, the faster the quality of the wood will be determined. The wood industry which is still manual is also very susceptible to human eye fatigue. Technology is currently developing rapidly to help human productive activities and image processing is a breakthrough to detect wood defects. This study aims to identify swietenia mahagoni wood defects using the euclidean distance method from the extraction of 6 texture and shape features GLCM (Gray Level Co-Occurance Method) including metric, eccentricity, contrast, correlation, energy, and homogeneity, which was previously segmented with the best segmentation from the comparison results of thresholding and k-means segmentation and produced an average accuracy of 95.33% with an F1 score value of 0.95. The dataset used is the primary dataset with a total of 54 images on 3 types of wood defects, namely growing skin defects on wood ends, rotten wood eye on the body, and healthy wood eye on the body. Cross validation is also applied to test the reliability of the proposed model. By using 3-fold cross validation, the optimal average accuracy is 88.90%. Validation with other similar datasets was also carried out by identifying potato leaf defects resulting in an average accuracy of 92.86% with the most optimal 3-fold cross validation value achieved an average accuracy of 83.33%. Image augmentation is also carried out in order to reproduce the image so that the reliability test of the proposed method can be carried out, namely by rotating the image 45 degrees,90 degrees,120 degrees,180 degrees which produces 84 images of augmentation, so that the total image is 138 images and gets an average accuracy from the image augmentation is 80%.
Collapse
Affiliation(s)
- Dwiza Riana
- Magister of Computer Science, Universitas Nusa Mandiri, Indonesia
| | - Sri Rahayu
- Magister of Computer Science, Universitas Nusa Mandiri, Indonesia
| | - Muhamad Hasan
- Magister of Computer Science, Universitas Nusa Mandiri, Indonesia
| | - Anton
- Magister of Computer Science, Universitas Nusa Mandiri, Indonesia
| |
Collapse
|
84
|
Sun Y, Lan Z, Xue SW, Zhao L, Xiao Y, Kuai C, Lin Q, Bao K. Brain state-dependent dynamic functional connectivity patterns in attention-deficit/hyperactivity disorder. J Psychiatr Res 2021; 138:569-575. [PMID: 33991995 DOI: 10.1016/j.jpsychires.2021.05.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 04/01/2021] [Accepted: 05/01/2021] [Indexed: 11/28/2022]
Abstract
Attention-deficit/hyperactivity disorder (ADHD) patients have presented aberrant static brain networks, however identifying ADHD patients based on dynamic information in brain networks is not fully clear. Data were obtained from 32 boys with ADHD and 52 sex- and age-matched typically developing controls; a sliding-window method was used to assess dynamic functional connectivity (dFC), and two reoccurring dFC states (the hot and cool states) were then identified using a k-means clustering method. The results showed that ADHD patients had significant changes in occurrence, transitions times and dFC strength of the cingulo-opercular network (CON) and sensorimotor network (SMN) in the cool state. The severity of ADHD symptoms showed significant correlations with the regional amplitude of dFC fluctuations in the ventral medial prefrontal cortex (vmPFC), anterior medial prefrontal cortex (amPFC) and precuneus. These findings could provide insights on the state-dependent dynamic changes in large-scale brain connectivity and network configurations in ADHD.
Collapse
Affiliation(s)
- Yunkai Sun
- Center for Cognition and Brain Disorders, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 311121, China; Institute of Psychological Science, Hangzhou Normal University, Hangzhou, 311121, China; Zhejiang Key Laboratory for Research in Assessment of Cognitive Impairments, Hangzhou, 311121, China; College of Education, Hangzhou Normal University, Hangzhou, 311121, China; Department of Psychiatry, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, 310016, China
| | - Zhihui Lan
- Center for Cognition and Brain Disorders, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 311121, China; Institute of Psychological Science, Hangzhou Normal University, Hangzhou, 311121, China; Zhejiang Key Laboratory for Research in Assessment of Cognitive Impairments, Hangzhou, 311121, China; College of Education, Hangzhou Normal University, Hangzhou, 311121, China
| | - Shao-Wei Xue
- Center for Cognition and Brain Disorders, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 311121, China; Institute of Psychological Science, Hangzhou Normal University, Hangzhou, 311121, China; Zhejiang Key Laboratory for Research in Assessment of Cognitive Impairments, Hangzhou, 311121, China; College of Education, Hangzhou Normal University, Hangzhou, 311121, China.
| | - Lei Zhao
- Center for Cognition and Brain Disorders, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 311121, China; Institute of Psychological Science, Hangzhou Normal University, Hangzhou, 311121, China; Zhejiang Key Laboratory for Research in Assessment of Cognitive Impairments, Hangzhou, 311121, China; College of Education, Hangzhou Normal University, Hangzhou, 311121, China
| | - Yang Xiao
- Center for Cognition and Brain Disorders, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 311121, China; Institute of Psychological Science, Hangzhou Normal University, Hangzhou, 311121, China; Zhejiang Key Laboratory for Research in Assessment of Cognitive Impairments, Hangzhou, 311121, China; College of Education, Hangzhou Normal University, Hangzhou, 311121, China
| | - Changxiao Kuai
- Center for Cognition and Brain Disorders, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 311121, China; Institute of Psychological Science, Hangzhou Normal University, Hangzhou, 311121, China; Zhejiang Key Laboratory for Research in Assessment of Cognitive Impairments, Hangzhou, 311121, China; College of Education, Hangzhou Normal University, Hangzhou, 311121, China
| | - Qiaoyuan Lin
- Center for Cognition and Brain Disorders, The Affiliated Hospital of Hangzhou Normal University, Hangzhou, 311121, China; Institute of Psychological Science, Hangzhou Normal University, Hangzhou, 311121, China; Zhejiang Key Laboratory for Research in Assessment of Cognitive Impairments, Hangzhou, 311121, China; College of Education, Hangzhou Normal University, Hangzhou, 311121, China
| | - Kangchen Bao
- College of Education, Hangzhou Normal University, Hangzhou, 311121, China
| |
Collapse
|
85
|
Fan J, Qin X, He R, Ma J, Wei Q. Gene expression profiles for an immunoscore model in bone and soft tissue sarcoma. Aging (Albany NY) 2021; 13:13708-13725. [PMID: 33946044 PMCID: PMC8202872 DOI: 10.18632/aging.202956] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Accepted: 12/18/2020] [Indexed: 12/11/2022]
Abstract
Background: Immune infiltration is a prognostic marker to clinical outcomes in various solid tumors. However, reports that focus on bone and soft tissue sarcoma are rare. The study aimed to analyze and identify how immune components influence prognosis and develop a novel prognostic system for sarcomas. Methods: We retrieved the gene expression data from 3 online databases (GEO, TCGA, and TARGET). The immune fraction was estimated using the CIBERSORT algorithm. After that, we re-clustered samples by K-means and constructed immunoscore by the least absolute shrinkage and selection operator (LASSO) Cox regression model. Next, to confirm the prognostic value, nomograms were constructed. Results: 334 samples diagnosed with 8 tumor types (including osteosarcoma) were involved in our analysis. Patients were next re-clustered into three subgroups (OS, SAR1, and SAR2) through immune composition. Survival analysis showed a significant difference between the two soft tissue groups: patients with a higher proportion of CD8+ T cells, macrophages M1, and mast cells had favorable outcomes (p=0.0018). Immunoscore models were successfully established in OS and SAR2 groups consisting of 12 and 9 cell fractions, respectively. We found immunosocre was an independent factor for overall survival time. Patients with higher immunoscore had poor prognosis (p<0.0001). Patients with metastatic lesions scored higher than those counterparts with localized tumors (p<0.05). Conclusions: Immune fractions could be a useful tool for the classification and prognosis of bone and soft tissue sarcoma patients. This proposed immunoscore showed a promising impact on survival prediction.
Collapse
Affiliation(s)
- Jingyuan Fan
- Department of Orthopedics, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Xinyi Qin
- School of Graduate, Guangxi Medical University, Nanning, Guangxi, China
| | - Rongquan He
- Department of Oncology, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Jie Ma
- Department of Oncology, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Qingjun Wei
- Department of Orthopedics, The First Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China
| |
Collapse
|
86
|
Chan AHE, Chaisiri K, Saralamba S, Morand S, Thaenkham U. Assessing the suitability of mitochondrial and nuclear DNA genetic markers for molecular systematics and species identification of helminths. Parasit Vectors 2021; 14:233. [PMID: 33933158 DOI: 10.1186/s13071-021-04737-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2021] [Accepted: 04/21/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genetic markers are employed widely in molecular studies, and their utility depends on the degree of sequence variation, which dictates the type of application for which they are suited. Consequently, the suitability of a genetic marker for any specific application is complicated by its properties and usage across studies. To provide a yardstick for future users, in this study we assess the suitability of genetic markers for molecular systematics and species identification in helminths and provide an estimate of the cut-off genetic distances per taxonomic level. METHODS We assessed four classes of genetic markers, namely nuclear ribosomal internal transcribed spacers, nuclear rRNA, mitochondrial rRNA and mitochondrial protein-coding genes, based on certain properties that are important for species identification and molecular systematics. For molecular identification, these properties are inter-species sequence variation; length of reference sequences; easy alignment of sequences; and easy to design universal primers. For molecular systematics, the properties are: average genetic distance from order/suborder to species level; the number of monophyletic clades at the order/suborder level; length of reference sequences; easy alignment of sequences; easy to design universal primers; and absence of nucleotide substitution saturation. Estimation of the cut-off genetic distances was performed using the 'K-means' clustering algorithm. RESULTS The nuclear rRNA genes exhibited the lowest sequence variation, whereas the mitochondrial genes exhibited relatively higher variation across the three groups of helminths. Also, the nuclear and mitochondrial rRNA genes were the best possible genetic markers for helminth molecular systematics, whereas the mitochondrial protein-coding and rRNA genes were suitable for molecular identification. We also revealed that a general gauge of genetic distances might not be adequate, using evidence from the wide range of genetic distances among nematodes. CONCLUSION This study assessed the suitability of DNA genetic markers for application in molecular systematics and molecular identification of helminths. We provide a novel way of analyzing genetic distances to generate suitable cut-off values for each taxonomic level using the 'K-means' clustering algorithm. The estimated cut-off genetic distance values, together with the summary of the utility and limitations of each class of genetic markers, are useful information that can benefit researchers conducting molecular studies on helminths.
Collapse
|
87
|
Haweel R, Shalaby A, Mahmoud A, Seada N, Ghoniemy S, Ghazal M, Casanova MF, Barnes GN, El-Baz A. A robust DWT-CNN-based CAD system for early diagnosis of autism using task-based fMRI. Med Phys 2021; 48:2315-2326. [PMID: 33378589 DOI: 10.1002/mp.14692] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/27/2020] [Accepted: 12/17/2020] [Indexed: 02/06/2023] Open
Abstract
PURPOSE Task-based fMRI (TfMRI) is a diagnostic imaging modality for observing the effects of a disease or other condition on the functional activity of the brain. Autism spectrum disorder (ASD) is a pervasive developmental disorder associated with impairments in social and linguistic abilities. Machine learning algorithms have been widely utilized for brain imaging aiming for objective ASD diagnostics. Recently, deep learning methods have been gaining more attention for fMRI classification. The goal of this paper is to develop a convolutional neural network (CNN)-based framework to help in global diagnosis of ASD using TfMRI data that are collected from a response to speech experiment. METHODS To achieve this goal, the proposed framework adopts a novel imaging marker integrating both spatial and temporal information that are related to the functional activity of the brain. The developed pipeline consists of three main components. In the first step, the collected TfMRI data are preprocessed and parcellated using the Harvard-Oxford probabilistic atlas included with the fMRIB Software Library (FSL). Second, a group analysis using FSL is performed between ASD and typically developing (TD) children to identify significantly activated brain areas in response to the speech task. In order to reduce brain spatial dimensionality, a K-means clustering technique is performed on such significant brain areas. Informative blood oxygen level-dependent (BOLD) signals are extracted from each cluster. A compression step for each extracted BOLD signal using discrete wavelet transform (DWT) has been proposed. The adopted wavelets are similar to the expected hemodynamic response which enables DWT to compress the BOLD signal while highlighting its activation information. Finally, a deep learning 2D CNN network is used to classify the patients as ASD or TD based on extracted features from the previous step. RESULTS Preliminary results on 100 TfMRI dataset (50 ASD, 50 TD) obtain 80% correct global classification using tenfold cross validation (with sensitivity = 84%, specificity = 76%). CONCLUSION The experimental results show the high accuracy of the proposed framework and hold promise for the presented framework as a helpful adjunct to currently used ASD diagnostic tools.
Collapse
Affiliation(s)
- Reem Haweel
- BioImaging Laboratory, Department of Bioengineering, University of Louisville, KY, 40208, USA
- Computer Systems Department, Faculty of Computer and Information Sciences, University of Ain Shams, Cairo, 11566, Egypt
| | - Ahmed Shalaby
- BioImaging Laboratory, Department of Bioengineering, University of Louisville, KY, 40208, USA
| | - Ali Mahmoud
- BioImaging Laboratory, Department of Bioengineering, University of Louisville, KY, 40208, USA
| | - Noha Seada
- Computer Systems Department, Faculty of Computer and Information Sciences, University of Ain Shams, Cairo, 11566, Egypt
| | - Said Ghoniemy
- Computer Systems Department, Faculty of Computer and Information Sciences, University of Ain Shams, Cairo, 11566, Egypt
| | - Mohammed Ghazal
- Department of Electrical and Computer Engineering, Abu Dhabi University, Abu Dhabi, 59911, UAE
| | - Manuel F Casanova
- Biomedical Sciences, University of South Carolina, Greenville, SC, 29607, USA
| | - Gregory N Barnes
- Department of Neurology, University of Louisville, Louisville, KY, 40208, USA
| | - Ayman El-Baz
- Department of Bioengineering, University of Louisville, KY, 40208, USA
| |
Collapse
|
88
|
Babu KR, Nagajaneyulu PV, Prasad KS. Brain Tumor Segmentation of T1w MRI Images Based on Clustering Using Dimensionality Reduction Random Projection Technique. Curr Med Imaging 2021; 17:331-341. [PMID: 32652918 DOI: 10.2174/1573405616666200712180521] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 05/18/2020] [Accepted: 05/28/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND Early diagnosis of a brain tumor may increase life expectancy. Magnetic resonance imaging (MRI) accompanied by several segmentation algorithms is preferred as a reliable method for assessment. The availability of high-dimensional medical image data during diagnosis places a heavy computational burden and a suitable pre-processing step is required for lower- dimensional representation. The storage requirement and complexity of image data are also a concern. To address this concern, the random projection technique (RPT) is widely used as a multivariate approach for data reduction. AIM This study mainly focuses on T1-weighted MRI image clustering for brain tumor segmentation with dimension reduction by using the conventional principal component analysis (PCA) and RPT. METHODS Two clustering algorithms, K-means and fuzzy c-means (FCM) were used for brain tumor detection. The primary study objective was to present a comparison of the two clustering methods between MRI images subjected to PCA and RPT. In addition to the original dimension of 512 × 512, three other image sizes, 256 × 256, 128 × 128, and 64 × 64, were used to determine the effect of the methods. RESULTS In terms of average reconstruction, Euclidean distance, and segmentation distance errors, the RPT produced better results than the PCA method for all the clustered images from clustering techniques. CONCLUSION According to the values of performance metrics, RPT supported fuzzy c-means in achieving the best clustering performance and provided significant results for each new size of the MRI images.
Collapse
Affiliation(s)
- K Rajesh Babu
- Department of Electronics and Communication Engineering, Faculty of KL University, Guntur, India
| | | | - K Satya Prasad
- Vignan's Foundation for Science, Technology & Research, Guntur, India
| |
Collapse
|
89
|
Ashour AS, Eissa MM, Wahba MA, Elsawy RA, Elgnainy HF, Tolba MS, Mohamed WS. Ensemble-based bag of features for automated classification of normal and COVID-19 CXR images. Biomed Signal Process Control 2021; 68:102656. [PMID: 33897803 PMCID: PMC8057743 DOI: 10.1016/j.bspc.2021.102656] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Revised: 03/30/2021] [Accepted: 04/16/2021] [Indexed: 11/10/2022]
Abstract
The medical and scientific communities are currently trying to treat infected patients and develop vaccines for preventing a future outbreak. In healthcare, machine learning is proven to be an efficient technology for helping to combat the COVID-19. Hospitals are now overwhelmed with the increased infections of COVID-19 cases and given patients’ confidentiality and rights. It becomes hard to assemble quality medical image datasets in a timely manner. For COVID-19 diagnosis, several traditional computer-aided detection systems based on classification techniques were proposed. The bag-of-features (BoF) model has shown a promising potential in this domain. Thus, this work developed an ensemble-based BoF classification system for the COVID-19 detection. In this model, we proposed ensemble at the classification step of the BoF. The proposed system was evaluated and compared to different classification systems for different number of visual words to evaluate their effect on the classification efficiency. The results proved the superiority of the proposed ensemble-based BoF for the classification of normal and COVID19 chest X-ray (CXR) images compared to other classifiers.
Collapse
Affiliation(s)
- Amira S Ashour
- Department of Electronics and Electrical Communications Engineering, Faculty of Engineering, Tanta University, Egypt
| | - Merihan M Eissa
- Department of Electronics and Electrical Communications Engineering, Faculty of Engineering, Tanta University, Egypt
| | - Maram A Wahba
- Department of Electronics and Electrical Communications Engineering, Faculty of Engineering, Tanta University, Egypt
| | - Radwa A Elsawy
- Department of Electronics and Electrical Communications Engineering, Faculty of Engineering, Tanta University, Egypt.,Department of Electronics and Communication Engineering, Alexandria Higher Institute of Engineering &Technology, Egypt
| | | | | | - Waleed S Mohamed
- Department of Internal Medicine, Faculty of Medicine, Tanta University, Tanta, Egypt
| |
Collapse
|
90
|
Garg P, Joshi D. A region-specific clustering approach to investigate risk-factors in mortality rate during COVID-19: comprehensive statistical analysis from 208 countries. J Med Eng Technol 2021; 45:284-289. [PMID: 33750249 DOI: 10.1080/03091902.2021.1893398] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Since the outbreak of the novel coronavirus, COVID-19 has continuously spread across the globe briskly. However, since its existence, the symptoms of the disease have been varying widely; thus, developing an urgent need to stratify high-risk categories of people who show more propensity to be affected by this deadly virus will be beneficial for health care. Using the open-access data and machine learning algorithms, this paper aims to cluster countries in groups with similar profiles with respect to the country level pre COVID-19 pandemic parameters. The purpose of performing the data analysis is to measure the extent to which these major risk factors determine the mortality rate due to the coronavirus disease 2019. An unsupervised machine learning model (k-means) was employed for two hundred and eight countries to define data-driven clusters based on thirteen country-level parameters. After performing the one-way ANOVA for comparing the clusters in terms of total cases, total deaths, total cases per population, total deaths per population, and death rate, the paradigm with four and seven clusters showed the best ability to stratify the countries according to total cases per population and death rate with p-values of less than 0.05 and 0.001, respectively. However, the model could not stratify countries in total deaths/cases and total deaths per population.
Collapse
Affiliation(s)
- Poojita Garg
- Computer Science and Engineering, University Institute of Engineering and Technology, Panjab University, Chandigarh, India
| | - Deepak Joshi
- Centre for Biomedical Engineering, IIT Delhi, New Delhi, India.,Department of Biomedical Engineering, AIIMS Delhi, New Delhi, India
| |
Collapse
|
91
|
Chaudhary L, Singh B. Community detection using unsupervised machine learning techniques on COVID-19 dataset. Soc Netw Anal Min 2021; 11:28. [PMID: 33717366 PMCID: PMC7943333 DOI: 10.1007/s13278-021-00734-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 12/30/2020] [Accepted: 02/12/2021] [Indexed: 01/20/2023]
Abstract
COVID-19 has been considered to be the most destructive pandemic ever happened in the history of mankind. The worldwide research community has put a tenacious effort to carry out research on the COVID-19 to analyse its impact on economic, medical and sociolgoical fields. They are trying to solve many crucial issues related to this disease and derive strategies to deal with this global pandemic. In this paper, we have analysed the trend, countries affected regionally and the variation of cases at the country level on COVID-19 dataset. We have used the Principal component analysis on the COVID-19 dataset variables to reduce the dimensionality and find the most significant variables. Further, we have unveiled the hidden community structure of countries by applying the unsupervised clustering approach, K-means. We have compared the results with the K-means method. The communities achieved after applying the PCA are more precise. The resulted communities can be beneficial to researchers, scientists, sociologists, different policy makers and managers of health sector.
Collapse
Affiliation(s)
| | - Buddha Singh
- Jawaharlal Nehru University, New Delhi, 110067 India
| |
Collapse
|
92
|
Kazemipoor M, Valizadeh F, Jambarsang S. Three-dimensional pattern of inflammatory periapical lesion extension in the premolar's region: an application of K-means clustering. Curr Med Imaging 2021; 17:1151-1158. [PMID: 33632108 DOI: 10.2174/1573405617666210225090213] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 01/01/2021] [Accepted: 01/06/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Cone-beam computed tomography (CBCT) provides better diagnosis of endodontic lesions. INTRODUCTION The present study would assess the pattern of periapical lesion extension in premolar teeth using CBCT. METHOD In this descriptive study' 330 roots in the regions of maxillary and mandibular premolars have been evaluated. Maximum periapical lesion extensions in the three orthogonal planes (axial, coronal and sagittal) were measured and recorded in millimeters. Measurements were compared based on gender' dental arch, tooth type and root. Statistical analysis was performed using repeated measure ANOVA, Bonferroni, Chi-square tests and clustering data analysis (K-means method). The significant level was set at 0.05. RESULT There were significant differences between the lesion expansions in the three dimensional planes (p-value<0.001). The highest average of lesion extension in the premolar regions of the examined population was reported in the vertical dimension (4.1± 1.3), followed by horizontal buccolingual dimension (3.4±1.1) and horizontal mesiodistal dimension (3.1±1.0) respectively. According to independent variables, in the premolar region only tooth roots showed significant differences in the lesion extension (p-value=0.002). Clustering data analysis showed that the majority of the participants categorized in a cluster with lower lesion extension. Based on clustering data analysis, the small lesions were significantly observed in the first premolar and buccal roots. CONCLUSION Since the periapical lesion extension in the buccolingual dimension, which could not be detected in the 2-D imaging techniques, was rather high in the region of premolar teeth, CBCT -as a 3-D imaging technique- is a suitable option for the precise evaluation of periapical lesion extension. Also, the majority of the lesions in this tooth area are small and located in the buccal roots.
Collapse
Affiliation(s)
- Maryam Kazemipoor
- Department of Endodontics, School of Dentistry, Shahid Sadoughi University of Medical Sciences, Yazd. Iran
| | - Fatemeh Valizadeh
- Department of Endodontics, School of Dentistry, Shahid Sadoughi University of Medical Sciences, Yazd. Iran
| | - Sara Jambarsang
- Department of Biostatistics and Epidemiology, School of Public health, Shahid Sadoughi University of Medical Sciences, Yazd. Iran
| |
Collapse
|
93
|
Li Y, Zeng X, Lin CW, Tseng GC. Simultaneous estimation of cluster number and feature sparsity in high-dimensional cluster analysis. Biometrics 2021; 78:574-585. [PMID: 33621349 DOI: 10.1111/biom.13449] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2020] [Revised: 02/02/2021] [Accepted: 02/03/2021] [Indexed: 11/28/2022]
Abstract
Estimating the number of clusters (K) is a critical and often difficult task in cluster analysis. Many methods have been proposed to estimate K, including some top performers using resampling approach. When performing cluster analysis in high-dimensional data, simultaneous clustering and feature selection is needed for improved interpretation and performance. To our knowledge, little has been studied for simultaneous estimation of K and feature sparsity parameter in a high-dimensional exploratory cluster analysis. In this paper, we propose a resampling method to bridge this gap and evaluate its performance under the sparse K-means clustering framework. The proposed target function balances between sensitivity and specificity of clustering evaluation of pairwise subjects from clustering of full and subsampled data. Through extensive simulations, the method performs among the best over classical methods in estimating K in low-dimensional data. For high-dimensional simulation data, it also shows superior performance to simultaneously estimate K and feature sparsity parameter. Finally, we evaluated the methods in four microarray, two RNA-seq, one SNP, and two nonomics datasets. The proposed method achieves better clustering accuracy with fewer selected predictive genes in almost all real applications.
Collapse
Affiliation(s)
- Yujia Li
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Xiangrui Zeng
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania
| | - Chien-Wei Lin
- Division of Biostatistics, Medical College of Wisconsin, Wauwatosa, Wisconsin
| | - George C Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pennsylvania
| |
Collapse
|
94
|
Pothula KR, Geraets JA, Ferber II, Schröder GF. Clustering polymorphs of tau and IAPP fibrils with the CHEP algorithm. Prog Biophys Mol Biol 2021; 160:16-25. [PMID: 33556421 DOI: 10.1016/j.pbiomolbio.2020.11.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 11/16/2020] [Accepted: 11/24/2020] [Indexed: 01/03/2023]
Abstract
Recent steps towards automation have improved the quality and efficiency of the entire cryo-electron microscopy workflow, from sample preparation to image processing. Most of the image processing steps are now quite automated, but there are still a few steps which need the specific intervention of researchers. One such step is the identification and separation of helical protein polymorphs at early stages of image processing. Here, we tested and evaluated our recent clustering approach on three datasets containing amyloid fibrils, demonstrating that the proposed unsupervised clustering method automatically and effectively identifies the polymorphs from cryo-EM images. As an automated polymorph separation method, it has the potential to complement automated helical picking, which typically cannot easily distinguish between polymorphs with subtle differences in morphology, and is therefore a useful tool for the image processing and structure determination of helical proteins.
Collapse
Affiliation(s)
- Karunakar R Pothula
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and JuStruct, Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425, Jülich, Germany
| | - James A Geraets
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and JuStruct, Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425, Jülich, Germany
| | - Inda I Ferber
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and JuStruct, Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425, Jülich, Germany
| | - Gunnar F Schröder
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) and JuStruct, Jülich Center for Structural Biology, Forschungszentrum Jülich, 52425, Jülich, Germany; Physics Department, Heinrich-Heine-Universität Düsseldorf, 40225, Düsseldorf, Germany.
| |
Collapse
|
95
|
Qin F, Zuo T, Wang X. CCpos: WiFi Fingerprint Indoor Positioning System Based on CDAE-CNN. Sensors (Basel) 2021; 21:s21041114. [PMID: 33562754 PMCID: PMC7915958 DOI: 10.3390/s21041114] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 01/28/2021] [Accepted: 02/01/2021] [Indexed: 11/16/2022]
Abstract
WiFi is widely used for indoor positioning because of its advantages such as long transmission distance and ease of use indoors. To improve the accuracy and robustness of indoor WiFi fingerprint localization technology, this paper proposes a positioning system CCPos (CADE-CNN Positioning), which is based on a convolutional denoising autoencoder (CDAE) and a convolutional neural network (CNN). In the offline stage, this system applies the K-means algorithm to extract the validation set from the all-training set. In the online stage, the RSSI is first denoised and key features are extracted by the CDAE. Then the location estimation is output by the CNN. In this paper, the Alcala Tutorial 2017 dataset and UJIIndoorLoc are adopted to verify the performance of the CCpos system. The experimental results show that our system has excellent noise immunity and generalization performance. The mean positioning errors on the Alcala Tutorial 2017 dataset and the UJIIndoorLoc are 1.05 m and 12.4 m, respectively.
Collapse
Affiliation(s)
- Feng Qin
- College of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China;
| | - Tao Zuo
- College of Information Science and Engineering, Wuhan University of Science and Technology, Wuhan 430081, China;
- Engineering Research Center for Metallurgical Automation and Detecting Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China;
- Correspondence:
| | - Xing Wang
- Engineering Research Center for Metallurgical Automation and Detecting Technology of Ministry of Education, Wuhan University of Science and Technology, Wuhan 430081, China;
| |
Collapse
|
96
|
Henao-Rojas JC, Rosero-Alpala MG, Ortiz-Muñoz C, Velásquez-Arroyo CE, Leon-Rueda WA, Ramírez-Gil JG. Machine Learning Applications and Optimization of Clustering Methods Improve the Selection of Descriptors in Blackberry Germplasm Banks. Plants (Basel) 2021; 10:247. [PMID: 33525314 PMCID: PMC7911707 DOI: 10.3390/plants10020247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 11/27/2020] [Accepted: 11/30/2020] [Indexed: 11/16/2022]
Abstract
Machine learning (ML) and its multiple applications have comparative advantages for improving the interpretation of knowledge on different agricultural processes. However, there are challenges that impede proper usage, as can be seen in phenotypic characterizations of germplasm banks. The objective of this research was to test and optimize different analysis methods based on ML for the prioritization and selection of morphological descriptors of Rubus spp. 55 descriptors were evaluated in 26 genotypes and the weight of each one and its ability to discriminating capacity was determined. ML methods as random forest (RF), support vector machines, in the linear and radial forms, and neural networks were optimized and compared. Subsequently, the results were validated with two discriminating methods and their variants: hierarchical agglomerative clustering and K-means. The results indicated that RF presented the highest accuracy (0.768) of the methods evaluated, selecting 11 descriptors based on the purity (Gini index), importance, number of connected trees, and significance (p value < 0.05). Additionally, K-means method with optimized descriptors based on RF had greater discriminating power on Rubus spp., accessions according to evaluated statistics. This study presents one application of ML for the optimization of specific morphological variables for plant germplasm bank characterization.
Collapse
Affiliation(s)
- Juan Camilo Henao-Rojas
- Corporación Colombiana de Investigación Agropecuaria—AGROSAVIA, Centro de Investigación La Selva- Km 7, 250047 Ríonegro, Colombia; (J.C.H.-R.); (M.G.R.-A.); (C.O.-M.); (C.E.V.-A.)
| | - María Gladis Rosero-Alpala
- Corporación Colombiana de Investigación Agropecuaria—AGROSAVIA, Centro de Investigación La Selva- Km 7, 250047 Ríonegro, Colombia; (J.C.H.-R.); (M.G.R.-A.); (C.O.-M.); (C.E.V.-A.)
| | - Carolina Ortiz-Muñoz
- Corporación Colombiana de Investigación Agropecuaria—AGROSAVIA, Centro de Investigación La Selva- Km 7, 250047 Ríonegro, Colombia; (J.C.H.-R.); (M.G.R.-A.); (C.O.-M.); (C.E.V.-A.)
| | - Carlos Enrique Velásquez-Arroyo
- Corporación Colombiana de Investigación Agropecuaria—AGROSAVIA, Centro de Investigación La Selva- Km 7, 250047 Ríonegro, Colombia; (J.C.H.-R.); (M.G.R.-A.); (C.O.-M.); (C.E.V.-A.)
| | - William Alfonso Leon-Rueda
- Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, 111321 Sede Bogotá, Colombia;
| | - Joaquín Guillermo Ramírez-Gil
- Departamento de Agronomía, Facultad de Ciencias Agrarias, Universidad Nacional de Colombia, 111321 Sede Bogotá, Colombia;
| |
Collapse
|
97
|
Sharma S, Quinn D, Melenhorst JJ, Pruteanu-Malinici I. High-Dimensional Immune Monitoring for Chimeric Antigen Receptor T Cell Therapies. Curr Hematol Malig Rep 2021; 16:112-6. [PMID: 33449291 DOI: 10.1007/s11899-020-00602-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/29/2020] [Indexed: 10/22/2022]
Abstract
PURPOSE OF REVIEW High-dimensional flow cytometry experiments have become a method of choice for high-throughput integration and characterization of cell populations. Here, we present a summary of state-of-the-art R-based pipelines used for differential analyses of cytometry data, largely based on chimeric antigen receptor (CAR) T cell therapies. These pipelines are based on publicly available R libraries, put together in a systematic and functional fashion, therefore free of cost. RECENT FINDINGS In recent years, existing tools tailored to analyze complex high-dimensional data such as single-cell RNA sequencing (scRNAseq) have been successfully ported to cytometry studies due to the similar nature of flow cytometry and scRNAseq platforms. Existing environments like Cytobank (Kotecha et al., 2010), FlowJo (FlowJo™ Software) and FCS Express (https://denovosoftware.com) already offer a variety of these ported tools, but they either come at a premium or are fairly complicated to manage by an inexperienced user. To mitigate these limitations, experienced cytometrists and bioinformaticians usually incorporate these functions into an RShiny (https://shiny.rstudio.com) application that ultimately offers a user-friendly, intuitive environment that can be used to analyze flow cytometry data. Computational tools and Shiny-based tools are the perfect answer to the ever-growing dimensionality and complexity of flow cytometry data, by offering a dynamic, yet user-friendly exploratory space, tailored to bridge the space between the lab experimental world and the computational, machine learning space.
Collapse
|
98
|
Li M, Wang Q, Shen Y, Zhu T. Customer relationship management analysis of outpatients in a Chinese infectious disease hospital using drug-proportion recency-frequency-monetary model. Int J Med Inform 2020; 147:104373. [PMID: 33418439 DOI: 10.1016/j.ijmedinf.2020.104373] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Revised: 12/23/2020] [Accepted: 12/27/2020] [Indexed: 11/19/2022]
Abstract
BACKGROUND Identifying the patient types with different economic values can be useful for hospital development. OBJECTIVE This work uses the theory of customer relationship management (CRM) to analyze the outpatients in the hospital for infectious diseases in Shanghai, China. METHODS A total of 2,271,020 data elements of outpatients in the research unit between August 2009 and December 2019 were extracted, analyzed and cleaned to obtain 171,107 valid data elements (1 element per person). The main diseases were viral hepatitis B (VHB) and acquired immunodeficiency syndrome (AIDS), and the average percentage of drug expenditure was 80.39 %. We innovatively expanded the classic RFM (R: recency, F: frequency, M: monetary) model in CRM to the dRFM (d: percentage of drug expenditure) model. We selected the best clustering algorithm from the K-means, Kohonen and two-step clustering methods to find the optimal model to distinguish the types of patients with different economic values and the best decision-making algorithm from the C5.0, CART classification regression tree, CHAID and QUEST algorithms to verify the model. RESULTS After performing two rounds of K-means clustering analysis on three models: RFM, RFM + dRFM and dRFM, and 97,855 data elements were retained. The RFM + dRFM model was the optimal model, clustering the patients into 3 types: potential patients (24.2 %) to be retained, with a high drug expenditure and the last visit in more than 19.06 months, high-value patients (24.5 %) to be attracted, with the last visit in about 6.66 months; basal patients (51.3 %) to be kept, with the last visit in about 3.7 months. The model was then verified using the C5.0 decision tree algorithm with an accuracy rate of 99.97 %. CONCLUSION This objective CRM analysis of the patients in the hospital for infectious diseases using the dRFM model accurately identified different types of patients, providing an objective and effective basis for hospital management.
Collapse
Affiliation(s)
- Min Li
- Nanjing University of Aeronautics and Astronautics, College of Economics and Management, Nanjing, Jiangsu, 211106, China; Shanghai Public Health Clinical Center, Fudan University, Shanghai, 201508, China.
| | - Qunwei Wang
- Nanjing University of Aeronautics and Astronautics, College of Economics and Management, Nanjing, Jiangsu, 211106, China.
| | - Yinzhong Shen
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, 201508, China.
| | - TongYu Zhu
- Shanghai Public Health Clinical Center, Fudan University, Shanghai, 201508, China.
| |
Collapse
|
99
|
Abstract
Characterizing long‐term prescription data is challenging due to the time‐varying nature of drug use. Conventional approaches summarize time‐varying data into categorical variables based on simple measures, such as cumulative dose, while ignoring patterns of use. The loss of information can lead to misclassification and biased estimates of the exposure‐outcome association. We introduce a classification method to characterize longitudinal prescription data with an unsupervised machine learning algorithm. We used administrative databases covering virtually all 1.3 million residents of Manitoba and explicitly designed features to describe the average dose, proportion of days covered (PDC), dose change, and dose variability, and clustered the resulting feature space using K‐means clustering. We applied this method to metformin use in diabetes patients. We identified 27,786 metformin users and showed that the feature distributions of their metformin use are stable for varying the lengths of follow‐up and that these distributions have clear interpretations. We found six distinct metformin user groups: patients with intermittent use, decreasing dose, increasing dose, high dose, and two medium dose groups (one with stable dose and one with highly variable use). Patients in the varying and decreasing dose groups had a higher chance of progression of diabetes than other patients. The method presented in this paper allows for characterization of drug use into distinct and clinically relevant groups in a way that cannot be obtained from merely classifying use by quantiles of overall use.
Collapse
Affiliation(s)
- Christiaan H Righolt
- Vaccine and Drug Evaluation Centre, Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Geng Zhang
- Vaccine and Drug Evaluation Centre, Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Salaheddin M Mahmud
- Vaccine and Drug Evaluation Centre, Department of Community Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| |
Collapse
|
100
|
Ulfenborg B, Karlsson A, Riveiro M, Andersson CX, Sartipy P, Synnergren J. Multi-assignment clustering: Machine learning from a biological perspective. J Biotechnol 2021; 326:1-10. [PMID: 33285150 DOI: 10.1016/j.jbiotec.2020.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 12/03/2020] [Indexed: 11/21/2022]
Abstract
A common approach for analyzing large-scale molecular data is to cluster objects sharing similar characteristics. This assumes that genes with highly similar expression profiles are likely participating in a common molecular process. Biological systems are extremely complex and challenging to understand, with proteins having multiple functions that sometimes need to be activated or expressed in a time-dependent manner. Thus, the strategies applied for clustering of these molecules into groups are of key importance for translation of data to biologically interpretable findings. Here we implemented a multi-assignment clustering (MAsC) approach that allows molecules to be assigned to multiple clusters, rather than single ones as in commonly used clustering techniques. When applied to high-throughput transcriptomics data, MAsC increased power of the downstream pathway analysis and allowed identification of pathways with high biological relevance to the experimental setting and the biological systems studied. Multi-assignment clustering also reduced noise in the clustering partition by excluding genes with a low correlation to all of the resulting clusters. Together, these findings suggest that our methodology facilitates translation of large-scale molecular data into biological knowledge. The method is made available as an R package on GitLab (https://gitlab.com/wolftower/masc).
Collapse
|