1
|
John M, Shaiba H. Identification of self-care problem in children using machine learning. Heliyon 2024; 10:e26977. [PMID: 38463780 PMCID: PMC10923687 DOI: 10.1016/j.heliyon.2024.e26977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 02/14/2024] [Accepted: 02/22/2024] [Indexed: 03/12/2024] Open
Abstract
Identification of self-care problems in children is a challenging task for medical professionals owing to its complexity and time consumption. Furthermore, the shortage of occupational therapists worldwide makes the task more challenging. Machine learning methods have come to the aid of reducing the complexity associated with problems in diverse fields. This paper employs machine learning based models to identify whether a child suffers from self-care problems using SCADI dataset. The dataset exhibited high dimensionality and imbalance. Initially, the dataset was converted into lower dimensionality. Imbalanced dataset is likely to affect the performance of machine learning models. To address this issue, SMOTE oversampling method was used to reduce the wide variations in the class distribution. The classification methods used were Naïve bayes, J48 and random forest. Random forest classifier which was operated on SMOTE balanced data obtained the best classification performance with balanced accuracy of 99%. The classification model outperformed the existing expert systems.
Collapse
Affiliation(s)
- Maya John
- Artificial Intelligence and Data Analytics (AIDA) Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
| | - Hadil Shaiba
- Department of Computer Science, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| |
Collapse
|
2
|
Ali R, Hussain J, Lee SW. Multilayer perceptron-based self-care early prediction of children with disabilities. Digit Health 2023; 9:20552076231184054. [PMID: 37426585 PMCID: PMC10328031 DOI: 10.1177/20552076231184054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 06/07/2023] [Indexed: 07/11/2023] Open
Abstract
Early identification of children with self-care impairments is one of the key challenges professional therapists face due to the complex and time-consuming detection process using relevant self-care activities. Due to the complex nature of the problem, machine-learning methods have been widely applied in this area. In this study, a feed-forward artificial neural network (ANN)-based self-care prediction methodology, called multilayer perceptron (MLP)-progressive, has been proposed. The proposed methodology integrates unsupervised instance-based resampling and randomizing preprocessing techniques to MLP for improved early detection of self-care disabilities in children. Preprocessing of the dataset affects the MLP performance; hence, randomization and resampling of the dataset improves the performance of the MLP model. To confirm the usefulness of MLP-progressive, three experiments were conducted, including validating MLP-progressive methodology over multi-class and binary-class datasets, impact analysis of the proposed preprocessing filters on the model performance, and comparing the MLP-progressive results with state-of-the-art studies. The evaluation metrics accuracy, precision, recall, F-measure, TP rate, FP rate, and ROC were used to measure performance of the proposed disability detection model. The proposed MLP-progressive model outperforms existing methods and attains a classification accuracy of 97.14% and 98.57% on multi-class and binary-class datasets, respectively. Additionally, when evaluated on the multi-class dataset, significant improvements in accuracies ranging from 90.00% to 97.14% were observed when compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Rahman Ali
- Quaid-e-Azam College of Commerce, University of Peshawar, Khyber Pakhtunkhwa, Pakistan
| | - Jamil Hussain
- Department of Data Science, Sejong University, Seoul, Korea
| | - Seung Won Lee
- Sungkyunkwan University School of Medicine, Suwon, Korea
| |
Collapse
|
3
|
Abstract
Cervical cancer is one of the leading causes of premature mortality among women worldwide and more than 85% of these deaths are in developing countries. There are several risk factors associated with cervical cancer. In this paper, we developed a predictive model for predicting the outcome of patients with cervical cancer, given risk patterns from individual medical records and preliminary screening. This work presents a decision tree (DT) classification algorithm to analyze the risk factors of cervical cancer. Recursive feature elimination (RFE) and least absolute shrinkage and selection operator (LASSO) feature selection techniques were fully explored to determine the most important attributes for cervical cancer prediction. The dataset employed here contains missing values and is highly imbalanced. Therefore, a combination of under and oversampling techniques called SMOTETomek was employed. A comparative analysis of the proposed model has been performed to show the effectiveness of feature selection and class imbalance based on the classifier’s accuracy, sensitivity, and specificity. The DT with the selected features from RFE and SMOTETomek has better results with an accuracy of 98.72% and sensitivity of 100%. DT classifier is shown to have better performance in handling classification problems when the features are reduced, and the problem of high class imbalance is addressed.
Collapse
|
4
|
Vo MT, Vo AH, Le T. A robust framework for shoulder implant X-ray image classification. DATA TECHNOLOGIES AND APPLICATIONS 2021. [DOI: 10.1108/dta-08-2021-0210] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeMedical images are increasingly popular; therefore, the analysis of these images based on deep learning helps diagnose diseases become more and more essential and necessary. Recently, the shoulder implant X-ray image classification (SIXIC) dataset that includes X-ray images of implanted shoulder prostheses produced by four manufacturers was released. The implant's model detection helps to select the correct equipment and procedures in the upcoming surgery.Design/methodology/approachThis study proposes a robust model named X-Net to improve the predictability for shoulder implants X-ray image classification in the SIXIC dataset. The X-Net model utilizes the Squeeze and Excitation (SE) block integrated into Residual Network (ResNet) module. The SE module aims to weigh each feature map extracted from ResNet, which aids in improving the performance. The feature extraction process of X-Net model is performed by both modules: ResNet and SE modules. The final feature is obtained by incorporating the extracted features from the above steps, which brings more important characteristics of X-ray images in the input dataset. Next, X-Net uses this fine-grained feature to classify the input images into four classes (Cofield, Depuy, Zimmer and Tornier) in the SIXIC dataset.FindingsExperiments are conducted to show the proposed approach's effectiveness compared with other state-of-the-art methods for SIXIC. The experimental results indicate that the approach outperforms the various experimental methods in terms of several performance metrics. In addition, the proposed approach provides the new state of the art results in all performance metrics, such as accuracy, precision, recall, F1-score and area under the curve (AUC), for the experimental dataset.Originality/valueThe proposed method with high predictive performance can be used to assist in the treatment of injured shoulder joints.
Collapse
|
5
|
Zdrodowska M, Dardzińska-Głȩbocka A. Classification and action rules in identification and self-care assessment problems. Technol Health Care 2021; 30:257-269. [PMID: 34806638 DOI: 10.3233/thc-219008] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Disability, especially in children, is a very important and current problem. Lack of proper diagnosis and care increases the difficulty for children to adapt to disabilities. Disabled children have many problems with basic activities of daily living. Therefore, it is very important to support diagnosticians and physiotherapists in recognizing self-care problems in children. OBJECTIVE The aim of this paper is to extract classification and action rules, useful for those who work with children with disabilities. METHODS First, features and their impact on the accuracy of classification are determined. Then, two models are built: one with all features and one with selected ones. For these models the classification rules are extracted. Finally, action rules are mined and the next step in treatment process is predicted. RESULTS Seventeen features with the greatest impact on classifying a child into a particular group of self-care problems were identified. Based on the implemented algorithms, decision and action rules were obtained. CONCLUSIONS The obtained model, selected attributes and extracted classification and action rules can support the work of therapists and direct their work to those areas of disability where even a minimal reduction of features would be of great benefit to the children.
Collapse
Affiliation(s)
- Małgorzata Zdrodowska
- Institute of Biomedical Engineering, Faculty of Mechanical Engineering, Bialystok Technical University, Bialystok, Poland
| | - Agnieszka Dardzińska-Głȩbocka
- Institute of Mechanical Engineering, Faculty of Mechanical Engineering, Bialystok Technical University, Bialystok, Poland
| |
Collapse
|
6
|
Abstract
The overlapping problem occurs when a region of the dimensional data space is shared in a similar proportion by different classes. It has an impact on a classifier’s performance due to the difficulty in correctly separating the classes. Further, an imbalanced dataset consists of a situation in which one class has more instances than another, and this is another aspect that impacts a classifier’s performance. In general, these two problems are treated separately. On the other hand, Prototype Selection (PS) approaches are employed as strategies for selecting appropriate instances from a dataset by filtering redundant and noise data, which can cause misclassification performance. In this paper, we introduce Filtering-based Instance Selection (FIS), using as a base the Self-Organizing Maps Neural Network (SOM) and information entropy. In this sense, SOM is trained with a dataset, and, then, the instances of the training set are mapped to the nearest prototype (SOM neurons). An analysis with entropy is conducted in each prototype region. From a threshold, we propose three decision methods: filtering the majority class (H-FIS (High Filter IS)), the minority class (L-FIS (Low Filter IS)), and both classes (B-FIS). The experiments using artificial and real dataset showed that the methods proposed in combination with 1NN improved the accuracy, F-Score, and G-mean values when compared with the 1NN classifier without the filter methods. The FIS approach is also compatible with the approaches mentioned in the relevant literature.
Collapse
|
7
|
|
8
|
Abstract
Data imbalance is a thorny issue in machine learning. SMOTE is a famous oversampling method of imbalanced learning. However, it has some disadvantages such as sample overlapping, noise interference, and blindness of neighbor selection. In order to address these problems, we present a new oversampling method, OS-CCD, based on a new concept, the classification contribution degree. The classification contribution degree determines the number of synthetic samples generated by SMOTE for each positive sample. OS-CCD follows the spatial distribution characteristics of original samples on the class boundary, as well as avoids oversampling from noisy points. Experiments on twelve benchmark datasets demonstrate that OS-CCD outperforms six classical oversampling methods in terms of accuracy, F1-score, AUC, and ROC.
Collapse
|
9
|
Putatunda S. Care2Vec: a hybrid autoencoder-based approach for the classification of self-care problems in physically disabled children. Neural Comput Appl 2020. [DOI: 10.1007/s00521-020-04943-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
10
|
A Self-Care Prediction Model for Children with Disability Based on Genetic Algorithm and Extreme Gradient Boosting. MATHEMATICS 2020. [DOI: 10.3390/math8091590] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Detecting self-care problems is one of important and challenging issues for occupational therapists, since it requires a complex and time-consuming process. Machine learning algorithms have been recently applied to overcome this issue. In this study, we propose a self-care prediction model called GA-XGBoost, which combines genetic algorithms (GAs) with extreme gradient boosting (XGBoost) for predicting self-care problems of children with disability. Selecting the feature subset affects the model performance; thus, we utilize GA to optimize finding the optimum feature subsets toward improving the model’s performance. To validate the effectiveness of GA-XGBoost, we present six experiments: comparing GA-XGBoost with other machine learning models and previous study results, a statistical significant test, impact analysis of feature selection and comparison with other feature selection methods, and sensitivity analysis of GA parameters. During the experiments, we use accuracy, precision, recall, and f1-score to measure the performance of the prediction models. The results show that GA-XGBoost obtains better performance than other prediction models and the previous study results. In addition, we design and develop a web-based self-care prediction to help therapist diagnose the self-care problems of children with disabilities. Therefore, appropriate treatment/therapy could be performed for each child to improve their therapeutic outcome.
Collapse
|
11
|
Ijaz MF, Attique M, Son Y. Data-Driven Cervical Cancer Prediction Model with Outlier Detection and Over-Sampling Methods. SENSORS 2020; 20:s20102809. [PMID: 32429090 PMCID: PMC7284557 DOI: 10.3390/s20102809] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Revised: 05/11/2020] [Accepted: 05/13/2020] [Indexed: 12/29/2022]
Abstract
Globally, cervical cancer remains as the foremost prevailing cancer in females. Hence, it is necessary to distinguish the importance of risk factors of cervical cancer to classify potential patients. The present work proposes a cervical cancer prediction model (CCPM) that offers early prediction of cervical cancer using risk factors as inputs. The CCPM first removes outliers by using outlier detection methods such as density-based spatial clustering of applications with noise (DBSCAN) and isolation forest (iForest) and by increasing the number of cases in the dataset in a balanced way, for example, through synthetic minority over-sampling technique (SMOTE) and SMOTE with Tomek link (SMOTETomek). Finally, it employs random forest (RF) as a classifier. Thus, CCPM lies on four scenarios: (1) DBSCAN + SMOTETomek + RF, (2) DBSCAN + SMOTE+ RF, (3) iForest + SMOTETomek + RF, and (4) iForest + SMOTE + RF. A dataset of 858 potential patients was used to validate the performance of the proposed method. We found that combinations of iForest with SMOTE and iForest with SMOTETomek provided better performances than those of DBSCAN with SMOTE and DBSCAN with SMOTETomek. We also observed that RF performed the best among several popular machine learning classifiers. Furthermore, the proposed CCPM showed better accuracy than previously proposed methods for forecasting cervical cancer. In addition, a mobile application that can collect cervical cancer risk factors data and provides results from CCPM is developed for instant and proper action at the initial stage of cervical cancer.
Collapse
Affiliation(s)
- Muhammad Fazal Ijaz
- Department of Industrial and Systems Engineering, Dongguk University-Seoul, Seoul 04620, Korea;
| | | | - Youngdoo Son
- Department of Industrial and Systems Engineering, Dongguk University-Seoul, Seoul 04620, Korea;
- Correspondence: ; Tel.: +82-2-2260-3840
| |
Collapse
|
12
|
Le T, Vo MT, Kieu T, Hwang E, Rho S, Baik SW. Multiple Electric Energy Consumption Forecasting Using a Cluster-Based Strategy for Transfer Learning in Smart Building. SENSORS 2020; 20:s20092668. [PMID: 32392858 PMCID: PMC7362249 DOI: 10.3390/s20092668] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 04/15/2020] [Accepted: 05/03/2020] [Indexed: 11/18/2022]
Abstract
Electric energy consumption forecasting is an interesting, challenging, and important issue in energy management and equipment efficiency improvement. Existing approaches are predictive models that have the ability to predict for a specific profile, i.e., a time series of a whole building or an individual household in a smart building. In practice, there are many profiles in each smart building, which leads to time-consuming and expensive system resources. Therefore, this study develops a robust framework for the Multiple Electric Energy Consumption forecasting (MEC) of a smart building using Transfer Learning and Long Short-Term Memory (TLL), the so-called MEC-TLL framework. In this framework, we first employ a k-means clustering algorithm to cluster the daily load demand of many profiles in the training set. In this phase, we also perform Silhouette analysis to specify the optimal number of clusters for the experimental datasets. Next, this study develops the MEC training algorithm, which utilizes a cluster-based strategy for transfer learning the Long Short-Term Memory models to reduce the computational time. Finally, extensive experiments are conducted to compare the computational time and different performance metrics for multiple electric energy consumption forecasting on two smart buildings in South Korea. The experimental results indicate that our proposed approach is capable of economical overheads while achieving superior performances. Therefore, the proposed approach can be applied effectively for intelligent energy management in smart buildings.
Collapse
Affiliation(s)
- Tuong Le
- Informetrics Research Group, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam;
- Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City 700000, Vietnam
| | - Minh Thanh Vo
- Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam;
| | - Tung Kieu
- University of Science, Vietnam National University, Ho Chi Minh City 700000, Vietnam;
| | - Eenjun Hwang
- School of Electrical Engineering, Korea University, Seoul 02841, Korea;
| | - Seungmin Rho
- Department of Software, Sejong University, Seoul 05006, Korea;
| | - Sung Wook Baik
- Department of Software, Sejong University, Seoul 05006, Korea;
- Correspondence:
| |
Collapse
|
13
|
Classification of Guillain–Barré Syndrome Subtypes Using Sampling Techniques with Binary Approach. Symmetry (Basel) 2020. [DOI: 10.3390/sym12030482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Guillain–Barré Syndrome (GBS) is an unusual disorder where the body’s immune system affects the peripheral nervous system. GBS has four main subtypes, whose treatments vary among them. Severe cases of GBS can be fatal. This work aimed to investigate whether balancing an original GBS dataset improves the predictive models created in a previous study. purpleBalancing a dataset is to pursue symmetry in the number of instances of each of the classes.The dataset includes 129 records of Mexican patients diagnosed with some subtype of GBS. We created 10 binary datasets from the original dataset. Then, we balanced these datasets using four different methods to undersample the majority class and one method to oversample the minority class. Finally, we used three classifiers with different approaches to creating predictive models. The results show that balancing the original dataset improves the previous predictive models. The goal of the predictive models is to identify the GBS subtypes applying Machine Learning algorithms. It is expected that specialists may use the model to have a complementary diagnostic using a reduced set of relevant features. Early identification of the subtype will allow starting with the appropriate treatment for patient recovery. This is a contribution to exploring the performance of balancing techniques with real data.
Collapse
|
14
|
|
15
|
Improving Electric Energy Consumption Prediction Using CNN and Bi-LSTM. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9204237] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The electric energy consumption prediction (EECP) is an essential and complex task in intelligent power management system. EECP plays a significant role in drawing up a national energy development policy. Therefore, this study proposes an Electric Energy Consumption Prediction model utilizing the combination of Convolutional Neural Network (CNN) and Bi-directional Long Short-Term Memory (Bi-LSTM) that is named EECP-CBL model to predict electric energy consumption. In this framework, two CNNs in the first module extract the important information from several variables in the individual household electric power consumption (IHEPC) dataset. Then, Bi-LSTM module with two Bi-LSTM layers uses the above information as well as the trends of time series in two directions including the forward and backward states to make predictions. The obtained values in the Bi-LSTM module will be passed to the last module that consists of two fully connected layers for finally predicting the electric energy consumption in the future. The experiments were conducted to compare the prediction performances of the proposed model and the state-of-the-art models for the IHEPC dataset with several variants. The experimental results indicate that EECP-CBL framework outperforms the state-of-the-art approaches in terms of several performance metrics for electric energy consumption prediction on several variations of IHEPC dataset in real-time, short-term, medium-term and long-term timespans.
Collapse
|
16
|
A New Approach for Construction of Geodemographic Segmentation Model and Prediction Analysis. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2019; 2019:9252837. [PMID: 31236109 PMCID: PMC6545749 DOI: 10.1155/2019/9252837] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 03/01/2019] [Accepted: 04/09/2019] [Indexed: 11/24/2022]
Abstract
Customer retention is invariably the top priority of all consumer businesses, and certainly it is one of the most critical challenges as well. Identifying and gaining insights into the most probable cause of churn can save from five to ten times in terms of cost for the company compared with finding new customers. Therefore, this study introduces a full-fledged geodemographic segmentation model, assessing it, testing it, and deriving insights from it. A bank dataset consisting 11,000 instances, which consists of 10,000 instances for training and 10,000 instances for testing, with 14 attributes, has been used, and the likelihood of a person staying with the bank or leaving the bank is computed with the help of logistic regression. Base on the proposed model, insights are drawn and recommendations are provided. Stepwise logistic regression methods, namely, backward elimination method, forward selection method, and bidirectional model are constructed and contrasted to choose the best among them. Future forecasting of the models has been done by using cumulative accuracy profile (CAP) curve analysis.
Collapse
|