1
|
Alabdulqader EA, Alarfaj AA, Umer M, Eshmawi AA, Alsubai S, Kim TH, Ashraf I. Improving prediction of blood cancer using leukemia microarray gene data and Chi2 features with weighted convolutional neural network. Sci Rep 2024; 14:15625. [PMID: 38972881 PMCID: PMC11228030 DOI: 10.1038/s41598-024-65315-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 06/19/2024] [Indexed: 07/09/2024] Open
Abstract
Blood cancer has emerged as a growing concern over the past decade, necessitating early diagnosis for timely and effective treatment. The present diagnostic method, which involves a battery of tests and medical experts, is costly and time-consuming. For this reason, it is crucial to establish an automated diagnostic system for accurate predictions. A particular field of focus in medical research is the use of machine learning and leukemia microarray gene data for blood cancer diagnosis. Even with a great deal of research, more improvements are needed to reach the appropriate levels of accuracy and efficacy. This work presents a supervised machine-learning algorithm for blood cancer prediction. This work makes use of the 22,283-gene leukemia microarray gene data. Chi-squared (Chi2) feature selection methods and the synthetic minority oversampling technique (SMOTE)-Tomek resampling is used to overcome issues with imbalanced and high-dimensional datasets. To balance the dataset for each target class, SMOTE-Tomek creates synthetic data, and Chi2 chooses the most important features to train the learning models from 22,283 genes. A novel weighted convolutional neural network (CNN) model is proposed for classification, utilizing the support of three separate CNN models. To determine the importance of the proposed approach, extensive experiments are carried out on the datasets, including a performance comparison with the most advanced techniques. Weighted CNN demonstrates superior performance over other models when coupled with SMOTE-Tomek and Chi2 techniques, achieving a remarkable 99.9% accuracy. Results from k-fold cross-validation further affirm the supremacy of the proposed model.
Collapse
Affiliation(s)
- Ebtisam Abdullah Alabdulqader
- Department of Information Technology, College of Computer and Information Sciences, King Saud University, P. O. Box 800, 11421, Riyadh, Saudi Arabia
| | - Aisha Ahmed Alarfaj
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science and Information Technology, The Islamia University of Bahawalpur, Bahawalpur, 63100, Pakistan
| | - Ala' Abdulmajid Eshmawi
- Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, 23218, Saudi Arabia
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, P.O. Box 151, 11942, Al-Kharj, Saudi Arabia
| | - Tai-Hoon Kim
- School of Electrical and Computer Engineering, Yeosu Campus, Chonnam National University, 50, Daehak-ro, Yeosu-si, 59626, Jeollanam-do, Republic of Korea.
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, 38541, Korea.
| |
Collapse
|
2
|
Jamel L, Umer M, Saidani O, Alabduallah B, Alsubai S, Ishmanov F, Kim TH, Ashraf I. Improving prediction of maternal health risks using PCA features and TreeNet model. PeerJ Comput Sci 2024; 10:e1982. [PMID: 38660162 PMCID: PMC11042025 DOI: 10.7717/peerj-cs.1982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 03/15/2024] [Indexed: 04/26/2024]
Abstract
Maternal healthcare is a critical aspect of public health that focuses on the well-being of pregnant women before, during, and after childbirth. It encompasses a range of services aimed at ensuring the optimal health of both the mother and the developing fetus. During pregnancy and in the postpartum period, the mother's health is susceptible to several complications and risks, and timely detection of such risks can play a vital role in women's safety. This study proposes an approach to predict risks associated with maternal health. The first step of the approach involves utilizing principal component analysis (PCA) to extract significant features from the dataset. Following that, this study employs a stacked ensemble voting classifier which combines one machine learning and one deep learning model to achieve high performance. The performance of the proposed approach is compared to six machine learning algorithms and one deep learning algorithm. Two scenarios are considered for the experiments: one utilizing all features and the other using PCA features. By utilizing PCA-based features, the proposed model achieves an accuracy of 98.25%, precision of 99.17%, recall of 99.16%, and an F1 score of 99.16%. The effectiveness of the proposed model is further confirmed by comparing it to existing state of-the-art approaches.
Collapse
Affiliation(s)
- Leila Jamel
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Punjab, Pakistan
| | - Oumaima Saidani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Bayan Alabduallah
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Farruh Ishmanov
- Department of Electronics and Communication Engineering, Kwangwoon University, Seoul, Republic of South Korea
| | - Tai-hoon Kim
- School of Electrical and Computer Engineering, Yeosu Campus, Chonnam National University, Daehak-ro, Yeosu-si, Jeollanam-do, Republic of South Korea
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, Republic of South Korea
| |
Collapse
|
3
|
Aljrees T. Improving prediction of cervical cancer using KNN imputer and multi-model ensemble learning. PLoS One 2024; 19:e0295632. [PMID: 38170713 PMCID: PMC10763959 DOI: 10.1371/journal.pone.0295632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 11/23/2023] [Indexed: 01/05/2024] Open
Abstract
Cervical cancer is a leading cause of women's mortality, emphasizing the need for early diagnosis and effective treatment. In line with the imperative of early intervention, the automated identification of cervical cancer has emerged as a promising avenue, leveraging machine learning techniques to enhance both the speed and accuracy of diagnosis. However, an inherent challenge in the development of these automated systems is the presence of missing values in the datasets commonly used for cervical cancer detection. Missing data can significantly impact the performance of machine learning models, potentially leading to inaccurate or unreliable results. This study addresses a critical challenge in automated cervical cancer identification-handling missing data in datasets. The study present a novel approach that combines three machine learning models into a stacked ensemble voting classifier, complemented by the use of a KNN Imputer to manage missing values. The proposed model achieves remarkable results with an accuracy of 0.9941, precision of 0.98, recall of 0.96, and an F1 score of 0.97. This study examines three distinct scenarios: one involving the deletion of missing values, another utilizing KNN imputation, and a third employing PCA for imputing missing values. This research has significant implications for the medical field, offering medical experts a powerful tool for more accurate cervical cancer therapy and enhancing the overall effectiveness of testing procedures. By addressing missing data challenges and achieving high accuracy, this work represents a valuable contribution to cervical cancer detection, ultimately aiming to reduce the impact of this disease on women's health and healthcare systems.
Collapse
Affiliation(s)
- Turki Aljrees
- College of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia
| |
Collapse
|
4
|
Karamti H, Alharthi R, Umer M, Shaiba H, Ishaq A, Abuzinadah N, Alsubai S, Ashraf I. Breast cancer detection employing stacked ensemble model with convolutional features. Cancer Biomark 2024; 40:155-170. [PMID: 38160347 DOI: 10.3233/cbm-230294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024]
Abstract
Breast cancer is a major cause of female deaths, especially in underdeveloped countries. It can be treated if diagnosed early and chances of survival are high if treated appropriately and timely. For timely and accurate automated diagnosis, machine learning approaches tend to show better results than traditional methods, however, accuracy lacks the desired level. This study proposes the use of an ensemble model to provide accurate detection of breast cancer. The proposed model uses the random forest and support vector classifier along with automatic feature extraction using an optimized convolutional neural network (CNN). Extensive experiments are performed using the original, as well as, CNN-based features to analyze the performance of the deployed models. Experimental results involving the use of the Wisconsin dataset reveal that CNN-based features provide better results than the original features. It is observed that the proposed model achieves an accuracy of 99.99% for breast cancer detection. Performance comparison with existing state-of-the-art models is also carried out showing the superior performance of the proposed model.
Collapse
Affiliation(s)
- Hanen Karamti
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Raed Alharthi
- Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science and Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Hadil Shaiba
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Abid Ishaq
- Department of Computer Science and Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Nihal Abuzinadah
- Faculty of Computer Science and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, Korea
| |
Collapse
|
5
|
Abuzinadah N, Umer M, Ishaq A, Al Hejaili A, Alsubai S, Eshmawi AA, Mohamed A, Ashraf I. Role of convolutional features and machine learning for predicting student academic performance from MOODLE data. PLoS One 2023; 18:e0293061. [PMID: 37939093 PMCID: PMC10631652 DOI: 10.1371/journal.pone.0293061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 09/25/2023] [Indexed: 11/10/2023] Open
Abstract
Predicting student performance automatically is of utmost importance, due to the substantial volume of data within educational databases. Educational data mining (EDM) devises techniques to uncover insights from data originating in educational settings. Artificial intelligence (AI) can mine educational data to predict student performance and provide measures to help students avoid failing and learn better. Learning platforms complement traditional learning settings by analyzing student performance, which can help reduce the chance of student failure. Existing methods for student performance prediction in educational data mining faced challenges such as limited accuracy, imbalanced data, and difficulties in feature engineering. These issues hindered effective adaptability and generalization across diverse educational contexts. This study proposes a machine learning-based system with deep convoluted features for the prediction of students' academic performance. The proposed framework is employed to predict student academic performance using balanced as well as, imbalanced datasets using the synthetic minority oversampling technique (SMOTE). In addition, the performance is also evaluated using the original and deep convoluted features. Experimental results indicate that the use of deep convoluted features provides improved prediction accuracy compared to original features. Results obtained using the extra tree classifier with convoluted features show the highest classification accuracy of 99.9%. In comparison with the state-of-the-art approaches, the proposed approach achieved higher performance. This research introduces a powerful AI-driven system for student performance prediction, offering substantial advancements in accuracy compared to existing approaches.
Collapse
Affiliation(s)
- Nihal Abuzinadah
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Abid Ishaq
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Abdullah Al Hejaili
- Faculty of Computers & Information Technology, Computer Science Department, University of Tabuk, Tabuk, Saudi Arabia
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences in Al-Kharj, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia
| | - Ala’ Abdulmajid Eshmawi
- Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudia Arabia
| | | | - Imran Ashraf
- Information and Communication Engineering, Yeungnam University, Gyeongsan, Korea
| |
Collapse
|
6
|
Karamti H, Alharthi R, Anizi AA, Alhebshi RM, Eshmawi AA, Alsubai S, Umer M. Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach. Cancers (Basel) 2023; 15:4412. [PMID: 37686692 PMCID: PMC10486648 DOI: 10.3390/cancers15174412] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 08/02/2023] [Accepted: 08/09/2023] [Indexed: 09/10/2023] Open
Abstract
Objective: Cervical cancer ranks among the top causes of death among females in developing countries. The most important procedures that should be followed to guarantee the minimizing of cervical cancer's aftereffects are early identification and treatment under the finest medical guidance. One of the best methods to find this sort of malignancy is by looking at a Pap smear image. For automated detection of cervical cancer, the available datasets often have missing values, which can significantly affect the performance of machine learning models. Methods: To address these challenges, this study proposes an automated system for predicting cervical cancer that efficiently handles missing values with SMOTE features to achieve high accuracy. The proposed system employs a stacked ensemble voting classifier model that combines three machine learning models, along with KNN Imputer and SMOTE up-sampled features for handling missing values. Results: The proposed model achieves 99.99% accuracy, 99.99% precision, 99.99% recall, and 99.99% F1 score when using KNN imputed SMOTE features. The study compares the performance of the proposed model with multiple other machine learning algorithms under four scenarios: with missing values removed, with KNN imputation, with SMOTE features, and with KNN imputed SMOTE features. The study validates the efficacy of the proposed model against existing state-of-the-art approaches. Conclusions: This study investigates the issue of missing values and class imbalance in the data collected for cervical cancer detection and might aid medical practitioners in timely detection and providing cervical cancer patients with better care.
Collapse
Affiliation(s)
- Hanen Karamti
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Raed Alharthi
- Department of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin 39524, Saudi Arabia;
| | - Amira Al Anizi
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Reemah M. Alhebshi
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia;
| | - Ala’ Abdulmajid Eshmawi
- Department of Cybersecurity, College of Computer Science and Engineering, University of Jeddah, Jeddah 23218, Saudi Arabia;
| | - Shtwai Alsubai
- Department of Computer Science, College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, P.O. Box 151, Al-Kharj 11942, Saudi Arabia;
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
| |
Collapse
|
7
|
Chen X, Aljrees T, Umer M, Saidani O, Almuqren L, Mzoughi O, Ishaq A, Ashraf I. Cervical cancer detection using K nearest neighbor imputer and stacked ensemble learningmodel. Digit Health 2023; 9:20552076231203802. [PMID: 37799501 PMCID: PMC10548812 DOI: 10.1177/20552076231203802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 09/08/2023] [Indexed: 10/07/2023] Open
Abstract
Objective Cervical cancer stands as a leading cause of mortality among women in developing nations. To ensure the reduction of its adverse consequences, the primary protocols to be adhered to involve early detection and treatment under the guidance of expert medical professionals. An effective approach for identifying this form of malignancy involves the examination of Pap smear images. However, in the context of automating cervical cancer detection, many of the existing datasets frequently exhibit missing data points, a factor that can substantially impact the effectiveness of machine learning models. Methods In response to these hurdles, this research introduces an automated system designed to predict cervical cancer with a dual focus: adeptly managing missing data while attaining remarkable accuracy. The system's core is built upon a stacked ensemble voting classifier model, which amalgamates three distinct machine learning models, all harmoniously integrated with the KNN Imputer to address the issue of missing values. Results The model put forth attains an accuracy of 99.41%, precision of 97.63%, recall of 95.96%, and an F1 score of 96.76% when incorporating the KNN imputation method. The investigation conducts a comparative analysis, contrasting the performance of this model with seven alternative machine learning algorithms in two scenarios: one where missing values are eliminated, and another employing KNN imputation. This study offers validation of the effectiveness of the proposed model in comparison to current state-of-the-art methodologies. Conclusions This research delves into the challenge of handling missing data in the dataset utilized for cervical cancer detection. The findings have the potential to assist healthcare professionals in achieving early detection and enhancing the quality of care provided to individuals affected by cervical cancer.
Collapse
Affiliation(s)
- Xiaoyuan Chen
- Huzhou Key Laboratory of Green Energy Materials and Battery Cascade Utilization, School of Intelligent Manufacturing, Huzhou College, Huzhou, P.R. China
| | - Turki Aljrees
- Department College of Computer Science and Engineering, University of Hafr Al-Batin, Hafar Al-Batin, Saudi Arabia
| | - Muhammad Umer
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Oumaima Saidani
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Latifah Almuqren
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Olfa Mzoughi
- Department of Computer Science, College of Sciences and Humanities-Aflaj, Prince Sattam bin Abdulaziz University, Aflaj, Saudi Arabia
| | - Abid Ishaq
- Department of Computer Science & Information Technology, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, South Korea
| |
Collapse
|