1
|
Oliullah K, Rasel MH, Islam MM, Islam MR, Wadud MAH, Whaiduzzaman M. A stacked ensemble machine learning approach for the prediction of diabetes. J Diabetes Metab Disord 2024; 23:603-617. [PMID: 38932863 PMCID: PMC11196524 DOI: 10.1007/s40200-023-01321-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Accepted: 09/22/2023] [Indexed: 06/28/2024]
Abstract
Objectives Diabetes has become a leading cause of mortality in both developed and developing countries, impacting a growing number of individuals worldwide. As the prevalence of the disease continues to rise, researchers have diligently worked towards developing accurate diabetes prediction models. The primary aim of this study is to utilize a diverse set of machine learning algorithms to detect the presence of diabetes, particularly in females, at an early stage. By leveraging these methods, this research seeks to provide physicians with valuable tools to identify the disease early, enabling timely interventions and improving patient outcomes. Methods In this study, some state-of-the-art machine learning techniques, such as random forest classifiers with gridsearchCV, XGBoost, NGBoost, Bagging, LightGBM, and AdaBoost classifiers, were employed. These models were chosen as the base layer of our proposed stacked ensemble model because of their high accuracy. Before feeding the data into the models, the dataset was preprocessed to ensure optimal performance and obtain improved results. Results The accuracy achieved in this study was 92.91%, which demonstrates its competitiveness with the existing approaches. Moreover, the utilization of the Shapley additive explanation (SHAP) facilitated the interpretation of machine learning models. Conclusion We anticipate that these findings will be beneficial to healthcare providers, stakeholders, students, and researchers involved in diabetes prediction research and development.
Collapse
Affiliation(s)
- Khondokar Oliullah
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Mahedi Hasan Rasel
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Md. Manzurul Islam
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Md. Reazul Islam
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Md. Anwar Hussen Wadud
- Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Dhaka, Bangladesh
| | - Md. Whaiduzzaman
- School of Information Systems, Queensland University of Technology, Brisbane, Australia
| |
Collapse
|
2
|
Ma Y, He J, Tan D, Han X, Feng R, Xiong H, Peng X, Pu X, Zhang L, Li Y, Chen S. The clinical and imaging data fusion model for single-period cerebral CTA collateral circulation assessment. JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY 2024:XST240083. [PMID: 38820061 DOI: 10.3233/xst-240083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2024]
Abstract
Background The Chinese population ranks among the highest globally in terms of stroke prevalence. In the clinical diagnostic process, radiologists utilize computed tomography angiography (CTA) images for diagnosis, enabling a precise assessment of collateral circulation in the brains of stroke patients. Recent studies frequently combine imaging and machine learning methods to develop computer-aided diagnostic algorithms. However, in studies concerning collateral circulation assessment, the extracted imaging features are primarily composed of manually designed statistical features, which exhibit significant limitations in their representational capacity. Accurately assessing collateral circulation using image features in brain CTA images still presents challenges. Methods To tackle this issue, considering the scarcity of publicly accessible medical datasets, we combined clinical data with imaging data to establish a dataset named RadiomicsClinicCTA. Moreover, we devised two collateral circulation assessment models to exploit the synergistic potential of patients' clinical information and imaging data for a more accurate assessment of collateral circulation: data-level fusion and feature-level fusion. To remove redundant features from the dataset, we employed Levene's test and T-test methods for feature pre-screening. Subsequently, we performed feature dimensionality reduction using the LASSO and random forest algorithms and trained classification models with various machine learning algorithms on the data-level fusion dataset after feature engineering. Results Experimental results on the RadiomicsClinicCTA dataset demonstrate that the optimized data-level fusion model achieves an accuracy and AUC value exceeding 86% . Subsequently, we trained and assessed the performance of the feature-level fusion classification model. The results indicate the feature-level fusion classification model outperforms the optimized data-level fusion model. Comparative experiments show that the fused dataset better differentiates between good and bad side branch features relative to the pure radiomics dataset. Conclusions Our study underscores the efficacy of integrating clinical and imaging data through fusion models, significantly enhancing the accuracy of collateral circulation assessment in stroke patients.
Collapse
Affiliation(s)
- Yuqi Ma
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jingliu He
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Duo Tan
- The Second People's Hospital of Guizhou Province, Guizhou, China
| | - Xu Han
- School of Electrical and Information Engineering, Tianjin University, Tianjin, China
| | - Ruiqi Feng
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Hailing Xiong
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Xihua Peng
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Xun Pu
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Lin Zhang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Yongmei Li
- Department of Radiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Shanxiong Chen
- College of Computer and Information Science, Southwest University, Chongqing, China
- Big Data & Intelligence Engineering School, Chongqing College of International Business and Economics, Chongqing, China
| |
Collapse
|
3
|
Al-Zubayer MA, Alam K, Shanto HH, Maniruzzaman M, Majumder UK, Ahammed B. Machine learning models for prediction of double and triple burdens of non-communicable diseases in Bangladesh. J Biosoc Sci 2024; 56:426-444. [PMID: 38505939 DOI: 10.1017/s0021932024000063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/21/2024]
Abstract
Increasing prevalence of non-communicable diseases (NCDs) has become the leading cause of death and disability in Bangladesh. Therefore, this study aimed to measure the prevalence of and risk factors for double and triple burden of NCDs (DBNCDs and TBNCDs), considering diabetes, hypertension, and overweight and obesity as well as establish a machine learning approach for predicting DBNCDs and TBNCDs. A total of 12,151 respondents from the 2017 to 2018 Bangladesh Demographic and Health Survey were included in this analysis, where 10%, 27.4%, and 24.3% of respondents had diabetes, hypertension, and overweight and obesity, respectively. Chi-square test and multilevel logistic regression (LR) analysis were applied to select factors associated with DBNCDs and TBNCDs. Furthermore, six classifiers including decision tree (DT), LR, naïve Bayes (NB), k-nearest neighbour (KNN), random forest (RF), and extreme gradient boosting (XGBoost) with three cross-validation protocols (K2, K5, and K10) were adopted to predict the status of DBNCDs and TBNCDs. The classification accuracy (ACC) and area under the curve (AUC) were computed for each protocol and repeated 10 times to make them more robust, and then the average ACC and AUC were computed. The prevalence of DBNCDs and TBNCDs was 14.3% and 2.3%, respectively. The findings of this study revealed that DBNCDs and TBNCDs were significantly influenced by age, sex, marital status, wealth index, education and geographic region. Compared to other classifiers, the RF-based classifier provides the highest ACC and AUC for both DBNCDs (ACC = 81.06% and AUC = 0.93) and TBNCDs (ACC = 88.61% and AUC = 0.97) for the K10 protocol. A combination of considered two-step factor selections and RF-based classifier can better predict the burden of NCDs. The findings of this study suggested that decision-makers might adopt suitable decisions to control and prevent the burden of NCDs using RF classifiers.
Collapse
Affiliation(s)
| | - Khorshed Alam
- School of Business, University of Southern Queensland, Toowoomba, QLD, Australia
- Centre for Health Research, University of Southern Queensland, Toowoomba, QLD, Australia
| | | | - Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| |
Collapse
|
4
|
Jain A, Verma A, Verma AK, Bajaj V. Tunable Q-factor wavelet transform based identification of diabetic patients using ECG signals. Comput Methods Biomech Biomed Engin 2024:1-10. [PMID: 38635476 DOI: 10.1080/10255842.2024.2342512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 04/08/2024] [Indexed: 04/20/2024]
Abstract
Diabetes is a chronic health condition that is characterized by increased levels of glucose (sugar) in the blood. It can have harmful effects on different parts of the body, such as the retina of the eyes, skin, nervous system, kidneys, and heart. Diabetes affects the structure of electrocardiogram (ECG) impulses by causing cardiovascular autonomic dysfunction. Multi-resolution analysis of the input ECG signal is utilized in this paper to develop a machine learning-based system for the automated detection of diabetic patients. In the first step, the input ECG signal is decomposed into sub-bands utilizing the tunable Q-factor wavelet transform (TQWT) technique. In the second step, four entropy-based characteristics are evaluated from each SB and elected using the K-W test method. To develop an automatic diabetes detection system, selected features are given as input with 10-fold validation to a SVM classifier using various kernel functions. The 3 rd sub-band of TQWT with the Coarse Gaussian kernel function kernel of the SVM classifier yields a classification accuracy of 91.5%. In the same dataset, the comparative analysis demonstrates that the proposed method outperforms other existing methods.
Collapse
Affiliation(s)
- Anuja Jain
- Teerthanker Mahaveer University, Moradabad, UP, India
| | - Anurag Verma
- Teerthanker Mahaveer University, Moradabad, UP, India
| | - Amit Kumar Verma
- Mahatama Jyotiba Phule Rohilkhand University, Bareilly, UP, India
| | - Varun Bajaj
- PDPM Indian Institute of Information Technology, Design & Manufacturing (IIITDM), Jabalpur, India
| |
Collapse
|
5
|
Waqas Khan Q, Iqbal K, Ahmad R, Rizwan A, Nawaz Khan A, Kim D. An intelligent diabetes classification and perception framework based on ensemble and deep learning method. PeerJ Comput Sci 2024; 10:e1914. [PMID: 38660179 PMCID: PMC11041940 DOI: 10.7717/peerj-cs.1914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 02/06/2024] [Indexed: 04/26/2024]
Abstract
Sugar in the blood can harm individuals and their vital organs, potentially leading to blindness, renal illness, as well as kidney and heart diseases. Globally, diabetic patients face an average annual mortality rate of 38%. This study employs Chi-square, mutual information, and sequential feature selection (SFS) to choose features for training multiple classifiers. These classifiers include an artificial neural network (ANN), a random forest (RF), a gradient boosting (GB) algorithm, Tab-Net, and a support vector machine (SVM). The goal is to predict the onset of diabetes at an earlier age. The classifier, developed based on the selected features, aims to enable early diagnosis of diabetes. The PIMA and early-risk diabetes datasets serve as test subjects for the developed system. The feature selection technique is then applied to focus on the most important and relevant features for model training. The experiment findings conclude that the ANN exhibited a spectacular performance in terms of accuracy on the PIMA dataset, achieving a remarkable accuracy rate of 99.35%. The second experiment, conducted on the early diabetes risk dataset using selected features, revealed that RF achieved an accuracy of 99.36%. Based on our experimental results, it can be concluded that our suggested method significantly outperformed baseline machine learning algorithms already employed for diabetes prediction on both datasets.
Collapse
Affiliation(s)
- Qazi Waqas Khan
- Department of Computer Engineering, Jeju National University, South Korea, Jeju-si, Jeju, South Korea
| | - Khalid Iqbal
- Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Punjab, Pakistan
| | - Rashid Ahmad
- Department of Computer Science, COMSATS University Islamabad, Attock Campus, Attock, Punjab, Pakistan
- Bigdata Research Center, Jeju National University, Jeju-si, Jeju, South Korea
| | - Atif Rizwan
- Department of Computer Engineering, Jeju National University, South Korea, Jeju-si, Jeju, South Korea
| | - Anam Nawaz Khan
- Department of Computer Engineering, Jeju National University, South Korea, Jeju-si, Jeju, South Korea
| | - DoHyeun Kim
- Department of Computer Engineering, Jeju National University, South Korea, Jeju-si, Jeju, South Korea
| |
Collapse
|
6
|
García-Jaramillo M, Luque C, León-Vargas F. Machine Learning and Deep Learning Techniques Applied to Diabetes Research: A Bibliometric Analysis. J Diabetes Sci Technol 2024; 18:287-301. [PMID: 38047451 DOI: 10.1177/19322968231215350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
BACKGROUND The use of machine learning and deep learning techniques in the research on diabetes has garnered attention in recent times. Nonetheless, few studies offer a thorough picture of the knowledge generation landscape in this field. To address this, a bibliometric analysis of scientific articles published from 2000 to 2022 was conducted to discover global research trends and networks and to emphasize the most prominent countries, institutions, journals, articles, and key topics in this domain. METHODS The Scopus database was used to identify and retrieve high-quality scientific documents. The results were classified into categories of detection (covering diagnosis, screening, identification, segmentation, among others), prediction (prognosis, forecasting, estimation), and management (treatment, control, monitoring, education, telemedicine integration). Biblioshiny and RStudio were used to analyze the data. RESULTS A total of 1773 articles were collected and analyzed. The number of publications and citations increased substantially since 2012, with a notable increase in the last 3 years. Of the 3 categories considered, detection was the most dominant, followed by prediction and management. Around 53.2% of the total journals started disseminating articles on this subject in 2020. China, India, and the United States were the most productive countries. Although no evidence of outstanding leadership by specific authors was found, the University of California emerged as the most influential institution for the development of scientific production. CONCLUSION This is an evolving field that has experienced a rapid increase in productivity, especially over the last years with exponential growth. This trend is expected to continue in the coming years.
Collapse
Affiliation(s)
| | - Carolina Luque
- Faculty of Engineering, Universidad EAN, Bogotá, Colombia
| | - Fabian León-Vargas
- Faculty of Mechanical, Electronic and Biomedical Engineering, Universidad Antonio Nariño, Bogotá, Colombia
| |
Collapse
|
7
|
Talari P, N B, Kaur G, Alshahrani H, Al Reshan MS, Sulaiman A, Shaikh A. Hybrid feature selection and classification technique for early prediction and severity of diabetes type 2. PLoS One 2024; 19:e0292100. [PMID: 38236900 PMCID: PMC10796060 DOI: 10.1371/journal.pone.0292100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 09/12/2023] [Indexed: 01/22/2024] Open
Abstract
Diabetes prediction is an ongoing study topic in which medical specialists are attempting to forecast the condition with greater precision. Diabetes typically stays lethargic, and on the off chance that patients are determined to have another illness, like harm to the kidney vessels, issues with the retina of the eye, or a heart issue, it can cause metabolic problems and various complexities in the body. Various worldwide learning procedures, including casting a ballot, supporting, and sacking, have been applied in this review. The Engineered Minority Oversampling Procedure (Destroyed), along with the K-overlay cross-approval approach, was utilized to achieve class evening out and approve the discoveries. Pima Indian Diabetes (PID) dataset is accumulated from the UCI Machine Learning (UCI ML) store for this review, and this dataset was picked. A highlighted engineering technique was used to calculate the influence of lifestyle factors. A two-phase classification model has been developed to predict insulin resistance using the Sequential Minimal Optimisation (SMO) and SMOTE approaches together. The SMOTE technique is used to preprocess data in the model's first phase, while SMO classes are used in the second phase. All other categorization techniques were outperformed by bagging decision trees in terms of Misclassification Error rate, Accuracy, Specificity, Precision, Recall, F1 measures, and ROC curve. The model was created using a combined SMOTE and SMO strategy, which achieved 99.07% correction with 0.1 ms of runtime. The suggested system's result is to enhance the classifier's performance in spotting illness early.
Collapse
Affiliation(s)
- Praveen Talari
- Department of Computer Science and Engineering, Vignana Bharathi Institute of Technology, Hyderabad, India
| | - Bharathiraja N
- Chitkara University Institute of Engineering and Technology, Chitkara University Punjab, Rajpura, India
| | - Gaganpreet Kaur
- Chitkara University Institute of Engineering and Technology, Chitkara University Punjab, Rajpura, India
| | - Hani Alshahrani
- Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia
| | - Mana Saleh Al Reshan
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia
- Scientific and Engineering Research Centre, Najran University, Najran, Saudi Arabia
| | - Adel Sulaiman
- Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia
| | - Asadullah Shaikh
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran, Saudi Arabia
| |
Collapse
|
8
|
Jiang L, Yang Z, Wang D, Gong H, Li J, Wang J, Wang L. Diabetes prediction model for unbalanced community follow-up data set based on optimal feature selection and scorecard. Digit Health 2024; 10:20552076241236370. [PMID: 38449681 PMCID: PMC10915850 DOI: 10.1177/20552076241236370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/14/2024] [Indexed: 03/08/2024] Open
Abstract
Objectives Diabetes is a metabolic disease and early detection is crucial to ensuring a healthy life for people with prediabetes. Community care plays an important role in public health, but the association between community follow-up of key life characteristics and diabetes risk remains unclear. Based on the method of optimal feature selection and risk scorecard, follow-up data of diabetes patients are modeled to assess diabetes risk. Methods We conducted a study on the diabetes risk assessment model and risk scorecard using follow-up data from diabetes patients in Haizhu District, Guangzhou, from 2016 to 2023. The raw data underwent preprocessing and imbalance handling. Subsequently, features relevant to diabetes were selected and optimized to determine the optimal subset of features associated with community follow-up and diabetes risk. We established the diabetes risk assessment model. Furthermore, for a comprehensible and interpretable risk expression, the Weight of Evidence transformation method was applied to features. The transformed features were discretized using the quantile binning method to design the risk scorecard, mapping the model's output to five risk levels. Results In constructing the diabetes risk assessment model, the Random Forest classifier achieved the highest accuracy. The risk scorecard obtained an accuracy of 85.16%, precision of 87.30%, recall of 80.26%, and an F1 score of 83.27% on the unbalanced research dataset. The performance loss compared to the diabetes risk assessment model was minimal, suggesting that the binning method used for constructing the diabetes risk scorecard is reasonable, with very low feature information loss. Conclusion The methods provided in this article demonstrate effectiveness and reliability in the assessment of diabetes risk. The assessment model and scorecard can be directly applied to community doctors for large-scale risk identification and early warning and can also be used for individual self-examination to reduce risk factor levels.
Collapse
Affiliation(s)
- Liangjun Jiang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Zerui Yang
- Electronics & Information School, Yangtze University, Jingzhou, China
| | - Donghai Wang
- Shenzhen Center for Disease Control and Prevention, Shenzhen, China
| | - Haimei Gong
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Juan Li
- Haizhu District Community Health Development Guidance Center, Guangzhou, China
| | - Jing Wang
- Shenzhen E-link Wisdom Co., Ltd, Shenzhen, China
| | - Lei Wang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| |
Collapse
|
9
|
Patro KK, Allam JP, Sanapala U, Marpu CK, Samee NA, Alabdulhafith M, Plawiak P. An effective correlation-based data modeling framework for automatic diabetes prediction using machine and deep learning techniques. BMC Bioinformatics 2023; 24:372. [PMID: 37784049 PMCID: PMC10544445 DOI: 10.1186/s12859-023-05488-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 09/19/2023] [Indexed: 10/04/2023] Open
Abstract
The rising risk of diabetes, particularly in emerging countries, highlights the importance of early detection. Manual prediction can be a challenging task, leading to the need for automatic approaches. The major challenge with biomedical datasets is data scarcity. Biomedical data is often difficult to obtain in large quantities, which can limit the ability to train deep learning models effectively. Biomedical data can be noisy and inconsistent, which can make it difficult to train accurate models. To overcome the above-mentioned challenges, this work presents a new framework for data modeling that is based on correlation measures between features and can be used to process data effectively for predicting diabetes. The standard, publicly available Pima Indians Medical Diabetes (PIMA) dataset is utilized to verify the effectiveness of the proposed techniques. Experiments using the PIMA dataset showed that the proposed data modeling method improved the accuracy of machine learning models by an average of 9%, with deep convolutional neural network models achieving an accuracy of 96.13%. Overall, this study demonstrates the effectiveness of the proposed strategy in the early and reliable prediction of diabetes.
Collapse
Affiliation(s)
- Kiran Kumar Patro
- Department of ECE, Aditya Institute of Technology and Management, Tekkali, AP, 532201, India
| | - Jaya Prakash Allam
- School of Computer Science and Engineering, VIT Vellore, Katpadi, Vellore, Tamil Nadu, 632014, India.
| | | | - Chaitanya Kumar Marpu
- Department of ECE, Aditya Institute of Technology and Management, Tekkali, AP, 532201, India
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Maali Alabdulhafith
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Pawel Plawiak
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155, Krakow, Poland
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Bałtycka 5, 44-100, Gliwice, Poland
| |
Collapse
|
10
|
Liu XZ, Duan M, Huang HD, Zhang Y, Xiang TY, Niu WC, Zhou B, Wang HL, Zhang TT. Predicting diabetic kidney disease for type 2 diabetes mellitus by machine learning in the real world: a multicenter retrospective study. Front Endocrinol (Lausanne) 2023; 14:1184190. [PMID: 37469989 PMCID: PMC10352831 DOI: 10.3389/fendo.2023.1184190] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 06/09/2023] [Indexed: 07/21/2023] Open
Abstract
Objective Diabetic kidney disease (DKD) has been reported as a main microvascular complication of diabetes mellitus. Although renal biopsy is capable of distinguishing DKD from Non Diabetic kidney disease(NDKD), no gold standard has been validated to assess the development of DKD.This study aimed to build an auxiliary diagnosis model for type 2 Diabetic kidney disease (T2DKD) based on machine learning algorithms. Methods Clinical data on 3624 individuals with type 2 diabetes (T2DM) was gathered from January 1, 2019 to December 31, 2019 using a multi-center retrospective database. The data fell into a training set and a validation set at random at a ratio of 8:2. To identify critical clinical variables, the absolute shrinkage and selection operator with the lowest number was employed. Fifteen machine learning models were built to support the diagnosis of T2DKD, and the optimal model was selected in accordance with the area under the receiver operating characteristic curve (AUC) and accuracy. The model was improved with the use of Bayesian Optimization methods. The Shapley Additive explanations (SHAP) approach was used to illustrate prediction findings. Results DKD was diagnosed in 1856 (51.2 percent) of the 3624 individuals within the final cohort. As revealed by the SHAP findings, the Categorical Boosting (CatBoost) model achieved the optimal performance 1in the prediction of the risk of T2DKD, with an AUC of 0.86 based on the top 38 characteristics. The SHAP findings suggested that a simplified CatBoost model with an AUC of 0.84 was built in accordance with the top 12 characteristics. The more basic model features consisted of systolic blood pressure (SBP), creatinine (CREA), length of stay (LOS), thrombin time (TT), Age, prothrombin time (PT), platelet large cell ratio (P-LCR), albumin (ALB), glucose (GLU), fibrinogen (FIB-C), red blood cell distribution width-standard deviation (RDW-SD), as well as hemoglobin A1C(HbA1C). Conclusion A machine learning-based model for the prediction of the risk of developing T2DKD was built, and its effectiveness was verified. The CatBoost model can contribute to the diagnosis of T2DKD. Clinicians could gain more insights into the outcomes if the ML model is made interpretable.
Collapse
Affiliation(s)
- Xiao zhu Liu
- Department of Cardiology, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Minjie Duan
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Hao dong Huang
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Yang Zhang
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Tian yu Xiang
- Information Center, The University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Wu ceng Niu
- Department of Nuclear Medicine, Handan First Hospital, Hebei, China
| | - Bei Zhou
- Department of Cardiology, the Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Hao lin Wang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Ting ting Zhang
- Department of Endocrinology, Fifth Medical Center of Chinese People's Liberation Army (PLA) Hospital, Beijing, China
| |
Collapse
|
11
|
Liu J, Qu J, Xu L, Qiao C, Shao G, Liu X, He H, Zhang J. Prediction of liver cancer prognosis based on immune cell marker genes. Front Immunol 2023; 14:1147797. [PMID: 37180166 PMCID: PMC10174299 DOI: 10.3389/fimmu.2023.1147797] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 02/24/2023] [Indexed: 05/15/2023] Open
Abstract
Introduction Monitoring the response after treatment of liver cancer and timely adjusting the treatment strategy are crucial to improve the survival rate of liver cancer. At present, the clinical monitoring of liver cancer after treatment is mainly based on serum markers and imaging. Morphological evaluation has limitations, such as the inability to measure small tumors and the poor repeatability of measurement, which is not applicable to cancer evaluation after immunotherapy or targeted treatment. The determination of serum markers is greatly affected by the environment and cannot accurately evaluate the prognosis. With the development of single cell sequencing technology, a large number of immune cell-specific genes have been identified. Immune cells and microenvironment play an important role in the process of prognosis. We speculate that the expression changes of immune cell-specific genes can indicate the process of prognosis. Method Therefore, this paper first screened out the immune cell-specific genes related to liver cancer, and then built a deep learning model based on the expression of these genes to predict metastasis and the survival time of liver cancer patients. We verified and compared the model on the data set of 372 patients with liver cancer. Result The experiments found that our model is significantly superior to other methods, and can accurately identify whether liver cancer patients have metastasis and predict the survival time of liver cancer patients according to the expression of immune cell-specific genes. Discussion We found these immune cell-specific genes participant multiple cancer-related pathways. We fully explored the function of these genes, which would support the development of immunotherapy for liver cancer.
Collapse
Affiliation(s)
- Jianfei Liu
- Department of Interventional Therapy, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Junjie Qu
- Interventional Medicine Center, Affiliated Zhongshan Hospital of Dalian University, Dalian, Liaoning, China
| | - Lingling Xu
- Department of Medical Oncology, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Chen Qiao
- Department of Interventional Therapy, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Guiwen Shao
- Department of Interventional Therapy, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Xin Liu
- Department of Medical Oncology, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Hui He
- Department of Laparoscopic Surgery, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| | - Jian Zhang
- Department of Interventional Therapy, The First Affiliated Hospital of Dalian Medical University, Dalian, Liaoning, China
| |
Collapse
|
12
|
De Falco I, Della Cioppa A, Koutny T, Ubl M, Krcma M, Scafuri U, Tarantino E. A Federated Learning-Inspired Evolutionary Algorithm: Application to Glucose Prediction. SENSORS (BASEL, SWITZERLAND) 2023; 23:2957. [PMID: 36991668 PMCID: PMC10059991 DOI: 10.3390/s23062957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 02/17/2023] [Accepted: 03/04/2023] [Indexed: 06/19/2023]
Abstract
In this paper, we propose an innovative Federated Learning-inspired evolutionary framework. Its main novelty is that this is the first time that an Evolutionary Algorithm is employed on its own to directly perform Federated Learning activity. A further novelty resides in the fact that, differently from the other Federated Learning frameworks in the literature, ours can efficiently deal at the same time with two relevant issues in Machine Learning, i.e., data privacy and interpretability of the solutions. Our framework consists of a master/slave approach in which each slave contains local data, protecting sensible private data, and exploits an evolutionary algorithm to generate prediction models. The master shares through the slaves the locally learned models that emerge on each slave. Sharing these local models results in global models. Being that data privacy and interpretability are very significant in the medical domain, the algorithm is tested to forecast future glucose values for diabetic patients by exploiting a Grammatical Evolution algorithm. The effectiveness of this knowledge-sharing process is assessed experimentally by comparing the proposed framework with another where no exchange of local models occurs. The results show that the performance of the proposed approach is better and demonstrate the validity of its sharing process for the emergence of local models for personal diabetes management, usable as efficient global models. When further subjects not involved in the learning process are considered, the models discovered by our framework show higher generalization capability than those achieved without knowledge sharing: the improvement provided by knowledge sharing is equal to about 3.03% for precision, 1.56% for recall, 3.17% for F1, and 1.56% for accuracy. Moreover, statistical analysis reveals the statistical superiority of model exchange with respect to the case of no exchange taking place.
Collapse
Affiliation(s)
- Ivanoe De Falco
- ICAR-National Research Council of Italy, Via P. Castellino, 80131 Naples, Italy
| | - Antonio Della Cioppa
- ICAR-National Research Council of Italy, Via P. Castellino, 80131 Naples, Italy
- Natural Computation Lab, DIEM, University of Salerno, Via Giovanni Paolo II 132, 84084 Fisciano, Italy
| | - Tomas Koutny
- Department of Computer Science and Engineering, New Technologies for Information Society, University of West Bohemia, Technicka 18, 330 01 Pilsen, Czech Republic
| | - Martin Ubl
- Department of Computer Science and Engineering, University of West Bohemia, Technicka 18, 330 01 Pilsen, Czech Republic
| | - Michal Krcma
- Diabetology Center, First Department of Internal Medicine, University Hospital Pilsen, Alej Svobody 923/80, 323 00 Pilsen, Czech Republic
| | - Umberto Scafuri
- ICAR-National Research Council of Italy, Via P. Castellino, 80131 Naples, Italy
| | - Ernesto Tarantino
- ICAR-National Research Council of Italy, Via P. Castellino, 80131 Naples, Italy
| |
Collapse
|
13
|
Mahoto NA, Shaikh A, Sulaiman A, Reshan MSA, Rajab A, Rajab K. A machine learning based data modeling for medical diagnosis. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
14
|
An Ensemble of Light Gradient Boosting Machine and Adaptive Boosting for Prediction of Type-2 Diabetes. INT J COMPUT INT SYS 2023. [DOI: 10.1007/s44196-023-00184-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023] Open
Abstract
AbstractMachine learning helps construct predictive models in clinical data analysis, predicting stock prices, picture recognition, financial modelling, disease prediction, and diagnostics. This paper proposes machine learning ensemble algorithms to forecast diabetes. The ensemble combines k-NN, Naive Bayes (Gaussian), Random Forest (RF), Adaboost, and a recently designed Light Gradient Boosting Machine. The proposed ensembles inherit detection ability of LightGBM to boost accuracy. Under fivefold cross-validation, the proposed ensemble models perform better than other recent models. The k-NN, Adaboost, and LightGBM jointly achieve 90.76% detection accuracy. The receiver operating curve analysis shows that $$k$$
k
-NN, RF, and LightGBM successfully solve class imbalance issue of the underlying dataset.
Collapse
|
15
|
Machine Learning Modeling of Disease Treatment Default: A Comparative Analysis of Classification Models. ADVANCES IN PUBLIC HEALTH 2023. [DOI: 10.1155/2023/4168770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Generally, treatment default of diseases by patients is regarded as the biggest threat to favourable disease treatment outcomes. It is seen as the reason for the resurgence of infectious diseases including tuberculosis in some developing countries. Sadly, its occurrence in chronic disease management is associated with high morbidity and mortality rates. Many reasons have been adduced for this phenomenon. Exploration of treatment default using biographic and behavioral metrics collected from patients and healthcare providers remains a challenge. The focus on contextual nonbiomedical measurements using a supervised machine learning modeling technique is aimed at creating an understanding of the reasons why treatment default occurs, including identifying important contextual parameters that contribute to treatment default. The predicted accuracy scores of four supervised machine learning algorithms, namely, gradient boosting, logistic regression, random forest, and support vector machine were 0.87, 0.90, 0.81, and 0.77, respectively. Additionally, performance indicators such as the positive predicted value score for the four models ranged between 98.72%–98.87%, and the negative predicted values of gradient boosting, logistic regression, random forest, and support vector machine were 50%, 75%, 22.22%, and 50%, respectively. Logistic regression appears to have the highest negative-predicted value score of 75%, with the smallest error margin of 25% and the highest accuracy score of 0.90, and the random forest had the lowest negative predicted value score of 22.22%, registering the highest error margin of 77.78%. By performing a chi-square correlation statistic test of variable independence, this study suggests that age, presence of comorbidities, concern for long queuing/waiting time at treatment facilities, availability of qualified clinicians, and the patient’s nutritional state whether on a controlled diet or not are likely to affect their adherence to disease treatment and could result in an increased risk of default.
Collapse
|
16
|
Zeng Y, Liu D, Wang Y. Identification of phosphorylation site using S-padding strategy based convolutional neural network. Health Inf Sci Syst 2022; 10:29. [PMID: 36124094 PMCID: PMC9481819 DOI: 10.1007/s13755-022-00196-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 10/14/2022] Open
Abstract
Purpose Abnormal phosphorylation has been proved to associate with a variety of human diseases, and the identification of phosphorylation sites is one of the research hotspots in healthcare. The study of phosphorylation site prediction in deep learning models often introduces a variety of information, and the utilization of complex models limits the usage scenarios of the models. Methods An enhanced deep learning method with S-padding strategy based on convolutional neural network is proposed in this paper. The S-padding strategy forms a three-dimensional matrix with extension information from original amino acid sequences, and a corresponding 2D-CNN model is designed to abstract the comprehensive features of phosphorylation site area in protein sequences. Results The fivefold cross-validation experiments are conducted, and the results show the performance of the proposed method on human dataset can achieve an accuracy of 89.68 % on serine/threonine sites and 88.16 % on tyrosine sites, respectively. Furthermore, phosphorylation site prediction on different organisms obtains the accuracy, sensitivity, and specificity of over 0.85, indicating a potential capability on phosphorylation site prediction task. Comparison result with existing models shows that the proposed method obtains better performance on both accuracy and AUC value, and the proposed method can further improve performance with sufficient training data. Conclusion This method enables proteome-wide predictions via models trained on a large amount of phosphorylation data, further exploiting the potential of protein phosphorylation site identification, and helping to provide insights into phosphorylation mechanisms.
Collapse
Affiliation(s)
- Yanjiao Zeng
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006 Guangdong China
| | - Dongning Liu
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006 Guangdong China
| | - Yang Wang
- School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, 510006 Guangdong China
| |
Collapse
|
17
|
Islam MM, Rahman MJ, Menhazul Abedin M, Ahammed B, Ali M, Ahmed NF, Maniruzzaman M. Identification of the risk factors of type 2 diabetes and its prediction using machine learning techniques. Health Syst (Basingstoke) 2022; 12:243-254. [PMID: 37234468 PMCID: PMC10208154 DOI: 10.1080/20476965.2022.2141141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 10/20/2022] [Indexed: 11/07/2022] Open
Abstract
This study identified the risk factors for type 2 diabetes (T2D) and proposed a machine learning (ML) technique for predicting T2D. The risk factors for T2D were identified by multiple logistic regression (MLR) using p-value (p<0.05). Then, five ML-based techniques, including logistic regression, naïve Bayes, J48, multilayer perceptron, and random forest (RF) were employed to predict T2D. This study utilized two publicly available datasets, derived from the National Health and Nutrition Examination Survey, 2009-2010 and 2011-2012. About 4922 respondents with 387 T2D patients were included in 2009-2010 dataset, whereas 4936 respondents with 373 T2D patients were included in 2011-2012. This study identified six risk factors (age, education, marital status, SBP, smoking, and BMI) for 2009-2010 and nine risk factors (age, race, marital status, SBP, DBP, direct cholesterol, physical activity, smoking, and BMI) for 2011-2012. RF-based classifier obtained 95.9% accuracy, 95.7% sensitivity, 95.3% F-measure, and 0.946 area under the curve.
Collapse
Affiliation(s)
- Md. Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh
| | | | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - Mohammad Ali
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - N.A.M Faisal Ahmed
- Institute of Education and Research, University of Rajshahi, Rajshahi, Bangladesh
| | | |
Collapse
|
18
|
Afrash MR, Rahimi F, Kazemi H, Shanbezadeh M, Amraei M, Asadi F. Development of an intelligent clinical decision support system for the early prediction of diabetic nephropathy. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.101135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
19
|
Prevalence and Early Prediction of Diabetes Using Machine Learning in North Kashmir: A Case Study of District Bandipora. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2789760. [PMID: 36238678 PMCID: PMC9553420 DOI: 10.1155/2022/2789760] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/16/2022] [Accepted: 09/23/2022] [Indexed: 11/17/2022]
Abstract
Diabetes is one of the biggest health problems that affect millions of people across the world. Uncontrolled diabetes can increase the risk of heart attack, cancer, kidney damage, blindness, and other illnesses. Researchers are motivated to create a Machine Learning methodology that can predict diabetes in the future. Exploiting Machine Learning Algorithms (MLA) is essential if healthcare professionals are able to identify diseases more effectively. In order to improve the medical diagnosis of diabetes this research explored and contrasts various MLA that can identify diabetes risk early. The research includes the analysis on real datasets such as a clinical dataset gathered from a doctor in the Indian district of Bandipora in the years April 2021–Feb2022. MLA are currently important in the healthcare sector due to their prediction abilities. Researchers are using MLA to improve disease prediction and reduce cost. In this Paper author developed a methodology using Machine Learning Algorithms for Diabetes Disease Risk Prediction in North Kashmir. Six MLA have been successfully used in the experimental study such as Random Forest (RF), Multi-Layer Perceptron (MLP), Support Vector Machine (SVM), Gradient Boost (GB), Decision Tree (DT), and Logistic Regression (LR). RF is the most accurate classifier with the uppermost accuracy rate of 98 percent followed by MLP (90.99%), SVM (92%), GBC (97%), DT (96%), and LR (69%), respectively, with the balanced data set. Lastly, this study enables us to effectively identify the prevalence and prediction of diabetes.
Collapse
|
20
|
Dutta A, Hasan MK, Ahmad M, Awal MA, Islam MA, Masud M, Meshref H. Early Prediction of Diabetes Using an Ensemble of Machine Learning Models. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph191912378. [PMID: 36231678 PMCID: PMC9566114 DOI: 10.3390/ijerph191912378] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/20/2022] [Accepted: 09/24/2022] [Indexed: 05/15/2023]
Abstract
Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of 0.735 and an area under the ROC curve (AUC) of 0.832. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data.
Collapse
Affiliation(s)
- Aishwariya Dutta
- Department of Biomedical Engineering (BME), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh
- Department of Biomedical Engineering (BME), Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka 1216, Bangladesh
| | - Md. Kamrul Hasan
- Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh
| | - Mohiuddin Ahmad
- Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh
| | - Md. Abdul Awal
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD 4072, Australia
- Electronics and Communication Engineering (ECE) Discipline, Khulna University (KU), Khulna 9208, Bangladesh
- Correspondence:
| | | | - Mehedi Masud
- Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Hossam Meshref
- Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| |
Collapse
|
21
|
Investigation of Diabetes Care in Elder Individuals Using Artificial Intelligence. J FOOD QUALITY 2022. [DOI: 10.1155/2022/8760032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The term blockchain is mainly regarded as the distributed transaction which is mainly comprised of different blocks, and each set tends to represent the data that are being associated with the previous blocks. The blockchain is mainly managed through peer-to-peer networks which comparatively involves in adhering to the protocol of authenticating various blocks to form the blockchain. The usage of blockchain technology has been increasingly used in different fields, and healthcare services are now using blockchain for better patient delivery, detecting disease, and other aspects. The scope of the proposed study is that this study has exploited the function of a blockchain-enabled big data network to support medical professionals in giving better treatment modalities and delivering better patient care. The application of a new generation of smart block chains such as Ethereum and NEM is now offering better services and features in creating blockchain-based healthcare data management and hence support healthcare centers, medical practitioners, nurses, radiologists, and patients for better healthcare management. The application of blockchain technology in big data networks supports adding more value as it results in enhanced data quality, accessibility, and support in creating better security and safety of data and information, which is highly essential in the medical industry. Blockchain technology enables big data technologies enabled in supporting medical practitioners in addressing various healthcare ailments; one of the major diseases impacting many people around the world is diabetes. Patients with such ailments tend to generate more data and information related to the disease and health-related aspects. Hence, this information requires being maintained and analyzed, so that superior healthcare services can be provided. This study is more involved in the investigation of blockchain technology through a big data network enabled in offering better care for elderly individuals who have been affected due to diabetes, the researchers propose to choose a questionnaire method to collect the data from nearly 169 respondents, and these data were then analyzed using SPSS data package. The analyst used percentage analysis, correlation analysis, and chi-square test to analyze the data which has been collated by the researchers. The results and discussion show in detail the major aspects of blockchain technology in supporting healthcare professionals for better diabetes care management for elderly individuals.
Collapse
|
22
|
Development of a Convolutional Neural Network Model to Predict Coronary Artery Disease Based on Single-Lead and Twelve-Lead ECG Signals. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12157711] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Coronary artery disease (CAD) is one of the most common causes of heart ailments; many patients with CAD do not exhibit initial symptoms. An electrocardiogram (ECG) is a diagnostic tool widely used to capture the abnormal activity of the heart and help with diagnoses. Assessing ECG signals may be challenging and time-consuming. Identifying abnormal ECG morphologies, especially in low amplitude curves, may be prone to error. Hence, a system that can automatically detect and assess the ECG and treadmill test ECG (TMT-ECG) signals will be helpful to the medical industry in detecting CAD. In the present work, we developed an intelligent system that can predict CAD, based on ECG and TMT signals more accurately than any other system developed thus far. The distinct convolutional neural network (CNN) architecture deals with single-lead and multi-lead (12-lead) ECG and TMT-ECG data effectively. While most artificial intelligence-based systems rely on the universal dataset, the current work used clinical lab data collected from a renowned hospital in the neighborhood. ECG and TMT-ECG graphs of normal and CAD patients were collected in the form of scanned reports. One-dimensional ECG data with all possible features were extracted from the scanned report with the help of a modified image processing method. This feature extraction procedure was integrated with the optimized architecture of the CNN model leading to a novel prediction system for CAD. The automated computer-assisted system helps in the detection and medication of CAD with a high prediction accuracy of 99%.
Collapse
|
23
|
Dritsas E, Trigka M. Data-Driven Machine-Learning Methods for Diabetes Risk Prediction. SENSORS 2022; 22:s22145304. [PMID: 35890983 PMCID: PMC9318204 DOI: 10.3390/s22145304] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/10/2022] [Accepted: 07/13/2022] [Indexed: 01/11/2023]
Abstract
Diabetes mellitus is a chronic condition characterized by a disturbance in the metabolism of carbohydrates, fats and proteins. The most characteristic disorder in all forms of diabetes is hyperglycemia, i.e., elevated blood sugar levels. The modern way of life has significantly increased the incidence of diabetes. Therefore, early diagnosis of the disease is a necessity. Machine Learning (ML) has gained great popularity among healthcare providers and physicians due to its high potential in developing efficient tools for risk prediction, prognosis, treatment and the management of various conditions. In this study, a supervised learning methodology is described that aims to create risk prediction tools with high efficiency for type 2 diabetes occurrence. A features analysis is conducted to evaluate their importance and explore their association with diabetes. These features are the most common symptoms that often develop slowly with diabetes, and they are utilized to train and test several ML models. Various ML models are evaluated in terms of the Precision, Recall, F-Measure, Accuracy and AUC metrics and compared under 10-fold cross-validation and data splitting. Both validation methods highlighted Random Forest and K-NN as the best performing models in comparison to the other models.
Collapse
|
24
|
An Ensemble Approach to Predict Early-Stage Diabetes Risk Using Machine Learning: An Empirical Study. SENSORS 2022; 22:s22145247. [PMID: 35890927 PMCID: PMC9324493 DOI: 10.3390/s22145247] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 07/04/2022] [Accepted: 07/06/2022] [Indexed: 01/27/2023]
Abstract
Diabetes is a long-lasting disease triggered by expanded sugar levels in human blood and can affect various organs if left untreated. It contributes to heart disease, kidney issues, damaged nerves, damaged blood vessels, and blindness. Timely disease prediction can save precious lives and enable healthcare advisors to take care of the conditions. Most diabetic patients know little about the risk factors they face before diagnosis. Nowadays, hospitals deploy basic information systems, which generate vast amounts of data that cannot be converted into proper/useful information and cannot be used to support decision making for clinical purposes. There are different automated techniques available for the earlier prediction of disease. Ensemble learning is a data analysis technique that combines multiple techniques into a single optimal predictive system to evaluate bias and variation, and to improve predictions. Diabetes data, which included 17 variables, were gathered from the UCI repository of various datasets. The predictive models used in this study include AdaBoost, Bagging, and Random Forest, to compare the precision, recall, classification accuracy, and F1-score. Finally, the Random Forest Ensemble Method had the best accuracy (97%), whereas the AdaBoost and Bagging algorithms had lower accuracy, precision, recall, and F1-scores.
Collapse
|
25
|
A Comparative Analysis of Blockchain in Enhancing the Drug Traceability in Edible Foods Using Multiple Regression Analysis. J FOOD QUALITY 2022. [DOI: 10.1155/2022/1689913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
The growing need for access to safer food items is increasing, and hence, there is a need for a better supply chain management system in the food industry is increasing. The increased complexity of the existing systems tends to introduce more issues to the stakeholders, and also, the cost of product traceability is quite high. Hence, the industry is looking for effective solutions in relation to drug traceability, and the application of Blockchain technology enables the stakeholders in the food and beverage (F&B) sector to track the movement of goods, supported in gathering the required details so that the contaminated products can be identified and recalled without much delay and lesser recall costs to protect the lives of the individuals. The tampered food items are increasing and are impacting the supply chain process, brand name of the companies, and claim assurance. They create an adverse impact on the health of the individuals and cause higher economic loss to the health-care industry. The existing studies tend to focus on laying emphasis of the need for an enhanced, effective, and end tracking systems in the industry. The emergence of Blockchain technology enables centralized tracking of information support in enhancing the data privacy and increasing transparency and support in eradicating the tampered food products in the supply chain system. These approaches leverage the usage of smart contracts and decentralize the storage of information in a secure manner for enhanced product traceability in the F&B industry. The implementation of smart contracts generates better data governance, which tends to meet the needs and requirements of the stakeholders, and applies effective measures of food traceability. The primary objective of the study is to perform an analysis of Blockchain in enhancing drug traceability in the food sector. The researcher uses quantitative analysis for the study as it helps in understanding the critical determinants influencing drug traceability in food effectively, the survey method is used to gather the information, and past reviews are also used to possess a better understanding of the subject area effectively.
Collapse
|
26
|
Liu Q, Zhang M, He Y, Zhang L, Zou J, Yan Y, Guo Y. Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques. J Pers Med 2022; 12:jpm12060905. [PMID: 35743691 PMCID: PMC9224915 DOI: 10.3390/jpm12060905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 05/21/2022] [Accepted: 05/27/2022] [Indexed: 02/04/2023] Open
Abstract
Early identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of type 2 diabetes mellitus (T2DM) in Chinese elderly. A retrospective cohort study was conducted using the health screening data of adults older than 65 years in Wuhan, China from 2018 to 2020. With a strict data filtration, 127,031 records from the eligible participants were utilized. Overall, 8298 participants were diagnosed with incident T2DM during the 2-year follow-up (2019–2020). The dataset was randomly split into training set (n = 101,625) and test set (n = 25,406). We developed prediction models based on four ML algorithms: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). Using LASSO regression, 21 prediction features were selected. The Random under-sampling (RUS) was applied to address the class imbalance, and the Shapley Additive Explanations (SHAP) was used to calculate and visualize feature importance. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. The XGBoost model achieved the best performance (AUC = 0.7805, sensitivity = 0.6452, specificity = 0.7577, accuracy = 0.7503). Fasting plasma glucose (FPG), education, exercise, gender, and waist circumference (WC) were the top five important predictors. This study showed that XGBoost model can be applied to screen individuals at high risk of T2DM in the early phrase, which has the strong potential for intelligent prevention and control of diabetes. The key features could also be useful for developing targeted diabetes prevention interventions.
Collapse
Affiliation(s)
- Qing Liu
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (M.Z.)
| | - Miao Zhang
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (M.Z.)
| | - Yifeng He
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Lei Zhang
- School of Mathematics and Statistics, Wuhan University, Wuhan 430070, China;
| | - Jingui Zou
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Yaqiong Yan
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
| | - Yan Guo
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
- Correspondence:
| |
Collapse
|
27
|
A Comprehensive Review of Various Diabetic Prediction Models: A Literature Survey. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:8100697. [PMID: 35449835 PMCID: PMC9018179 DOI: 10.1155/2022/8100697] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/24/2022] [Accepted: 03/02/2022] [Indexed: 12/19/2022]
Abstract
Diabetes is a chronic disease characterized by a high amount of glucose in the blood and can cause too many complications also in the body, such as internal organ failure, retinopathy, and neuropathy. According to the predictions made by WHO, the figure may reach approximately 642 million by 2040, which means one in a ten may suffer from diabetes due to unhealthy lifestyle and lack of exercise. Many authors in the past have researched extensively on diabetes prediction through machine learning algorithms. The idea that had motivated us to present a review of various diabetic prediction models is to address the diabetic prediction problem by identifying, critically evaluating, and integrating the findings of all relevant, high-quality individual studies. In this paper, we have analysed the work done by various authors for diabetes prediction methods. Our analysis on diabetic prediction models was to find out the methods so as to select the best quality researches and to synthesize the different researches. Analysis of diabetes data disease is quite challenging because most of the data in the medical field are nonlinear, nonnormal, correlation structured, and complex in nature. Machine learning-based algorithms have been ruled out in the field of healthcare and medical imaging. Diabetes mellitus prediction at an early stage requires a different approach from other approaches. Machine learning-based system risk stratification can be used to categorize the patients into diabetic and controls. We strongly recommend our study because it comprises articles from various sources that will help other researchers on various diabetic prediction models.
Collapse
|
28
|
A Novel Approach for Feature Selection and Classification of Diabetes Mellitus: Machine Learning Methods. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:3820360. [PMID: 35463255 PMCID: PMC9033325 DOI: 10.1155/2022/3820360] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 03/12/2022] [Accepted: 03/19/2022] [Indexed: 01/12/2023]
Abstract
An active research area where the experts from the medical field are trying to envisage the problem with more accuracy is diabetes prediction. Surveys conducted by WHO have shown a remarkable increase in the diabetic patients. Diabetes generally remains in dormant mode and it boosts the other diseases if patients are diagnosed with some other disease such as damage to the kidney vessels, problems in retina of the eye, and cardiac problem; if unidentified, it can create metabolic disorders and too many complications in the body. The main objective of our study is to draw a comparative study of different classifiers and feature selection methods to predict the diabetes with greater accuracy. In this paper, we have studied multilayer perceptron, decision trees, K-nearest neighbour, and random forest classifiers and few feature selection techniques were applied on the classifiers to detect the diabetes at an early stage. Raw data is subjected to preprocessing techniques, thus removing outliers and imputing missing values by mean and then in the end hyperparameters optimization. Experiments were conducted on PIMA Indians diabetes dataset using Weka 3.9 and the accuracy achieved for multilayer perceptron is 77.60%, for decision trees is 76.07%, for K-nearest neighbour is 78.58%, and for random forest is 79.8%, which is by far the best accuracy for random forest classifier.
Collapse
|
29
|
Predicting Children with ADHD Using Behavioral Activity: A Machine Learning Analysis. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052737] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Attention deficit hyperactivity disorder (ADHD) is one of childhood’s most frequent neurobehavioral disorders. The purpose of this study is to: (i) extract the most prominent risk factors for children with ADHD; and (ii) propose a machine learning (ML)-based approach to classify children as either having ADHD or healthy. We extracted the data of 45,779 children aged 3–17 years from the 2018–2019 National Survey of Children’s Health (NSCH, 2018–2019). About 5218 (11.4%) of children were ADHD, and the rest of the children were healthy. Since the class label is highly imbalanced, we adopted a combination of oversampling and undersampling approaches to make a balanced class label. We adopted logistic regression (LR) to extract the significant factors for children with ADHD based on p-values (<0.05). Eight ML-based classifiers such as random forest (RF), Naïve Bayes (NB), decision tree (DT), XGBoost, k-nearest neighborhood (KNN), multilayer perceptron (MLP), support vector machine (SVM), and 1-dimensional convolution neural network (1D CNN) were adopted for the prediction of children with ADHD. The average age of the children with ADHD was 12.4 ± 3.4 years. Our findings showed that RF-based classifier provided the highest classification accuracy of 85.5%, sensitivity of 84.4%, specificity of 86.4%, and an AUC of 0.94. This study illustrated that LR with RF-based system could provide excellent accuracy for classifying and predicting children with ADHD. This system will be helpful for early detection and diagnosis of ADHD.
Collapse
|
30
|
Nadakinamani RG, Reyana A, Kautish S, Vibith AS, Gupta Y, Abdelwahab SF, Mohamed AW. Clinical Data Analysis for Prediction of Cardiovascular Disease Using Machine Learning Techniques. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:2973324. [PMID: 35069715 PMCID: PMC8767405 DOI: 10.1155/2022/2973324] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 12/03/2021] [Accepted: 12/15/2021] [Indexed: 02/08/2023]
Abstract
Cardiovascular disease is difficult to detect due to several risk factors, including high blood pressure, cholesterol, and an abnormal pulse rate. Accurate decision-making and optimal treatment are required to address cardiac risk. As machine learning technology advances, the healthcare industry's clinical practice is likely to change. As a result, researchers and clinicians must recognize the importance of machine learning techniques. The main objective of this research is to recommend a machine learning-based cardiovascular disease prediction system that is highly accurate. In contrast, modern machine learning algorithms such as REP Tree, M5P Tree, Random Tree, Linear Regression, Naive Bayes, J48, and JRIP are used to classify popular cardiovascular datasets. The proposed CDPS's performance was evaluated using a variety of metrics to identify the best suitable machine learning model. When it came to predicting cardiovascular disease patients, the Random Tree model performed admirably, with the highest accuracy of 100%, the lowest MAE of 0.0011, the lowest RMSE of 0.0231, and the fastest prediction time of 0.01 seconds.
Collapse
Affiliation(s)
| | - A. Reyana
- Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology, Coimbatore, Tamil Nadu, India
| | - Sandeep Kautish
- Department of Computer Science and Engineering, LBEF Campus, Kathmandu, Nepal, India
| | - A. S. Vibith
- Department of Computer Science and Engineering, RMK College of Engineering and Technology, Tiruvallur, Tamil Nadu, India
| | - Yogita Gupta
- Department of Biotechnology, Thapar Institute of Engineering & Technology, Patiala, India
| | - Sayed F. Abdelwahab
- Department of Pharmaceutics and Industrial Pharmacy, College of Pharmacy, Taif University, PO Box 11099, Taif 21944, Saudi Arabia
| | - Ali Wagdy Mohamed
- Operations Research Department, Faculty of Graduate Studies for Statistical Research, Cairo University, Giza 12613, Egypt
- Department of Mathematics and Actuarial Science, School of Science and Engineering, The American University in Cairo, New Cairo, Egypt
| |
Collapse
|
31
|
A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques. JOURNAL OF HEALTHCARE ENGINEERING 2022; 2022:1684017. [PMID: 35070225 PMCID: PMC8767376 DOI: 10.1155/2022/1684017] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Accepted: 12/17/2021] [Indexed: 11/18/2022]
Abstract
Diabetes is a chronic disease that continues to be a significant and global concern since it affects the entire population’s health. It is a metabolic disorder that leads to high blood sugar levels and many other problems such as stroke, kidney failure, and heart and nerve problems. Several researchers have attempted to construct an accurate diabetes prediction model over the years. However, this subject still faces significant open research issues due to a lack of appropriate data sets and prediction approaches, which pushes researchers to use big data analytics and machine learning (ML)-based methods. Applying four different machine learning methods, the research tries to overcome the problems and investigate healthcare predictive analytics. The study’s primary goal was to see how big data analytics and machine learning-based techniques may be used in diabetes. The examination of the results shows that the suggested ML-based framework may achieve a score of 86. Health experts and other stakeholders are working to develop categorization models that will aid in the prediction of diabetes and the formulation of preventative initiatives. The authors perform a review of the literature on machine models and suggest an intelligent framework for diabetes prediction based on their findings. Machine learning models are critically examined, and an intelligent machine learning-based architecture for diabetes prediction is proposed and evaluated by the authors. In this study, the authors utilize our framework to develop and assess decision tree (DT)-based random forest (RF) and support vector machine (SVM) learning models for diabetes prediction, which are the most widely used techniques in the literature at the time of writing. It is proposed in this study that a unique intelligent diabetes mellitus prediction framework (IDMPF) is developed using machine learning. According to the framework, it was developed after conducting a rigorous review of existing prediction models in the literature and examining their applicability to diabetes. Using the framework, the authors describe the training procedures, model assessment strategies, and issues associated with diabetes prediction, as well as solutions they provide. The findings of this study may be utilized by health professionals, stakeholders, students, and researchers who are involved in diabetes prediction research and development. The proposed work gives 83% accuracy with the minimum error rate.
Collapse
|
32
|
Zemmal N, Benzebouchi NE, Azizi N, Schwab D, Belhaouari SB. Unbalanced Learning for Diabetes Diagnosis Based on Enhanced Resampling and Stacking Classifier. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES 2022. [DOI: 10.4018/ijiit.309583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Diabetes is characterized by an abnormally enhanced concentration of glucose in the blood serum. It has a damaging impact on several noble body systems. Today, the concept of unbalanced learning has developed considerably in the domain of medical diagnosis, which greatly reduces the generation of erroneous classification results. The paper takes a hybrid approach to imbalanced learning by proposing an enhanced multimodal meta-learning method called IRESAMPLE+St to distinguish between normal and diabetic patients. This approach relies on the Stacking paradigm by utilizing the complementarity that may exist between classifiers. In the same focus of this study, a modified RESAMPLE-based technique referred to as IRESAMPLE+ and the SMOTE method are integrated as a preliminary resampling step to overcome and resolve the problem of unbalanced data. The suggested IRESAMPLE+St provides a computerized diabetes diagnostic system with impressive results, comparing it to the principal related studies, reflecting the design and engineering successes achieved.
Collapse
Affiliation(s)
- Nawel Zemmal
- Mathematics and Computer Science Department, Mohamed Cherif Messaadia University, Souk-Ahras, Algeria & Labged Laboratory, Badji Mokhtar Annaba University, Annaba, Algeria
| | - Nacer Eddine Benzebouchi
- Labged Laboratory, Computer Science Department, Badji Mokhtar Annaba University, Annaba, Algeria
| | - Nabiha Azizi
- Labged Laboratory, Computer Science Department, Badji Mokhtar Annaba University, Annaba, Algeria
| | | | | |
Collapse
|
33
|
Samet S, Laouar MR, Bendib I, Eom S. Analysis and Prediction of Diabetes Disease Using Machine Learning Methods. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY 2022. [DOI: 10.4018/ijdsst.303943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
To increase healthcare quality, early illness prediction helps patients prevent potentially life-threatening health issues before it is too late. Artificial intelligence is a rapidly evolving area, and its applications to diabetes, a worldwide epidemic, have the potential to revolutionize the way diabetes is diagnosed and managed. A total of six supervised machine learning algorithms based on patient data were used and compared to predict the diagnosis of diabetes mellitus. For experiments, the Pima Indians Diabetes Database was used, and their missing values were carefully handled by different techniques. For random train-test splits, the Random Forest classification algorithm achieved an accuracy rate of 92 percent. This model outperforms other state-of-the-art approaches due to the application of a combination of techniques for dealing with missing values (the mixture of imputing missing values techniques) that is proposed. With this approach, the models of this manuscript achieved better accuracy than prior work done with the Pima diabetes data.
Collapse
Affiliation(s)
- Sarra Samet
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Mohamed Ridda Laouar
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Issam Bendib
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Sean Eom
- Department of Management, Southeast Missouri State University, USA
| |
Collapse
|
34
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|
35
|
Rahman M, Pientong C, Zafar S, Ekalaksananan T, Paul RE, Haque U, Rocklöv J, Overgaard HJ. Mapping the spatial distribution of the dengue vector Aedes aegypti and predicting its abundance in northeastern Thailand using machine-learning approach. One Health 2021; 13:100358. [PMID: 34934797 PMCID: PMC8661047 DOI: 10.1016/j.onehlt.2021.100358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 12/02/2021] [Accepted: 12/02/2021] [Indexed: 10/19/2022] Open
Abstract
BACKGROUND Mapping the spatial distribution of the dengue vector Aedes (Ae.) aegypti and accurately predicting its abundance are crucial for designing effective vector control strategies and early warning tools for dengue epidemic prevention. Socio-ecological and landscape factors influence Ae. aegypti abundance. Therefore, we aimed to map the spatial distribution of female adult Ae. aegypti and predict its abundance in northeastern Thailand based on socioeconomic, climate change, and dengue knowledge, attitude and practices (KAP) and/or landscape factors using machine learning (ML)-based system. METHOD A total of 1066 females adult Ae. aegypti were collected from four villages in northeastern Thailand during January-December 2019. Information on household socioeconomics, KAP regarding climate change and dengue, and satellite-based landscape data were also acquired. Geographic information systems (GIS) were used to map the household-based spatial distribution of female adult Ae. aegypti abundance (high/low). Five popular supervised learning models, logistic regression (LR), support vector machine (SVM), k-nearest neighbor (kNN), artificial neural network (ANN), and random forest (RF), were used to predict females adult Ae. aegypti abundance (high/low). The predictive accuracy of each modeling technique was calculated and evaluated. Important variables for predicting female adult Ae. aegypti abundance were also identified using the best-fitted model. RESULTS Urban areas had higher abundance of female adult Ae. aegypti compared to rural areas. Overall, study respondents in both urban and rural areas had inadequate KAP regarding climate change and dengue. The average landscape factors per household in urban areas were rice crop (47.4%), natural tree cover (17.8%), built-up area (13.2%), permanent wetlands (21.2%), and rubber plantation (0%), and the corresponding figures for rural areas were 12.1, 2.0, 38.7, 40.1 and 0.1% respectively. Among all assessed models, RF showed the best prediction performance (socioeconomics: area under curve, AUC = 0.93, classification accuracy, CA = 0.86, F1 score = 0.85; KAP: AUC = 0.95, CA = 0.92, F1 = 0.90; landscape: AUC = 0.96, CA = 0.89, F1 = 0.87) for female adult Ae. aegypti abundance. The combined influences of all factors further improved the predictive accuracy in RF model (socioeconomics + KAP + landscape: AUC = 0.99, CA = 0.96 and F1 = 0.95). Dengue prevention practices were shown to be the most important predictor in the RF model for female adult Ae. aegypti abundance in northeastern Thailand. CONCLUSION The RF model is more suitable for the prediction of Ae. aegypti abundance in northeastern Thailand. Our study exemplifies that the application of GIS and machine learning systems has significant potential for understanding the spatial distribution of dengue vectors and predicting its abundance. The study findings might help optimize vector control strategies, future mosquito suppression, prediction and control strategies of epidemic arboviral diseases (dengue, chikungunya, and Zika). Such strategies can be incorporated into One Health approaches applying transdisciplinary approaches considering human-vector and agro-environmental interrelationships.
Collapse
Key Words
- ANN, Artificial neural network
- AUC, Area under curve
- Aedes aegypti
- CA, Classification accuracy.
- DENV, Dengue virus
- Dengue
- Early warning
- GIS, Geographic information systems
- HCI, Household crowding index
- KAP, Knowledge, attitude, and practice
- LR, logistic regression
- ML, Machine learning
- PCI, Premise condition index
- Prediction
- RF, Random forest
- SES, Socioeconomic status
- SVM, Support vector machine
- Supervised learning
- kNN, k-nearest neighbor
Collapse
Affiliation(s)
- M.S. Rahman
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- Department of Statistics, Begum Rokeya University, Rangpur, Rangpur-5404, Bangladesh
| | - Chamsai Pientong
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- HPV & EBV and Carcinogenesis Research Group, Khon Kaen University, Khon Kaen, Thailand
| | - Sumaira Zafar
- Environmental Engineering and Management Program, Asian Institute of Technology, Pathumthani, Thailand
| | - Tipaya Ekalaksananan
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- HPV & EBV and Carcinogenesis Research Group, Khon Kaen University, Khon Kaen, Thailand
| | - Richard E. Paul
- Unité de la Génétique Fonctionnelle des Maladies Infectieuses, Institut Pasteur, CNRS UMR 2000, 75015 Paris, France
| | - Ubydul Haque
- Department of Biostatistics and Epidemiology, University of North Texas Health Science Center, Fort Worth, TX 76177, USA
| | - Joacim Rocklöv
- Department of Public Health and Clinical Medicine, Umeå University, 90187 Umeå, Sweden
| | - Hans J. Overgaard
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand
- Faculty of Science and Technology, Norwegian University of Life Sciences, P.O. Box 5003, Ås, Norway
| |
Collapse
|
36
|
Gautier T, Ziegler LB, Gerber MS, Campos-Náñez E, Patek SD. Artificial intelligence and diabetes technology: A review. Metabolism 2021; 124:154872. [PMID: 34480920 DOI: 10.1016/j.metabol.2021.154872] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 07/27/2021] [Accepted: 08/28/2021] [Indexed: 12/15/2022]
Abstract
Artificial intelligence (AI) is widely discussed in the popular literature and is portrayed as impacting many aspects of human life, both in and out of the workplace. The potential for revolutionizing healthcare is significant because of the availability of increasingly powerful computational platforms and methods, along with increasingly informative sources of patient data, both in and out of clinical settings. This review aims to provide a realistic assessment of the potential for AI in understanding and managing diabetes, accounting for the state of the art in the methodology and medical devices that collect data, process data, and act accordingly. Acknowledging that many conflicting definitions of AI have been put forth, this article attempts to characterize the main elements of the field as they relate to diabetes, identifying the main perspectives and methods that can (i) affect basic understanding of the disease, (ii) affect understanding of risk factors (genetic, clinical, and behavioral) of diabetes development, (iii) improve diagnosis, (iv) improve understanding of the arc of disease (progression and personal/societal impact), and finally (v) improve treatment.
Collapse
Affiliation(s)
- Thibault Gautier
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America.
| | - Leah B Ziegler
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America
| | - Matthew S Gerber
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America
| | - Enrique Campos-Náñez
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America
| | - Stephen D Patek
- Dexcom/TypeZero, 946 Grady Avenue, Suite 203, Charlottesville, VA 22903, United States of America
| |
Collapse
|
37
|
Nadeem MW, Goh HG, Ponnusamy V, Andonovic I, Khan MA, Hussain M. A Fusion-Based Machine Learning Approach for the Prediction of the Onset of Diabetes. Healthcare (Basel) 2021; 9:1393. [PMID: 34683073 PMCID: PMC8535299 DOI: 10.3390/healthcare9101393] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 10/08/2021] [Accepted: 10/09/2021] [Indexed: 12/03/2022] Open
Abstract
A growing portfolio of research has been reported on the use of machine learning-based architectures and models in the domain of healthcare. The development of data-driven applications and services for the diagnosis and classification of key illness conditions is challenging owing to issues of low volume, low-quality contextual data for the training, and validation of algorithms, which, in turn, compromises the accuracy of the resultant models. Here, a fusion machine learning approach is presented reporting an improvement in the accuracy of the identification of diabetes and the prediction of the onset of critical events for patients with diabetes (PwD). Globally, the cost of treating diabetes, a prevalent chronic illness condition characterized by high levels of sugar in the bloodstream over long periods, is placing severe demands on health providers and the proposed solution has the potential to support an increase in the rates of survival of PwD through informing on the optimum treatment on an individual patient basis. At the core of the proposed architecture is a fusion of machine learning classifiers (Support Vector Machine and Artificial Neural Network). Results indicate a classification accuracy of 94.67%, exceeding the performance of reported machine learning models for diabetes by ~1.8% over the best reported to date.
Collapse
Affiliation(s)
- Muhammad Waqas Nadeem
- Faculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman (UTAR), Kampar 31900, Perak, Malaysia; (M.W.N.); (H.G.G.); (V.P.)
| | - Hock Guan Goh
- Faculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman (UTAR), Kampar 31900, Perak, Malaysia; (M.W.N.); (H.G.G.); (V.P.)
| | - Vasaki Ponnusamy
- Faculty of Information and Communication Technology (FICT), Universiti Tunku Abdul Rahman (UTAR), Kampar 31900, Perak, Malaysia; (M.W.N.); (H.G.G.); (V.P.)
| | - Ivan Andonovic
- Department of Electronic & Electrical Engineering, University of Strathclyde, Royal College Building, 204 George St., Glasgow G1 1XW, UK
| | - Muhammad Adnan Khan
- Pattern Recognition and Machine Learning Lab, Department of Software, Gachon University, Seongnam 13557, Korea
| | - Muzammil Hussain
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore 54000, Pakistan;
| |
Collapse
|
38
|
Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:9930985. [PMID: 34631003 PMCID: PMC8500744 DOI: 10.1155/2021/9930985] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 05/17/2021] [Accepted: 08/16/2021] [Indexed: 11/17/2022]
Abstract
The remarkable advancements in biotechnology and public healthcare infrastructures have led to a momentous production of critical and sensitive healthcare data. By applying intelligent data analysis techniques, many interesting patterns are identified for the early and onset detection and prevention of several fatal diseases. Diabetes mellitus is an extremely life-threatening disease because it contributes to other lethal diseases, i.e., heart, kidney, and nerve damage. In this paper, a machine learning based approach has been proposed for the classification, early-stage identification, and prediction of diabetes. Furthermore, it also presents an IoT-based hypothetical diabetes monitoring system for a healthy and affected person to monitor his blood glucose (BG) level. For diabetes classification, three different classifiers have been employed, i.e., random forest (RF), multilayer perceptron (MLP), and logistic regression (LR). For predictive analysis, we have employed long short-term memory (LSTM), moving averages (MA), and linear regression (LR). For experimental evaluation, a benchmark PIMA Indian Diabetes dataset is used. During the analysis, it is observed that MLP outperforms other classifiers with 86.08% of accuracy and LSTM improves the significant prediction with 87.26% accuracy of diabetes. Moreover, a comparative analysis of the proposed approach is also performed with existing state-of-the-art techniques, demonstrating the adaptability of the proposed approach in many public healthcare applications.
Collapse
|
39
|
Development and validation of a new diabetes index for the risk classification of present and new-onset diabetes: multicohort study. Sci Rep 2021; 11:15748. [PMID: 34344964 PMCID: PMC8333254 DOI: 10.1038/s41598-021-95341-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/26/2021] [Indexed: 02/07/2023] Open
Abstract
In this study, we aimed to propose a novel diabetes index for the risk classification based on machine learning techniques with a high accuracy for diabetes mellitus. Upon analyzing their demographic and biochemical data, we classified the 2013-16 Korea National Health and Nutrition Examination Survey (KNHANES), the 2017-18 KNHANES, and the Korean Genome and Epidemiology Study (KoGES), as the derivation, internal validation, and external validation sets, respectively. We constructed a new diabetes index using logistic regression (LR) and calculated the probability of diabetes in the validation sets. We used the area under the receiver operating characteristic curve (AUROC) and Cox regression analysis to measure the performance of the internal and external validation sets, respectively. We constructed a gender-specific diabetes prediction model, having a resultant AUROC of 0.93 and 0.94 for men and women, respectively. Based on this probability, we classified participants into five groups and analyzed cumulative incidence from the KoGES dataset. Group 5 demonstrated significantly worse outcomes than those in other groups. Our novel model for predicting diabetes, based on two large-scale population-based cohort studies, showed high sensitivity and selectivity. Therefore, our diabetes index can be used to classify individuals at high risk of diabetes.
Collapse
|
40
|
Graph Convolutional Network Enabled Two-Stream Learning Architecture for Diabetes Classification based on Flash Glucose Monitoring Data. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
41
|
Gupta D, Choudhury A, Gupta U, Singh P, Prasad M. Computational approach to clinical diagnosis of diabetes disease: a comparative study. MULTIMEDIA TOOLS AND APPLICATIONS 2021; 80:30091-30116. [DOI: 10.1007/s11042-020-10242-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 10/14/2020] [Accepted: 12/09/2020] [Indexed: 08/30/2023]
|
42
|
Multiclass classification of metabolic conditions using fasting plasma levels of glucose and insulin. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00550-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
43
|
Rahman SMJ, Ahmed NAMF, Abedin MM, Ahammed B, Ali M, Rahman MJ, Maniruzzaman M. Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach. PLoS One 2021; 16:e0253172. [PMID: 34138925 PMCID: PMC8211236 DOI: 10.1371/journal.pone.0253172] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 05/28/2021] [Indexed: 11/23/2022] Open
Abstract
Aims Malnutrition is a major health issue among Bangladeshi under-five (U5) children. Children are malnourished if the calories and proteins they take through their diet are not sufficient for their growth and maintenance. The goal of the research was to use machine learning (ML) algorithms to detect the risk factors of malnutrition (stunted, wasted, and underweight) as well as their prediction. Methods This work utilized malnutrition data that was derived from Bangladesh Demographic and Health Survey which was conducted in 2014. The selected dataset consisted of 7079 children with 13 factors. The potential risks of malnutrition have been identified by logistic regression (LR). Moreover, 3 ML classifiers (support vector machine (SVM), random forest (RF), and LR) have been implemented for predicting malnutrition and the performance of these ML algorithms were assessed on the basis of accuracy. Results The average prevalence of stunted, wasted, and underweight was 35.4%, 15.4%, and 32.8%, respectively. It was noted that LR identified five risk factors for stunting and underweight, as well as four factors for wasting. Results illustrated that RF can be accurately classified as stunted, wasted, and underweight children and obtained the highest accuracy of 88.3% for stunted, 87.7% for wasted, and 85.7% for underweight. Conclusion This research focused on the identification and prediction of major risk factors for stunting, wasting, and underweight using ML algorithms which will aid policymakers in reducing malnutrition among Bangladesh’s U5 children.
Collapse
Affiliation(s)
| | | | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - Mohammad Ali
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | | | - Md. Maniruzzaman
- Statistics Discipline, Khulna University, Khulna, Bangladesh
- * E-mail:
| |
Collapse
|
44
|
Islam MM, Rahman MJ, Chandra Roy D, Tawabunnahar M, Jahan R, Ahmed NAMF, Maniruzzaman M. Machine learning algorithm for characterizing risks of hypertension, at an early stage in Bangladesh. Diabetes Metab Syndr 2021; 15:877-884. [PMID: 33892404 DOI: 10.1016/j.dsx.2021.03.035] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/24/2021] [Accepted: 03/31/2021] [Indexed: 12/30/2022]
Abstract
BACKGROUND AND AIMS Hypertension has become a major public health issue as the prevalence and risk of premature death and disability among adults due to hypertension has increased globally. The main objective is to characterize the risk factors of hypertension among adults in Bangladesh using machine learning (ML) algorithms. MATERIALS AND METHODS The hypertension data was derived from Bangladesh demographic and health survey, 2017-18, which included 6965 people aged 35 and above. Two most promising risk factor identification methods, namely least absolute shrinkage operator (LASSO) and support vector machine recursive feature elimination (SVMRFE) are implemented to detect the critical risk factors of hypertension. Additionally, four well-known ML algorithms as artificial neural network, decision tree, random forest, and gradient boosting (GB) have been used to predict hypertension. Performance scores of these algorithms were evaluated by accuracy, precision, recall, F-measure, and area under the curve (AUC). RESULTS The results clarify that age, BMI, wealth index, working status, and marital status for LASSO and age, BMI, marital status, diabetes and region for SVMRFE appear to be the top-most five significant risk factors for hypertension. Our findings reveal that the combination of SVMRFE-GB gives the maximum accuracy (66.98%), recall (97.92%), F-measure (78.99%), and AUC (0.669) compared to others. CONCLUSION GB-based algorithm confirms the best performer for prediction of hypertension, at an early stage in Bangladesh. Therefore, this study highly suggests that the policymakers make proper judgments for controlling hypertension using SVMRFE-GB-based combination to save time and reduce cost for Bangladeshi adults.
Collapse
Affiliation(s)
- Md Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Dulal Chandra Roy
- Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Most Tawabunnahar
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh 2220, Bangladesh.
| | - Rubaiyat Jahan
- Institution of Education and Research, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - N A M Faisal Ahmed
- Institution of Education and Research, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna 9208, Bangladesh.
| |
Collapse
|
45
|
Ray A, Chaudhuri AK. Smart healthcare disease diagnosis and patient management: Innovation, improvement and skill development. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2020.100011] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
46
|
Smoker's characteristics, general health and their perception of smoking in the social environment: a study of smokers in Rajshahi City, Bangladesh. JOURNAL OF PUBLIC HEALTH-HEIDELBERG 2021; 30:1501-1512. [PMID: 33425660 PMCID: PMC7786872 DOI: 10.1007/s10389-020-01413-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 11/17/2020] [Indexed: 01/28/2023]
Abstract
Background Smoking is one of the bad habits in social environment and is one of the main causes of immature death in Bangladesh. Rajshahi is one of the cleanest, most peaceful cities in Bangladesh, but the inhabitants often feel uncomfortable about smokers who smoke in public places and often on transport. Smoking frequency is very high among males compared to females, and a large number of smokers are building or road construction laborers and people involved in offering different services such as transportation, vending from vans, etc. The practice of smoking in this area is destructive for mental and physical health especially for students compared to other professionals because the city is known as the City of Education. Methods The study analyzes smokers’ characteristics, general health, and their (smokers) perception of smoking in public places. Cross-sectional data were collected randomly from 160 smokers through face-to-face questionnaire survey. The determinants of complexities with regard to social environment and human health were studied using frequency distribution, chi-square test, and binary and multinomial logistic regression analysis using IBM SPSS version 24. Results Frequency distributions reveal that 93.8% of smokers believe that smoking creates public health hazards, 51.3% of smokers think it causes breathing complexities for non-smokers, 48.8% of smokers feel smoking causes air pollution, 68.8% of smokers think smoking causes gastric problems, 24.4% of smokers had headache problems due to smoking and cigarette fumes, 86.3% of smokers learnt smoking from friends, 48.8% of smokers smoke due to their addiction and 25.6% for depression, and 80.6% usually smoke after having a meal. The chi-square test reveals that class of smokers was significantly associated with frequency of heartbeat rate, starting smoking at specific age level was significantly associated with suffering from diseases, category of smoking articles was significantly associated with suffering from disease, class of smokers was significantly associated with causes for smoking, and starting smoking at specific age level was significantly associated with profession of the smokers at 1% level of significance respectively. A significant odds ratio was found (OR = 6.363, 95% CI 1.918–21.104, p < 0.01) for the profession group of students/labour at 1% level; their outcomes for suffering from diseases such as gastric problem and fever/headache/others were 6.363 times those for the profession group of service/other smokers. Conclusion Smoking in public places should be restricted because non-smokers cannot breathe freely and it is not healthy for them to inhale smoke indirectly, which has many adverse effects on public health. The study also reveals that the majority of the smokers have gastric problems, abnormal heartbeat rates, frequent headaches, depression and addiction problems, etc., and that they believe that smoking causes significant health hazard on human health and social environment. Therefore, necessary interventions should be taken immediately by policy-makers to prevent smoking in public places. Supplementary Information The online version contains supplementary material available at 10.1007/s10389-020-01413-w.
Collapse
|
47
|
Fourati M, Smaoui S, Hlima HB, Elhadef K, Braïek OB, Ennouri K, Mtibaa AC, Mellouli L. Bioactive Compounds and Pharmacological Potential of Pomegranate (Punica granatum) Seeds - A Review. PLANT FOODS FOR HUMAN NUTRITION (DORDRECHT, NETHERLANDS) 2020; 75:477-486. [PMID: 33040298 DOI: 10.1007/s11130-020-00863-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 10/01/2020] [Indexed: 06/11/2023]
Abstract
The use of complementary medicine has recently increased in an attempt to find effective alternative therapies that reduce the adverse effects of drugs. Pomegranate (Punica granatum L.) by-products, such as seeds, is a rich source of phytochemicals with a high antioxidant activity, thus possessing health benefits. For the identification and quantification of the pomegranate seeds chemical compounds, particular attention has been drawn to the latest developments in the HPLC coupling with electrospray ionization (ESI) MS/MS detection. In fact, a wide range of phytochemicals including phenolic acid, anthocyanins, flavonoids, hydrolysable tannins and other polyphenols were characterized. Furthermore, an exhaustive review of the scientific literature on pomegranate seeds on biomedicine and pharmacotherapy was carried out. Indeed, both in vitro and in vivo studies have demonstrated how pomegranate seeds possess antioxidant, anti- cardiovascular diseases, anti-osteoporosis, antidiabetic, anti-inflammatory and anticancer activities. The present review describes a recent tendency in research focusing on the chemical and biomedical features of the pomegranate seeds to value them as natural additives or active compounds for first-order diseases.
Collapse
Affiliation(s)
- Mariam Fourati
- Laboratory of Microbial, Enzymatic Biotechnology and Biomolecules (LBMEB), Center of Biotechnology of Sfax, University of Sfax-Tunisia, Road of Sidi Mansour Km 6, P. O. Box 1177, 3018, Sfax, Tunisia
| | - Slim Smaoui
- Laboratory of Microbial, Enzymatic Biotechnology and Biomolecules (LBMEB), Center of Biotechnology of Sfax, University of Sfax-Tunisia, Road of Sidi Mansour Km 6, P. O. Box 1177, 3018, Sfax, Tunisia.
| | - Hajer Ben Hlima
- Laboratoire de Génie Enzymatique et de Microbiologie, Equipe de Biotechnologie des Algues, Ecole Nationale d'Ingénieurs de Sfax, Université de Sfax, 3038, Sfax, Tunisia
| | - Khaoula Elhadef
- Laboratory of Microbial, Enzymatic Biotechnology and Biomolecules (LBMEB), Center of Biotechnology of Sfax, University of Sfax-Tunisia, Road of Sidi Mansour Km 6, P. O. Box 1177, 3018, Sfax, Tunisia
| | - Olfa Ben Braïek
- Laboratory of Transmissible Diseases and Biologically Active Substances (LR99ES27), Faculty of Pharmacy, University of Monastir, Monastir, Tunisia
| | - Karim Ennouri
- Laboratory of Amelioration and Protection of Olive Genetic Resources, Olive Tree Institute, Sfax University, Sfax, Tunisia
| | - Ahlem Chakchouk Mtibaa
- Laboratory of Microbial, Enzymatic Biotechnology and Biomolecules (LBMEB), Center of Biotechnology of Sfax, University of Sfax-Tunisia, Road of Sidi Mansour Km 6, P. O. Box 1177, 3018, Sfax, Tunisia
| | - Lotfi Mellouli
- Laboratory of Microbial, Enzymatic Biotechnology and Biomolecules (LBMEB), Center of Biotechnology of Sfax, University of Sfax-Tunisia, Road of Sidi Mansour Km 6, P. O. Box 1177, 3018, Sfax, Tunisia
| |
Collapse
|
48
|
Islam MM, Rahman MJ, Chandra Roy D, Maniruzzaman M. Automated detection and classification of diabetes disease based on Bangladesh demographic and health survey data, 2011 using machine learning approach. Diabetes Metab Syndr 2020; 14:217-219. [PMID: 32193086 DOI: 10.1016/j.dsx.2020.03.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/12/2020] [Revised: 03/08/2020] [Accepted: 03/09/2020] [Indexed: 10/24/2022]
Abstract
BACKGROUND AND AIMS Diabetes has been recognized as a continuing health challenge for the twenty-first century, both in developed and developing countries including Bangladesh. The main objective of this study is to use machine learning (ML) based classifiers for automated detection and classification of diabetes. METHODS The diabetes dataset have taken from Bangladesh demographic and health survey, 2011 data having 1569 respondents are 127 diabetes. Two statistical tests as independent t for continuous and chi-square for categorical variables are used to determine the risk factors of diabetes. Six ML-based classifiers as support vector machine, random forest, linear discriminant analysis, logistic regression, k-nearest neighborhood, bagged classification and regression tree (Bagged CART) have been adopted to predict and classify of diabetes. RESULTS Our findings show that 11 factors out of 15 factors are significantly associated with diabetes. Bagged CART provides the highest accuracy and area under the curve of 94.3% and 0.600. CONCLUSIONS Bagged CART anticipates a very supportive computational resource for classification of diabetes and it would be very helpful to the doctors for making a decision to control diabetes disease in Bangladesh.
Collapse
Affiliation(s)
- Md Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Dulal Chandra Roy
- Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh.
| | - Md Maniruzzaman
- Department of Statistics, University of Rajshahi, Rajshahi, 6205, Bangladesh; Statistics Discipline, Khulna University, Khulna, 9208, Bangladesh.
| |
Collapse
|
49
|
Xue M, Su Y, Li C, Wang S, Yao H. Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. J Diabetes Res 2020; 2020:6873891. [PMID: 33029536 PMCID: PMC7532405 DOI: 10.1155/2020/6873891] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 08/01/2020] [Accepted: 09/02/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. METHODS A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. RESULTS The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F-1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). CONCLUSIONS We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.
Collapse
Affiliation(s)
- Mingyue Xue
- Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Yinxia Su
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Chen Li
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Shuxia Wang
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| | - Hua Yao
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| |
Collapse
|