1
|
Islam MM, Rahman MJ, Rabby MS, Alam MJ, Pollob SMAI, Ahmed NAMF, Tawabunnahar M, Roy DC, Shin J, Maniruzzaman M. Predicting the risk of diabetic retinopathy using explainable machine learning algorithms. Diabetes Metab Syndr 2023; 17:102919. [PMID: 38091881 DOI: 10.1016/j.dsx.2023.102919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/31/2023]
Abstract
BACKGROUND AND OBJECTIVE Diabetic retinopathy (DR) is a global health concern among diabetic patients. The objective of this study was to propose an explainable machine learning (ML)-based system for predicting the risk of DR. MATERIALS AND METHODS This study utilized publicly available cross-sectional data in a Chinese cohort of 6374 respondents. We employed boruta and least absolute shrinkage and selection operator (LASSO) based feature selection methods to identify the common predictors of DR. Using the identified predictors, we trained and optimized four widly applicable models (artificial neural network, support vector machine, random forest, and extreme gradient boosting (XGBoost) to predict patients with DR. Moreover, shapely additive explanation (SHAP) was adopted to show the contribution of each predictor of DR in the prediction. RESULTS Combining Boruta and LASSO method revealed that community, TCTG, HDLC, BUN, FPG, HbAlc, weight, and duration were the most important predictors of DR. The XGBoost-based model outperformed the other models, with an accuracy of 90.01%, precision of 91.80%, recall of 97.91%, F1 score of 94.86%, and AUC of 0.850. Moreover, SHAP method showed that HbA1c, community, FPG, TCTG, duration, and UA1b were the influencing predictors of DR. CONCLUSION The proposed integrating system will be helpful as a tool for selecting significant predictors, which can predict patients who are at high risk of DR at an early stage in China.
Collapse
Affiliation(s)
- Md Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh; Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Md Symun Rabby
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Md Jahangir Alam
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | | | - N A M Faisal Ahmed
- Instutite of Education and Research, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Most Tawabunnahar
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Dulal Chandra Roy
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Junpil Shin
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, 965-8580, Fukushima, Japan.
| | - Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna-9208, Bangladesh.
| |
Collapse
|
2
|
Thotad PN, Bharamagoudar GR, Anami BS. Diabetes disease detection and classification on Indian demographic and health survey data using machine learning methods. Diabetes Metab Syndr 2023; 17:102690. [PMID: 36527769 DOI: 10.1016/j.dsx.2022.102690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Revised: 11/30/2022] [Accepted: 12/02/2022] [Indexed: 12/07/2022]
Abstract
BACKGROUND & AIM Diabetes mellitus has become one of the out brakes causing major health issues in developing countries like India. The need for leveraging technology is felt in diabetes management. The main objective of this work is to deploy machine learning methods for the detection and classification of diabetes having clinical relevance. METHODS Indian demographic and health survey-2016 dataset is considered and determined the risk factors for continuous and categorical data. Kernel entropy component analysis is used for the dimensionality reduction of the feature set. Predictive exploration-based machine learning methods like logistic regression, gaussian naive Bayes, linear discriminant analysis, support vector classifier, k-nearest neighbor, decision tree, extreme gradient boosting, kernel entropy component analysis, and random forest are deployed in the work. The deployed methodology has three phases: feature extraction, classification, and prediction. RESULTS Random Forest gave the maximum classification accuracy of 99.84% and 96.75% for imbalanced and kernel entropy component analysis-induced balanced datasets (using synthetic minority oversampling technique) respectively. The maximum precision of 99.64% is obtained using a support vector classifier on the balanced dataset. The area under the curve is 99%, which is observed from kernel entropy component analysis induced random forest on the balanced dataset. All other models performed moderately when applied to kernel entropy component analysis trained dataset. CONCLUSIONS Random Forest model performed better in comparison with other models. The overall performance of the machine learning models can be improved by training the diabetes dataset using kernel entropy component analysis.
Collapse
Affiliation(s)
- Puneeth N Thotad
- Department of Master of Computer Applications, KLE Institute of Technology, Hubballi, 580 027, India.
| | - Geeta R Bharamagoudar
- Department of Computer Science, KLE Institute of Technology, Hubballi, 580 027, India
| | - Basavaraj S Anami
- School of Computer Science & Engineering, KLE Technological University, Hubballi, 580 031, India
| |
Collapse
|
3
|
Simaiya S, Kaur R, Sandhu JK, Alsafyani M, Alroobaea R, alsekait DM, Margala M, Chakrabarti P. A novel multistage ensemble approach for prediction and classification of diabetes. Front Physiol 2022; 13:1085240. [PMID: 36601350 PMCID: PMC9807241 DOI: 10.3389/fphys.2022.1085240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 11/22/2022] [Indexed: 12/23/2022] Open
Abstract
Diabetes mellitus is a metabolic syndrome affecting millions of people worldwide. Every year, the rate of occurrence rises drastically. Diabetes-related problems across several vital organs of the body can be fatal if left untreated. Diabetes must be detected early to receive proper treatment, preventing the condition from escalating to severe problems. Tremendous health sciences and biotechnology advancements have resulted in massive data that generated massive Electronic Health Records and clinical information. The exponential increase of electronically gathered information has resulted in more complicated, accurate prediction models that can be updated continuously using machine learning techniques. This research mainly emphasizes discovering the best ensemble model for predicting diabetes. A new multistage ensemble model is proposed for diabetes prediction. In this model, accuracy is predicated on the Pima Indian Diabetes dataset. The accuracy of the proposed ensemble model is compared with the existing machine learning model, and the experimental results demonstrate the performance of the proposed model in terms of higher Precision, f-measure, Recall, and area under the curve.
Collapse
Affiliation(s)
- Sarita Simaiya
- Department of Computer Science and Engineering, Institute of Engineering and Technology, Chandigarh University, Mohali, Punjab, India,School of Computing and Informatics, University of Louisiana, Lafayette, LA, United States,*Correspondence: Sarita Simaiya, ; Martin Margala,
| | - Rajwinder Kaur
- Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India
| | - Jasminder Kaur Sandhu
- Department of Computer Science and Engineering, Institute of Engineering and Technology, Chandigarh University, Mohali, Punjab, India
| | - Majed Alsafyani
- Department Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Roobaea Alroobaea
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Deema mohammed alsekait
- Department of Computer Science and Information Technology, Applied College, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Martin Margala
- School of Computing and Informatics, University of Louisiana, Lafayette, LA, United States,*Correspondence: Sarita Simaiya, ; Martin Margala,
| | | |
Collapse
|
4
|
Islam MM, Rahman MJ, Menhazul Abedin M, Ahammed B, Ali M, Ahmed NF, Maniruzzaman M. Identification of the risk factors of type 2 diabetes and its prediction using machine learning techniques. Health Syst (Basingstoke) 2022; 12:243-254. [PMID: 37234468 PMCID: PMC10208154 DOI: 10.1080/20476965.2022.2141141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 10/20/2022] [Indexed: 11/07/2022] Open
Abstract
This study identified the risk factors for type 2 diabetes (T2D) and proposed a machine learning (ML) technique for predicting T2D. The risk factors for T2D were identified by multiple logistic regression (MLR) using p-value (p<0.05). Then, five ML-based techniques, including logistic regression, naïve Bayes, J48, multilayer perceptron, and random forest (RF) were employed to predict T2D. This study utilized two publicly available datasets, derived from the National Health and Nutrition Examination Survey, 2009-2010 and 2011-2012. About 4922 respondents with 387 T2D patients were included in 2009-2010 dataset, whereas 4936 respondents with 373 T2D patients were included in 2011-2012. This study identified six risk factors (age, education, marital status, SBP, smoking, and BMI) for 2009-2010 and nine risk factors (age, race, marital status, SBP, DBP, direct cholesterol, physical activity, smoking, and BMI) for 2011-2012. RF-based classifier obtained 95.9% accuracy, 95.7% sensitivity, 95.3% F-measure, and 0.946 area under the curve.
Collapse
Affiliation(s)
- Md. Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh
| | | | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - Mohammad Ali
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - N.A.M Faisal Ahmed
- Institute of Education and Research, University of Rajshahi, Rajshahi, Bangladesh
| | | |
Collapse
|
5
|
Dutta A, Hasan MK, Ahmad M, Awal MA, Islam MA, Masud M, Meshref H. Early Prediction of Diabetes Using an Ensemble of Machine Learning Models. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph191912378. [PMID: 36231678 PMCID: PMC9566114 DOI: 10.3390/ijerph191912378] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/20/2022] [Accepted: 09/24/2022] [Indexed: 05/15/2023]
Abstract
Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of 0.735 and an area under the ROC curve (AUC) of 0.832. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data.
Collapse
Affiliation(s)
- Aishwariya Dutta
- Department of Biomedical Engineering (BME), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh
- Department of Biomedical Engineering (BME), Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka 1216, Bangladesh
| | - Md. Kamrul Hasan
- Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh
| | - Mohiuddin Ahmad
- Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh
| | - Md. Abdul Awal
- School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD 4072, Australia
- Electronics and Communication Engineering (ECE) Discipline, Khulna University (KU), Khulna 9208, Bangladesh
- Correspondence:
| | | | - Mehedi Masud
- Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| | - Hossam Meshref
- Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
| |
Collapse
|
6
|
Narwane SV, Sawarkar SD. Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction. Diabetes Metab Syndr 2022; 16:102609. [PMID: 36099677 DOI: 10.1016/j.dsx.2022.102609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 08/21/2022] [Accepted: 08/23/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND AND AIMS Healthcare is a sensitive sector, and addressing the class imbalance in the healthcare domain is a time-consuming task for machine learning-based systems due to the vast amount of data. This study looks into the impact of socioeconomic disparities on the healthcare data of diabetic patients to make accurate disease predictions. METHODS This study proposed a systematic approach of Closest Distance Ranking and Principal Component Analysis to deal with the unbalanced dataset. A typical machine learning technique was used to analyze the proposed approach. The data set of pregnant diabetic women is analysed for accurate detection. RESULTS The results of the case are analysed using sensitivity, which demonstrates that the minority class's lack of information makes it impossible to forecast the results. On the other hand, the unbalanced dataset was treated using the proposed technique and evaluated with the machine learning algorithm which significantly increased the performance of the system. CONCLUSION The performance of the machine learning-based system was significantly enhanced by the unbalanced dataset which was processed with the proposed technique and evaluated with the machine learning algorithm. For the first time, an unbalanced dataset was treated with a combination of Closest Distance Ranking and Principal Component Analysis.
Collapse
Affiliation(s)
- Swati V Narwane
- Department of Computer Engineering, Datta Meghe College of Engineering, Navi Mumbai, Pin Code: 400 708, India.
| | - Sudhir D Sawarkar
- Department of Computer Engineering, Datta Meghe College of Engineering, Navi Mumbai, Pin Code: 400 708, India.
| |
Collapse
|
7
|
Deep Learning Paradigm for Cardiovascular Disease/Stroke Risk Stratification in Parkinson’s Disease Affected by COVID-19: A Narrative Review. Diagnostics (Basel) 2022; 12:diagnostics12071543. [PMID: 35885449 PMCID: PMC9324237 DOI: 10.3390/diagnostics12071543] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 06/14/2022] [Accepted: 06/16/2022] [Indexed: 11/16/2022] Open
Abstract
Background and Motivation: Parkinson’s disease (PD) is one of the most serious, non-curable, and expensive to treat. Recently, machine learning (ML) has shown to be able to predict cardiovascular/stroke risk in PD patients. The presence of COVID-19 causes the ML systems to become severely non-linear and poses challenges in cardiovascular/stroke risk stratification. Further, due to comorbidity, sample size constraints, and poor scientific and clinical validation techniques, there have been no well-explained ML paradigms. Deep neural networks are powerful learning machines that generalize non-linear conditions. This study presents a novel investigation of deep learning (DL) solutions for CVD/stroke risk prediction in PD patients affected by the COVID-19 framework. Method: The PRISMA search strategy was used for the selection of 292 studies closely associated with the effect of PD on CVD risk in the COVID-19 framework. We study the hypothesis that PD in the presence of COVID-19 can cause more harm to the heart and brain than in non-COVID-19 conditions. COVID-19 lung damage severity can be used as a covariate during DL training model designs. We, therefore, propose a DL model for the estimation of, (i) COVID-19 lesions in computed tomography (CT) scans and (ii) combining the covariates of PD, COVID-19 lesions, office and laboratory arterial atherosclerotic image-based biomarkers, and medicine usage for the PD patients for the design of DL point-based models for CVD/stroke risk stratification. Results: We validated the feasibility of CVD/stroke risk stratification in PD patients in the presence of a COVID-19 environment and this was also verified. DL architectures like long short-term memory (LSTM), and recurrent neural network (RNN) were studied for CVD/stroke risk stratification showing powerful designs. Lastly, we examined the artificial intelligence bias and provided recommendations for early detection of CVD/stroke in PD patients in the presence of COVID-19. Conclusion: The DL is a very powerful tool for predicting CVD/stroke risk in PD patients affected by COVID-19.
Collapse
|
8
|
Islam Pollob SMA, Abedin MM, Islam MT, Islam MM, Maniruzzaman M. Predicting risks of low birth weight in Bangladesh with machine learning. PLoS One 2022; 17:e0267190. [PMID: 35617201 PMCID: PMC9135259 DOI: 10.1371/journal.pone.0267190] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Accepted: 04/04/2022] [Indexed: 11/26/2022] Open
Abstract
Background and objective Low birth weight is one of the primary causes of child mortality and several diseases of future life in developing countries, especially in Southern Asia. The main objective of this study is to determine the risk factors of low birth weight and predict low birth weight babies based on machine learning algorithms. Materials and methods Low birth weight data has been taken from the Bangladesh Demographic and Health Survey, 2017–18, which had 2351 respondents. The risk factors associated with low birth weight were investigated using binary logistic regression. Two machine learning-based classifiers (logistic regression and decision tree) were adopted to characterize and predict low birth weight. The model performances were evaluated by accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve. Results The average percentage of low birth weight in Bangladesh was 16.2%. The respondent’s region, education, wealth index, height, twin child, and alive child were statistically significant risk factors for low birth weight babies. The logistic regression-based classifier performed 87.6% accuracy and 0.59 area under the curve for holdout (90:10) cross-validation, whereas the decision tree performed 85.4% accuracy and 0.55 area under the curve. Conclusions Logistic regression-based classifier provided the most accurate classification of low birth weight babies and has the highest accuracy. This study’s findings indicate the necessity for an efficient, cost-effective, and integrated complementary approach to reduce and correctly predict low birth weight babies in Bangladesh.
Collapse
Affiliation(s)
| | | | | | - Md. Merajul Islam
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh
| | - Md. Maniruzzaman
- Statistics Discipline, Khulna University, Khulna, Bangladesh
- * E-mail:
| |
Collapse
|
9
|
Swislocki AL. Glucose Trajectory: More than Changing Glucose Tolerance with Age? Metab Syndr Relat Disord 2022; 20:313-320. [DOI: 10.1089/met.2021.0093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Arthur L.M. Swislocki
- Medical Service, VA Northern California Health Care System (612/111), Martinez, California, USA
- Division of Endocrinology and Metabolism, Department of Internal Medicine, UC Davis School of Medicine, Sacramento, California, USA
| |
Collapse
|
10
|
Rahman SMJ, Ahmed NAMF, Abedin MM, Ahammed B, Ali M, Rahman MJ, Maniruzzaman M. Investigate the risk factors of stunting, wasting, and underweight among under-five Bangladeshi children and its prediction based on machine learning approach. PLoS One 2021; 16:e0253172. [PMID: 34138925 PMCID: PMC8211236 DOI: 10.1371/journal.pone.0253172] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 05/28/2021] [Indexed: 11/23/2022] Open
Abstract
Aims Malnutrition is a major health issue among Bangladeshi under-five (U5) children. Children are malnourished if the calories and proteins they take through their diet are not sufficient for their growth and maintenance. The goal of the research was to use machine learning (ML) algorithms to detect the risk factors of malnutrition (stunted, wasted, and underweight) as well as their prediction. Methods This work utilized malnutrition data that was derived from Bangladesh Demographic and Health Survey which was conducted in 2014. The selected dataset consisted of 7079 children with 13 factors. The potential risks of malnutrition have been identified by logistic regression (LR). Moreover, 3 ML classifiers (support vector machine (SVM), random forest (RF), and LR) have been implemented for predicting malnutrition and the performance of these ML algorithms were assessed on the basis of accuracy. Results The average prevalence of stunted, wasted, and underweight was 35.4%, 15.4%, and 32.8%, respectively. It was noted that LR identified five risk factors for stunting and underweight, as well as four factors for wasting. Results illustrated that RF can be accurately classified as stunted, wasted, and underweight children and obtained the highest accuracy of 88.3% for stunted, 87.7% for wasted, and 85.7% for underweight. Conclusion This research focused on the identification and prediction of major risk factors for stunting, wasting, and underweight using ML algorithms which will aid policymakers in reducing malnutrition among Bangladesh’s U5 children.
Collapse
Affiliation(s)
| | | | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - Mohammad Ali
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | | | - Md. Maniruzzaman
- Statistics Discipline, Khulna University, Khulna, Bangladesh
- * E-mail:
| |
Collapse
|
11
|
Islam MM, Rahman MJ, Chandra Roy D, Tawabunnahar M, Jahan R, Ahmed NAMF, Maniruzzaman M. Machine learning algorithm for characterizing risks of hypertension, at an early stage in Bangladesh. Diabetes Metab Syndr 2021; 15:877-884. [PMID: 33892404 DOI: 10.1016/j.dsx.2021.03.035] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Revised: 03/24/2021] [Accepted: 03/31/2021] [Indexed: 12/30/2022]
Abstract
BACKGROUND AND AIMS Hypertension has become a major public health issue as the prevalence and risk of premature death and disability among adults due to hypertension has increased globally. The main objective is to characterize the risk factors of hypertension among adults in Bangladesh using machine learning (ML) algorithms. MATERIALS AND METHODS The hypertension data was derived from Bangladesh demographic and health survey, 2017-18, which included 6965 people aged 35 and above. Two most promising risk factor identification methods, namely least absolute shrinkage operator (LASSO) and support vector machine recursive feature elimination (SVMRFE) are implemented to detect the critical risk factors of hypertension. Additionally, four well-known ML algorithms as artificial neural network, decision tree, random forest, and gradient boosting (GB) have been used to predict hypertension. Performance scores of these algorithms were evaluated by accuracy, precision, recall, F-measure, and area under the curve (AUC). RESULTS The results clarify that age, BMI, wealth index, working status, and marital status for LASSO and age, BMI, marital status, diabetes and region for SVMRFE appear to be the top-most five significant risk factors for hypertension. Our findings reveal that the combination of SVMRFE-GB gives the maximum accuracy (66.98%), recall (97.92%), F-measure (78.99%), and AUC (0.669) compared to others. CONCLUSION GB-based algorithm confirms the best performer for prediction of hypertension, at an early stage in Bangladesh. Therefore, this study highly suggests that the policymakers make proper judgments for controlling hypertension using SVMRFE-GB-based combination to save time and reduce cost for Bangladeshi adults.
Collapse
Affiliation(s)
- Md Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Dulal Chandra Roy
- Department of Statistics, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Most Tawabunnahar
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh 2220, Bangladesh.
| | - Rubaiyat Jahan
- Institution of Education and Research, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - N A M Faisal Ahmed
- Institution of Education and Research, University of Rajshahi, Rajshahi 6205, Bangladesh.
| | - Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna 9208, Bangladesh.
| |
Collapse
|
12
|
Basu S, Johnson KT, Berkowitz SA. Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes. Curr Diab Rep 2020; 20:80. [PMID: 33270183 DOI: 10.1007/s11892-020-01353-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/26/2020] [Indexed: 12/12/2022]
Abstract
PURPOSE OF REVIEW Machine learning approaches-which seek to predict outcomes or classify patient features by recognizing patterns in large datasets-are increasingly applied to clinical epidemiology research on diabetes. Given its novelty and emergence in fields outside of biomedical research, machine learning terminology, techniques, and research findings may be unfamiliar to diabetes researchers. Our aim was to present the use of machine learning approaches in an approachable way, drawing from clinical epidemiological research in diabetes published from 1 Jan 2017 to 1 June 2020. RECENT FINDINGS Machine learning approaches using tree-based learners-which produce decision trees to help guide clinical interventions-frequently have higher sensitivity and specificity than traditional regression models for risk prediction. Machine learning approaches using neural networking and "deep learning" can be applied to medical image data, particularly for the identification and staging of diabetic retinopathy and skin ulcers. Among the machine learning approaches reviewed, researchers identified new strategies to develop standard datasets for rigorous comparisons across older and newer approaches, methods to illustrate how a machine learner was treating underlying data, and approaches to improve the transparency of the machine learning process. Machine learning approaches have the potential to improve risk stratification and outcome prediction for clinical epidemiology applications. Achieving this potential would be facilitated by use of universal open-source datasets for fair comparisons. More work remains in the application of strategies to communicate how the machine learners are generating their predictions.
Collapse
Affiliation(s)
- Sanjay Basu
- Center for Primary Care, Harvard Medical School, Boston, MA, USA.
- Research and Population Health, Collective Health, San Francisco, CA, USA.
- School of Public Health, Imperial College London, London, SW7, UK.
| | - Karl T Johnson
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Seth A Berkowitz
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
13
|
Automated Maintenance Data Classification Using Recurrent Neural Network: Enhancement by Spotted Hyena-Based Whale Optimization. MATHEMATICS 2020. [DOI: 10.3390/math8112008] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Data classification has been considered extensively in different fields, such as machine learning, artificial intelligence, pattern recognition, and data mining, and the expansion of classification has yielded immense achievements. The automatic classification of maintenance data has been investigated over the past few decades owing to its usefulness in construction and facility management. To utilize automated data classification in the maintenance field, a data classification model is implemented in this study based on the analysis of different mechanical maintenance data. The developed model involves four main steps: (a) data acquisition, (b) feature extraction, (c) feature selection, and (d) classification. During data acquisition, four types of dataset are collected from the benchmark Google datasets. The attributes of each dataset are further processed for classification. Principal component analysis and first-order and second-order statistical features are computed during the feature extraction process. To reduce the dimensions of the features for error-free classification, feature selection was performed. The hybridization of two algorithms, the Whale Optimization Algorithm (WOA) and Spotted Hyena Optimization (SHO), tends to produce a new algorithm—i.e., a Spotted Hyena-based Whale Optimization Algorithm (SH-WOA), which is adopted for performing feature selection. The selected features are subjected to a deep learning algorithm called Recurrent Neural Network (RNN). To enhance the efficiency of conventional RNNs, the number of hidden neurons in an RNN is optimized using the developed SH-WOA. Finally, the efficacy of the proposed model is verified utilizing the entire dataset. Experimental results show that the developed model can effectively solve uncertain data classification, which minimizes the execution time and enhances efficiency.
Collapse
|