1
|
Islam MM, Rahman MJ, Rabby MS, Alam MJ, Pollob SMAI, Ahmed NAMF, Tawabunnahar M, Roy DC, Shin J, Maniruzzaman M. Predicting the risk of diabetic retinopathy using explainable machine learning algorithms. Diabetes Metab Syndr 2023; 17:102919. [PMID: 38091881 DOI: 10.1016/j.dsx.2023.102919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 12/31/2023]
Abstract
BACKGROUND AND OBJECTIVE Diabetic retinopathy (DR) is a global health concern among diabetic patients. The objective of this study was to propose an explainable machine learning (ML)-based system for predicting the risk of DR. MATERIALS AND METHODS This study utilized publicly available cross-sectional data in a Chinese cohort of 6374 respondents. We employed boruta and least absolute shrinkage and selection operator (LASSO) based feature selection methods to identify the common predictors of DR. Using the identified predictors, we trained and optimized four widly applicable models (artificial neural network, support vector machine, random forest, and extreme gradient boosting (XGBoost) to predict patients with DR. Moreover, shapely additive explanation (SHAP) was adopted to show the contribution of each predictor of DR in the prediction. RESULTS Combining Boruta and LASSO method revealed that community, TCTG, HDLC, BUN, FPG, HbAlc, weight, and duration were the most important predictors of DR. The XGBoost-based model outperformed the other models, with an accuracy of 90.01%, precision of 91.80%, recall of 97.91%, F1 score of 94.86%, and AUC of 0.850. Moreover, SHAP method showed that HbA1c, community, FPG, TCTG, duration, and UA1b were the influencing predictors of DR. CONCLUSION The proposed integrating system will be helpful as a tool for selecting significant predictors, which can predict patients who are at high risk of DR at an early stage in China.
Collapse
Affiliation(s)
- Md Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh; Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Md Jahanur Rahman
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Md Symun Rabby
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Md Jahangir Alam
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | | | - N A M Faisal Ahmed
- Instutite of Education and Research, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Most Tawabunnahar
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh-2224, Bangladesh.
| | - Dulal Chandra Roy
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh.
| | - Junpil Shin
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, 965-8580, Fukushima, Japan.
| | - Md Maniruzzaman
- Statistics Discipline, Khulna University, Khulna-9208, Bangladesh.
| |
Collapse
|
2
|
Shi S, Gao L, Zhang J, Zhang B, Xiao J, Xu W, Tian Y, Ni L, Wu X. The automatic detection of diabetic kidney disease from retinal vascular parameters combined with clinical variables using artificial intelligence in type-2 diabetes patients. BMC Med Inform Decis Mak 2023; 23:241. [PMID: 37904184 PMCID: PMC10617171 DOI: 10.1186/s12911-023-02343-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND Diabetic kidney disease (DKD) has become the largest cause of end-stage kidney disease. Early and accurate detection of DKD is beneficial for patients. The present detection depends on the measurement of albuminuria or the estimated glomerular filtration rate, which is invasive and not optimal; therefore, new detection tools are urgently needed. Meanwhile, a close relationship between diabetic retinopathy and DKD has been reported; thus, we aimed to develop a novel detection algorithm for DKD using artificial intelligence technology based on retinal vascular parameters combined with several easily available clinical parameters in patients with type-2 diabetes. METHODS A total of 515 consecutive patients with type-2 diabetes mellitus from Xiangyang Central Hospital were included. Patients were stratified by DKD diagnosis and split randomly into either the training set (70%, N = 360) or the testing set (30%, N = 155) (random seed = 1). Data from the training set were used to develop the machine learning algorithm (MLA), while those from the testing set were used to validate the MLA. Model performances were evaluated. RESULTS The MLA using the random forest classifier presented optimal performance compared with other classifiers. When validated, the accuracy, sensitivity, specificity, F1 score, and AUC for the optimal model were 84.5%(95% CI 83.3-85.7), 84.5%(82.3-86.7), 84.5%(82.7-86.3), 0.845(0.831-0.859), and 0.914(0.903-0.925), respectively. CONCLUSIONS A new machine learning algorithm for DKD diagnosis based on fundus images and 8 easily available clinical parameters was developed, which indicated that retinal vascular changes can assist in DKD screening and detection.
Collapse
Affiliation(s)
- Shaomin Shi
- Department of Nephrology, Zhongnan Hospital of Wuhan University, 169 Donghu Road, Wuhan, 430071, Hubei, China
- Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, 441000, China
| | - Ling Gao
- Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, 441000, China
| | - Juan Zhang
- Department of Nephrology, Zhongnan Hospital of Wuhan University, 169 Donghu Road, Wuhan, 430071, Hubei, China
| | - Baifang Zhang
- Department of Biochemistry, Wuhan University TaiKang Medical School (School of Basic Medical Sciences), Wuhan, 430071, Hubei, China
| | - Jing Xiao
- Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, 441000, China
| | - Wan Xu
- Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, 441000, China
| | - Yuan Tian
- Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, Hubei, 441000, China.
| | - Lihua Ni
- Department of Nephrology, Zhongnan Hospital of Wuhan University, 169 Donghu Road, Wuhan, 430071, Hubei, China.
| | - Xiaoyan Wu
- Department of Nephrology, Zhongnan Hospital of Wuhan University, 169 Donghu Road, Wuhan, 430071, Hubei, China.
- Department of General Practice, Zhongnan Hospital of Wuhan University, 169 Donghu Road, Wuhan, 430071, Hubei, China.
| |
Collapse
|
3
|
Narwane SV, Sawarkar SD. Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction. Diabetes Metab Syndr 2022; 16:102609. [PMID: 36099677 DOI: 10.1016/j.dsx.2022.102609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 08/21/2022] [Accepted: 08/23/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND AND AIMS Healthcare is a sensitive sector, and addressing the class imbalance in the healthcare domain is a time-consuming task for machine learning-based systems due to the vast amount of data. This study looks into the impact of socioeconomic disparities on the healthcare data of diabetic patients to make accurate disease predictions. METHODS This study proposed a systematic approach of Closest Distance Ranking and Principal Component Analysis to deal with the unbalanced dataset. A typical machine learning technique was used to analyze the proposed approach. The data set of pregnant diabetic women is analysed for accurate detection. RESULTS The results of the case are analysed using sensitivity, which demonstrates that the minority class's lack of information makes it impossible to forecast the results. On the other hand, the unbalanced dataset was treated using the proposed technique and evaluated with the machine learning algorithm which significantly increased the performance of the system. CONCLUSION The performance of the machine learning-based system was significantly enhanced by the unbalanced dataset which was processed with the proposed technique and evaluated with the machine learning algorithm. For the first time, an unbalanced dataset was treated with a combination of Closest Distance Ranking and Principal Component Analysis.
Collapse
Affiliation(s)
- Swati V Narwane
- Department of Computer Engineering, Datta Meghe College of Engineering, Navi Mumbai, Pin Code: 400 708, India.
| | - Sudhir D Sawarkar
- Department of Computer Engineering, Datta Meghe College of Engineering, Navi Mumbai, Pin Code: 400 708, India.
| |
Collapse
|