Xing Z, Chen H, Alman AC. Discriminating insulin resistance in middle-aged nondiabetic women using machine learning approaches.
AIMS Public Health 2024;
11:667-687. [PMID:
39027391 PMCID:
PMC11252584 DOI:
10.3934/publichealth.2024034]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 03/28/2024] [Accepted: 04/08/2024] [Indexed: 07/20/2024] Open
Abstract
Objective
We employed machine learning algorithms to discriminate insulin resistance (IR) in middle-aged nondiabetic women.
Methods
The data was from the National Health and Nutrition Examination Survey (2007-2018). The study subjects were 2084 nondiabetic women aged 45-64. The analysis included 48 predictors. We randomly divided the data into training (n = 1667) and testing (n = 417) datasets. Four machine learning techniques were employed to discriminate IR: extreme gradient boosting (XGBoosting), random forest (RF), gradient boosting machine (GBM), and decision tree (DT). The area under the curve (AUC) of receiver operating characteristic (ROC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score were compared as performance metrics to select the optimal technique.
Results
The XGBoosting algorithm achieved a relatively high AUC of 0.93 in the training dataset and 0.86 in the testing dataset to discriminate IR using 48 predictors and was followed by the RF, GBM, and DT models. After selecting the top five predictors to build models, the XGBoost algorithm with the AUC of 0.90 (training dataset) and 0.86 (testing dataset) remained the optimal prediction model. The SHapley Additive exPlanations (SHAP) values revealed the associations between the five predictors and IR, namely BMI (strongly positive impact on IR), fasting glucose (strongly positive), HDL-C (medium negative), triglycerides (medium positive), and glycohemoglobin (medium positive). The threshold values for identifying IR were 29 kg/m2, 100 mg/dL, 54.5 mg/dL, 89 mg/dL, and 5.6% for BMI, glucose, HDL-C, triglycerides, and glycohemoglobin, respectively.
Conclusion
The XGBoosting algorithm demonstrated superior performance metrics for discriminating IR in middle-aged nondiabetic women, with BMI, glucose, HDL-C, glycohemoglobin, and triglycerides as the top five predictors.
Collapse