Chen C, Zhang W, Yan G, Tang C. Identifying metabolic dysfunction-associated steatotic liver disease in patients with hypertension and pre-hypertension: An interpretable machine learning approach.
Digit Health 2024;
10:20552076241233135. [PMID:
38389508 PMCID:
PMC10883118 DOI:
10.1177/20552076241233135]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 01/30/2024] [Indexed: 02/24/2024] Open
Abstract
Objective
Metabolic dysfunction-associated steatotic liver disease (MASLD) is one of the most prevalent liver diseases and is associated with pre-hypertension and hypertension. Our research aims to develop interpretable machine learning (ML) models to accurately identify MASLD in hypertensive and pre-hypertensive populations.
Methods
The dataset for 4722 hypertensive and pre-hypertensive patients is from subjects in the NAGALA study. Six ML models, including the decision tree, K-nearest neighbor, gradient boosting, naive Bayes, support vector machine, and random forest (RF) models, were used in this study. The optimal model was constructed according to the performances of models evaluated by K-fold cross-validation (k = 5), the area under the receiver operating characteristic curve (AUC), average precision (AP), accuracy, sensitivity, specificity, and F1. Shapley additive explanation (SHAP) values were employed for both global and local interpretation of the model results.
Results
The prevalence of MASLD in hypertensive and pre-hypertensive patients was 44.3% (362 cases) and 28.3% (1107 cases), respectively. The RF model outperformed the other five models with an AUC of 0.889, AP of 0.800, accuracy of 0.819, sensitivity of 0.816, specificity of 0.821, and F1 of 0.729. According to the SHAP analysis, the top five important features were alanine aminotransferase, body mass index, waist circumference, high-density lipoprotein cholesterol, and total cholesterol. Further analysis of the feature selection in the RF model revealed that incorporating all features leads to optimal model performance.
Conclusions
ML algorithms, especially RF algorithm, improve the accuracy of MASLD identification, and the global and local interpretation of the RF model results enables us to intuitively understand how various features affect the chances of MASLD in patients with hypertension and pre-hypertension.
Collapse