1
|
Chang TH, Chen YD, Lu HHS, Wu JL, Mak K, Yu CS. Specific patterns and potential risk factors to predict 3-year risk of death among non-cancer patients with advanced chronic kidney disease by machine learning. Medicine (Baltimore) 2024; 103:e37112. [PMID: 38363886 PMCID: PMC10869094 DOI: 10.1097/md.0000000000037112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 01/09/2024] [Indexed: 02/18/2024] Open
Abstract
Chronic kidney disease (CKD) is a major public health concern. But there are limited machine learning studies on non-cancer patients with advanced CKD, and the results of machine learning studies on cancer patients with CKD may not apply directly on non-cancer patients. We aimed to conduct a comprehensive investigation of risk factors for a 3-year risk of death among non-cancer advanced CKD patients with an estimated glomerular filtration rate < 60.0 mL/min/1.73m2 by several machine learning algorithms. In this retrospective cohort study, we collected data from in-hospital and emergency care patients from 2 hospitals in Taiwan from 2009 to 2019, including their international classification of disease at admission and laboratory data from the hospital's electronic medical records (EMRs). Several machine learning algorithms were used to analyze the potential impact and degree of influence of each factor on mortality and survival. Data from 2 hospitals in northern Taiwan were collected with 6565 enrolled patients. After data cleaning, 26 risk factors and approximately 3887 advanced CKD patients from Shuang Ho Hospital were used as the training set. The validation set contained 2299 patients from Taipei Medical University Hospital. Predictive variables, such as albumin, PT-INR, and age, were the top 3 significant risk factors with paramount influence on mortality prediction. In the receiver operating characteristic curve, the random forest had the highest values for accuracy above 0.80. MLP, and Adaboost had better performance on sensitivity and F1-score compared to other methods. Additionally, SVM with linear kernel function had the highest specificity of 0.9983, while its sensitivity and F1-score were poor. Logistic regression had the best performance, with an area under the curve of 0.8527. Evaluating Taiwanese advanced CKD patients' EMRs could provide physicians with a good approximation of the patients' 3-year risk of death by machine learning algorithms.
Collapse
Affiliation(s)
- Tzu-Hao Chang
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan
| | - Yu-Da Chen
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, Taiwan
- School of Health Care Administration, College of Management, Taipei Medical University, Taipei, Taiwan
- Department of Family Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Henry Horng-Shing Lu
- Institute of Statistics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
- Institute of Data Science and Engineering, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
| | - Jenny L. Wu
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | | | - Cheng-Sheng Yu
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan
- Clinical Data Center, Office of Data Science, Taipei Medical University, Taipei, Taiwan
- Fintech RD Center, Nan Shan Life Insurance Co., Ltd
| |
Collapse
|