1
|
Li Y, Du C, Ge S, Zhang R, Shao Y, Chen K, Li Z, Ma F. Hematoma expansion prediction based on SMOTE and XGBoost algorithm. BMC Med Inform Decis Mak 2024; 24:172. [PMID: 38898499 PMCID: PMC11186182 DOI: 10.1186/s12911-024-02561-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 05/30/2024] [Indexed: 06/21/2024] Open
Abstract
Hematoma expansion (HE) is a high risky symptom with high rate of occurrence for patients who have undergone spontaneous intracerebral hemorrhage (ICH) after a major accident or illness. Correct prediction of the occurrence of HE in advance is critical to help the doctors to determine the next step medical treatment. Most existing studies focus only on the occurrence of HE within 6 h after the occurrence of ICH, while in reality a considerable number of patients have HE after the first 6 h but within 24 h. In this study, based on the medical doctors recommendation, we focus on prediction of the occurrence of HE within 24 h, as well as the occurrence of HE every 6 h within 24 h. Based on the demographics and computer tomography (CT) image extraction information, we used the XGBoost method to predict the occurrence of HE within 24 h. In this study, to solve the issue of highly imbalanced data set, which is a frequent case in medical data analysis, we used the SMOTE algorithm for data augmentation. To evaluate our method, we used a data set consisting of 582 patients records, and compared the results of proposed method as well as few machine learning methods. Our experiments show that XGBoost achieved the best prediction performance on the balanced dataset processed by the SMOTE algorithm with an accuracy of 0.82 and F1-score of 0.82. Moreover, our proposed method predicts the occurrence of HE within 6, 12, 18 and 24 h at the accuracy of 0.89, 0.82, 0.87 and 0.94, indicating that the HE occurrence within 24 h can be predicted accurately by the proposed method.
Collapse
Affiliation(s)
- Yan Li
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Chaonan Du
- Department of Neurosurgery, Affiliated Jinling Hospital, Medical School of Nanjing University, Nanjing, China
| | - Sikai Ge
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Ruonan Zhang
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Yiming Shao
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Keyu Chen
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Zhepeng Li
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Fei Ma
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China.
| |
Collapse
|
2
|
Yu M, Yuan Z, Li R, Shi B, Wan D, Dong X. Interpretable machine learning model to predict surgical difficulty in laparoscopic resection for rectal cancer. Front Oncol 2024; 14:1337219. [PMID: 38380369 PMCID: PMC10878416 DOI: 10.3389/fonc.2024.1337219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 01/15/2024] [Indexed: 02/22/2024] Open
Abstract
Background Laparoscopic total mesorectal excision (LaTME) is standard surgical methods for rectal cancer, and LaTME operation is a challenging procedure. This study is intended to use machine learning to develop and validate prediction models for surgical difficulty of LaTME in patients with rectal cancer and compare these models' performance. Methods We retrospectively collected the preoperative clinical and MRI pelvimetry parameter of rectal cancer patients who underwent laparoscopic total mesorectal resection from 2017 to 2022. The difficulty of LaTME was defined according to the scoring criteria reported by Escal. Patients were randomly divided into training group (80%) and test group (20%). We selected independent influencing features using the least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression method. Adopt synthetic minority oversampling technique (SMOTE) to alleviate the class imbalance problem. Six machine learning model were developed: light gradient boosting machine (LGBM); categorical boosting (CatBoost); extreme gradient boost (XGBoost), logistic regression (LR); random forests (RF); multilayer perceptron (MLP). The area under receiver operating characteristic curve (AUROC), accuracy, sensitivity, specificity and F1 score were used to evaluate the performance of the model. The Shapley Additive Explanations (SHAP) analysis provided interpretation for the best machine learning model. Further decision curve analysis (DCA) was used to evaluate the clinical manifestations of the model. Results A total of 626 patients were included. LASSO regression analysis shows that tumor height, prognostic nutrition index (PNI), pelvic inlet, pelvic outlet, sacrococcygeal distance, mesorectal fat area and angle 5 (the angle between the apex of the sacral angle and the lower edge of the pubic bone) are the predictor variables of the machine learning model. In addition, the correlation heatmap shows that there is no significant correlation between these seven variables. When predicting the difficulty of LaTME surgery, the XGBoost model performed best among the six machine learning models (AUROC=0.855). Based on the decision curve analysis (DCA) results, the XGBoost model is also superior, and feature importance analysis shows that tumor height is the most important variable among the seven factors. Conclusions This study developed an XGBoost model to predict the difficulty of LaTME surgery. This model can help clinicians quickly and accurately predict the difficulty of surgery and adopt individualized surgical methods.
Collapse
Affiliation(s)
| | | | | | | | - Daiwei Wan
- Department of General Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| | - Xiaoqiang Dong
- Department of General Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
| |
Collapse
|
3
|
Ge J, Ji Y, Wang F, Zhou X, Wei J, Qi C. Correlation Between Cystatin C and the Severity of Cardiac Dysfunction in Patients with Systolic Heart Failure. Risk Manag Healthc Policy 2023; 16:2419-2426. [PMID: 38024499 PMCID: PMC10655600 DOI: 10.2147/rmhp.s437678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 11/02/2023] [Indexed: 12/01/2023] Open
Abstract
Introduction To investigate the relationship between cystatin C and cardiac dysfunction severity in patients with systolic heart failure. Methods We recruited 100 hospitalized patients with systolic heart failure and 100 age-gender-matched controls. The clinical information of each patient was collected. Blood pressure, heart rate, height, and weight were measured, as were serum concentrations of cholesterol, renal function indices, cystatin C, and B-type natriuretic peptide (BNP). Transthoracic echocardiography was performed on each patient. Results Cystatin C and other indices of renal function, such as urea nitrogen, creatinine, and uric acid, were significantly elevated in the serum of patients with heart failure and those with more severe cardiac dysfunction. The stepwise regression analyses showed that cystatin C was positively associated with BNP (β = 0.18, P = 0.04, 95% CI: 21.1 ~ 1420.4) and left atrial diameter (LAD) (β = 0.19, P = 0.04, 95% CI: 0.03 ~ 9.21) and was negatively associated with ejection fraction (β = -0.22, P = 0.023, 95% CI: -12.4 ~ -0.93), while creatinine was only positively correlated with BNP (β = 0.23, P = 0.03, 95% CI: 1.11 ~ 20.7). The Receiver Operating Characteristic (ROC) curves demonstrated significantly more severe cardiac dysfunction (NYHA III/IV) in patients with cystatin C ≥ 0.895mg/L (sensitivity was 83.0%, specificity was 80.9%, AUC = 0.893) and creatinine ≥ 91.5μmol/L (sensitivity was 71.7%, specificity was 70.2%, AUC = 0.764). Conclusion Cystatin C was significantly correlated with cardiac structure and function in patients with systolic heart failure, and it was more valuable than creatinine to evaluate the severity of heart failure.
Collapse
Affiliation(s)
- Jiyong Ge
- Department of Cardiology, The Affiliated Changzhou Second People’s Hospital of Nanjing Medical University, Changzhou, Jiangsu, 213003, People’s Republic of China
| | - Yuan Ji
- Department of Cardiology, The Affiliated Changzhou Second People’s Hospital of Nanjing Medical University, Changzhou, Jiangsu, 213003, People’s Republic of China
| | - Fangfang Wang
- Department of Cardiology, The Affiliated Changzhou Second People’s Hospital of Nanjing Medical University, Changzhou, Jiangsu, 213003, People’s Republic of China
| | - Xuejun Zhou
- Department of Cardiology, The Affiliated Changzhou Second People’s Hospital of Nanjing Medical University, Changzhou, Jiangsu, 213003, People’s Republic of China
| | - Jiazhan Wei
- Department of Cardiology, The Affiliated Changzhou Second People’s Hospital of Nanjing Medical University, Changzhou, Jiangsu, 213003, People’s Republic of China
| | - Chunjian Qi
- Oncology Institute, The Affiliated Changzhou Second People’s Hospital of Nanjing Medical University, Changzhou, Jiangsu, 213003, People’s Republic of China
| |
Collapse
|
4
|
Zhang Y, Wang H, Yin C, Shu T, Yu J, Jian J, Jian C, Duan M, Kadier K, Xu Q, Wang X, Xiang T, Liu X. Development of a prediction model for the risk of 30-day unplanned readmission in older patients with heart failure: A multicenter retrospective study. Nutr Metab Cardiovasc Dis 2023; 33:1878-1887. [PMID: 37500347 DOI: 10.1016/j.numecd.2023.05.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/21/2023] [Accepted: 05/31/2023] [Indexed: 07/29/2023]
Abstract
BACKGROUND AND AIM Heart failure (HF) imposes significant global health costs due to its high incidence, readmission, and mortality rate. Accurate assessment of readmission risk and precise interventions have become important measures to improve health for patients with HF. Therefore, this study aimed to develop a machine learning (ML) model to predict 30-day unplanned readmissions in older patients with HF. METHODS AND RESULTS This study collected data on hospitalized older patients with HF from the medical data platform of Chongqing Medical University from January 1, 2012, to December 31, 2021. A total of 5 candidate algorithms were selected from 15 ML algorithms with excellent performance, which was evaluated by area under the operating characteristic curve (AUC) and accuracy. Then, the 5 candidate algorithms were hyperparameter tuned by 5-fold cross-validation grid search, and performance was evaluated by AUC, accuracy, sensitivity, specificity, and recall. Finally, an optimal ML model was constructed, and the predictive results were explained using the SHapley Additive exPlanations (SHAP) framework. A total of 14,843 older patients with HF were consecutively enrolled. CatBoost model was selected as the best prediction model, and AUC was 0.732, with 0.712 accuracy, 0.619 sensitivity, and 0.722 specificity. NT.proBNP, length of stay (LOS), triglycerides, blood phosphorus, blood potassium, and lactate dehydrogenase had the greatest effect on 30-day unplanned readmission in older patients with HF, according to SHAP results. CONCLUSIONS The study developed a CatBoost model to predict the risk of unplanned 30-day special-cause readmission in older patients with HF, which showed more significant performance compared with the traditional logistic regression model.
Collapse
Affiliation(s)
- Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Haolin Wang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Chengliang Yin
- Faculty of Medicine, Macau University of Science and Technology, 999078, Macau, China
| | - Tingting Shu
- Army Medical University (Third Military Medical University), Chongqing, China
| | - Jie Yu
- Department of Medical Imaging, The Affiliated Taian City Central Hospital of Qingdao University, Taian 271000, China
| | - Jie Jian
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Chang Jian
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Minjie Duan
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Kaisaierjiang Kadier
- Department of Cardiology, First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China
| | - Qian Xu
- Collection Development Department of Library, Chongqing Medical University, Chongqing, China
| | - Xueer Wang
- College of Oncology, Guangxi Medical University, Nanning 530022, China
| | - Tianyu Xiang
- Information Center, The University-Town Hospital of Chongqing Medical University, Chongqing, China.
| | - Xiaozhu Liu
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China.
| |
Collapse
|
5
|
Kim R, Suresh K, Rosenberg MA, Tan MS, Malone DC, Allen LA, Kao DP, Anderson HD, Tiwari P, Trinkley KE. A machine learning evaluation of patient characteristics associated with prescribing of guideline-directed medical therapy for heart failure. Front Cardiovasc Med 2023; 10:1169574. [PMID: 37416920 PMCID: PMC10321403 DOI: 10.3389/fcvm.2023.1169574] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 06/01/2023] [Indexed: 07/08/2023] Open
Abstract
Introduction/background Patients with heart failure and reduced ejection fraction (HFrEF) are consistently underprescribed guideline-directed medications. Although many barriers to prescribing are known, identification of these barriers has relied on traditional a priori hypotheses or qualitative methods. Machine learning can overcome many limitations of traditional methods to capture complex relationships in data and lead to a more comprehensive understanding of the underpinnings driving underprescribing. Here, we used machine learning methods and routinely available electronic health record data to identify predictors of prescribing. Methods We evaluated the predictive performance of machine learning algorithms to predict prescription of four types of medications for adults with HFrEF: angiotensin converting enzyme inhibitor/angiotensin receptor blocker (ACE/ARB), angiotensin receptor-neprilysin inhibitor (ARNI), evidence-based beta blocker (BB), or mineralocorticoid receptor antagonist (MRA). The models with the best predictive performance were used to identify the top 20 characteristics associated with prescribing each medication type. Shapley values were used to provide insight into the importance and direction of the predictor relationships with medication prescribing. Results For 3,832 patients meeting the inclusion criteria, 70% were prescribed an ACE/ARB, 8% an ARNI, 75% a BB, and 40% an MRA. The best-predicting model for each medication type was a random forest (area under the curve: 0.788-0.821; Brier score: 0.063-0.185). Across all medications, top predictors of prescribing included prescription of other evidence-based medications and younger age. Unique to prescribing an ARNI, the top predictors included lack of diagnoses of chronic kidney disease, chronic obstructive pulmonary disease, or hypotension, as well as being in a relationship, nontobacco use, and alcohol use. Discussion/conclusions We identified multiple predictors of prescribing for HFrEF medications that are being used to strategically design interventions to address barriers to prescribing and to inform further investigations. The machine learning approach used in this study to identify predictors of suboptimal prescribing can also be used by other health systems to identify and address locally relevant gaps and solutions to prescribing.
Collapse
Affiliation(s)
- Rachel Kim
- School of Medicine, University of Colorado Medical Campus, Aurora, CO, United States
| | - Krithika Suresh
- Department of Biostatistics and Informatics, Colorado School of Public Health, Aurora, CO, United States
| | - Michael A. Rosenberg
- School of Medicine, University of Colorado Medical Campus, Aurora, CO, United States
| | - Malinda S. Tan
- Department of Pharmacotherapy, University of Utah, Salt Lake City, UT, United States
| | - Daniel C. Malone
- Department of Pharmacotherapy, University of Utah, Salt Lake City, UT, United States
| | - Larry A. Allen
- School of Medicine, University of Colorado Medical Campus, Aurora, CO, United States
- Adult and Child Consortium for Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - David P. Kao
- School of Medicine, University of Colorado Medical Campus, Aurora, CO, United States
- Department of Clinical Informatics, UCHealth, Aurora, CO, United States
| | - Heather D. Anderson
- Department of Clinical Pharmacy, University of Colorado Anschutz Medical Campus Skaggs School of Pharmacy and Pharmaceutical Sciences, Aurora, CO, United States
| | - Premanand Tiwari
- School of Medicine, University of Colorado Medical Campus, Aurora, CO, United States
| | - Katy E. Trinkley
- School of Medicine, University of Colorado Medical Campus, Aurora, CO, United States
- Department of Clinical Informatics, UCHealth, Aurora, CO, United States
- Department of Clinical Pharmacy, University of Colorado Anschutz Medical Campus Skaggs School of Pharmacy and Pharmaceutical Sciences, Aurora, CO, United States
| |
Collapse
|
6
|
Eysenbach G, Chao HJ, Chiang YC, Chen HY. Explainable Machine Learning Techniques To Predict Amiodarone-Induced Thyroid Dysfunction Risk: Multicenter, Retrospective Study With External Validation. J Med Internet Res 2023; 25:e43734. [PMID: 36749620 PMCID: PMC9944157 DOI: 10.2196/43734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/25/2022] [Accepted: 01/16/2023] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Machine learning offers new solutions for predicting life-threatening, unpredictable amiodarone-induced thyroid dysfunction. Traditional regression approaches for adverse-effect prediction without time-series consideration of features have yielded suboptimal predictions. Machine learning algorithms with multiple data sets at different time points may generate better performance in predicting adverse effects. OBJECTIVE We aimed to develop and validate machine learning models for forecasting individualized amiodarone-induced thyroid dysfunction risk and to optimize a machine learning-based risk stratification scheme with a resampling method and readjustment of the clinically derived decision thresholds. METHODS This study developed machine learning models using multicenter, delinked electronic health records. It included patients receiving amiodarone from January 2013 to December 2017. The training set was composed of data from Taipei Medical University Hospital and Wan Fang Hospital, while data from Taipei Medical University Shuang Ho Hospital were used as the external test set. The study collected stationary features at baseline and dynamic features at the first, second, third, sixth, ninth, 12th, 15th, 18th, and 21st months after amiodarone initiation. We used 16 machine learning models, including extreme gradient boosting, adaptive boosting, k-nearest neighbor, and logistic regression models, along with an original resampling method and 3 other resampling methods, including oversampling with the borderline-synthesized minority oversampling technique, undersampling-edited nearest neighbor, and over- and undersampling hybrid methods. The model performance was compared based on accuracy; Precision, recall, F1-score, geometric mean, area under the curve of the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPRC). Feature importance was determined by the best model. The decision threshold was readjusted to identify the best cutoff value and a Kaplan-Meier survival analysis was performed. RESULTS The training set contained 4075 patients from Taipei Medical University Hospital and Wan Fang Hospital, of whom 583 (14.3%) developed amiodarone-induced thyroid dysfunction, while the external test set included 2422 patients from Taipei Medical University Shuang Ho Hospital, of whom 275 (11.4%) developed amiodarone-induced thyroid dysfunction. The extreme gradient boosting oversampling machine learning model demonstrated the best predictive outcomes among all 16 models. The accuracy; Precision, recall, F1-score, G-mean, AUPRC, and AUROC were 0.923, 0.632, 0.756, 0.688, 0.845, 0.751, and 0.934, respectively. After readjusting the cutoff, the best value was 0.627, and the F1-score reached 0.699. The best threshold was able to classify 286 of 2422 patients (11.8%) as high-risk subjects, among which 275 were true-positive patients in the testing set. A shorter treatment duration; higher levels of thyroid-stimulating hormone and high-density lipoprotein cholesterol; and lower levels of free thyroxin, alkaline phosphatase, and low-density lipoprotein were the most important features. CONCLUSIONS Machine learning models combined with resampling methods can predict amiodarone-induced thyroid dysfunction and serve as a support tool for individualized risk prediction and clinical decision support.
Collapse
Affiliation(s)
| | - Horng-Jiun Chao
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan
| | - Yi-Chun Chiang
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| | - Hsiang-Yin Chen
- Department of Clinical Pharmacy, School of Pharmacy, Taipei Medical University, Taipei, Taiwan.,Department of Pharmacy, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
7
|
Song W, Liu Y, Qiu L, Qing J, Li A, Zhao Y, Li Y, Li R, Zhou X. Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province. Front Med (Lausanne) 2023; 9:930541. [PMID: 36698845 PMCID: PMC9868668 DOI: 10.3389/fmed.2022.930541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 12/19/2022] [Indexed: 01/11/2023] Open
Abstract
Introduction Chronic kidney disease (CKD) is a progressive disease with high incidence but early imperceptible symptoms. Since China's rural areas are subject to inadequate medical check-ups and single disease screening programme, it could easily translate into end-stage renal failure. This study aimed to construct an early warning model for CKD tailored to impoverished areas by employing machine learning (ML) algorithms with easily accessible parameters from ten rural areas in Shanxi Province, thereby, promoting a forward shift of treatment time and improving patients' quality of life. Methods From April to November 2019, CKD opportunistic screening was carried out in 10 rural areas in Shanxi Province. First, general information, physical examination data, blood and urine specimens were collected from 13,550 subjects. Afterward, feature selection of explanatory variables was performed using LASSO regression, and target datasets were balanced using the SMOTE (synthetic minority over-sampling technique) algorithm, i.e., albuminuria-to-creatinine ratio (ACR) and α1-microglobulin-to-creatinine ratio (MCR). Next, Bagging, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were employed for classification of ACR outcomes and MCR outcomes, respectively. Results 12,330 rural residents were included in this study, with 20 explanatory variables. The cases with increased ACR and increased MCR represented 1,587 (12.8%) and 1,456 (11.8%), respectively. After conducting LASSO, 14 and 15 explanatory variables remained in these two datasets, respectively. Bagging, RF, and XGBoost performed well in classification, with the AUC reaching 0.74, 0.87, 0.87, 0.89 for ACR outcomes and 0.75, 0.88, 0.89, 0.90 for MCR outcomes. The five variables contributing most to the classification of ACR outcomes and MCR outcomes constituted SBP, TG, TC, and Hcy, DBP and age, TG, SBP, Hcy and FPG, respectively. Overall, the machine learning algorithms could emerge as a warning model for CKD. Conclusion ML algorithms in conjunction with rural accessible indexes boast good performance in classification, which allows for an early warning model for CKD. This model could help achieve large-scale population screening for CKD in poverty-stricken areas and should be promoted to improve the quality of life and reduce the mortality rate.
Collapse
Affiliation(s)
- Wenzhu Song
- School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Yanfeng Liu
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Lixia Qiu
- School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Jianbo Qing
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Aizhong Li
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China
| | - Yan Zhao
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Yafeng Li
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China,Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China,Core Laboratory, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China,Academy of Microbial Ecology, Shanxi Medical University, Taiyuan, China
| | - Rongshan Li
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China,Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China,*Correspondence: Rongshan Li,
| | - Xiaoshuang Zhou
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China,Xiaoshuang Zhou,
| |
Collapse
|
8
|
Adeoye J, Zheng LW, Thomson P, Choi SW, Su YX. Explainable ensemble learning model improves identification of candidates for oral cancer screening. Oral Oncol 2023; 136:106278. [PMID: 36525782 DOI: 10.1016/j.oraloncology.2022.106278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 11/26/2022] [Accepted: 12/06/2022] [Indexed: 12/15/2022]
Abstract
OBJECTIVES Artificial intelligence could enhance the use of disparate risk factors (crude method) for better stratification of patients to be screened for oral cancer. This study aims to construct a meta-classifier that considers diverse risk factors to identify patients at risk of oral cancer and other suspicious oral diseases for targeted screening. MATERIALS AND METHODS A retrospective dataset from a community oral cancer screening program was used to construct and train the novel voting meta-classifier. Comprehensive risk factor information from this dataset was used as input features for eleven supervised learning algorithms which served as base learners and provided predicted probabilities that are weighted and aggregated by the meta-classifier. Training dataset was augmented using SMOTE-ENN. Additionally, Shapley additive explanations (SHAP) values were generated to implement the explainability of the model and display the important risk factors. RESULTS Our meta-classifier had an internal validation recall, specificity, and AUROC of 0.83, 0.86, and 0.85 for identifying the risk of oral cancer and 0.92, 0.60, and 0.76 for identifying suspicious oral mucosal disease respectively. Upon external validation, the meta-classifier had a significantly higher AUROC than the crude/current method used for identifying the risk of oral cancer (0.78 vs 0.46; p = 0.001) Also, the meta-classifier had better recall than the crude method for predicting the risk of suspicious oral mucosal diseases (0.78 vs 0.47). CONCLUSION Overall, these findings showcase that our approach optimizes the use of risk factors in identifying patients for oral screening which suggests potential clinical application.
Collapse
Affiliation(s)
- John Adeoye
- Division of Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong, China
| | - Li-Wu Zheng
- Division of Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong, China
| | - Peter Thomson
- College of Medicine and Dentistry, James Cook University, Cairns, Queensland, Australia
| | - Siu-Wai Choi
- Division of Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong, China
| | - Yu-Xiong Su
- Division of Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong, China.
| |
Collapse
|
9
|
Zhang X, Gavaldà R, Baixeries J. Interpretable prediction of mortality in liver transplant recipients based on machine learning. Comput Biol Med 2022; 151:106188. [PMID: 36306583 DOI: 10.1016/j.compbiomed.2022.106188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/24/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]
Abstract
BACKGROUND Accurate prediction of the mortality of post-liver transplantation is an important but challenging task. It relates to optimizing organ allocation and estimating the risk of possible dysfunction. Existing risk scoring models, such as the Balance of Risk (BAR) score and the Survival Outcomes Following Liver Transplantation (SOFT) score, do not predict the mortality of post-liver transplantation with sufficient accuracy. In this study, we evaluate the performance of machine learning models and establish an explainable machine learning model for predicting mortality in liver transplant recipients. METHOD The optimal feature set for the prediction of the mortality was selected by a wrapper method based on binary particle swarm optimization (BPSO). With the selected optimal feature set, seven machine learning models were applied to predict mortality over different time windows. The best-performing model was used to predict mortality through a comprehensive comparison and evaluation. An interpretable approach based on machine learning and SHapley Additive exPlanations (SHAP) is used to explicitly explain the model's decision and make new discoveries. RESULTS With regard to predictive power, our results demonstrated that the feature set selected by BPSO outperformed both the feature set in the existing risk score model (BAR score, SOFT score) and the feature set processed by principal component analysis (PCA). The best-performing model, extreme gradient boosting (XGBoost), was found to improve the Area Under a Curve (AUC) values for mortality prediction by 6.7%, 11.6%, and 17.4% at 3 months, 3 years, and 10 years, respectively, compared to the SOFT score. The main predictors of mortality and their impact were discussed for different age groups and different follow-up periods. CONCLUSIONS Our analysis demonstrates that XGBoost can be an ideal method to assess the mortality risk in liver transplantation. In combination with the SHAP approach, the proposed framework provides a more intuitive and comprehensive interpretation of the predictive model, thereby allowing the clinician to better understand the decision-making process of the model and the impact of factors associated with mortality risk in liver transplantation.
Collapse
Affiliation(s)
- Xiao Zhang
- Department of Computer Science, Universitat Politècnica de Catalunya, Barcelona, 08034, Spain.
| | | | - Jaume Baixeries
- Department of Computer Science, Universitat Politècnica de Catalunya, Barcelona, 08034, Spain
| |
Collapse
|
10
|
Masukawa K, Aoyama M, Yokota S, Nakamura J, Ishida R, Nakayama M, Miyashita M. Machine learning models to detect social distress, spiritual pain, and severe physical psychological symptoms in terminally ill patients with cancer from unstructured text data in electronic medical records. Palliat Med 2022; 36:1207-1216. [PMID: 35773973 DOI: 10.1177/02692163221105595] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
BACKGROUND Few studies have developed automatic systems for identifying social distress, spiritual pain, and severe physical and phycological symptoms from text data in electronic medical records. AIM To develop models to detect social distress, spiritual pain, and severe physical and psychological symptoms in terminally ill patients with cancer from unstructured text data contained in electronic medical records. DESIGN A retrospective study of 1,554,736 narrative clinical records was analyzed 1 month before patients died. Supervised machine learning models were trained to detect comprehensive symptoms, and the performance of the models was tested using the area under the receiver operating characteristic curve (AUROC) and precision recall curve (AUPRC). SETTING/PARTICIPANTS A total of 808 patients was included in the study using records obtained from a university hospital in Japan between January 1, 2018 and December 31, 2019. As training data, we used medical records labeled for detecting social distress (n = 10,000) and spiritual pain (n = 10,000), and records that could be combined with the Support Team Assessment Schedule (based on date) for detecting severe physical/psychological symptoms (n = 5409). RESULTS Machine learning models for detecting social distress had AUROC and AUPRC values of 0.98 and 0.61, respectively; values for spiritual pain, were 0.90 and 0.58, respectively. The machine learning models accurately identified severe symptoms (pain, dyspnea, nausea, insomnia, and anxiety) with a high level of discrimination (AUROC > 0.8). CONCLUSION The machine learning models could detect social distress, spiritual pain, and severe symptoms in terminally ill patients with cancer from text data contained in electronic medical records.
Collapse
Affiliation(s)
- Kento Masukawa
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Maho Aoyama
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Shinichiroh Yokota
- Faculty of Medicine, The University of Tokyo, Hongo, Tokyo, Japan.,Department of Healthcare Information Management, The University of Tokyo Hospital, Hongo, Tokyo, Japan
| | - Jyunya Nakamura
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Ryoka Ishida
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Masaharu Nakayama
- Department of Medical Informatics, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Mitsunori Miyashita
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| |
Collapse
|
11
|
Song W, Zhou X, Duan Q, Wang Q, Li Y, Li A, Zhou W, Sun L, Qiu L, Li R, Li Y. Using random forest algorithm for glomerular and tubular injury diagnosis. Front Med (Lausanne) 2022; 9:911737. [PMID: 35966858 PMCID: PMC9366016 DOI: 10.3389/fmed.2022.911737] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Accepted: 07/04/2022] [Indexed: 11/16/2022] Open
Abstract
Objectives Chronic kidney disease (CKD) is a common chronic condition with high incidence and insidious onset. Glomerular injury (GI) and tubular injury (TI) represent early manifestations of CKD and could indicate the risk of its development. In this study, we aimed to classify GI and TI using three machine learning algorithms to promote their early diagnosis and slow the progression of CKD. Methods Demographic information, physical examination, blood, and morning urine samples were first collected from 13,550 subjects in 10 counties in Shanxi province for classification of GI and TI. Besides, LASSO regression was employed for feature selection of explanatory variables, and the SMOTE (synthetic minority over-sampling technique) algorithm was used to balance target datasets, i.e., GI and TI. Afterward, Random Forest (RF), Naive Bayes (NB), and logistic regression (LR) were constructed to achieve classification of GI and TI, respectively. Results A total of 12,330 participants enrolled in this study, with 20 explanatory variables. The number of patients with GI, and TI were 1,587 (12.8%) and 1,456 (11.8%), respectively. After feature selection by LASSO, 14 and 15 explanatory variables remained in these two datasets. Besides, after SMOTE, the number of patients and normal ones were 6,165, 6,165 for GI, and 6,165, 6,164 for TI, respectively. RF outperformed NB and LR in terms of accuracy (78.14, 80.49%), sensitivity (82.00, 84.60%), specificity (74.29, 76.09%), and AUC (0.868, 0.885) for both GI and TI; the four variables contributing most to the classification of GI and TI represented SBP, DBP, sex, age and age, SBP, FPG, and GHb, respectively. Conclusion RF boasts good performance in classifying GI and TI, which allows for early auxiliary diagnosis of GI and TI, thus facilitating to help alleviate the progression of CKD, and enjoying great prospects in clinical practice.
Collapse
Affiliation(s)
- Wenzhu Song
- School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Xiaoshuang Zhou
- Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Qi Duan
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China
| | - Qian Wang
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China
| | - Yaheng Li
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China
| | - Aizhong Li
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China
| | - Wenjing Zhou
- School of Medical Sciences, Shanxi University of Chinese Medicine, Jinzhong, China
| | - Lin Sun
- College of Traditional Chinese Medicine and Food Engineering, Shanxi University of Chinese Medicine, Jinzhong, China
| | - Lixia Qiu
- School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Rongshan Li
- Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China.,Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China
| | - Yafeng Li
- Department of Nephrology, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China.,Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China.,Core Laboratory, Shanxi Provincial People's Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China.,Academy of Microbial Ecology, Shanxi Medical University, Taiyuan, China
| |
Collapse
|
12
|
Wang K, Tian J, Zheng C, Yang H, Ren J, Liu Y, Han Q, Zhang Y. Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med 2021; 137:104813. [PMID: 34481185 DOI: 10.1016/j.compbiomed.2021.104813] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/25/2021] [Accepted: 08/25/2021] [Indexed: 01/14/2023]
Abstract
BACKGROUND This study sought to evaluate the performance of machine learning (ML) models and establish an explainable ML model with good prediction of 3-year all-cause mortality in patients with heart failure (HF) caused by coronary heart disease (CHD). METHODS We established six ML models using follow-up data to predict 3-year all-cause mortality. Through comprehensive evaluation, the best performing model was used to predict and stratify patients. The log-rank test was used to assess the difference between Kaplan-Meier curves. The association between ML risk and 3-year all-cause mortality was also assessed using multivariable Cox regression. Finally, an explainable approach based on ML and the SHapley Additive exPlanations (SHAP) method was deployed to calculate 3-year all-cause mortality risk and to generate individual explanations of the model's decisions. RESULTS The best performing extreme gradient boosting (XGBoost) model was selected to predict and stratify patients. Subjects with a higher ML score had a high hazard of suffering events (hazard ratio [HR]: 10.351; P < 0.001), and this relationship persisted with a multivariable analysis (adjusted HR: 5.343; P < 0.001). Age, N-terminal pro-B-type natriuretic peptide, occupation, New York Heart Association classification, and nitrate drug use were important factors for both genders. CONCLUSIONS The ML-based risk stratification tool was able to accurately assess and stratify the risk of 3-year all-cause mortality in patients with HF caused by CHD. ML combined with SHAP could provide an explicit explanation of individualized risk prediction and give physicians an intuitive understanding of the influence of key features in the model.
Collapse
Affiliation(s)
- Ke Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Jing Tian
- Department of Cardiology, The First Affiliated Hospital of Shanxi Medical University, Taiyuan, People's Republic of China
| | - Chu Zheng
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Hong Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Jia Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Yanling Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Qinghua Han
- Department of Cardiology, The First Affiliated Hospital of Shanxi Medical University, Taiyuan, People's Republic of China
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China.
| |
Collapse
|