1
|
Razavi-Termeh SV, Sadeghi-Niaraki A, Yao XA, Naqvi RA, Choi SM. Assessment of noise pollution-prone areas using an explainable geospatial artificial intelligence approach. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 370:122361. [PMID: 39255573 DOI: 10.1016/j.jenvman.2024.122361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 08/12/2024] [Accepted: 08/30/2024] [Indexed: 09/12/2024]
Abstract
This research aims to use the power of geospatial artificial intelligence (GeoAI), employing the categorical boosting (CatBoost) machine learning model in conjunction with two metaheuristic algorithms, the firefly algorithm (CatBoost-FA) and the fruit fly optimization algorithm (CatBoost-FOA), to spatially assess and map noise pollution prone areas in Tehran city, Iran. To spatially model areas susceptible to noise pollution, we established a comprehensive spatial database encompassing data for the annual average Leq (equivalent continuous sound level) from 2019 to 2022. This database was enriched with critical spatial criteria influencing noise pollution, including urban land use, traffic volume, population density, and normalized difference vegetation index (NDVI). Our study evaluated the predictive accuracy of these models using key performance metrics, including root mean square error (RMSE), mean absolute error (MAE), and receiver operating characteristic (ROC) indices. The results demonstrated the superior performance of the CatBoost-FA algorithm, with RMSE and MAE values of 0.159 and 0.114 for the training data and 0.437 and 0.371 for the test data, outperforming both the CatBoost-FOA and CatBoost models. ROC analysis further confirmed the efficacy of the models, achieving an accuracy of 0.897, CatBoost-FOA with an accuracy of 0.871, and CatBoost with an accuracy of 0.846, highlighting their robust modeling capabilities. Additionally, we employed an explainable artificial intelligence (XAI) approach, utilizing the SHAP (Shapley Additive Explanations) method to interpret the underlying mechanisms of our models. The SHAP results revealed the significant influence of various factors on noise-pollution-prone areas, with airport, commercial, and administrative zones emerging as pivotal contributors.
Collapse
Affiliation(s)
- Seyed Vahid Razavi-Termeh
- Dept. of Computer Science & Engineering and Convergence Engineering for Intelligent Drone, XR Research Center, Sejong University, Seoul, Republic of Korea.
| | - Abolghasem Sadeghi-Niaraki
- Dept. of Computer Science & Engineering and Convergence Engineering for Intelligent Drone, XR Research Center, Sejong University, Seoul, Republic of Korea.
| | - X Angela Yao
- Department of Geography, University of Georgia, Athens, GA, 30602, USA.
| | - Rizwan Ali Naqvi
- Department of Intelligent Mechatronics Engineering, Sejong University, Seoul, Republic of Korea.
| | - Soo-Mi Choi
- Dept. of Computer Science & Engineering and Convergence Engineering for Intelligent Drone, XR Research Center, Sejong University, Seoul, Republic of Korea.
| |
Collapse
|
2
|
Li R, Wang X, Luo L, Yuan Y. Identifying the most crucial factors associated with depression based on interpretable machine learning: a case study from CHARLS. Front Psychol 2024; 15:1392240. [PMID: 39118849 PMCID: PMC11306142 DOI: 10.3389/fpsyg.2024.1392240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Accepted: 07/08/2024] [Indexed: 08/10/2024] Open
Abstract
Background Depression is one of the most common mental illnesses among middle-aged and older adults in China. It is of great importance to find the crucial factors that lead to depression and to effectively control and reduce the risk of depression. Currently, there are limited methods available to accurately predict the risk of depression and identify the crucial factors that influence it. Methods We collected data from 25,586 samples from the harmonized China Health and Retirement Longitudinal Study (CHARLS), and the latest records from 2018 were included in the current cross-sectional analysis. Ninety-three input variables in the survey were considered as potential influential features. Five machine learning (ML) models were utilized, including CatBoost and eXtreme Gradient Boosting (XGBoost), Gradient Boosting decision tree (GBDT), Random Forest (RF), Light Gradient Boosting Machine (LightGBM). The models were compared to the traditional multivariable Linear Regression (LR) model. Simultaneously, SHapley Additive exPlanations (SHAP) were used to identify key influencing factors at the global level and explain individual heterogeneity through instance-level analysis. To explore how different factors are non-linearly associated with the risk of depression, we employed the Accumulated Local Effects (ALE) approach to analyze the identified critical variables while controlling other covariates. Results CatBoost outperformed other machine learning models in terms of MAE, MSE, MedAE, and R2metrics. The top three crucial factors identified by the SHAP were r4satlife, r4slfmem, and r4shlta, representing life satisfaction, self-reported memory, and health status levels, respectively. Conclusion This study demonstrates that the CatBoost model is an appropriate choice for predicting depression among middle-aged and older adults in Harmonized CHARLS. The SHAP and ALE interpretable methods have identified crucial factors and the nonlinear relationship with depression, which require the attention of domain experts.
Collapse
Affiliation(s)
- Rulin Li
- School of Management, North Sichuan Medical College, Nanchong, China
| | - Xueyan Wang
- Information Centre, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Lanjun Luo
- School of Management, North Sichuan Medical College, Nanchong, China
| | - Youwei Yuan
- School of Management, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
3
|
Li Q, Lv H, Chen Y, Shen J, Shi J, Zhou C. Development and validation of a machine learning predictive model for perioperative myocardial injury in cardiac surgery with cardiopulmonary bypass. J Cardiothorac Surg 2024; 19:384. [PMID: 38926872 PMCID: PMC11201784 DOI: 10.1186/s13019-024-02856-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Perioperative myocardial injury (PMI) with different cut-off values has showed to be associated with different prognostic effect after cardiac surgery. Machine learning (ML) method has been widely used in perioperative risk predictions during cardiac surgery. However, the utilization of ML in PMI has not been studied yet. Therefore, we sought to develop and validate the performances of ML for PMI with different cut-off values in cardiac surgery with cardiopulmonary bypass (CPB). METHODS This was a second analysis of a multicenter clinical trial (OPTIMAL) and requirement for written informed consent was waived due to the retrospective design. Patients aged 18-70 undergoing elective cardiac surgery with CPB from December 2018 to April 2021 were enrolled in China. The models were developed using the data from Fuwai Hospital and externally validated by the other three cardiac centres. Traditional logistic regression (LR) and eleven ML models were constructed. The primary outcome was PMI, defined as the postoperative maximum cardiac Troponin I beyond different times of upper reference limit (40x, 70x, 100x, 130x) We measured the model performance by examining the area under the receiver operating characteristic curve (AUROC), precision-recall curve (AUPRC), and calibration brier score. RESULTS A total of 2983 eligible patients eventually participated in both the model development (n = 2420) and external validation (n = 563). The CatboostClassifier and RandomForestClassifier emerged as potential alternatives to the LR model for predicting PMI. The AUROC demonstrated an increase with each of the four cutoffs, peaking at 100x URL in the testing dataset and at 70x URL in the external validation dataset. However, it's worth noting that the AUPRC decreased with each cutoff increment. Additionally, the Brier loss score decreased as the cutoffs increased, reaching its lowest point at 0.16 with a 130x URL cutoff. Moreover, extended CPB time, aortic duration, elevated preoperative N-terminal brain sodium peptide, reduced preoperative neutrophil count, higher body mass index, and increased high-sensitivity C-reactive protein levels were identified as risk factors for PMI across all four cutoff values. CONCLUSIONS The CatboostClassifier and RandomForestClassifer algorithms could be an alternative for LR in prediction of PMI. Furthermore, preoperative higher N-terminal brain sodium peptide and lower high-sensitivity C-reactive protein were strong risk factor for PMI, the underlying mechanism require further investigation.
Collapse
Affiliation(s)
- Qian Li
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Hong Lv
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yuye Chen
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jingjia Shen
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jia Shi
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Chenghui Zhou
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
- Center for Anesthesiology, Beijing Anzhen Hospital, Capital Medical University, No. 2 Anzhen Rd., Chaoyang District, Beijing, 10029, China.
| |
Collapse
|
4
|
Li Q, Lv H, Chen Y, Shen J, Shi J, Zhou C. Hybrid feature selection in a machine learning predictive model for perioperative myocardial injury in noncoronary cardiac surgery with cardiopulmonary bypass. Perfusion 2024:2676591241253459. [PMID: 38733257 DOI: 10.1177/02676591241253459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2024]
Abstract
BACKGROUND Perioperative myocardial injury (PMI) is associated with increased mobility and mortality after noncoronary cardiac surgery. However, limited studies have developed a predictive model for PMI. Therefore, we used hybrid feature selection (FS) methods to establish a predictive model for PMI in noncoronary cardiac surgery with cardiopulmonary bypass (CPB). METHODS This was a single-center retrospective study conducted at the Fuwai Hospital in China. Patients aged 18-70 years who underwent elective noncoronary surgery with CPB at our institution from December 2018 to April 2021 were enrolled. The primary outcome was PMI, defined as the postoperative cardiac troponin I (cTnI) levels exceeding 220 times of upper reference limit (URL). Statistical analyses were conducted by Python (Python Software Foundation, version 3.9.7 and integrated development environment Jupyter Notebook 1.1.0) and SPSS software version 26.0 (IBM Corp., Armonk, New York, USA). RESULTS A total of 1130 patients were eventually eligible for this study. The incidence of PMI was 20.3% (229/1130) in the overall patients, 20.6% (163/791) in the training dataset, and 19.5% (66/339) in the testing dataset. The logistic regression model performed the best AUC of 0.6893 (95 CI%: 0.6371-0.7382) by the traditional selection method, and the random forest model performed the best AUC of 0.6937 (95 CI%: 0.6416-0.7423) by the union of Wrapper and Embedded method, and the CatBoost model performed the best AUC of 0.6828 (95 CI%: 0.6304-0.7320) by the union of Embedded and forward logistic regression technique, and the Naïve Bayes model achieved the best AUC with 0.7254 (95 CI%: 0.6746-0.7723) by forwarding logistic regression method. Moreover, the decision tree, KNeighborsClassifier, and support vector machine models performed the worse AUC in all selection forms. Furthermore, the SHapley Additive exPlanations plot showed that prolonged CPB, aortic clamp time, and preoperative low platelets count were strongly related to the PMI risk. CONCLUSIONS In total, four category feature selection methods were utilized, comprising five individual selection techniques and 15 combined methods. Notably, the combination of logistic regression and embedded methods demonstrated outstanding performance in predicting PMI risk. We also concluded that the machine learning model, including random forest, catboost, and Naive Bayes, were suitable candidates for establishing PMI predictive model. Nevertheless, additional investigation and validation are imperative for substantiating these finding.
Collapse
Affiliation(s)
- Qian Li
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Hong Lv
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Yuye Chen
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Jingjia Shen
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Jia Shi
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
| | - Chenghui Zhou
- Department of Anesthesiology, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Bejing, China
- Center for Anesthesiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| |
Collapse
|
5
|
Lin W, Shi S, Lan H, Wang N, Huang H, Wen J, Chen G. Identification of influence factors in overweight population through an interpretable risk model based on machine learning: a large retrospective cohort. Endocrine 2024; 83:604-614. [PMID: 37776483 DOI: 10.1007/s12020-023-03536-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 09/12/2023] [Indexed: 10/02/2023]
Abstract
BACKGROUND The identification of associated overweight risk factors is crucial to future health risk predictions and behavioral interventions. Several consensus problems remain in machine learning, such as cross-validation, and the resulting model may suffer from overfitting or poor interpretability. METHODS This study employed nine commonly used machine learning methods to construct overweight risk models. The general community are the target of this study, and a total of 10,905 Chinese subjects from Ningde City in Fujian province, southeast China, participated. The best model was selected through appropriate verification and validation and was suitably explained. RESULTS The overweight risk models employing machine learning exhibited good performance. It was concluded that CatBoost, which is used in the construction of clinical risk models, may surpass previous machine learning methods. The visual display of the Shapley additive explanation value for the machine model variables accurately represented the influence of each variable in the model. CONCLUSIONS The construction of an overweight risk model using machine learning may currently be the best approach. Moreover, CatBoost may be the best machine learning method. Furthermore, combining Shapley's additive explanation and machine learning methods can be effective in identifying disease risk factors for prevention and control.
Collapse
Affiliation(s)
- Wei Lin
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China.
| | - Songchang Shi
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Hospital Jinshan Branch, Fujian Provincial Hospital, Fuzhou, 350001, PR China
| | - Huiyu Lan
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China
| | - Nengying Wang
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China
| | - Huibin Huang
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China
| | - Junping Wen
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China
| | - Gang Chen
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, FuZhou, 350001, PR China.
| |
Collapse
|
6
|
Yang J, Wan J, Feng L, Hou S, Yv K, Xu L, Chen K. Machine learning algorithms for the prediction of adverse prognosis in patients undergoing peritoneal dialysis. BMC Med Inform Decis Mak 2024; 24:8. [PMID: 38166909 PMCID: PMC10763100 DOI: 10.1186/s12911-023-02412-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/19/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND An appropriate prediction model for adverse prognosis before peritoneal dialysis (PD) is lacking. Thus, we retrospectively analysed patients who underwent PD to construct a predictive model for adverse prognoses using machine learning (ML). METHODS A retrospective analysis was conducted on 873 patients who underwent PD from August 2007 to December 2020. A total of 824 patients who met the inclusion criteria were included in the analysis. Five commonly used ML algorithms were used for the initial model training. By using the area under the curve (AUC) and accuracy (ACC), we ranked the indicators with the highest impact and displayed them using the values of Shapley additive explanation (SHAP) version 0.41.0. The top 20 indicators were selected to build a compact model that is conducive to clinical application. All model-building steps were implemented in Python 3.8.3. RESULTS At the end of follow-up, 353 patients withdrew from PD (converted to haemodialysis or died), and 471 patients continued receiving PD. In the complete model, the categorical boosting classifier (CatBoost) model exhibited the strongest performance (AUC = 0.80, 95% confidence interval [CI] = 0.76-0.83; ACC: 0.78, 95% CI = 0.72-0.83) and was selected for subsequent analysis. We reconstructed a compression model by extracting 20 key features ranked by the SHAP values, and the CatBoost model still showed the strongest performance (AUC = 0.79, ACC = 0.74). CONCLUSIONS The CatBoost model, which was built using the intelligent analysis technology of ML, demonstrated the best predictive performance. Therefore, our developed prediction model has potential value in patient screening before PD and hierarchical management after PD.
Collapse
Affiliation(s)
- Jie Yang
- Department of Nephrology, Daping Hospital, Army Medical University, Chongqing, 400042, China
| | - Jingfang Wan
- Department of Nephrology, Daping Hospital, Army Medical University, Chongqing, 400042, China
| | - Lei Feng
- Department of Nephrology, Daping Hospital, Army Medical University, Chongqing, 400042, China
- Teaching Office, Medical Research Department, Army Special Medical Center, Chongqing, China
| | - Shihui Hou
- Department of Nephrology, Daping Hospital, Army Medical University, Chongqing, 400042, China
| | - Kaizhen Yv
- Department of Nephrology, Daping Hospital, Army Medical University, Chongqing, 400042, China
| | - Liang Xu
- Department of Medical Engineering, The Second Affiliated Hospital of the Army Medical University, Chongqing, 400037, China.
| | - Kehong Chen
- Department of Nephrology, Daping Hospital, Army Medical University, Chongqing, 400042, China.
- State Key Laboratory of Trauma, Burns and Combined Injury, Wound Trauma Medical Center, Army Medical University, Chongqing, China.
| |
Collapse
|
7
|
Zhang K, Xu X, You H. Social causation, social selection, and economic selection in the health outcomes of Chinese older adults and their gender disparities. SSM Popul Health 2023; 24:101508. [PMID: 37720820 PMCID: PMC10500472 DOI: 10.1016/j.ssmph.2023.101508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 08/26/2023] [Accepted: 09/02/2023] [Indexed: 09/19/2023] Open
Abstract
Background The economic selection hypothesis, which argues that the initial economic situation determines both subsequent health and economic conditions, has been drawn into the debate on causation-selection issues. This study aims to construct a path model with self-rated health and depression score of older adults as health outcomes to measure and compare the social causation forces of wealth accumulation, social selection forces of adulthood health, and economic selection forces of childhood economics, and to examine their gender disparities. Methods Data was obtained from a sample of 19613 older adults aged 45 years or above from the 2014 life history survey and the 2015 routine follow-up survey of the China Health and Retirement Longitudinal Study. Structural equation modeling analysis was conducted employing the full information maximum likelihood estimation method. Results The presence of social causation, social selection, and economic selection were all statistically supported. In self-rated health, social selection forces held the dominant position, while social causation forces were comparable to economic selection forces. In depression score, social selection still exhibited stronger forces than economic selection, but social causation had forces close to social selection and greater than economic selection. The forces of the three hypotheses in self-rated health did not significantly change with gender, but social causation exerted mightier forces than economic selection within the male group, unlike the female group. The forces of economic selection in depression score were greater in females than males and no significant differences were observed among the forces of the three hypotheses in the female group. Conclusions Social causation, social selection, and economic selection operate simultaneously on the self-rated health and depression score of older adults. However, the force magnitudes of the three hypotheses and/or their rankings differ by health outcomes and gender.
Collapse
Affiliation(s)
- Kangkang Zhang
- School of Health Policy & Management, Nanjing Medical University, Nanjing, China
| | - Xinpeng Xu
- School of Public Health, Nanjing Medical University, Nanjing, China
- Institute of Healthy Jiangsu Development, Nanjing Medical University, Nanjing, China
| | - Hua You
- School of Health Policy & Management, Nanjing Medical University, Nanjing, China
- School of Public Health, Nanjing Medical University, Nanjing, China
- Institute of Healthy Jiangsu Development, Nanjing Medical University, Nanjing, China
| |
Collapse
|
8
|
Zhang Y, Wang H, Yin C, Shu T, Yu J, Jian J, Jian C, Duan M, Kadier K, Xu Q, Wang X, Xiang T, Liu X. Development of a prediction model for the risk of 30-day unplanned readmission in older patients with heart failure: A multicenter retrospective study. Nutr Metab Cardiovasc Dis 2023; 33:1878-1887. [PMID: 37500347 DOI: 10.1016/j.numecd.2023.05.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 05/21/2023] [Accepted: 05/31/2023] [Indexed: 07/29/2023]
Abstract
BACKGROUND AND AIM Heart failure (HF) imposes significant global health costs due to its high incidence, readmission, and mortality rate. Accurate assessment of readmission risk and precise interventions have become important measures to improve health for patients with HF. Therefore, this study aimed to develop a machine learning (ML) model to predict 30-day unplanned readmissions in older patients with HF. METHODS AND RESULTS This study collected data on hospitalized older patients with HF from the medical data platform of Chongqing Medical University from January 1, 2012, to December 31, 2021. A total of 5 candidate algorithms were selected from 15 ML algorithms with excellent performance, which was evaluated by area under the operating characteristic curve (AUC) and accuracy. Then, the 5 candidate algorithms were hyperparameter tuned by 5-fold cross-validation grid search, and performance was evaluated by AUC, accuracy, sensitivity, specificity, and recall. Finally, an optimal ML model was constructed, and the predictive results were explained using the SHapley Additive exPlanations (SHAP) framework. A total of 14,843 older patients with HF were consecutively enrolled. CatBoost model was selected as the best prediction model, and AUC was 0.732, with 0.712 accuracy, 0.619 sensitivity, and 0.722 specificity. NT.proBNP, length of stay (LOS), triglycerides, blood phosphorus, blood potassium, and lactate dehydrogenase had the greatest effect on 30-day unplanned readmission in older patients with HF, according to SHAP results. CONCLUSIONS The study developed a CatBoost model to predict the risk of unplanned 30-day special-cause readmission in older patients with HF, which showed more significant performance compared with the traditional logistic regression model.
Collapse
Affiliation(s)
- Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Haolin Wang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
| | - Chengliang Yin
- Faculty of Medicine, Macau University of Science and Technology, 999078, Macau, China
| | - Tingting Shu
- Army Medical University (Third Military Medical University), Chongqing, China
| | - Jie Yu
- Department of Medical Imaging, The Affiliated Taian City Central Hospital of Qingdao University, Taian 271000, China
| | - Jie Jian
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Chang Jian
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Minjie Duan
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Kaisaierjiang Kadier
- Department of Cardiology, First Affiliated Hospital of Xinjiang Medical University, Ürümqi, China
| | - Qian Xu
- Collection Development Department of Library, Chongqing Medical University, Chongqing, China
| | - Xueer Wang
- College of Oncology, Guangxi Medical University, Nanning 530022, China
| | - Tianyu Xiang
- Information Center, The University-Town Hospital of Chongqing Medical University, Chongqing, China.
| | - Xiaozhu Liu
- College of Medical Informatics, Chongqing Medical University, Chongqing, China; Medical Data Science Academy, Chongqing Medical University, Chongqing, China.
| |
Collapse
|
9
|
Qu Z, Wang Y, Guo D, He G, Sui C, Duan Y, Zhang X, Lan L, Meng H, Wang Y, Liu X. Identifying depression in the United States veterans using deep learning algorithms, NHANES 2005-2018. BMC Psychiatry 2023; 23:620. [PMID: 37612646 PMCID: PMC10463693 DOI: 10.1186/s12888-023-05109-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Accepted: 08/13/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND Depression is a common mental health problem among veterans, with high mortality. Despite the numerous conducted investigations, the prediction and identification of risk factors for depression are still severely limited. This study used a deep learning algorithm to identify depression in veterans and its factors associated with clinical manifestations. METHODS Our data originated from the National Health and Nutrition Examination Survey (2005-2018). A dataset of 2,546 veterans was identified using deep learning and five traditional machine learning algorithms with 10-fold cross-validation. Model performance was assessed by examining the area under the subject operating characteristic curve (AUC), accuracy, recall, specificity, precision, and F1 score. RESULTS Deep learning had the highest AUC (0.891, 95%CI 0.869-0.914) and specificity (0.906) in identifying depression in veterans. Further study on depression among veterans of different ages showed that the AUC values for deep learning were 0.929 (95%CI 0.904-0.955) in the middle-aged group and 0.924(95%CI 0.900-0.948) in the older age group. In addition to general health conditions, sleep difficulties, memory impairment, work incapacity, income, BMI, and chronic diseases, factors such as vitamins E and C, and palmitic acid were also identified as important influencing factors. CONCLUSIONS Compared with traditional machine learning methods, deep learning algorithms achieved optimal performance, making it conducive for identifying depression and its risk factors among veterans.
Collapse
Affiliation(s)
- Zihan Qu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Yashan Wang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Dingjie Guo
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Guangliang He
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Chuanying Sui
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Yuqing Duan
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Xin Zhang
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Linwei Lan
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Hengyu Meng
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China
| | - Yajing Wang
- School of Computer Science, McGill University, Montreal, H3A 0G4, Canada
| | - Xin Liu
- Department of Epidemiology and Statistics, School of Public Health, Jilin University, Changchun, 130021, China.
| |
Collapse
|
10
|
Nhu NT, Kang JH, Yeh TS, Wu CC, Tsai CY, Piravej K, Lam C. Prediction of posttraumatic functional recovery in middle-aged and older patients through dynamic ensemble selection modeling. Front Public Health 2023; 11:1164820. [PMID: 37408743 PMCID: PMC10319009 DOI: 10.3389/fpubh.2023.1164820] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 05/17/2023] [Indexed: 07/07/2023] Open
Abstract
Introduction Age-specific risk factors may delay posttraumatic functional recovery; complex interactions exist between these factors. In this study, we investigated the prediction ability of machine learning models for posttraumatic (6 months) functional recovery in middle-aged and older patients on the basis of their preexisting health conditions. Methods Data obtained from injured patients aged ≥45 years were divided into training-validation (n = 368) and test (n = 159) data sets. The input features were the sociodemographic characteristics and baseline health conditions of the patients. The output feature was functional status 6 months after injury; this was assessed using the Barthel Index (BI). On the basis of their BI scores, the patients were categorized into functionally independent (BI >60) and functionally dependent (BI ≤60) groups. The permutation feature importance method was used for feature selection. Six algorithms were validated through cross-validation with hyperparameter optimization. The algorithms exhibiting satisfactory performance were subjected to bagging to construct stacking, voting, and dynamic ensemble selection models. The best model was evaluated on the test data set. Partial dependence (PD) and individual conditional expectation (ICE) plots were created. Results In total, nineteen of twenty-seven features were selected. Logistic regression, linear discrimination analysis, and Gaussian Naive Bayes algorithms exhibited satisfactory performances and were, therefore, used to construct ensemble models. The k-Nearest Oracle Elimination model outperformed the other models when evaluated on the training-validation data set (sensitivity: 0.732, 95% CI: 0.702-0.761; specificity: 0.813, 95% CI: 0.805-0.822); it exhibited compatible performance on the test data set (sensitivity: 0.779, 95% CI: 0.559-0.950; specificity: 0.859, 95% CI: 0.799-0.912). The PD and ICE plots showed consistent patterns with practical tendencies. Conclusion Preexisting health conditions can predict long-term functional outcomes in injured middle-aged and older patients, thus predicting prognosis and facilitating clinical decision-making.
Collapse
Affiliation(s)
- Nguyen Thanh Nhu
- International Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- Faculty of Medicine, Can Tho University of Medicine and Pharmacy, Can Tho, Vietnam
| | - Jiunn-Horng Kang
- International Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- Department of Physical Medicine and Rehabilitation, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- Department of Physical Medicine and Rehabilitation, Taipei Medical University Hospital, Taipei, Taiwan
- Graduate Institute of Nanomedicine and Medical Engineering, College of Biomedical Engineering, Taipei Medical University, Taipei, Taiwan
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Tian-Shin Yeh
- Department of Physical Medicine and Rehabilitation, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- Department of Physical Medicine and Rehabilitation, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
- Department of Epidemiology and Nutrition, Harvard T. H. Chan School of Public Health, Harvard University, Boston, MA, United States
- Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom
| | - Chia-Chieh Wu
- Emergency Department, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
- Department of Emergency, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| | - Cheng-Yu Tsai
- Centre for Transport Studies, Department of Civil and Environmental Engineering, Imperial College London, London, United Kingdom
| | - Krisna Piravej
- Department of Rehabilitation Medicine, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
- Department of Chula Neuroscience Center, King Chulalongkorn Memorial Hospital, Bangkok, Thailand
| | - Carlos Lam
- Emergency Department, Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan
- Department of Emergency, School of Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
11
|
Tsai HJ, Yang WC, Tsai SJ, Lin CH, Yang AC. Right-side frontal-central cortical hyperactivation before the treatment predicts outcomes of antidepressant and electroconvulsive therapy responsivity in major depressive disorder. J Psychiatr Res 2023; 161:377-385. [PMID: 37012197 DOI: 10.1016/j.jpsychires.2023.03.023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Revised: 03/08/2023] [Accepted: 03/13/2023] [Indexed: 04/05/2023]
Abstract
Major depressive disorder places a great burden on healthcare resources worldwide. Antidepressants are the first-line treatment for major depressive disorder, but if patients don't respond adequately, brain stimulation therapy may be needed as second-line treatment. Digital phenotyping in patients with major depressive disorder will aid in the timely prediction of treatment effectiveness. This study explored electroencephalographic (EEG) signatures that diversify depression treatment responsivity, including antidepressant administration or brain stimulation therapy. Resting-state, pre-treatment EEG sequences from depressive patients who received fluoxetine treatment (n = 55; 26 remitters and 29 poor responders) or electroconvulsive therapy (ECT, n = 58; 36 remitters and 22 nonremitters) were recorded on 19 channels. Twenty-nine EEG segments were obtained from each patient per recording electrode. Power spectral analysis was conducted for feature extraction and showed the highest predictive accuracy for fluoxetine or ECT outcomes. Both occurred with beta-band oscillations within right-side frontal-central (F1-score = 0.9437) or prefrontal areas of the brain (F1-score = 0.9416), respectively. Significantly higher beta-band power was observed among patients who lacked adequate treatment response than the remitters, specifically at 19.2 Hz or 24.5 Hz for fluoxetine administration or ECT outcome, respectively. Our findings indicated that pre-treatment, right-side cortical hyperactivation is associated with poor outcomes of antidepressant-based or ECT-based treatment in major depression. Whether depression treatment response rates can be improved by reducing the high-frequency EEG power in corresponding areas of the brain to provide a protective effect against depression recurrence warrants further study.
Collapse
Affiliation(s)
- Hsin-Jung Tsai
- Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei, Taiwan; Digital Medicine and Smart Healthcare Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Wei-Cheng Yang
- Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei, Taiwan; Department of Psychiatry, Tainan Hospital, Ministry of Health and Welfare, Tainan, Taiwan
| | - Shih-Jen Tsai
- Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei, Taiwan; Department of Psychiatry, Taipei Veteran General Hospital, Taipei, Taiwan
| | - Ching-Hua Lin
- Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung, Taiwan.
| | - Albert C Yang
- Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei, Taiwan; Digital Medicine and Smart Healthcare Research Center, National Yang Ming Chiao Tung University, Taipei, Taiwan; Department of Medical Research, Taipei Veterans General Hospital, Taipei, Taiwan.
| |
Collapse
|
12
|
Chang W, Wang X, Yang J, Qin T. An Improved CatBoost-Based Classification Model for Ecological Suitability of Blueberries. SENSORS (BASEL, SWITZERLAND) 2023; 23:1811. [PMID: 36850409 PMCID: PMC9961688 DOI: 10.3390/s23041811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 06/18/2023]
Abstract
Selecting the best planting area for blueberries is an essential issue in agriculture. To better improve the effectiveness of blueberry cultivation, a machine learning-based classification model for blueberry ecological suitability was proposed for the first time and its validation was conducted by using multi-source environmental features data in this paper. The sparrow search algorithm (SSA) was adopted to optimize the CatBoost model and classify the ecological suitability of blueberries based on the selection of data features. Firstly, the Borderline-SMOTE algorithm was used to balance the number of positive and negative samples. The Variance Inflation Factor and information gain methods were applied to filter out the factors affecting the growth of blueberries. Subsequently, the processed data were fed into the CatBoost for training, and the parameters of the CatBoost were optimized to obtain the optimal model using SSA. Finally, the SSA-CatBoost model was adopted to classify the ecological suitability of blueberries and output the suitability types. Taking a study on a blueberry plantation in Majiang County, Guizhou Province, China as an example, the findings demonstrate that the AUC value of the SSA-CatBoost-based blueberry ecological suitability model is 0.921, which is 2.68% higher than that of the CatBoost (AUC = 0.897) and is significantly higher than Logistic Regression (AUC = 0.855), Support Vector Machine (AUC = 0.864), and Random Forest (AUC = 0.875). Furthermore, the ecological suitability of blueberries in Majiang County is mapped according to the classification results of different models. When comparing the actual blueberry cultivation situation in Majiang County, the classification results of the SSA-CatBoost model proposed in this paper matches best with the real blueberry cultivation situation in Majiang County, which is of a high reference value for the selection of blueberry cultivation sites.
Collapse
Affiliation(s)
- Wenfeng Chang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Xiao Wang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Jing Yang
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| | - Tao Qin
- Department of Electrical Engineering, Guizhou University, Guiyang 550025, China
| |
Collapse
|
13
|
Development and Validation of a Machine Learning Predictive Model for Cardiac Surgery-Associated Acute Kidney Injury. J Clin Med 2023; 12:jcm12031166. [PMID: 36769813 PMCID: PMC9917969 DOI: 10.3390/jcm12031166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2022] [Revised: 01/16/2023] [Accepted: 01/27/2023] [Indexed: 02/05/2023] Open
Abstract
OBJECTIVE We aimed to develop and validate a predictive machine learning (ML) model for cardiac surgery associated with acute kidney injury (CSA-AKI) based on a multicenter randomized control trial (RCT) and a Medical Information Mart for Intensive Care-IV (MIMIC-IV) dataset. METHODS This was a subanalysis from a completed RCT approved by the Ethics Committee of Fuwai Hospital in Beijing, China (NCT03782350). Data from Fuwai Hospital were randomly assigned, with 80% for the training dataset and 20% for the testing dataset. The data from three other centers were used for the external validation dataset. Furthermore, the MIMIC-IV dataset was also utilized to validate the performance of the predictive model. The area under the receiver operating characteristic curve (ROC-AUC), the precision-recall curve (PR-AUC), and the calibration brier score were applied to evaluate the performance of the traditional logistic regression (LR) and eleven ML algorithms. Additionally, the Shapley Additive Explanations (SHAP) interpreter was used to explain the potential risk factors for CSA-AKI. RESULT A total of 6495 eligible patients undergoing cardiopulmonary bypass (CPB) were eventually included in this study, 2416 of whom were from Fuwai Hospital (Beijing), for model development, 562 from three other cardiac centers in China, and 3517 from the MIMICIV dataset, were used, respectively, for external validation. The CatBoostClassifier algorithms outperformed other models, with excellent discrimination and calibration performance for the development, as well as the MIMIC-IV, datasets. In addition, the CatBoostClassifier achieved ROC-AUCs of 0.85, 0.67, and 0.77 and brier scores of 0.14, 0.19, and 0.16 in the testing, external, and MIMIC-IV datasets, respectively. Moreover, the utmost important risk factor, the N-terminal brain sodium peptide (NT-proBNP), was confirmed by the LASSO method in the feature section process. Notably, the SHAP explainer identified that the preoperative blood urea nitrogen level, prothrombin time, serum creatinine level, total bilirubin level, and age were positively correlated with CSA-AKI; preoperative platelets level, systolic and diastolic blood pressure, albumin level, and body weight were negatively associated with CSA-AKI. CONCLUSIONS The CatBoostClassifier algorithms outperformed other ML models in the discrimination and calibration of CSA-AKI prediction cardiac surgery with CPB, based on a multicenter RCT and MIMIC-IV dataset. Moreover, the preoperative NT-proBNP level was confirmed to be strongly related to CSA-AKI.
Collapse
|
14
|
Wei Q, Xu X, Xu X, Cheng Q. Early identification of autism spectrum disorder by multi-instrument fusion: A clinically applicable machine learning approach. Psychiatry Res 2023; 320:115050. [PMID: 36645989 DOI: 10.1016/j.psychres.2023.115050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 12/30/2022] [Accepted: 01/05/2023] [Indexed: 01/12/2023]
Abstract
Autism spectrum disorder (ASD), developmental language disorder (DLD), and global developmental delay (GDD) are common neurodevelopmental disorders in early childhood; however, the differential diagnosis of these disorders is difficult because of overlapping symptoms. Drawing on a cohort of 2004 children with ASD, DLD, or GDD, this study developed machine learning classifiers using decision trees, support vector machines, eXtreme gradient boosting (XGB), logistic regression, and neural networks by combining several easily accessible behavioral and developmental assessment instruments. The best-performing XGB model was further simplified into a two-stage decision model (TS-DM) to achieve better interpretability. Model performance was tested and compared with that of 12 pediatricians on an external dataset of 60 children. The accuracies of the resident pediatricians, senior pediatricians, TS-DM, and XGB were 53.3%, 66.7%, 75.0%, and 78.3%, respectively. Machine learning has the potential to identify these three neurodevelopmental disorders by integrating information from multiple instruments and thereby may increase our understanding of the roles of different behavioral and developmental characteristics in the different diagnoses.
Collapse
Affiliation(s)
- Qiuhong Wei
- Department of Children's Healthcare, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child development and Critical Disorders, Chongqing Key Laboratory of Childhood Nutrition and Health, Children's Hospital of Chongqing Medical University, No 136. Zhongshan 2nd Rd, Yuzhong District, Chongqing, China
| | - Xueli Xu
- Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences, Guangzhou, China
| | - Ximing Xu
- Big Data Center for Children's Medical Care, Children's Hospital of Chongqing Medical University, No 136. Zhongshan 2nd Rd, Yuzhong District, Chongqing 400014, China.
| | - Qian Cheng
- Department of Children's Healthcare, National Clinical Research Center for Child Health and Disorders, Ministry of Education Key Laboratory of Child Development and Disorders, China International Science and Technology Cooperation Base of Child development and Critical Disorders, Chongqing Key Laboratory of Childhood Nutrition and Health, Children's Hospital of Chongqing Medical University, No 136. Zhongshan 2nd Rd, Yuzhong District, Chongqing, China.
| |
Collapse
|
15
|
Ustebay S, Sarmis A, Kaya GK, Sujan M. A comparison of machine learning algorithms in predicting COVID-19 prognostics. Intern Emerg Med 2023; 18:229-239. [PMID: 36116079 PMCID: PMC9483274 DOI: 10.1007/s11739-022-03101-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 09/05/2022] [Indexed: 02/01/2023]
Abstract
ML algorithms are used to develop prognostic and diagnostic models and so to support clinical decision-making. This study uses eight supervised ML algorithms to predict the need for intensive care, intubation, and mortality risk for COVID-19 patients. The study uses two datasets: (1) patient demographics and clinical data (n = 11,712), and (2) patient demographics, clinical data, and blood test results (n = 602) for developing the prediction models, understanding the most significant features, and comparing the performances of eight different ML algorithms. Experimental findings showed that all prognostic prediction models reported an AUROC value of over 0.92, in which extra tree and CatBoost classifiers were often outperformed (AUROC over 0.94). The findings revealed that the features of C-reactive protein, the ratio of lymphocytes, lactic acid, and serum calcium have a substantial impact on COVID-19 prognostic predictions. This study provides evidence of the value of tree-based supervised ML algorithms for predicting prognosis in health care.
Collapse
Affiliation(s)
- Serpil Ustebay
- Department of Computer Engineering, Istanbul Medeniyet University, Istanbul, Turkey
| | - Abdurrahman Sarmis
- Department of Microbiology Laboratory, Goztepe Prof. Dr. Suleyman Yalcin City Hospital, Istanbul, Turkey
| | - Gulsum Kubra Kaya
- Department of Industrial Engineering, Istanbul Medeniyet University, Istanbul, Turkey.
- School of Aerospace, Transport and Manufacturing, Cranfield University, Bedford, MK430AL, UK.
| | | |
Collapse
|
16
|
Qin Y, Wu J, Xiao W, Wang K, Huang A, Liu B, Yu J, Li C, Yu F, Ren Z. Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph192215027. [PMID: 36429751 PMCID: PMC9690067 DOI: 10.3390/ijerph192215027] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 11/04/2022] [Accepted: 11/10/2022] [Indexed: 06/01/2023]
Abstract
The prevalence of diabetes has been increasing in recent years, and previous research has found that machine-learning models are good diabetes prediction tools. The purpose of this study was to compare the efficacy of five different machine-learning models for diabetes prediction using lifestyle data from the National Health and Nutrition Examination Survey (NHANES) database. The 1999-2020 NHANES database yielded data on 17,833 individuals data based on demographic characteristics and lifestyle-related variables. To screen training data for machine models, the Akaike Information Criterion (AIC) forward propagation algorithm was utilized. For predicting diabetes, five machine-learning models (CATBoost, XGBoost, Random Forest (RF), Logistic Regression (LR), and Support Vector Machine (SVM)) were developed. Model performance was evaluated using accuracy, sensitivity, specificity, precision, F1 score, and receiver operating characteristic (ROC) curve. Among the five machine-learning models, the dietary intake levels of energy, carbohydrate, and fat, contributed the most to the prediction of diabetes patients. In terms of model performance, CATBoost ranks higher than RF, LG, XGBoost, and SVM. The best-performing machine-learning model among the five is CATBoost, which achieves an accuracy of 82.1% and an AUC of 0.83. Machine-learning models based on NHANES data can assist medical institutions in identifying diabetes patients.
Collapse
Affiliation(s)
- Yifan Qin
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Jinlong Wu
- College of Physical Education, Southwest University, Chongqing 400715, China
| | - Wen Xiao
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Kun Wang
- Physical Education College, Yanching Institute of Technology, Langfang 065201, China
| | - Anbing Huang
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Bowen Liu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Jingxuan Yu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Chuhao Li
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Fengyu Yu
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| | - Zhanbing Ren
- College of Physical Education, Shenzhen University, Shenzhen 518000, China
| |
Collapse
|
17
|
Shi S, Pan X, Zhang L, Wang X, Zhuang Y, Lin X, Shi S, Zheng J, Lin W. An application based on bioinformatics and machine learning for risk prediction of sepsis at first clinical presentation using transcriptomic data. Front Genet 2022; 13:979529. [PMID: 36159979 PMCID: PMC9490444 DOI: 10.3389/fgene.2022.979529] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 08/10/2022] [Indexed: 12/02/2022] Open
Abstract
Background: Linking genotypic changes to phenotypic traits based on machine learning methods has various challenges. In this study, we developed a workflow based on bioinformatics and machine learning methods using transcriptomic data for sepsis obtained at the first clinical presentation for predicting the risk of sepsis. By combining bioinformatics with machine learning methods, we have attempted to overcome current challenges in predicting disease risk using transcriptomic data. Methods: High-throughput sequencing transcriptomic data processing and gene annotation were performed using R software. Machine learning models were constructed, and model performance was evaluated by machine learning methods in Python. The models were visualized and interpreted using the Shapley Additive explanation (SHAP) method. Results: Based on the preset parameters and using recursive feature elimination implemented via machine learning, the top 10 optimal genes were screened for the establishment of the machine learning models. In a comparison of model performance, CatBoost was selected as the optimal model. We explored the significance of each gene in the model and the interaction between each gene through SHAP analysis. Conclusion: The combination of CatBoost and SHAP may serve as the best-performing machine learning model for predicting transcriptomic and sepsis risks. The workflow outlined may provide a new approach and direction in exploring the mechanisms associated with genes and sepsis risk.
Collapse
Affiliation(s)
- Songchang Shi
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Jinshan Hospital, Fujian Provincial Hospital, Fuzhou, China
| | - Xiaobin Pan
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Jinshan Hospital, Fujian Provincial Hospital, Fuzhou, China
| | - Lihui Zhang
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Jinshan Hospital, Fujian Provincial Hospital, Fuzhou, China
| | - Xincai Wang
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Jinshan Hospital, Fujian Provincial Hospital, Fuzhou, China
| | - Yingfeng Zhuang
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Jinshan Hospital, Fujian Provincial Hospital, Fuzhou, China
| | - Xingsheng Lin
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital South Branch, Fujian Provincial Jinshan Hospital, Fujian Provincial Hospital, Fuzhou, China
| | - Songjing Shi
- Department of Critical Care Medicine, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
| | - Jianzhang Zheng
- Department of Orthopedics, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
| | - Wei Lin
- Department of Endocrinology, Shengli Clinical Medical College of Fujian Medical University, Fujian Provincial Hospital, Fuzhou, China
| |
Collapse
|
18
|
Duan M, Shu T, Zhao B, Xiang T, Wang J, Huang H, Zhang Y, Xiao P, Zhou B, Xie Z, Liu X. Explainable machine learning models for predicting 30-day readmission in pediatric pulmonary hypertension: A multicenter, retrospective study. Front Cardiovasc Med 2022; 9:919224. [PMID: 35958416 PMCID: PMC9360407 DOI: 10.3389/fcvm.2022.919224] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/23/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundShort-term readmission for pediatric pulmonary hypertension (PH) is associated with a substantial social and personal burden. However, tools to predict individualized readmission risk are lacking. This study aimed to develop machine learning models to predict 30-day unplanned readmission in children with PH.MethodsThis study collected data on pediatric inpatients with PH from the Chongqing Medical University Medical Data Platform from January 2012 to January 2019. Key clinical variables were selected by the least absolute shrinkage and the selection operator. Prediction models were selected from 15 machine learning algorithms with excellent performance, which was evaluated by area under the operating characteristic curve (AUC). The outcome of the predictive model was interpreted by SHapley Additive exPlanations (SHAP).ResultsA total of 5,913 pediatric patients with PH were included in the final cohort. The CatBoost model was selected as the predictive model with the greatest AUC for 0.81 (95% CI: 0.77–0.86), high accuracy for 0.74 (95% CI: 0.72–0.76), sensitivity 0.78 (95% CI: 0.69–0.87), and specificity 0.74 (95% CI: 0.72–0.76). Age, length of stay (LOS), congenital heart surgery, and nonmedical order discharge showed the greatest impact on 30-day readmission in pediatric PH, according to SHAP results.ConclusionsThis study developed a CatBoost model to predict the risk of unplanned 30-day readmission in pediatric patients with PH, which showed more significant performance compared with traditional logistic regression. We found that age, LOS, congenital heart surgery, and nonmedical order discharge were important factors for 30-day readmission in pediatric PH.
Collapse
Affiliation(s)
- Minjie Duan
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Tingting Shu
- Department of Cardiology, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Binyi Zhao
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Tianyu Xiang
- Information Center, The University-Town Hospital of Chongqing Medical University, Chongqing, China
| | - Jinkui Wang
- Department of Urology, Children's Hospital of Chongqing Medical University, Chongqing, China
| | - Haodong Huang
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
- Personnel Department, Chongqing Health Center for Women and Children, Chongqing, China
| | - Yang Zhang
- College of Medical Informatics, Chongqing Medical University, Chongqing, China
- Medical Data Science Academy, Chongqing Medical University, Chongqing, China
| | - Peilin Xiao
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Bei Zhou
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Zulong Xie
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
- *Correspondence: Zulong Xie ;
| | - Xiaozhu Liu
- Department of Cardiology, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
- Xiaozhu Liu ;
| |
Collapse
|
19
|
Gao W, Zhou L, Liu S, Guan Y, Gao H, Hui B. Machine learning prediction of lignin content in poplar with Raman spectroscopy. BIORESOURCE TECHNOLOGY 2022; 348:126812. [PMID: 35131461 DOI: 10.1016/j.biortech.2022.126812] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 01/29/2022] [Accepted: 01/31/2022] [Indexed: 06/14/2023]
Abstract
Based on features extracted from Raman spectra, regularization algorithms, SVR, DT, RF, LightGBM, CatBoost, and XGBoost were used to develop prediction models for lignin content in poplar. Firstly, Raman features extracted from FT-Raman spectra after data processing were used as input of models and determined lignin contents were output. Secondly, grid-search combined with cross-validation was used to adjust the hyper-parameters of models. Finally, the predictive models were built by aforementioned algorithms. The results indicated regularization algorithms, SVR, DT held test R2 were >0.80 which means the predictive values from model still deviate from measured ones. Meanwhile, RF, LightGBM, CatBoost, and XGBoost were better than above algorithms, and their test R2 were >0.91 which suggesting the predictive values was nearly close to measured ones. Therefore, fast and accurate methods for predicting lignin content were obtained and will be useful for screening suitable lignocellulosic resource with expected lignin content.
Collapse
Affiliation(s)
- Wenli Gao
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, PR China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, PR China
| | - Liang Zhou
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, PR China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, PR China.
| | - Shengquan Liu
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, PR China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, PR China
| | - Ying Guan
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, PR China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, PR China
| | - Hui Gao
- School of Forestry and Landscape Architecture, Anhui Agricultural University, Hefei, Anhui 230036, PR China; Key Lab of State Forest and Grassland Administration on Wood Quality Improvement & High Efficient Utilization, Hefei, Anhui 230036, PR China
| | - Bin Hui
- State Key Laboratory of Bio-Fibers and Eco-Textiles, Institute of Marine Biobased Materials, School of Materials Science and Engineering, Qingdao University, Qingdao 266071, PR China
| |
Collapse
|
20
|
Chen S, Liu LP, Wang YJ, Zhou XH, Dong H, Chen ZW, Wu J, Gui R, Zhao QY. Advancing Prediction of Risk of Intraoperative Massive Blood Transfusion in Liver Transplantation With Machine Learning Models. A Multicenter Retrospective Study. Front Neuroinform 2022; 16:893452. [PMID: 35645754 PMCID: PMC9140217 DOI: 10.3389/fninf.2022.893452] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 04/25/2022] [Indexed: 11/13/2022] Open
Abstract
Background Liver transplantation surgery is often accompanied by massive blood loss and massive transfusion (MT), while MT can cause many serious complications related to high mortality. Therefore, there is an urgent need for a model that can predict the demand for MT to reduce the waste of blood resources and improve the prognosis of patients. Objective To develop a model for predicting intraoperative massive blood transfusion in liver transplantation surgery based on machine learning algorithms. Methods A total of 1,239 patients who underwent liver transplantation surgery in three large grade lll-A general hospitals of China from March 2014 to November 2021 were included and analyzed. A total of 1193 cases were randomly divided into the training set (70%) and test set (30%), and 46 cases were prospectively collected as a validation set. The outcome of this study was an intraoperative massive blood transfusion. A total of 27 candidate risk factors were collected, and recursive feature elimination (RFE) was used to select key features based on the Categorical Boosting (CatBoost) model. A total of ten machine learning models were built, among which the three best performing models and the traditional logistic regression (LR) method were prospectively verified in the validation set. The Area Under the Receiver Operating Characteristic Curve (AUROC) was used for model performance evaluation. The Shapley additive explanation value was applied to explain the complex ensemble learning models. Results Fifteen key variables were screened out, including age, weight, hemoglobin, platelets, white blood cells count, activated partial thromboplastin time, prothrombin time, thrombin time, direct bilirubin, aspartate aminotransferase, total protein, albumin, globulin, creatinine, urea. Among all algorithms, the predictive performance of the CatBoost model (AUROC: 0.810) was the best. In the prospective validation cohort, LR performed far less well than other algorithms. Conclusion A prediction model for massive blood transfusion in liver transplantation surgery was successfully established based on the CatBoost algorithm, and a certain degree of generalization verification is carried out in the validation set. The model may be superior to the traditional LR model and other algorithms, and it can more accurately predict the risk of massive blood transfusions and guide clinical decision-making.
Collapse
Affiliation(s)
- Sai Chen
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Le-Ping Liu
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Yong-Jun Wang
- Department of Blood Transfusion, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Xiong-Hui Zhou
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Hang Dong
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Zi-Wei Chen
- Department of Laboratory Medicine, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Jiang Wu
- Department of Blood Transfusion, Renji Hospital Affiliated to Shanghai Jiao Tong University, Shanghai, China
| | - Rong Gui
- Department of Blood Transfusion, The Third Xiangya Hospital of Central South University, Changsha, China
| | - Qin-Yu Zhao
- College of Engineering and Computer Science, Australian National University, Canberra, ACT, Australia
| |
Collapse
|