1
|
Ogunpola A, Saeed F, Basurra S, Albarrak AM, Qasem SN. Machine Learning-Based Predictive Models for Detection of Cardiovascular Diseases. Diagnostics (Basel) 2024; 14:144. [PMID: 38248021 PMCID: PMC10813849 DOI: 10.3390/diagnostics14020144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 12/21/2023] [Accepted: 12/25/2023] [Indexed: 01/23/2024] Open
Abstract
Cardiovascular diseases present a significant global health challenge that emphasizes the critical need for developing accurate and more effective detection methods. Several studies have contributed valuable insights in this field, but it is still necessary to advance the predictive models and address the gaps in the existing detection approaches. For instance, some of the previous studies have not considered the challenge of imbalanced datasets, which can lead to biased predictions, especially when the datasets include minority classes. This study's primary focus is the early detection of heart diseases, particularly myocardial infarction, using machine learning techniques. It tackles the challenge of imbalanced datasets by conducting a comprehensive literature review to identify effective strategies. Seven machine learning and deep learning classifiers, including K-Nearest Neighbors, Support Vector Machine, Logistic Regression, Convolutional Neural Network, Gradient Boost, XGBoost, and Random Forest, were deployed to enhance the accuracy of heart disease predictions. The research explores different classifiers and their performance, providing valuable insights for developing robust prediction models for myocardial infarction. The study's outcomes emphasize the effectiveness of meticulously fine-tuning an XGBoost model for cardiovascular diseases. This optimization yields remarkable results: 98.50% accuracy, 99.14% precision, 98.29% recall, and a 98.71% F1 score. Such optimization significantly enhances the model's diagnostic accuracy for heart disease.
Collapse
Affiliation(s)
- Adedayo Ogunpola
- DAAI Research Group, College of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK; (A.O.); (S.B.)
| | - Faisal Saeed
- DAAI Research Group, College of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK; (A.O.); (S.B.)
| | - Shadi Basurra
- DAAI Research Group, College of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK; (A.O.); (S.B.)
| | - Abdullah M. Albarrak
- Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia; (A.M.A.); (S.N.Q.)
| | - Sultan Noman Qasem
- Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia; (A.M.A.); (S.N.Q.)
| |
Collapse
|
2
|
Lim J, Jeon HG, Seo Y, Kim M, Moon JU, Cho SH. Survival Prediction Model for Patients with Hepatocellular Carcinoma and Extrahepatic Metastasis Based on XGBoost Algorithm. J Hepatocell Carcinoma 2023; 10:2251-2263. [PMID: 38107542 PMCID: PMC10725646 DOI: 10.2147/jhc.s429903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 11/03/2023] [Indexed: 12/19/2023] Open
Abstract
Purpose Accurate estimation of survival is of utmost importance in patients with hepatocellular carcinoma (HCC) and extrahepatic metastasis. This study aimed to develop a survival prediction model using real-world data. Patients and Methods A total of 993 patients with treatment-naïve HCC and extrahepatic metastasis were included from 13 Korean hospitals between 2013 and 2018. Patients were randomly divided into a training set (70.0%) and a test set (30.0%). The eXtreme Gradient Boosting (XGBoost) algorithm was applied to predict survival at 3, 6, and 12 months. Results The mean age of the patients was 60.8 ± 12.3 years, and 85.4% were male. During the study period, 96.1% died, and median survival duration was 4.0 months. In multivariate analysis, Child-Pugh class, number and size of tumors, presence of vascular or bile duct invasion, lung or bone metastasis, serum AFP, and primary anti-HCC treatment were associated with survival. We constructed a model for survival prediction based on the relevant variables, which is available online (https://metastatic-hcc.onrender.com/form). Our model demonstrated high performance, with areas under the receiver operating characteristic curves of 0.778, 0.794, and 0.784 at 3, 6, and 12 months, respectively. Feature importance analysis indicated that the primary anti-HCC treatment had the highest importance. Conclusion We developed a model to predict the survival of patients with HCC and extrahepatic metastasis, which demonstrated good discriminative ability. Our model would be helpful for personalized treatment and for improving the prognosis.
Collapse
Affiliation(s)
- Jihye Lim
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Yeouido St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Hyeon-Gi Jeon
- Department of Core Platform Team, SOCAR Incorporated, Seoul, Republic of Korea
| | - Yeonjoo Seo
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Yeouido St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Moonjin Kim
- Department of Internal Medicine, Yeouido St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Korea
| | - Ja Un Moon
- Department of Pediatrics, Yeouido St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Se Hyun Cho
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Yeouido St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| |
Collapse
|
3
|
Lee C, Jo B, Woo H, Im Y, Park RW, Park C. Chronic Disease Prediction Using the Common Data Model: Development Study. JMIR AI 2022; 1:e41030. [PMID: 38875545 PMCID: PMC11041444 DOI: 10.2196/41030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 11/21/2022] [Accepted: 11/26/2022] [Indexed: 06/16/2024]
Abstract
BACKGROUND Chronic disease management is a major health issue worldwide. With the paradigm shift to preventive medicine, disease prediction modeling using machine learning is gaining importance for precise and accurate medical judgement. OBJECTIVE This study aimed to develop high-performance prediction models for 4 chronic diseases using the common data model (CDM) and machine learning and to confirm the possibility for the extension of the proposed models. METHODS In this study, 4 major chronic diseases-namely, diabetes, hypertension, hyperlipidemia, and cardiovascular disease-were selected, and a model for predicting their occurrence within 10 years was developed. For model development, the Atlas analysis tool was used to define the chronic disease to be predicted, and data were extracted from the CDM according to the defined conditions. A model for predicting each disease was built with 4 algorithms verified in previous studies, and the performance was compared after applying a grid search. RESULTS For the prediction of each disease, we applied 4 algorithms (logistic regression, gradient boosting, random forest, and extreme gradient boosting), and all models show greater than 80% accuracy. As compared to the optimized model's performance, extreme gradient boosting presented the highest predictive performance for the 4 diseases (diabetes, hypertension, hyperlipidemia, and cardiovascular disease) with 80% or greater and from 0.84 to 0.93 in area under the curve standards. CONCLUSIONS This study demonstrates the possibility for the preemptive management of chronic diseases by predicting the occurrence of chronic diseases using the CDM and machine learning. With these models, the risk of developing major chronic diseases within 10 years can be demonstrated by identifying health risk factors using our chronic disease prediction machine learning model developed with the real-world data-based CDM and National Health Insurance Corporation examination data that individuals can easily obtain.
Collapse
Affiliation(s)
| | - Brian Jo
- Evidnet, Seongnam, Republic of Korea
| | | | - Yoori Im
- Evidnet, Seongnam, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University Hospital, Suwon, Republic of Korea
| | - ChulHyoung Park
- Department of Biomedical Informatics, Ajou University Hospital, Suwon, Republic of Korea
| |
Collapse
|
4
|
Explainable machine learning framework for predicting long-term cardiovascular disease risk among adolescents. Sci Rep 2022; 12:21905. [PMID: 36536006 PMCID: PMC9763353 DOI: 10.1038/s41598-022-25933-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 12/07/2022] [Indexed: 12/23/2022] Open
Abstract
Although cardiovascular disease (CVD) is the leading cause of death worldwide, over 80% of it is preventable through early intervention and lifestyle changes. Most cases of CVD are detected in adulthood, but the risk factors leading to CVD begin at a younger age. This research is the first to develop an explainable machine learning (ML)-based framework for long-term CVD risk prediction (low vs. high) among adolescents. This study uses longitudinal data from a nationally representative sample of individuals who participated in the Add Health study. A total of 14,083 participants who completed relevant survey questionnaires and health tests from adolescence to young adulthood were chosen. Four ML classifiers [decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), and deep neural networks (DNN)] and 36 adolescent predictors are used to predict adulthood CVD risk. While all ML models demonstrated good prediction capability, XGBoost achieved the best performance (AUC-ROC: 84.5% and AUC-PR: 96.9% on testing data). Besides, critical predictors of long-term CVD risk and its impact on risk prediction are obtained using an explainable technique for interpreting ML predictions. The results suggest that ML can be employed to detect adulthood CVD very early in life, and such an approach may facilitate primordial prevention and personalized intervention.
Collapse
|
5
|
Economics of Artificial Intelligence in Healthcare: Diagnosis vs. Treatment. Healthcare (Basel) 2022; 10:healthcare10122493. [PMID: 36554017 PMCID: PMC9777836 DOI: 10.3390/healthcare10122493] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 12/03/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Motivation: The price of medical treatment continues to rise due to (i) an increasing population; (ii) an aging human growth; (iii) disease prevalence; (iv) a rise in the frequency of patients that utilize health care services; and (v) increase in the price. Objective: Artificial Intelligence (AI) is already well-known for its superiority in various healthcare applications, including the segmentation of lesions in images, speech recognition, smartphone personal assistants, navigation, ride-sharing apps, and many more. Our study is based on two hypotheses: (i) AI offers more economic solutions compared to conventional methods; (ii) AI treatment offers stronger economics compared to AI diagnosis. This novel study aims to evaluate AI technology in the context of healthcare costs, namely in the areas of diagnosis and treatment, and then compare it to the traditional or non-AI-based approaches. Methodology: PRISMA was used to select the best 200 studies for AI in healthcare with a primary focus on cost reduction, especially towards diagnosis and treatment. We defined the diagnosis and treatment architectures, investigated their characteristics, and categorized the roles that AI plays in the diagnostic and therapeutic paradigms. We experimented with various combinations of different assumptions by integrating AI and then comparing it against conventional costs. Lastly, we dwell on three powerful future concepts of AI, namely, pruning, bias, explainability, and regulatory approvals of AI systems. Conclusions: The model shows tremendous cost savings using AI tools in diagnosis and treatment. The economics of AI can be improved by incorporating pruning, reduction in AI bias, explainability, and regulatory approvals.
Collapse
|
6
|
Li Y, Salimi-Khorshidi G, Rao S, Canoy D, Hassaine A, Lukasiewicz T, Rahimi K, Mamouei M. Validation of risk prediction models applied to longitudinal electronic health record data for the prediction of major cardiovascular events in the presence of data shifts. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2022; 3:535-547. [PMID: 36710898 PMCID: PMC9779795 DOI: 10.1093/ehjdh/ztac061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 09/22/2022] [Indexed: 12/24/2022]
Abstract
Aims Deep learning has dominated predictive modelling across different fields, but in medicine it has been met with mixed reception. In clinical practice, simple, statistical models and risk scores continue to inform cardiovascular disease risk predictions. This is due in part to the knowledge gap about how deep learning models perform in practice when they are subject to dynamic data shifts; a key criterion that common internal validation procedures do not address. We evaluated the performance of a novel deep learning model, BEHRT, under data shifts and compared it with several ML-based and established risk models. Methods and results Using linked electronic health records of 1.1 million patients across England aged at least 35 years between 1985 and 2015, we replicated three established statistical models for predicting 5-year risk of incident heart failure, stroke, and coronary heart disease. The results were compared with a widely accepted machine learning model (random forests), and a novel deep learning model (BEHRT). In addition to internal validation, we investigated how data shifts affect model discrimination and calibration. To this end, we tested the models on cohorts from (i) distinct geographical regions; (ii) different periods. Using internal validation, the deep learning models substantially outperformed the best statistical models by 6%, 8%, and 11% in heart failure, stroke, and coronary heart disease, respectively, in terms of the area under the receiver operating characteristic curve. Conclusion The performance of all models declined as a result of data shifts; despite this, the deep learning models maintained the best performance in all risk prediction tasks. Updating the model with the latest information can improve discrimination but if the prior distribution changes, the model may remain miscalibrated.
Collapse
Affiliation(s)
- Yikuan Li
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Gholamreza Salimi-Khorshidi
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Shishir Rao
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | - Dexter Canoy
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Abdelaali Hassaine
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| | | | - Kazem Rahimi
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| | - Mohammad Mamouei
- Deep Medicine, Oxford Martin School, University of Oxford, Hayes House, 75 George Street, Oxford OX1 2BQ, UK
- Nuffield Department of Women’s and Reproductive Health, Medical Science Division, University of Oxford, Oxford, UK
| |
Collapse
|
7
|
Kanda E, Suzuki A, Makino M, Tsubota H, Kanemata S, Shirakawa K, Yajima T. Machine learning models for prediction of HF and CKD development in early-stage type 2 diabetes patients. Sci Rep 2022; 12:20012. [PMID: 36411366 PMCID: PMC9678863 DOI: 10.1038/s41598-022-24562-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/17/2022] [Indexed: 11/23/2022] Open
Abstract
Chronic kidney disease (CKD) and heart failure (HF) are the first and most frequent comorbidities associated with mortality risks in early-stage type 2 diabetes mellitus (T2DM). However, efficient screening and risk assessment strategies for identifying T2DM patients at high risk of developing CKD and/or HF (CKD/HF) remains to be established. This study aimed to generate a novel machine learning (ML) model to predict the risk of developing CKD/HF in early-stage T2DM patients. The models were derived from a retrospective cohort of 217,054 T2DM patients without a history of cardiovascular and renal diseases extracted from a Japanese claims database. Among algorithms used for the ML, extreme gradient boosting exhibited the best performance for CKD/HF diagnosis and hospitalization after internal validation and was further validated using another dataset including 16,822 patients. In the external validation, 5-years prediction area under the receiver operating characteristic curves for CKD/HF diagnosis and hospitalization were 0.718 and 0.837, respectively. In Kaplan-Meier curves analysis, patients predicted to be at high risk showed significant increase in CKD/HF diagnosis and hospitalization compared with those at low risk. Thus, the developed model predicted the risk of developing CKD/HF in T2DM patients with reasonable probability in the external validation cohort. Clinical approach identifying T2DM at high risk of developing CKD/HF using ML models may contribute to improved prognosis by promoting early diagnosis and intervention.
Collapse
Affiliation(s)
- Eiichiro Kanda
- grid.415086.e0000 0001 1014 2000Medical Science, Kawasaki Medical University, Okayama, Japan
| | - Atsushi Suzuki
- grid.256115.40000 0004 1761 798XDepartment of Endocrinology, Diabetes and Metabolism, Fujita Health University, Toyoake, Aichi Japan
| | - Masaki Makino
- grid.256115.40000 0004 1761 798XDepartment of Endocrinology, Diabetes and Metabolism, Fujita Health University, Toyoake, Aichi Japan
| | - Hiroo Tsubota
- grid.476017.30000 0004 0376 5631AstraZeneca K.K., Osaka, Japan
| | - Satomi Kanemata
- grid.459873.40000 0004 0376 2510Ono Pharmaceutical Co., Ltd., Osaka, Japan
| | | | | |
Collapse
|
8
|
Tsarapatsani K, Sakellarios AI, Pezoulas VC, Tsakanikas VD, Kleber ME, Marz W, Michalis LK, Fotiadis DI. Machine Learning Models for Cardiovascular Disease Events Prediction. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:1066-1069. [PMID: 36085658 DOI: 10.1109/embc48229.2022.9871121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Cardiovascular diseases (CVDs) are among the most serious disorders leading to high mortality rates worldwide. CVDs can be diagnosed and prevented early by identifying risk biomarkers using statistical and machine learning (ML) models, In this work, we utilize clinical CVD risk factors and biochemical data using machine learning models such as Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Naïve Bayes (NB), Extreme Grading Boosting (XGB) and Adaptive Boosting (AdaBoost) to predict death caused by CVD within ten years of follow-up. We used the cohort of the Ludwigshafen Risk and Cardiovascular Health (LURIC) study and 2943 patients were included in the analysis (484 annotated as dead due to CVD). We calculated the Accuracy (ACC), Precision, Recall, F1-Score, Specificity (SPE) and area under the receiver operating characteristic curve (AUC) of each model. The findings of the comparative analysis show that Logistic Regression has been proven to be the most reliable algorithm having accuracy 72.20 %. These results will be used in the TIMELY study to estimate the risk score and mortality of CVD in patients with 10-year risk.
Collapse
|
9
|
Qin Q, Yang X, Zhang R, Liu M, Ma Y. An Application of Deep Belief Networks in Early Warning for Cerebrovascular Disease Risk. J ORGAN END USER COM 2022. [DOI: 10.4018/joeuc.287574] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
To reduce the incidence of cerebrovascular disease and mortality, identifying the risks of cerebrovascular disease in advance and taking certain preventive measures are significant. This article was aimed to investigate the risk factors of cerebrovascular disease (CVD) in the primary prevention, and to build an early warning model based on the existing technology. The authors use the information entropy algorithm of rough set theory to establish the index system suitable for early warning model. Then, using the limited Boltzmann machine and direction propagation algorithm, the depth trust network is established by building and stacking RBM, and the back propagation is used to fine-tune the parameters of the network at the top layer. Compared with the LM-BP early-warning model, the deep confidence network model is more effective than traditional artificial neural network, which can help to identify the risk of cerebrovascular disease in advance and promote the primary prevention.
Collapse
Affiliation(s)
| | - Xing Yang
- China Unicom Research Institute, China
| | | | - Manlu Liu
- Rochester Institute of Technology, USA
| | - Yuhan Ma
- Beijing Jiaotong University, China
| |
Collapse
|
10
|
Suri JS, Bhagawati M, Paul S, Protogeron A, Sfikakis PP, Kitas GD, Khanna NN, Ruzsa Z, Sharma AM, Saxena S, Faa G, Paraskevas KI, Laird JR, Johri AM, Saba L, Kalra M. Understanding the bias in machine learning systems for cardiovascular disease risk assessment: The first of its kind review. Comput Biol Med 2022; 142:105204. [DOI: 10.1016/j.compbiomed.2021.105204] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 12/29/2021] [Accepted: 12/29/2021] [Indexed: 02/09/2023]
|
11
|
Machine learning-based diagnosis and risk factor analysis of cardiocerebrovascular disease based on KNHANES. Sci Rep 2022; 12:2250. [PMID: 35145205 PMCID: PMC8831514 DOI: 10.1038/s41598-022-06333-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/25/2022] [Indexed: 12/31/2022] Open
Abstract
The prevalence of cardiocerebrovascular disease (CVD) is continuously increasing, and it is the leading cause of human death. Since it is difficult for physicians to screen thousands of people, high-accuracy and interpretable methods need to be presented. We developed four machine learning-based CVD classifiers (i.e., multi-layer perceptron, support vector machine, random forest, and light gradient boosting) based on the Korea National Health and Nutrition Examination Survey. We resampled and rebalanced KNHANES data using complex sampling weights such that the rebalanced dataset mimics a uniformly sampled dataset from overall population. For clear risk factor analysis, we removed multicollinearity and CVD-irrelevant variables using VIF-based filtering and the Boruta algorithm. We applied synthetic minority oversampling technique and random undersampling before ML training. We demonstrated that the proposed classifiers achieved excellent performance with AUCs over 0.853. Using Shapley value-based risk factor analysis, we identified that the most significant risk factors of CVD were age, sex, and the prevalence of hypertension. Additionally, we identified that age, hypertension, and BMI were positively correlated with CVD prevalence, while sex (female), alcohol consumption and, monthly income were negative. The results showed that the feature selection and the class balancing technique effectively improve the interpretability of models.
Collapse
|
12
|
Park D, Jeong E, Kim H, Pyun HW, Kim H, Choi YJ, Kim Y, Jin S, Hong D, Lee DW, Lee SY, Kim MC. Machine Learning-Based Three-Month Outcome Prediction in Acute Ischemic Stroke: A Single Cerebrovascular-Specialty Hospital Study in South Korea. Diagnostics (Basel) 2021; 11:diagnostics11101909. [PMID: 34679606 PMCID: PMC8534707 DOI: 10.3390/diagnostics11101909] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 10/07/2021] [Accepted: 10/13/2021] [Indexed: 01/02/2023] Open
Abstract
Background: Functional outcomes after acute ischemic stroke are of great concern to patients and their families, as well as physicians and surgeons who make the clinical decisions. We developed machine learning (ML)-based functional outcome prediction models in acute ischemic stroke. Methods: This retrospective study used a prospective cohort database. A total of 1066 patients with acute ischemic stroke between January 2019 and March 2021 were included. Variables such as demographic factors, stroke-related factors, laboratory findings, and comorbidities were utilized at the time of admission. Five ML algorithms were applied to predict a favorable functional outcome (modified Rankin Scale 0 or 1) at 3 months after stroke onset. Results: Regularized logistic regression showed the best performance with an area under the receiver operating characteristic curve (AUC) of 0.86. Support vector machines represented the second-highest AUC of 0.85 with the highest F1-score of 0.86, and finally, all ML models applied achieved an AUC > 0.8. The National Institute of Health Stroke Scale at admission and age were consistently the top two important variables for generalized logistic regression, random forest, and extreme gradient boosting models. Conclusions: ML-based functional outcome prediction models for acute ischemic stroke were validated and proven to be readily applicable and useful.
Collapse
Affiliation(s)
- Dougho Park
- Department of Rehabilitation Medicine, Pohang Stroke and Spine Hospital, Pohang 37659, Korea;
| | - Eunhwan Jeong
- Department of Neurology, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (E.J.); (H.K.)
| | - Haejong Kim
- Department of Neurology, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (E.J.); (H.K.)
| | - Hae Wook Pyun
- Department of Radiology, Pohang Stroke and Spine Hospital, Pohang 37659, Korea;
| | - Haemin Kim
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Yeon-Ju Choi
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Youngsoo Kim
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Suntak Jin
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Daeyoung Hong
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Dong Woo Lee
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Su Yun Lee
- Department of Neurology, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (E.J.); (H.K.)
- Correspondence: (S.Y.L.); (M.-C.K.)
| | - Mun-Chul Kim
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
- Correspondence: (S.Y.L.); (M.-C.K.)
| |
Collapse
|
13
|
Improved Heart Disease Prediction Using Particle Swarm Optimization Based Stacked Sparse Autoencoder. ELECTRONICS 2021. [DOI: 10.3390/electronics10192347] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Heart disease is the leading cause of death globally. The most common type of heart disease is coronary heart disease, which occurs when there is a build-up of plaque inside the arteries that supply blood to the heart, making blood circulation difficult. The prediction of heart disease is a challenge in clinical machine learning. Early detection of people at risk of the disease is vital in preventing its progression. This paper proposes a deep learning approach to achieve improved prediction of heart disease. An enhanced stacked sparse autoencoder network (SSAE) is developed to achieve efficient feature learning. The network consists of multiple sparse autoencoders and a softmax classifier. Additionally, in deep learning models, the algorithm’s parameters need to be optimized appropriately to obtain efficient performance. Hence, we propose a particle swarm optimization (PSO) based technique to tune the parameters of the stacked sparse autoencoder. The optimization by the PSO improves the feature learning and classification performance of the SSAE. Meanwhile, the multilayer architecture of autoencoders usually leads to internal covariate shift, a problem that affects the generalization ability of the network; hence, batch normalization is introduced to prevent this problem. The experimental results show that the proposed method effectively predicts heart disease by obtaining a classification accuracy of 0.973 and 0.961 on the Framingham and Cleveland heart disease datasets, respectively, thereby outperforming other machine learning methods and similar studies.
Collapse
|
14
|
Prognostic Validity of Statistical Prediction Methods Used for Talent Identification in Youth Tennis Players Based on Motor Abilities. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11157051] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
(1) Background: The search for talented young athletes is an important element of top-class sport. While performance profiles and suitable test tasks for talent identification have already been extensively investigated, there are few studies on statistical prediction methods for talent identification. Therefore, this long-term study examined the prognostic validity of four talent prediction methods. (2) Methods: Tennis players (N = 174; n♀ = 62 and n♂ = 112) at the age of eight years (U9) were examined using five physical fitness tests and four motor competence tests. Based on the test results, four predictions regarding the individual future performance were made for each participant using a linear recommendation score, a logistic regression, a discriminant analysis, and a neural network. These forecasts were then compared with the athletes’ achieved performance success at least four years later (U13‒U18). (3) Results: All four prediction methods showed a medium-to-high prognostic validity with respect to their forecasts. Their values of relative improvement over chance ranged from 0.447 (logistic regression) to 0.654 (tennis recommendation score). (4) Conclusions: However, the best results are only obtained by combining the non-linear method (neural network) with one of the linear methods. Nevertheless, 18.75% of later high-performance tennis players could not be predicted using any of the methods.
Collapse
|