1
|
Lee YS, Han S, Lee YE, Cho J, Choi YK, Yoon SY, Oh DK, Lee SY, Park MH, Lim CM, Moon JY. Development and validation of an interpretable model for predicting sepsis mortality across care settings. Sci Rep 2024; 14:13637. [PMID: 38871785 DOI: 10.1038/s41598-024-64463-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 06/10/2024] [Indexed: 06/15/2024] Open
Abstract
There are numerous prognostic predictive models for evaluating mortality risk, but current scoring models might not fully cater to sepsis patients' needs. This study developed and validated a new model for sepsis patients that is suitable for any care setting and accurately forecasts 28-day mortality. The derivation dataset, gathered from 20 hospitals between September 2019 and December 2021, contrasted with the validation dataset, collected from 15 hospitals from January 2022 to December 2022. In this study, 7436 patients were classified as members of the derivation dataset, and 2284 patients were classified as members of the validation dataset. The point system model emerged as the optimal model among the tested predictive models for foreseeing sepsis mortality. For community-acquired sepsis, the model's performance was satisfactory (derivation dataset AUC: 0.779, 95% CI 0.765-0.792; validation dataset AUC: 0.787, 95% CI 0.765-0.810). Similarly, for hospital-acquired sepsis, it performed well (derivation dataset AUC: 0.768, 95% CI 0.748-0.788; validation dataset AUC: 0.729, 95% CI 0.687-0.770). The calculator, accessible at https://avonlea76.shinyapps.io/shiny_app_up/ , is user-friendly and compatible. The new predictive model of sepsis mortality is user-friendly and satisfactorily forecasts 28-day mortality. Its versatility lies in its applicability to all patients, encompassing both community-acquired and hospital-acquired sepsis.
Collapse
Affiliation(s)
- Young Seok Lee
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Internal Medicine, Korea University Guro Hospital, Seoul, Republic of Korea
| | - Seungbong Han
- Department of Biostatistics, Korea University College of Medicine, Seoul, Republic of Korea
| | - Ye Eun Lee
- Department of Biostatistics, Korea University College of Medicine, Seoul, Republic of Korea
| | - Jaehwa Cho
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Young Kyun Choi
- Division of Infectious Disease and Critical Care Medicine, Department of Internal Medicine, Chungnam National University College of Medicine, Chungnam National University Sejong Hospital, Sejong, Republic of Korea
| | - Sun-Young Yoon
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Internal Medicine, Chungnam National University College of Medicine, Chungnam National University Sejong Hospital, Sejong, Republic of Korea
| | - Dong Kyu Oh
- Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Su Yeon Lee
- Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Mi Hyeon Park
- Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Chae-Man Lim
- Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jae Young Moon
- Division of Pulmonary, Allergy, and Critical Care Medicine, Department of Internal Medicine, Chungnam National University College of Medicine, Chungnam National University Sejong Hospital, Sejong, Republic of Korea.
| |
Collapse
|
2
|
Çubukçu HC, Topcu Dİ, Yenice S. Machine learning-based clinical decision support using laboratory data. Clin Chem Lab Med 2024; 62:793-823. [PMID: 38015744 DOI: 10.1515/cclm-2023-1037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/17/2023] [Indexed: 11/30/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) are becoming vital in laboratory medicine and the broader context of healthcare. In this review article, we summarized the development of ML models and how they contribute to clinical laboratory workflow and improve patient outcomes. The process of ML model development involves data collection, data cleansing, feature engineering, model development, and optimization. These models, once finalized, are subjected to thorough performance assessments and validations. Recently, due to the complexity inherent in model development, automated ML tools were also introduced to streamline the process, enabling non-experts to create models. Clinical Decision Support Systems (CDSS) use ML techniques on large datasets to aid healthcare professionals in test result interpretation. They are revolutionizing laboratory medicine, enabling labs to work more efficiently with less human supervision across pre-analytical, analytical, and post-analytical phases. Despite contributions of the ML tools at all analytical phases, their integration presents challenges like potential model uncertainties, black-box algorithms, and deskilling of professionals. Additionally, acquiring diverse datasets is hard, and models' complexity can limit clinical use. In conclusion, ML-based CDSS in healthcare can greatly enhance clinical decision-making. However, successful adoption demands collaboration among professionals and stakeholders, utilizing hybrid intelligence, external validation, and performance assessments.
Collapse
Affiliation(s)
- Hikmet Can Çubukçu
- General Directorate of Health Services, Rare Diseases Department, Turkish Ministry of Health, Ankara, Türkiye
- Hacettepe University Institute of Informatics, Ankara, Türkiye
| | - Deniz İlhan Topcu
- Health Sciences University İzmir Tepecik Education and Research Hospital, Medical Biochemistry, İzmir, Türkiye
| | - Sedef Yenice
- Florence Nightingale Hospital, Istanbul, Türkiye
| |
Collapse
|
3
|
Yu Y, Wang L, Hou W, Xue Y, Liu X, Li Y. Identification and validation of aging-related genes in heart failure based on multiple machine learning algorithms. Front Immunol 2024; 15:1367235. [PMID: 38686376 PMCID: PMC11056574 DOI: 10.3389/fimmu.2024.1367235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 04/03/2024] [Indexed: 05/02/2024] Open
Abstract
Background In the face of continued growth in the elderly population, the need to understand and combat age-related cardiac decline becomes even more urgent, requiring us to uncover new pathological and cardioprotective pathways. Methods We obtained the aging-related genes of heart failure through WGCNA and CellAge database. We elucidated the biological functions and signaling pathways involved in heart failure and aging through GO and KEGG enrichment analysis. We used three machine learning algorithms: LASSO, RF and SVM-RFE to further screen the aging-related genes of heart failure, and fitted and verified them through a variety of machine learning algorithms. We searched for drugs to treat age-related heart failure through the DSigDB database. Finally, We use CIBERSORT to complete immune infiltration analysis of aging samples. Results We obtained 57 up-regulated and 195 down-regulated aging-related genes in heart failure through WGCNA and CellAge databases. GO and KEGG enrichment analysis showed that aging-related genes are mainly involved in mechanisms such as Cellular senescence and Cell cycle. We further screened aging-related genes through machine learning and obtained 14 key genes. We verified the results on the test set and 2 external validation sets using 15 machine learning algorithm models and 207 combinations, and the highest accuracy was 0.911. Through screening of the DSigDB database, we believe that rimonabant and lovastatin have the potential to delay aging and protect the heart. The results of immune infiltration analysis showed that there were significant differences between Macrophages M2 and T cells CD8 in aging myocardium. Conclusion We identified aging signature genes and potential therapeutic drugs for heart failure through bioinformatics and multiple machine learning algorithms, providing new ideas for studying the mechanism and treatment of age-related cardiac decline.
Collapse
Affiliation(s)
- Yiding Yu
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Lin Wang
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Wangjun Hou
- Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yitao Xue
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Xiujuan Liu
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| | - Yan Li
- Affiliated Hospital of Shandong University of Traditional Chinese Medicine, Jinan, China
| |
Collapse
|
4
|
Cai Y, Cai YQ, Tang LY, Wang YH, Gong M, Jing TC, Li HJ, Li-Ling J, Hu W, Yin Z, Gong DX, Zhang GW. Artificial intelligence in the risk prediction models of cardiovascular disease and development of an independent validation screening tool: a systematic review. BMC Med 2024; 22:56. [PMID: 38317226 PMCID: PMC10845808 DOI: 10.1186/s12916-024-03273-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Accepted: 01/23/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND A comprehensive overview of artificial intelligence (AI) for cardiovascular disease (CVD) prediction and a screening tool of AI models (AI-Ms) for independent external validation are lacking. This systematic review aims to identify, describe, and appraise AI-Ms of CVD prediction in the general and special populations and develop a new independent validation score (IVS) for AI-Ms replicability evaluation. METHODS PubMed, Web of Science, Embase, and IEEE library were searched up to July 2021. Data extraction and analysis were performed for the populations, distribution, predictors, algorithms, etc. The risk of bias was evaluated with the prediction risk of bias assessment tool (PROBAST). Subsequently, we designed IVS for model replicability evaluation with five steps in five items, including transparency of algorithms, performance of models, feasibility of reproduction, risk of reproduction, and clinical implication, respectively. The review is registered in PROSPERO (No. CRD42021271789). RESULTS In 20,887 screened references, 79 articles (82.5% in 2017-2021) were included, which contained 114 datasets (67 in Europe and North America, but 0 in Africa). We identified 486 AI-Ms, of which the majority were in development (n = 380), but none of them had undergone independent external validation. A total of 66 idiographic algorithms were found; however, 36.4% were used only once and only 39.4% over three times. A large number of different predictors (range 5-52,000, median 21) and large-span sample size (range 80-3,660,000, median 4466) were observed. All models were at high risk of bias according to PROBAST, primarily due to the incorrect use of statistical methods. IVS analysis confirmed only 10 models as "recommended"; however, 281 and 187 were "not recommended" and "warning," respectively. CONCLUSION AI has led the digital revolution in the field of CVD prediction, but is still in the early stage of development as the defects of research design, report, and evaluation systems. The IVS we developed may contribute to independent external validation and the development of this field.
Collapse
Affiliation(s)
- Yue Cai
- China Medical University, Shenyang, 110122, China
| | - Yu-Qing Cai
- China Medical University, Shenyang, 110122, China
| | - Li-Ying Tang
- China Medical University, Shenyang, 110122, China
| | - Yi-Han Wang
- China Medical University, Shenyang, 110122, China
| | - Mengchun Gong
- Digital Health China Co. Ltd, Beijing, 100089, China
| | - Tian-Ci Jing
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China
| | - Hui-Jun Li
- Shenyang Medical & Film Science and Technology Co. Ltd., Shenyang, 110001, China
- Enduring Medicine Smart Innovation Research Institute, Shenyang, 110001, China
| | - Jesse Li-Ling
- Institute of Genetic Medicine, School of Life Science, State Key Laboratory of Biotherapy, Sichuan University, Chengdu, 610065, China
| | - Wei Hu
- Bayi Orthopedic Hospital, Chengdu, 610017, China
| | - Zhihua Yin
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang, 110122, China.
| | - Da-Xin Gong
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| | - Guang-Wei Zhang
- Smart Hospital Management Department, the First Hospital of China Medical University, Shenyang, 110001, China.
- The Internet Hospital Branch of the Chinese Research Hospital Association, Beijing, 100006, China.
| |
Collapse
|
5
|
Lim J, Li J, Feng X, Feng L, Xia Y, Xiao X, Wang Y, Xu Z. Machine learning classification of polycystic ovary syndrome based on radial pulse wave analysis. BMC Complement Med Ther 2023; 23:409. [PMID: 37957660 PMCID: PMC10644435 DOI: 10.1186/s12906-023-04249-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 11/07/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND Patients with Polycystic ovary syndrome (PCOS) experienced endocrine disorders that may present vascular function changes. This study aimed to classify and predict PCOS by radial pulse wave parameters using machine learning (ML) methods and to provide evidence for objectifying pulse diagnosis in traditional Chinese medicine (TCM). METHODS A case-control study with 459 subjects divided into a PCOS group and a healthy (non-PCOS) group. The pulse wave parameters were measured and analyzed between the two groups. Seven supervised ML classification models were applied, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Trees, Random Forest, Logistic Regression, Voting, and Long Short Term Memory networks (LSTM). Parameters that were significantly different were selected as input features and stratified k-fold cross-validations training was applied to the models. RESULTS There were 316 subjects in the PCOS group and 143 subjects in the healthy group. Compared to the healthy group, the pulse wave parameters h3/h1 and w/t from both left and right sides were increased while h4, t4, t, As, h4/h1 from both sides and right t1 were decreased in the PCOS group (P < 0.01). Among the ML models evaluated, both the Voting and LSTM with ensemble learning capabilities, demonstrated competitive performance. These models achieved the highest results across all evaluation metrics. Specifically, they both attained a testing accuracy of 72.174% and an F1 score of 0.818, their respective AUC values were 0.715 for the Voting and 0.722 for the LSTM. CONCLUSION Radial pulse wave signal could identify most PCOS patients accurately (with a good F1 score) and is valuable for early detection and monitoring of PCOS with acceptable overall accuracy. This technique can stimulate the development of individualized PCOS risk assessment using mobile detection technology, furthermore, gives physicians an intuitive understanding of the objective pulse diagnosis of TCM. TRIAL REGISTRATION Not applicable.
Collapse
Affiliation(s)
- Jiekee Lim
- School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, P. R. China
| | - Jieyun Li
- School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, P. R. China
| | - Xiao Feng
- The First Affiliated Hospital, Guangzhou University of Traditional Chinese Medicine, Guangzhou, 510405, P. R. China
| | - Lu Feng
- School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, P. R. China
| | - Yumo Xia
- School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, P. R. China
| | - Xinang Xiao
- School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, P. R. China
| | - Yiqin Wang
- School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, P. R. China
- Shanghai Key Laboratory of Health Identification and Assessment, Shanghai, 201203, P. R. China
| | - Zhaoxia Xu
- School of Traditional Chinese Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, P. R. China.
- Shanghai Key Laboratory of Health Identification and Assessment, Shanghai, 201203, P. R. China.
| |
Collapse
|
6
|
Li S, Zhu P, Cai G, Li J, Huang T, Tang W. Application of machine learning models in predicting insomnia severity: an integrative approach with constitution of traditional Chinese medicine. Front Med (Lausanne) 2023; 10:1292761. [PMID: 37928471 PMCID: PMC10625410 DOI: 10.3389/fmed.2023.1292761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 10/06/2023] [Indexed: 11/07/2023] Open
Abstract
Objective This study sought to explore the utility of machine learning models in predicting insomnia severity based on Traditional Chinese Medicine (TCM) constitution classifications, with an aim to discuss the potential applications of such models in the treatment and prevention of insomnia. Methods We analyzed a dataset of 165 insomnia patients from the Shanghai Minhang District Integrated Traditional Chinese and Western Medicine Hospital. TCM constitution was assessed using a standardized Constitution in Chinese Medicine (CCM) scale. Sleep quality, or insomnia severity, was evaluated using the Spiegel Sleep Questionnaire (SSQ). Machine learning models, including Random Forest Classifier (RFC), Support Vector Classifier (SVC), and K-Nearest Neighbors (KNN), were utilized. These models were optimized using Grid Search algorithm and were trained and tested on stratified patient data, with the TCM constitution classifications serving as primary predictors. Results The RFC outperformed others, achieving a weighted average accuracy, precision, recall, and F1-score of 0.91, 0.94, 0.92, and 0.92 respectively, it also effectively classified the severity of insomnia with high area under receiver operating characteristic curve (AUC-ROC) values. Feature importance analysis demonstrated the Damp-heat constitution as the most influential predictor, followed by Yang-deficiency, Qi-depression, Qi-deficiency, and Blood-stasis constitutions. Conclusion The results demonstrate the potent utility of machine learning, specifically RFC, coupled with TCM constitution classifications in predicting insomnia severity. Notably, the constitution classifications such as Damp-heat and Yang-deficiency emerged as crucial determinants, emphasizing its potential in guiding targeted insomnia treatments. This approach enables the development of more personalized and efficient interventions, thereby enhancing patient outcomes.
Collapse
Affiliation(s)
- Shenguang Li
- Shanghai Minhang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai, China
| | - Po Zhu
- Shanghai Minhang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai, China
| | - Guoying Cai
- Shanghai Minhang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai, China
| | - Jing Li
- Shanghai Minhang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai, China
| | - Tao Huang
- Yueyang Hospital of Integrated Traditional Chinese and Western Medicine Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Wenchao Tang
- School of Acupuncture-Moxibustion and Tuina, Shanhgai University of Traditional Chinese Medicine, Shanghai, China
| |
Collapse
|
7
|
Gutiérrez-Esparza G, Pulido T, Martínez-García M, Ramírez-delReal T, Groves-Miralrio LE, Márquez-Murillo MF, Amezcua-Guerra LM, Vargas-Alarcón G, Hernández-Lemus E. A machine learning approach to personalized predictors of dyslipidemia: a cohort study. Front Public Health 2023; 11:1213926. [PMID: 37799151 PMCID: PMC10548235 DOI: 10.3389/fpubh.2023.1213926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 08/23/2023] [Indexed: 10/07/2023] Open
Abstract
Introduction Mexico ranks second in the global prevalence of obesity in the adult population, which increases the probability of developing dyslipidemia. Dyslipidemia is closely related to cardiovascular diseases, which are the leading cause of death in the country. Therefore, developing tools that facilitate the prediction of dyslipidemias is essential for prevention and early treatment. Methods In this study, we utilized a dataset from a Mexico City cohort consisting of 2,621 participants, men and women aged between 20 and 50 years, with and without some type of dyslipidemia. Our primary objective was to identify potential factors associated with different types of dyslipidemia in both men and women. Machine learning algorithms were employed to achieve this goal. To facilitate feature selection, we applied the Variable Importance Measures (VIM) of Random Forest (RF), XGBoost, and Gradient Boosting Machine (GBM). Additionally, to address class imbalance, we employed Synthetic Minority Over-sampling Technique (SMOTE) for dataset resampling. The dataset encompassed anthropometric measurements, biochemical tests, dietary intake, family health history, and other health parameters, including smoking habits, alcohol consumption, quality of sleep, and physical activity. Results Our results revealed that the VIM algorithm of RF yielded the most optimal subset of attributes, closely followed by GBM, achieving a balanced accuracy of up to 80%. The selection of the best subset of attributes was based on the comparative performance of classifiers, evaluated through balanced accuracy, sensitivity, and specificity metrics. Discussion The top five features contributing to an increased risk of various types of dyslipidemia were identified through the machine learning technique. These features include body mass index, elevated uric acid levels, age, sleep disorders, and anxiety. The findings of this study shed light on significant factors that play a role in dyslipidemia development, aiding in the early identification, prevention, and treatment of this condition.
Collapse
Affiliation(s)
- Guadalupe Gutiérrez-Esparza
- Researcher for Mexico CONAHCYT, National Council of Humanities Sciences, and Technologies, Mexico City, Mexico
- Clinical Research, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Tomas Pulido
- Clinical Research, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Mireya Martínez-García
- Department of Immunology, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Tania Ramírez-delReal
- Researcher for Mexico CONAHCYT, National Council of Humanities Sciences, and Technologies, Mexico City, Mexico
- Center for Research in Geospatial Information Sciences, Aguascalientes, Mexico
| | | | - Manlio F. Márquez-Murillo
- Department of Electrocardiology, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Luis M. Amezcua-Guerra
- Department of Immunology, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Gilberto Vargas-Alarcón
- Department of Molecular Biology and Endocrinology, National Institute of Cardiology “Ignacio Chávez”, Mexico City, Mexico
| | - Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City, Mexico
- Center for Complexity Sciences, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
8
|
Guo S, Ge JX, Liu SN, Zhou JY, Li C, Chen HJ, Chen L, Shen YQ, Zhou QL. Development of a convenient and effective hypertension risk prediction model and exploration of the relationship between Serum Ferritin and Hypertension Risk: a study based on NHANES 2017-March 2020. Front Cardiovasc Med 2023; 10:1224795. [PMID: 37736023 PMCID: PMC10510409 DOI: 10.3389/fcvm.2023.1224795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 07/28/2023] [Indexed: 09/23/2023] Open
Abstract
Background Hypertension is a major public health problem, and its resulting other cardiovascular diseases are the leading cause of death worldwide. In this study, we constructed a convenient and high-performance hypertension risk prediction model to assist in clinical diagnosis and explore other important influencing factors. Methods We included 8,073 people from NHANES (2017-March 2020), using their 120 features to form the original dataset. After data pre-processing, we removed several redundant features through LASSO regression and correlation analysis. Thirteen commonly used machine learning methods were used to construct prediction models, and then, the methods with better performance were coupled with recursive feature elimination to determine the optimal feature subset. After data balancing through SMOTE, we integrated these better-performing learners to construct a fusion model based for predicting hypertension risk on stacking strategy. In addition, to explore the relationship between serum ferritin and the risk of hypertension, we performed a univariate analysis and divided it into four level groups (Q1 to Q4) by quartiles, with the lowest level group (Q1) as the reference, and performed multiple logistic regression analysis and trend analysis. Results The optimal feature subsets were: age, BMI, waist, SBP, DBP, Cre, UACR, serum ferritin, HbA1C, and doctors recommend reducing salt intake. Compared to other machine learning models, the constructed fusion model showed better predictive performance with precision, accuracy, recall, F1 value and AUC of 0.871, 0.873, 0.871, 0.869 and 0.966, respectively. For the analysis of the relationship between serum ferritin and hypertension, after controlling for all co-variates, OR and 95% CI from Q2 to Q4, compared to Q1, were 1.396 (1.176-1.658), 1.499 (1.254-1.791), and 1.645 (1.360-1.989), respectively, with P < 0.01 and P for trend <0.001. Conclusion The hypertension risk prediction model developed in this study is efficient in predicting hypertension with only 10 low-cost and easily accessible features, which is cost-effective in assisting clinical diagnosis. We also found a trend correlation between serum ferritin levels and the risk of hypertension.
Collapse
Affiliation(s)
- Shuang Guo
- Information Center, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, China
| | - Jiu-Xin Ge
- Department of Cardiology, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, China
| | - Shan-Na Liu
- Information Center, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, China
| | - Jia-Yu Zhou
- Xinjiang Second Medical College, Karamay, China
| | - Chang Li
- Information Center, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, China
| | - Han-Jie Chen
- Information Center, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, China
| | - Li Chen
- Information Center, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, China
| | - Yu-Qiang Shen
- Information Center, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, China
| | - Qing-Li Zhou
- Information Center, The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, China
| |
Collapse
|
9
|
Miranda E, Adiarto S, Bhatti FM, Zakiyyah AY, Aryuni M, Bernando C. Understanding Arteriosclerotic Heart Disease Patients Using Electronic Health Records: A Machine Learning and Shapley Additive exPlanations Approach. Healthc Inform Res 2023; 29:228-238. [PMID: 37591678 PMCID: PMC10440196 DOI: 10.4258/hir.2023.29.3.228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 07/21/2023] [Accepted: 07/21/2023] [Indexed: 08/19/2023] Open
Abstract
OBJECTIVES The number of deaths from cardiovascular disease is projected to reach 23.3 million by 2030. As a contribution to preventing this phenomenon, this paper proposed a machine learning (ML) model to predict patients with arteriosclerotic heart disease (AHD). We also interpreted the prediction model results based on the ML approach and deployed modelagnostic ML methods to identify informative features and their interpretations. METHODS We used a hematology Electronic Health Record (EHR) with information on erythrocytes, hematocrit, hemoglobin, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, leukocytes, thrombocytes, age, and sex. To detect and predict AHD, we explored random forest (RF), XGBoost, and AdaBoost models. We examined the prediction model results based on the confusion matrix and accuracy measures. We used the Shapley Additive exPlanations (SHAP) framework to interpret the ML model and quantify the contribution of features to predictions. RESULTS Our study included data from 6,837 patients, with 4,702 records from patients diagnosed with AHD and 2,135 records from patients without an AHD diagnosis. AdaBoost outperformed RF and XGBoost, achieving an accuracy of 0.78, precision of 0.82, F1-score of 0.85, and recall of 0.88. According to the SHAP summary bar plot method, hemoglobin was the most important attribute for detecting and predicting AHD patients. The SHAP local interpretability bar plot revealed that hemoglobin and mean corpuscular hemoglobin concentration had positive impacts on AHD prediction based on a single observation. CONCLUSIONS ML models based on real clinical data can be used to predict AHD.
Collapse
Affiliation(s)
- Eka Miranda
- Department of Information Systems, School of Information Systems, Bina Nusantara University, Jakarta,
Indonesia
| | - Suko Adiarto
- Department of Cardiology and Vascular Medicine, Faculty of Medicine, Universitas Indonesia/National Cardiovascular Center Harapan Kita, Jakarta,
Indonesia
| | - Faqir M. Bhatti
- Riphah Institute of Computing and Applied Sciences, Riphah International University, Raiwind, Lahore,
Pakistan
| | - Alfi Yusrotis Zakiyyah
- Mathematics Department, School of Computer Science, Bina Nusantara University, Jakarta,
Indonesia
| | - Mediana Aryuni
- Department of Information Systems, School of Information Systems, Bina Nusantara University, Jakarta,
Indonesia
| | - Charles Bernando
- Department of Information Systems, School of Information Systems, Bina Nusantara University, Jakarta,
Indonesia
| |
Collapse
|
10
|
Zhang X, Gavaldà R, Baixeries J. Interpretable prediction of mortality in liver transplant recipients based on machine learning. Comput Biol Med 2022; 151:106188. [PMID: 36306583 DOI: 10.1016/j.compbiomed.2022.106188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 09/24/2022] [Accepted: 10/08/2022] [Indexed: 12/27/2022]
Abstract
BACKGROUND Accurate prediction of the mortality of post-liver transplantation is an important but challenging task. It relates to optimizing organ allocation and estimating the risk of possible dysfunction. Existing risk scoring models, such as the Balance of Risk (BAR) score and the Survival Outcomes Following Liver Transplantation (SOFT) score, do not predict the mortality of post-liver transplantation with sufficient accuracy. In this study, we evaluate the performance of machine learning models and establish an explainable machine learning model for predicting mortality in liver transplant recipients. METHOD The optimal feature set for the prediction of the mortality was selected by a wrapper method based on binary particle swarm optimization (BPSO). With the selected optimal feature set, seven machine learning models were applied to predict mortality over different time windows. The best-performing model was used to predict mortality through a comprehensive comparison and evaluation. An interpretable approach based on machine learning and SHapley Additive exPlanations (SHAP) is used to explicitly explain the model's decision and make new discoveries. RESULTS With regard to predictive power, our results demonstrated that the feature set selected by BPSO outperformed both the feature set in the existing risk score model (BAR score, SOFT score) and the feature set processed by principal component analysis (PCA). The best-performing model, extreme gradient boosting (XGBoost), was found to improve the Area Under a Curve (AUC) values for mortality prediction by 6.7%, 11.6%, and 17.4% at 3 months, 3 years, and 10 years, respectively, compared to the SOFT score. The main predictors of mortality and their impact were discussed for different age groups and different follow-up periods. CONCLUSIONS Our analysis demonstrates that XGBoost can be an ideal method to assess the mortality risk in liver transplantation. In combination with the SHAP approach, the proposed framework provides a more intuitive and comprehensive interpretation of the predictive model, thereby allowing the clinician to better understand the decision-making process of the model and the impact of factors associated with mortality risk in liver transplantation.
Collapse
Affiliation(s)
- Xiao Zhang
- Department of Computer Science, Universitat Politècnica de Catalunya, Barcelona, 08034, Spain.
| | | | - Jaume Baixeries
- Department of Computer Science, Universitat Politècnica de Catalunya, Barcelona, 08034, Spain
| |
Collapse
|
11
|
Ghorashi SM, Fazeli A, Hedayat B, Mokhtari H, Jalali A, Ahmadi P, Chalian H, Bragazzi NL, Shirani S, Omidi N. Comparison of conventional scoring systems to machine learning models for the prediction of major adverse cardiovascular events in patients undergoing coronary computed tomography angiography. Front Cardiovasc Med 2022; 9:994483. [PMID: 36386332 PMCID: PMC9643500 DOI: 10.3389/fcvm.2022.994483] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 10/05/2022] [Indexed: 08/04/2023] Open
Abstract
BACKGROUND The study aims to compare the prognostic performance of conventional scoring systems to a machine learning (ML) model on coronary computed tomography angiography (CCTA) to discriminate between the patients with and without major adverse cardiovascular events (MACEs) and to find the most important contributing factor of MACE. MATERIALS AND METHODS From November to December 2019, 500 of 1586 CCTA scans were included and analyzed, then six conventional scores were calculated for each participant, and seven ML models were designed. Our study endpoints were all-cause mortality, non-fatal myocardial infarction, late coronary revascularization, and hospitalization for unstable angina or heart failure. Score performance was assessed by area under the curve (AUC) analysis. RESULTS Of 500 patients (mean age: 60 ± 10; 53.8% male subjects) referred for CCTA, 416 patients have met inclusion criteria, 46 patients with early (<90 days) cardiac evaluation (due to the inability to clarify the reason for the assessment, deterioration of the symptoms vs. the CCTA result), and 38 patients because of missed follow-up were not enrolled in the final analysis. Forty-six patients (11.0%) developed MACE within 20.5 ± 7.9 months of follow-up. Compared to conventional scores, ML models showed better performance, except only one model which is eXtreme Gradient Boosting had lower performance than conventional scoring systems (AUC:0.824, 95% confidence interval (CI): 0.701-0.947). Between ML models, random forest, ensemble with generalized linear, and ensemble with naive Bayes were shown to have higher prognostic performance (AUC: 0.92, 95% CI: 0.85-0.99, AUC: 0.90, 95% CI: 0.81-0.98, and AUC: 0.89, 95% CI: 0.82-0.97), respectively. Coronary artery calcium score (CACS) had the highest correlation with MACE. CONCLUSION Compared to the conventional scoring system, ML models using CCTA scans show improved prognostic prediction for MACE. Anatomical features were more important than clinical characteristics.
Collapse
Affiliation(s)
| | - Amir Fazeli
- Tehran Heart Center, Tehran University of Medical Science, Tehran, Iran
| | - Behnam Hedayat
- Tehran Heart Center, Tehran University of Medical Science, Tehran, Iran
| | - Hamid Mokhtari
- Biomedical Engineering and Physics Department, School of Medicine, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Arash Jalali
- Tehran Heart Center, Tehran University of Medical Science, Tehran, Iran
| | - Pooria Ahmadi
- Tehran Heart Center, Tehran University of Medical Science, Tehran, Iran
| | - Hamid Chalian
- Division of Cardiothoracic Imaging, Department of Radiology, University of Washington, Seattle, WA, United States
| | - Nicola Luigi Bragazzi
- Laboratory for Industrial and Applied Mathematics (LIAM), Department of Mathematics and Statistics, York University, Toronto, ON, Canada
| | - Shapour Shirani
- Department of Cardiovascular Imaging, Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Negar Omidi
- Department of Cardiovascular Imaging, Tehran Heart Center, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
12
|
Wang K, Li Y, Pan J, He H, Zhao Z, Guo Y, Zhang X. Noninvasive diagnosis of AIH/PBC overlap syndrome based on prediction models. Open Med (Wars) 2022; 17:1550-1558. [PMID: 36245703 PMCID: PMC9520330 DOI: 10.1515/med-2022-0526] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 06/21/2022] [Accepted: 06/24/2022] [Indexed: 11/15/2022] Open
Abstract
Abstract
Autoimmune liver diseases (AILDs) are life-threatening chronic liver diseases, mainly including autoimmune hepatitis (AIH), primary biliary cholangitis (PBC), and AIH–PBC overlap syndrome (OS), which are difficult to distinguish clinically at early stages. This study aimed to establish model to achieve the purpose of the diagnosis of AIH/PBC OS in a noninvasive way. A total of 201 AILDs patients were included in this retrospective study who underwent liver biopsy during January 2011 to December 2020. Serological factors significantly associated with OS were determined by the univariate analysis. Two multivariate models based on these factors were constructed to predict the diagnosis of AIH/PBC OS using logistic regression and random forest analysis. The results showed that immunoglobulins G and M had significant importance in both models. In logistic regression model, anti-Sp100, anti-Ro-52, anti-SSA, or antinuclear antibody positivity were risk factors for OS. In random forest model, activated partial thromboplastin time and ɑ-fetoprotein level were important. To distinguish PBC and OS, the sensitivity and specificity of logistic regression model were 0.889 and 0.727, respectively, and the sensitivity and specificity of random forest model were 0.944 and 0.818, respectively. In conclusion, we established two predictive models for the diagnosis of AIH/PBC OS in a noninvasive method and they showed better performance than Paris criteria for the definition of AIH/PBC OS.
Collapse
Affiliation(s)
- Kailing Wang
- Department of Gastroenterology, Xiangya Hospital, Central South University , Changsha , Hunan 410008 , China
| | - Yong Li
- Department of Gastroenterology, Xiangya Hospital, Central South University , Changsha , Hunan 410008 , China
| | - Jianfeng Pan
- Department of Gastroenterology, Xiangya Hospital, Central South University , Changsha , Hunan 410008 , China
| | - Huifang He
- Department of Gastroenterology, Xiangya Hospital, Central South University , Changsha , Hunan 410008 , China
| | - Ziyi Zhao
- Department of Gastroenterology, Xiangya Hospital, Central South University , Changsha , Hunan 410008 , China
| | - Yiming Guo
- Department of Gastroenterology, Xiangya Hospital, Central South University , Changsha , Hunan 410008 , China
| | - Xiaomei Zhang
- Department of Gastroenterology, Xiangya Hospital, Central South University , Changsha , Hunan 410008 , China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University , Changsha , Hunan 410007 , China
| |
Collapse
|
13
|
Dong W, Zhang P, Xu QL, Ren ZD, Wang J. A Study on a Neural Network Risk Simulation Model Construction for Avian Influenza A (H7N9) Outbreaks in Humans in China during 2013-2017. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:10877. [PMID: 36078588 PMCID: PMC9518328 DOI: 10.3390/ijerph191710877] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/23/2022] [Accepted: 08/25/2022] [Indexed: 06/15/2023]
Abstract
The main purposes of this study were to explore the spatial distribution characteristics of H7N9 human infections during 2013-2017, and to construct a neural network risk simulation model of H7N9 outbreaks in China and evaluate their effects. First, ArcGIS 10.6 was used for spatial autocorrelation analysis, and cluster patterns ofH7N9 outbreaks were analyzed in China during 2013-2017 to detect outbreaks' hotspots. During the study period, the incidence of H7N9 outbreaks in China was high in the eastern and southeastern coastal areas of China, with a tendency to spread to the central region. Moran's I values of global spatial autocorrelation of H7N9 outbreaks in China from 2013 to 2017 were 0.080128, 0.073792, 0.138015, 0.139221 and 0.050739, respectively (p < 0.05) indicating a statistically significant positive correlation of the epidemic. Then, SPSS 20.0 was used to analyze the correlation between H7N9 outbreaks in China and population, livestock production, the distance between the case and rivers, poultry farming, poultry market, vegetation index, etc. Statistically significant influencing factors screened out by correlation analysis were population of the city, average vegetation of the city, and the distance between the case and rivers (p < 0.05), which were included in the neural network risk simulation model of H7N9 outbreaks in China. The simulation accuracy of the neural network risk simulation model of H7N9 outbreaks in China from 2013 to 2017 were 85.71%, 91.25%, 91.54%, 90.49% and 92.74%, and the AUC were 0.903, 0.976, 0.967, 0.963 and 0.970, respectively, showing a good simulation effect of H7N9 epidemics in China. The innovation of this study lies in the epidemiological study of H7N9 outbreaks by using a variety of technical means, and the construction of a neural network risk simulation model of H7N9 outbreaks in China. This study could provide valuable references for the prevention and control of H7N9 outbreaks in China.
Collapse
Affiliation(s)
- Wen Dong
- Faculty of Geography, Yunnan Normal University, Kunming 650500, China
- GIS Technology Engineering Research Centre for West-China Resources and Environment of Educational Ministry, Yunnan Normal University, Kunming 650500, China
| | - Peng Zhang
- College of Intelligent Information Engineering, Chongqing Aerospace Polytechnic College, Chongqing 400021, China
| | - Quan-Li Xu
- Faculty of Geography, Yunnan Normal University, Kunming 650500, China
- GIS Technology Engineering Research Centre for West-China Resources and Environment of Educational Ministry, Yunnan Normal University, Kunming 650500, China
| | - Zhong-Da Ren
- GIS Technology Engineering Research Centre for West-China Resources and Environment of Educational Ministry, Yunnan Normal University, Kunming 650500, China
- State Key Laboratory of Estuarine and Coastal Research, East China Normal University, Shanghai 200241, China
| | - Jie Wang
- Chongqing City Management College, Chongqing 401331, China
| |
Collapse
|
14
|
Machine learning approach to predict subtypes of primary aldosteronism is helpful to estimate indication of adrenal vein sampling. High Blood Press Cardiovasc Prev 2022; 29:375-383. [PMID: 35576101 DOI: 10.1007/s40292-022-00523-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 05/06/2022] [Indexed: 11/09/2022] Open
Abstract
INTRODUCTION Primary aldosteronism (PA) is a common disease. Especially in unilateral PA (UPA), the risk of cardiovascular disease is high and proper localization is important. Adrenal vein sampling (AVS) is commonly used to localize PA, but its availability is limited. Therefore, it is important to predict the unilateral or bilateral PA and to choose the appropriate cases for AVS or watchful observation. AIM The purpose of this study is to develop a model using machine learning to predict bilateral or unilateral PA to extract cases for AVS or watchful observation. METHODS We retrospectively analyzed 154 patients diagnosed with PA and who underwent AVS at our hospital between January 2010 and June 2021. Based on machine learning, we determined predictors of PA subtypes diagnosis from the results of blood and loading tests. RESULTS The accuracy of the machine learning was 88% and the top predictors of the UPA were plasma aldosterone concentration after the saline infusion test, aldosterone to renin ratio after the captopril challenge test, serum potassium and aldosterone-to-renin ratio. By using these factors, the accuracy, sensitivity, specificity and the area under the curve (AUC) were 91%, 70%, 99% and 0.91, respectively. Furthermore, we examined the surgical outcomes of UPA and found that the group diagnosed as unilateral by the predictors showed improvement in clinical findings, while the group diagnosed as bilateral by the predictors showed no improvement. CONCLUSION Our predictive model based on machine learning can support to choose the performance of adrenal vein sampling or watchful observation.
Collapse
|
15
|
Suri JS, Bhagawati M, Paul S, Protogeron A, Sfikakis PP, Kitas GD, Khanna NN, Ruzsa Z, Sharma AM, Saxena S, Faa G, Paraskevas KI, Laird JR, Johri AM, Saba L, Kalra M. Understanding the bias in machine learning systems for cardiovascular disease risk assessment: The first of its kind review. Comput Biol Med 2022; 142:105204. [DOI: 10.1016/j.compbiomed.2021.105204] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Revised: 12/29/2021] [Accepted: 12/29/2021] [Indexed: 02/09/2023]
|
16
|
Identifying Coronary Artery Lesions by Feature Analysis of Radial Pulse Wave: A Case-Control Study. BIOMED RESEARCH INTERNATIONAL 2022; 2021:5047501. [PMID: 35005017 PMCID: PMC8739924 DOI: 10.1155/2021/5047501] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 12/10/2021] [Indexed: 11/17/2022]
Abstract
Background Cardiovascular diseases have been always the most common cause of morbidity and mortality worldwide. Health monitoring of high-risk and suspected patients is essential. Currently, invasive coronary angiography is still the most direct and accurate method of determining the severity of coronary artery lesions, but it may not be the optimal clinical choice for suspected patients who had clinical symptoms of coronary heart disease (CHD) such as chest pain but no coronary artery lesion. Modern medical research indicates that radial pulse waves contain substantial pathophysiologic information about the cardiovascular and circulation systems; therefore, analysis of these waves could be a noninvasive technique for assessing cardiovascular disease. Objective The objective of this study was to analyze the radial pulse wave to construct models for assessing the extent of coronary artery lesions based on pulse features and investigate the latent value of noninvasive detection technology based on pulse wave in the evaluation of cardiovascular disease, so as to promote the development of wearable devices and mobile medicine. Method This study included 529 patients suspected of CHD who had undergone coronary angiography. Patients were sorted into a control group with no lesions, a 1 or 2 lesion group, and a multiple (3 or more) lesion group as determined by coronary angiography. The linear time-domain features and the nonlinear multiscale entropy features of their radial pulse wave signals were compared, and these features were used to construct models for identifying the range of coronary artery lesions using the k-nearest neighbor (KNN), decision tree (DT), and random forest (RF) machine learning algorithms. The average precision of these algorithms was then compared. Results (1) Compared with the control group, the group with 1 or 2 lesions had increases in their radial pulse wave time-domain features H2/H1, H3/H1, and W2 (P < 0.05), whereas the group with multiple lesions had decreases in MSE1, MSE2, MSE3, MSE4, and MSE5 (P < 0.05). (2) Compared with the 1 or 2 lesion group, the multiple lesion group had increases in T1/T (P < 0.05) and decreases in T and W1 (P < 0.05). (3) The RF model for identifying numbers of coronary artery lesions had a higher average precision than the models built with KNN or DT. Furthermore, average precision of the model was highest (80.98%) if both time-domain features and multiscale entropy features of radial pulse signals were used to construct the model. Conclusion Pulse wave signal can identify the range of coronary artery lesions with acceptable accuracy; this result is promising valuable for assessing the severity of coronary artery lesions. The technique could be used to development of mobile medical treatments or remote home monitoring systems for patients suspected or those at high risk of coronary atherosclerotic heart disease.
Collapse
|
17
|
Zhu M, Wang B, Wang T, Chen Y, He D. Risk Assessment of Pulmonary Metastasis for Cervical Cancer Patients by Ensemble Learning Models: A Large Population Based Real-World Study. Int J Gen Med 2021; 14:8713-8723. [PMID: 34853529 PMCID: PMC8628546 DOI: 10.2147/ijgm.s338389] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 11/10/2021] [Indexed: 12/17/2022] Open
Abstract
Objective Pulmonary metastasis (PM) is an independent risk factor affecting the prognosis of cervical patients, but it still lacks a prediction. This study aimed to develop machine learning-based predictive models for PM. Methods A total of 22,766 patients diagnosed with or without PM from the Surveillance, Epidemiology, and End Results (SEER) database were enrolled in this study. The cohort was randomly split into a train set (70%) and a validation set (30%). In addition, 884 Chinese patients from two tertiary medical centers were included as an external validation set. Duplicated and useless candidate variables were excluded, and sixteen variables were included for the machine learning algorithm. We developed five predictive models, including the generalized linear model (GLM), random forest model (RFM), naive Bayesian model (NBM), artificial neural networks model (ANNM), and decision tree model (DTM). The predictive performance of these models was evaluated by the receiver operating characteristic (ROC) curve and calibration curve. The Cox proportional hazard model (CPHM) and competing risk model (CRM) were also included for survival outcome prediction. Results Of the patients included in the analysis, 2456 (4.38%) patients were diagnosed with PM. Age, organ-site metastasis (liver, bone, brain), distant lymph metastasis, tumor size, and pathology were the important predictors of PM. The RFM with 9 variables introduced was identified as the best predictive model for PM (AUC = 0.972, 95% CI: 0.958-0.986). The C-index for the CPHM and CRM was 0.626 (95% CI: 0.604-0.648) and 0.611 (95% CI: 0.586-0.636), respectively. Conclusion The prediction algorithm derived by machine-learning-based methods shows a robust ability to predict PM. This result suggests that machine learning techniques have the potential to improve the development and validation of predictive modeling in cervical patients with PM.
Collapse
Affiliation(s)
- Menglin Zhu
- Department of Anesthesiology, Hubei Minzu University Affiliated Enshi Clinical Medical School, The Central Hospital of Enshi Tujia and Miao Autonomous Prefecture, Enshi, Hubei, 445000, People's Republic of China
| | - Bo Wang
- National Clinical Research Center for Obstetrical and Gynecological Diseases; Key Laboratory of Cancer Invasion and Metastasis, Ministry of Education; Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, People's Republic of China
| | - Tiejun Wang
- Department of Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, People's Republic of China
| | - Yilin Chen
- Department of Anesthesiology, Hubei Minzu University Affiliated Enshi Clinical Medical School, The Central Hospital of Enshi Tujia and Miao Autonomous Prefecture, Enshi, Hubei, 445000, People's Republic of China.,Department of Pulmonary and Critical Care Medicine, Hubei Minzu University Affiliated Enshi Clinical Medical School, The Central Hospital of Enshi Tujia and Miao Autonomous Prefecture, Enshi, Hubei, 445000, People's Republic of China
| | - Du He
- Department of Anesthesiology, Hubei Minzu University Affiliated Enshi Clinical Medical School, The Central Hospital of Enshi Tujia and Miao Autonomous Prefecture, Enshi, Hubei, 445000, People's Republic of China.,Department of Oncology, Hubei Minzu University Affiliated Enshi Clinical Medical School, The Central Hospital of Enshi Tujia and Miao Autonomous Prefecture, Enshi, Hubei, 445000, People's Republic of China
| |
Collapse
|
18
|
Li XF, Huang YZ, Tang JY, Li RC, Wang XQ. Development of a random forest model for hypotension prediction after anesthesia induction for cardiac surgery. World J Clin Cases 2021; 9:8729-8739. [PMID: 34734051 PMCID: PMC8546817 DOI: 10.12998/wjcc.v9.i29.8729] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 07/07/2021] [Accepted: 07/22/2021] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Hypotension after the induction of anesthesia is known to be associated with various adverse events. The involvement of a series of factors makes the prediction of hypotension during anesthesia quite challenging.
AIM To explore the ability and effectiveness of a random forest (RF) model in the prediction of post-induction hypotension (PIH) in patients undergoing cardiac surgery.
METHODS Patient information was obtained from the electronic health records of the Second Affiliated Hospital of Hainan Medical University. The study included patients, ≥ 18 years of age, who underwent cardiac surgery from December 2007 to January 2018. An RF algorithm, which is a supervised machine learning technique, was employed to predict PIH. Model performance was assessed by the area under the curve (AUC) of the receiver operating characteristic. Mean decrease in the Gini index was used to rank various features based on their importance.
RESULTS Of the 3030 patients included in the study, 1578 (52.1%) experienced hypotension after the induction of anesthesia. The RF model performed effectively, with an AUC of 0.843 (0.808-0.877) and identified mean blood pressure as the most important predictor of PIH after anesthesia. Age and body mass index also had a significant impact.
CONCLUSION The generated RF model had high discrimination ability for the identification of individuals at high risk for a hypotensive event during cardiac surgery. The study results highlighted that machine learning tools confer unique advantages for the prediction of adverse post-anesthesia events.
Collapse
Affiliation(s)
- Xuan-Fa Li
- Department of Anesthesiology, The Second Affiliated Hospital of Hainan Medical University, Haikou 570311, Hainan Province, China
| | - Yong-Zhen Huang
- Department of Anesthesiology, Hainan Hospital of Traditional Chinese Medicine, Haikou 570203, Hainan Province, China
| | - Jing-Ying Tang
- Department of Anesthesiology, Hainan Provincial People’s Hospital, Haikou 570000, Hainan Province, China
| | - Rui-Chen Li
- Department of Anesthesiology, The Second Affiliated Hospital of Hainan Medical University, Haikou 570311, Hainan Province, China
| | - Xiao-Qi Wang
- Department of Anesthesiology, The Second Affiliated Hospital of Hainan Medical University, Haikou 570311, Hainan Province, China
| |
Collapse
|
19
|
Liu J, Wang X, Lin J, Li S, Deng G, Wei J. Classifiers for Predicting Coronary Artery Disease Based on Gene Expression Profiles in Peripheral Blood Mononuclear Cells. Int J Gen Med 2021; 14:5651-5663. [PMID: 34552349 PMCID: PMC8450378 DOI: 10.2147/ijgm.s329005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Accepted: 08/26/2021] [Indexed: 12/17/2022] Open
Abstract
Objective Coronary artery disease (CAD) is a serious global health concern. Current diagnostic methods for CAD involve risk to the patient and are costly, so better diagnostic tools are needed. We defined four classifiers based on gene expression profiles in peripheral blood mononuclear cells and determined their potential for CAD detection. Methods We downloaded a CAD-related data set (GSE113079) from the Gene Expression Omnibus (GEO) database. We identified differentially expressed genes (DEGs) in peripheral blood mononuclear cells between CAD samples and healthy controls. DEGs were analyzed for functional enrichment. To create a robust CAD classifier, DEGs were identified by feature selection using the principal component analysis. Then, least absolute shrinkage and selection operator (LASSO) logistic regression, random forest, and support vector machine (SVM) models were created. Gene set variation analysis (GSVA) score and gene set enrichment analysis (GSEA) were also conducted. The performance of the models was evaluated in terms of the area under receiver operating characteristic curves (AUC). Results In the training set, we found 135 up-regulated genes and 104 down-regulated genes in CAD patients compared with controls. The DEGs were involved in some pathways associated with CAD, such as pathways involving calcium and interleukin-17 signaling. Twenty genes were identified as optimal features and used to generate the logistic classifier based on LASSO. The AUC for the classifier was 1.00 in the training set and 0.997 in the test set. Using the 20 DEGs, SVM and random forest classifiers were also generated and showed high diagnostic efficacy, with respective AUCs of 0.997 and 1.00 against the training set. A GSVA score was also established using the top 20 significant DEGs, which showed an AUC of 0.971 in the training set and 0.989 in the test set. Furthermore, GSEA showed autophagy and the proteasome to be major pathways involving the DEGs. Conclusion We identified a set of genes specific for CAD whose expression can be measured non-invasively. Using these genes, we defined four diagnostic classifiers using multiple methods.
Collapse
Affiliation(s)
- Jie Liu
- Department of Cardiology, The Fifth Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, 530022, People's Republic of China.,Department of Cardiology, The First People's Hospital of Nanning, Nanning, Guangxi, 530022, People's Republic of China
| | - Xiaodong Wang
- Department of Cardiology, The Fifth Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, 530022, People's Republic of China.,Department of Cardiology, The First People's Hospital of Nanning, Nanning, Guangxi, 530022, People's Republic of China
| | - Junhua Lin
- Department of Cardiology, The Fifth Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, 530022, People's Republic of China
| | - Shaohua Li
- Department of Cardiology, The Fifth Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, 530022, People's Republic of China
| | - Guoxiong Deng
- Department of Cardiology, The Fifth Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, 530022, People's Republic of China.,Department of Cardiology, The First People's Hospital of Nanning, Nanning, Guangxi, 530022, People's Republic of China
| | - Jinru Wei
- Department of Cardiology, The Fifth Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, 530022, People's Republic of China.,Department of Cardiology, The First People's Hospital of Nanning, Nanning, Guangxi, 530022, People's Republic of China
| |
Collapse
|
20
|
Lizar JC, Yaly CC, Colello Bruno A, Viani GA, Pavoni JF. Patient-specific IMRT QA verification using machine learning and gamma radiomics. Phys Med 2021; 82:100-108. [PMID: 33607523 DOI: 10.1016/j.ejmp.2021.01.071] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 11/25/2020] [Accepted: 01/14/2021] [Indexed: 01/06/2023] Open
Abstract
Gamma function is the standard methodology for comparing dose distributions. It is calculated in dedicated software, and its results verification is not performed. Thus we developed an automatic tool for patient-specific QA results verification through high accuracy machine learning (ML) models based on the radiomics characteristics extraction from gamma images. We used 158 patient-specific QA tests and extracted 105 radiomics features from each gamma image. Three random forest models were developed (ML I, ML II, and ML III). ML I and ML II verified the gamma image approval using criteria of 2%/2mm/15% threshold and 3%/3mm/15% threshold, respectively. ML III verified if the gamma analyzes software recommended protocol was followed to detect if the TPS grid modification step was done. The models were based on the most important features selected using the mean decreased impurity, and their performances were evaluated. ML I included 25 features. Its accuracy was 0.85 using the test set and 0.84 using dataset B. ML II included 10 features, and its accuracy with the test set was 0.98; the same value was achieved using the never seen data (dataset B). The First-order 10th percentile feature was identified as a feature strongly related to the approved classification. ML III selected 23 features with an accuracy of 0.99 for test set and 0.98 for dataset B. An automatic workflow example for gamma analyses QA results verification could be proposed combining the models to detect grid inconsistencies on software evaluation, followed by the test approval classification.
Collapse
Affiliation(s)
- Jéssica Caroline Lizar
- Department of Physics, Faculty of Philosophy, Sciences and Letters at Ribeirão Preto, University of São Paulo, Av. Bandeirantes 3900, 14040-901, Monte Alegre, Ribeirão Preto, São Paulo, Brazil
| | - Carolina Cariolatto Yaly
- Radiotherapy Department, Ribeirão Preto Medical School Hospital and Clinics, University of São Paulo, Av. Bandeirantes 3900, 14040-900, Monte Alegre, Ribeirão Preto, São Paulo, Brazil
| | - Alexandre Colello Bruno
- Radiotherapy Department, Ribeirão Preto Medical School Hospital and Clinics, University of São Paulo, Av. Bandeirantes 3900, 14040-900, Monte Alegre, Ribeirão Preto, São Paulo, Brazil
| | - Gustavo Arruda Viani
- Radiotherapy Department, Ribeirão Preto Medical School Hospital and Clinics, University of São Paulo, Av. Bandeirantes 3900, 14040-900, Monte Alegre, Ribeirão Preto, São Paulo, Brazil
| | - Juliana Fernandes Pavoni
- Department of Physics, Faculty of Philosophy, Sciences and Letters at Ribeirão Preto, University of São Paulo, Av. Bandeirantes 3900, 14040-901, Monte Alegre, Ribeirão Preto, São Paulo, Brazil; Radiotherapy Department, Ribeirão Preto Medical School Hospital and Clinics, University of São Paulo, Av. Bandeirantes 3900, 14040-900, Monte Alegre, Ribeirão Preto, São Paulo, Brazil.
| |
Collapse
|
21
|
Su X, Xu Y, Tan Z, Wang X, Yang P, Su Y, Jiang Y, Qin S, Shang L. Prediction for cardiovascular diseases based on laboratory data: An analysis of random forest model. J Clin Lab Anal 2020; 34:e23421. [PMID: 32725839 PMCID: PMC7521325 DOI: 10.1002/jcla.23421] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 01/19/2020] [Accepted: 02/11/2020] [Indexed: 12/11/2022] Open
Abstract
Background To establish a prediction model for cardiovascular diseases (CVD) in the general population based on random forests. Methods A retrospective study involving 498 subjects was conducted in Xi'an Medical University between 2011 and 2018. The random forest algorithm was used to screen out the variables that greatly affected the CVD prediction and to establish a prediction model. The important variables were included in the multifactorial logistic regression analysis. The area under the curve (AUC) was compared between logistic regression model and random forest model. Results The random forest model revealed the variables, including the age, body mass index (BMI), fasting blood glucose (FBG), diastolic blood pressure (DBP), triglyceride (TG), systolic blood pressure (SBP), total cholesterol (TC), waist circumference, and high‐density lipoprotein‐cholesterol (HDL‐C), were more significant for CVD prediction; the AUC was 0.802 in CVD prediction. Multifactorial logistic regression analysis indicated that the risk factors for CVD included the age [odds ratio (OR): 1.14, 95% confidence intervals (CI): 1.10‐1.17, P < .001], BMI (OR: 1.13, 95% CI: 1.06‐1.20, P < .001), TG (OR: 1.11, 95% CI: 1.02‐1.22, P = .023), and DBP (OR: 1.04, 95% CI: 1.02‐1.06, P = .001); the AUC was 0.843 in CVD prediction. The established logistic regression prediction model was Logit P = Log[P/(1 − P)] = −11.47 + 0.13 × age + 0.12 × BMI + 0.11 × TG + 0.04 × DBP; P = 1/[1 + exp(−Logit P)]. People were prone to develop CVD at the time of P > .51. Conclusions A prediction model for CVD is developed in the general population based on random forests, which provides a simple tool for the early prediction of CVD.
Collapse
Affiliation(s)
- Xi Su
- Department of Health Statistics, Fourth Military Medical University, Xi'an, China.,School of Health Management, Xi'an Medical University, Xi'an, China
| | - Yongyong Xu
- Department of Health Statistics, Fourth Military Medical University, Xi'an, China
| | - Zhijun Tan
- Department of Health Statistics, Fourth Military Medical University, Xi'an, China
| | - Xia Wang
- Department of Health Statistics, Fourth Military Medical University, Xi'an, China
| | - Peng Yang
- Department of Health Statistics, Fourth Military Medical University, Xi'an, China
| | - Yani Su
- Data Center, Shaanxi Provincial People's Hospital, Xi'an, China
| | - Yangyang Jiang
- School of Health Management, Xi'an Medical University, Xi'an, China
| | - Sijia Qin
- School of Stomatology, Xi'an Medical University, Xi'an, China
| | - Lei Shang
- Department of Health Statistics, Fourth Military Medical University, Xi'an, China
| |
Collapse
|