1
|
Giannoula A, De Paepe AE, Sanz F, Furlong LI, Camara E. Identifying time patterns in Huntington's disease trajectories using dynamic time warping-based clustering on multi-modal data. Sci Rep 2025; 15:3081. [PMID: 39856140 PMCID: PMC11759715 DOI: 10.1038/s41598-025-86686-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 01/13/2025] [Indexed: 01/27/2025] Open
Abstract
One of the principal goals of Precision Medicine is to stratify patients by accounting for individual variability. However, extracting meaningful information from Real-World Data, such as Electronic Health Records, still remains challenging due to methodological and computational issues. A Dynamic Time Warping-based unsupervised-clustering methodology is presented in this paper for the clustering of patient trajectories of multi-modal health data on the basis of shared temporal characteristics. Building on an earlier methodology, a new dimension of time-varying clinical and imaging features is incorporated, through an adapted cost-minimization algorithm for clustering on different, possibly overlapping, feature subsets. The model disease chosen is Huntington's disease (HD), characterized by progressive neurodegeneration. From a wide range of examined user-defined parameters, four case examples are highlighted to demonstrate the identified temporal patterns in multi-modal HD trajectories and to study how these differ due to the combined effects of feature weights and granularity threshold. For each identified cluster, polynomial fits that describe the time behavior of the assessed features are provided for an informative comparison, together with their averaged values. The proposed data-mining methodology permits the stratification of distinct time patterns of multi-modal health data in individuals that share a diagnosis, by employing user-customized criteria beyond the current clinical practice. Overall, this work bears implications for better analysis of individual variability in disease progression, opening doors to personalized preventative, diagnostic and therapeutic strategies.
Collapse
Affiliation(s)
- Alexia Giannoula
- Research Group on Integrative Biomedical Informatics (GRIB), Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Hospital del Mar Research Institute, Barcelona, Spain.
| | - Audrey E De Paepe
- Research Group on Integrative Biomedical Informatics (GRIB), Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Hospital del Mar Research Institute, Barcelona, Spain
- Cognition and Brain Plasticity Unit, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), Barcelona, Spain
| | - Ferran Sanz
- Research Group on Integrative Biomedical Informatics (GRIB), Department of Medicine and Life Sciences (MELIS), Universitat Pompeu Fabra, Hospital del Mar Research Institute, Barcelona, Spain
| | | | - Estela Camara
- Cognition and Brain Plasticity Unit, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), Barcelona, Spain
| |
Collapse
|
2
|
Yu CS, Wu JL, Shih CM, Chiu KL, Chen YD, Chang TH. Exploring Mortality and Prognostic Factors of Heart Failure with In-Hospital and Emergency Patients by Electronic Medical Records: A Machine Learning Approach. Risk Manag Healthc Policy 2025; 18:77-93. [PMID: 39807211 PMCID: PMC11727332 DOI: 10.2147/rmhp.s488159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Accepted: 12/30/2024] [Indexed: 01/16/2025] Open
Abstract
Purpose As HF progresses into advanced HF, patients experience a poor quality of life, distressing symptoms, intensive care use, social distress, and eventual hospital death. We aimed to investigate the relationship between morality and potential prognostic factors among in-patient and emergency patients with HF. Patients and Methods A case series study: Data are collected from in-hospital and emergency care patients from 2014 to 2021, including their international classification of disease at admission, and laboratory data such as blood count, liver and renal functions, lipid profile, and other biochemistry from the hospital's electrical medical records. After a series of data pre-processing in the electronic medical record system, several machine learning models were used to evaluate predictions of HF mortality. The outcomes of those potential risk factors were visualized by different statistical analyses. Results In total, 3871 hF patients were enrolled. Logistic regression showed that intensive care unit (ICU) history within 1 week (OR: 9.765, 95% CI: 6.65, 14.34; p-value < 0.001) and prothrombin time (OR: 1.193, 95% CI: 1.098, 1.296; <0.001) were associated with mortality. Similar results were obtained when we analyzed the data using Cox regression instead of logistic regression. Random forest, support vector machine (SVM), Adaboost, and logistic regression had better overall performances with areas under the receiver operating characteristic curve (AUROCs) of >0.87. Naïve Bayes was the best in terms of both specificity and precision. With ensemble learning, age, ICU history within 1 week, and respiratory rate (BF) were the top three compelling risk factors affecting mortality due to HF. To improve the explainability of the AI models, Shapley Additive Explanations methods were also conducted. Conclusion Exploring HF mortality and its patterns related to clinical risk factors by machine learning models can help physicians make appropriate decisions when monitoring HF patients' health quality in the hospital.
Collapse
Affiliation(s)
- Cheng-Sheng Yu
- Graduate Institute of Data Science, College of Management, Taipei Medical University, New Taipei City, 235603, Taiwan
- Clinical Data Center, Office of Data Science, Taipei Medical University, New Taipei City, 235603, Taiwan
- Fintech Innovation Center, Nan Shan Life Insurance Co., Ltd., Taipei, 11049, Taiwan
- Beyond Lab, Nan Shan Life Insurance Co., Ltd., Taipei, 11049, Taiwan
| | - Jenny L Wu
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, New Taipei City, 235603, Taiwan
| | - Chun-Ming Shih
- Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University, Taipei, 11031, Taiwan
- Cardiovascular Research Center, Taipei Medical University Hospital, Taipei, 11031, Taiwan
- Taipei Heart Institute, Taipei Medical University, Taipei, 11031, Taiwan
| | - Kuan-Lin Chiu
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, 11031, Taiwan
| | - Yu-Da Chen
- Department of Family Medicine, Taipei Medical University Hospital, Taipei, 11031, Taiwan
- School of Medicine, College of Medicine, Taipei Medical University, Taipei, 11031, Taiwan
| | - Tzu-Hao Chang
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, New Taipei City, 235603, Taiwan
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, 11031, Taiwan
| |
Collapse
|
3
|
Kaushal P, Singh S, Vijayvergiya R. A Kernel Attention-based Transformer Model for Survival Prediction of Heart Disease Patients. J Cardiovasc Transl Res 2024; 17:1295-1306. [PMID: 39103715 DOI: 10.1007/s12265-024-10537-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 06/14/2024] [Indexed: 08/07/2024]
Abstract
Survival analysis is employed to scrutinize time-to-event data, with emphasis on comprehending the duration until the occurrence of a specific event. In this article, we introduce two novel survival prediction models: CosAttnSurv and CosAttnSurv + DyACT. CosAttnSurv model leverages transformer-based architecture and a softmax-free kernel attention mechanism for survival prediction. Our second model, CosAttnSurv + DyACT, enhances CosAttnSurv with Dynamic Adaptive Computation Time (DyACT) control, optimizing computation efficiency. The proposed models are validated using two public clinical datasets related to heart disease patients. When compared to other state-of-the-art models, our models demonstrated an enhanced discriminative and calibration performance. Furthermore, in comparison to other transformer architecture-based models, our proposed models demonstrate comparable performance while exhibiting significant reduction in both time and memory requirements. Overall, our models offer significant advancements in the field of survival analysis and emphasize the importance of computationally effective time-based predictions, with promising implications for medical decision-making and patient care.
Collapse
Affiliation(s)
- Palak Kaushal
- Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Sector-12, Chandigarh, 160012, Chandigarh, India.
| | - Shailendra Singh
- Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Sector-12, Chandigarh, 160012, Chandigarh, India
| | - Rajesh Vijayvergiya
- Advanced Cardiac Centre, Post Graduate Institute of Medical Education and Research (PGIMER), Sector 12, Chandigarh, 160012, Chandigarh, India
| |
Collapse
|
4
|
Zhang H, Mu R. Refining heart disease prediction accuracy using hybrid machine learning techniques with novel metaheuristic algorithms. Int J Cardiol 2024; 416:132506. [PMID: 39218253 DOI: 10.1016/j.ijcard.2024.132506] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 08/06/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024]
Abstract
Early diagnosis of heart disease is crucial, as it's one of the leading causes of death globally. Machine learning algorithms can be a powerful tool in achieving this goal. Therefore, this article aims to increase the accuracy of predicting heart disease using machine learning algorithms. Five classification models are explored: eXtreme Gradient Boosting (XGBC), Random Forest Classifier (RFC), Decision Tree Classifier (DTC), K-Nearest Neighbors Classifier (KNNC), and Logistic Regression Classifier (LRC). Additionally, four optimizers are evaluated: Slime mold Optimization Algorithm, Forest Optimization Algorithm, Pathfinder algorithm, and Giant Armadillo Optimization. To ensure robust model selection, a feature selection technique utilizing k-fold cross-validation is employed. This method identifies the most relevant features from the data, potentially improving model performance. The top three performing models are then coupled with the optimization algorithms to potentially enhance their generalizability and accuracy in predicting heart failure. In the final stage, the shortlisted models (XGBC, RFC, and DTC) were assessed using performance metrics like accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). This rigorous evaluation identified the XGGA hybrid model as the top performer, demonstrating its effectiveness in predicting heart failure. XGGA achieved impressive metrics, with an accuracy, precision, recall, and F1-score of 0.972 in the training phase, underscoring its robustness. Notably, the model's predictions deviated by less than 5.5 % for patients classified as alive and by less than 1.2 % for those classified as deceased compared to the actual outcomes, reflecting minimal error and high predictive reliability. In contrast, the DTC base model was the least effective, with an accuracy of 0.840 and a precision of 0.847. Overall, the optimization using the GAO algorithm significantly enhanced the performance of the models, highlighting the benefits of this approach.
Collapse
Affiliation(s)
- Haifeng Zhang
- The first people's Hospital of Baiyin, Baiyin, Gansu 730900, China
| | - Rui Mu
- The second people's Hospital of Baiyin, Baiyin, Gansu 730900, China.
| |
Collapse
|
5
|
Behera TK, Sathia S, Panigrahi S, Naik PK. Revolutionizing cardiovascular disease classification through machine learning and statistical methods. J Biopharm Stat 2024:1-23. [PMID: 39582240 DOI: 10.1080/10543406.2024.2429524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 11/09/2024] [Indexed: 11/26/2024]
Abstract
BACKGROUND Cardiovascular diseases (CVDs) include abnormal conditions of the heart, diseased blood vessels, structural problems of the heart, and blood clots. Traditionally, CVD has been diagnosed by clinical experts, physicians, and medical specialists, which is expensive, time-consuming, and requires expert intervention. On the other hand, cost-effective digital diagnosis of CVD is now possible because of the emergence of machine learning (ML) and statistical techniques. METHOD In this research, extensive studies were carried out to classify CVD via 19 promising ML models. To evaluate the performance and rank the ML models for CVD classification, two benchmark CVD datasets are considered from well-known sources, such as Kaggle and the UCI repository. The results are analysed considering individual datasets and their combination to assess the efficiency and reliability of ML models on the basis of various performance measures, such as precision, kappa, accuracy, recall, and the F1 score. Since some of the ML models are stochastic, we repeated the simulation 50 times for each dataset using each model and applied nonparametric statistical tests to draw decisive conclusions. RESULTS The nonparametric Friedman - Nemenyi hypothesis test suggests that the Extra Tree Classifier provides statistically superior accuracy and precision compared with all other models. However, the Extreme Gradient Boost (XGBoost) classifier provides statistically superior recall, kappa, and F1 scores compared with those of all the other models. Additionally, the XGBRF classifier achieves a statistically second-best rank in terms of the recall measures.
Collapse
Affiliation(s)
- Tapan Kumar Behera
- Centre of Excellence in Natural Products and Therapeutics, Department of Biotechnology and Bioinformatics, Sambalpur University, Jyoti Vihar, Burla, Sambalpur, Odisha, India
| | - Siddhartha Sathia
- Department of Cardiothoracic Surgery (CTVS), All India Institute of Medical Sciences, Sijua, Patrapada, Bhubaneswar, Odisha, India
| | - Sibarama Panigrahi
- Department of Computer Science & Engineering (CSE), National Institute of Technology, Rourkela, Odisha, India
| | - Pradeep Kumar Naik
- Centre of Excellence in Natural Products and Therapeutics, Department of Biotechnology and Bioinformatics, Sambalpur University, Jyoti Vihar, Burla, Sambalpur, Odisha, India
| |
Collapse
|
6
|
Guo L, Liu L, Li T, Cai L, Hu L, Zhou Y. Association between Serum Albumin-to-Creatinine Ratio and Readmission in Elderly Heart Failure Patients: A Retrospective Cohort Study. Gerontology 2024; 71:28-38. [PMID: 39557035 DOI: 10.1159/000542616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Accepted: 11/08/2024] [Indexed: 11/20/2024] Open
Abstract
INTRODUCTION This study aimed to investigate the relationship between the serum albumin-to-serum creatinine ratio (sACR) and readmission in elderly heart failure patients. METHODS We conducted a retrospective cohort study using data from the PhysioNet Restricted Health Data database. The exposure variable was sACR and the outcome variable readmission. Multivariate logistic regression and subgroup analyses were performed to assess the independent association between sACR and readmission. Smooth curve fits were applied to examine the nonlinear relationship. We employed multiple imputation and E-value sensitivity analyses to assess the robustness of our results. RESULTS Our study included 1,725 participants, of whom 40.6% were male, 59.2% were aged 60-79 years, and 40.8% were aged 80 years and older. After adjusting for potential confounders, we found that for each unit increase in sACR, the 28-day readmission rate decreased by 48% (odds ratio [OR] = 0.52, 95% CI: 0.29-0.95, p = 0.003). The 28-day readmission rate was significantly higher in the low sACR group (sACR <0.32) than in the high sACR group (sACR >0.51) (OR = 0.47, 95% CI: 0.3-0.76, p = 0.002). Similar results were observed for 3-month and 9-month readmission. Subgroup analysis showed no significant interactions. A nonlinear relationship was observed between the sACR and readmission. Sensitivity analyses have confirmed the robustness of our results. CONCLUSION There is a negative association between sACR and readmission in Chinese heart failure patients. Our study may offer novel insights into the management of heart failure readmissions.
Collapse
Affiliation(s)
- Leilei Guo
- Department of Cardiology, Jiang You People's Hospital, Mianyang, China
| | - Li Liu
- Department of Cardiology, Jiang You People's Hospital, Mianyang, China
| | - Tianwen Li
- Department of Cardiology, Jiang You People's Hospital, Mianyang, China
| | - Lina Cai
- Department of Cardiology, Jiang You People's Hospital, Mianyang, China
| | - Li Hu
- Department of Cardiology, Jiang You People's Hospital, Mianyang, China
| | - Yueshan Zhou
- Department of Cardiology, Jiang You People's Hospital, Mianyang, China
| |
Collapse
|
7
|
Delgado R, Fernández-Peláez F, Pallarés N, Diaz-Brito V, Izquierdo E, Oriol I, Simonetti A, Tebé C, Videla S, Carratalà J. Predictive risk models for COVID-19 patients using the multi-thresholding meta-algorithm. Sci Rep 2024; 14:28453. [PMID: 39557887 PMCID: PMC11574063 DOI: 10.1038/s41598-024-77386-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 10/22/2024] [Indexed: 11/20/2024] Open
Abstract
This study aims to develop a Machine Learning model to assess the risks faced by COVID-19 patients in a hospital setting, focusing specifically on predicting the complications leading to Intensive Care Unit (ICU) admission or mortality, which are minority classes compared to the majority class of discharged patients. We operate within a multiclass framework comprising three distinct classes, and address the challenge of dataset imbalance, a common source of model bias. To effectively manage this, we introduce the Multi-Thresholding meta-algorithm (MTh), an innovative output-level methodology that extends traditional thresholding from binary to multiclass classification. This methodology dynamically adjusts class probabilities using misclassification costs, making it highly effective in imbalanced datasets. Our approach is further enhanced by integrating the simplicity, transparency, and effectiveness of Bayesian networks to create a robust predictive model. Using patient admission data, the model accurately identifies key risk and protective factors for COVID-19 outcomes. Our findings indicate that certain patient characteristics, such as high Charlson Index and pre-existing conditions, significantly influence the risk of ICU admission and mortality. Moreover, we introduce an explanatory model that elucidates the interrelationships among these factors, demonstrating the influence of therapeutic limits on the overall risk assessment of COVID-19 patients. Overall, our research provides a significant contribution to the field of Machine Learning by offering a novel solution for multiclass classification in the context of imbalanced datasets. This model not only enhances predictive accuracy but also supports critical decision-making processes in healthcare, potentially improving patient outcomes and optimizing clinical resource allocation.
Collapse
Affiliation(s)
- Rosario Delgado
- Department of Mathematics, Universitat Autònoma de Barcelona, Barcelona, Spain.
| | | | - Natàlia Pallarés
- Biostatistics Unit of the Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
- Department of Basic Clinical Practice, School of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain
| | - Vicens Diaz-Brito
- Department of Infectious Diseases, Parc Sanitari S. Joan de Deu, Sant Boi de Llobregat, Barcelona, Spain
| | | | - Isabel Oriol
- Department of Clinical Sciences, School of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain
- Bellvitge Biomedical Research Institute, Barcelona, Spain
- Unitat Malalties Infeccioses, Servei Medicina Interna, Consorci Sanitari Integral, Barcelona, Spain
| | - Antonella Simonetti
- Àrea de Recerca, Consorci Sanitari Alt Penedès Garraf, Barcelona, Spain
- CIBERINFEC, Instituto de Salud Carlos III, Sevilla, Spain
- Infectious Disease Unit, Hospital de la Santa Creu i Sant Pau, Barcelona, Spain
| | - Cristian Tebé
- Biostatistics Unit of the Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Spain
- Department of Clinical Sciences, School of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain
| | - Sebastià Videla
- Department of Clinical Pharmacology, Bellvitge University Hospital, Barcelona, Spain
- Department of Pathology and Exp. Therapeutics, School Medicine and Health Sci., University of Barcelona, Barcelona, Spain
| | - Jordi Carratalà
- Department of Clinical Sciences, School of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain
- Bellvitge Biomedical Research Institute, Barcelona, Spain
- Department of Infetious Diseases, Bellvitge University Hospital, Barcelona, Spain
- CIBERINFEC, Instituto de Salud Carlos III, Sevilla, Spain
| |
Collapse
|
8
|
Visco V, Robustelli A, Loria F, Rispoli A, Palmieri F, Bramanti A, Carrizzo A, Vecchione C, Palmieri F, Ciccarelli M, D'Angelo G. An explainable model for predicting Worsening Heart Failure based on genetic programming. Comput Biol Med 2024; 182:109110. [PMID: 39243517 DOI: 10.1016/j.compbiomed.2024.109110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 09/02/2024] [Accepted: 09/02/2024] [Indexed: 09/09/2024]
Abstract
Heart Failure (HF) poses a challenge for our health systems, and early detection of Worsening HF (WHF), defined as a deterioration in symptoms and clinical and instrumental signs of HF, is vital to improving prognosis. Predicting WHF in a phase that is currently undiagnosable by physicians would enable prompt treatment of such events in patients at a higher risk of WHF. Although the role of Artificial Intelligence in cardiovascular diseases is becoming part of clinical practice, especially for diagnostic and prognostic purposes, its usage is often considered not completely reliable due to the incapacity of these models to provide a valid explanation about their output results. Physicians are often reluctant to make decisions based on unjustified results and see these models as black boxes. This study aims to develop a novel diagnostic model capable of predicting WHF while also providing an easy interpretation of the outcomes. We propose a threshold-based binary classifier built on a mathematical model derived from the Genetic Programming approach. This model clearly indicates that WHF is closely linked to creatinine, sPAP, and CAD, even though the relationship of these variables and WHF is almost complex. However, the proposed mathematical model allows for providing a 3D graphical representation, which medical staff can use to better understand the clinical situation of patients. Experiments conducted using retrospectively collected data from 519 patients treated at the HF Clinic of the University Hospital of Salerno have demonstrated the effectiveness of our model, surpassing the most commonly used machine learning algorithms. Indeed, the proposed GP-based classifier achieved a 96% average score for all considered evaluation metrics and fully supported the controls of medical staff. Our solution has the potential to impact clinical practice for HF by identifying patients at high risk of WHF and facilitating more rapid diagnosis, targeted treatment, and a reduction in hospitalizations.
Collapse
Affiliation(s)
- Valeria Visco
- Department of Medicine, Surgery and Dentistry, University of Salerno, Via S. Allende, Baronissi (SA), 84081, Italy
| | - Antonio Robustelli
- Department of Computer Science, University of Salerno, Via Giovanni Paolo II, 132, Fisciano (SA), 84084, Italy
| | - Francesco Loria
- Department of Medicine, Surgery and Dentistry, University of Salerno, Via S. Allende, Baronissi (SA), 84081, Italy
| | - Antonella Rispoli
- University Hospital San Giovanni di Dio e Ruggi d'Aragona, Largo Città Ippocrate, Salerno, 84131, Italy
| | - Francesca Palmieri
- Department of Medicine, Surgery and Dentistry, University of Salerno, Via S. Allende, Baronissi (SA), 84081, Italy
| | - Alessia Bramanti
- Department of Medicine, Surgery and Dentistry, University of Salerno, Via S. Allende, Baronissi (SA), 84081, Italy; University Hospital San Giovanni di Dio e Ruggi d'Aragona, Largo Città Ippocrate, Salerno, 84131, Italy
| | - Albino Carrizzo
- Department of Medicine, Surgery and Dentistry, University of Salerno, Via S. Allende, Baronissi (SA), 84081, Italy; Vascular Physiopathology Unit, IRCCS Neuromed Mediterranean Neurological Institute, Via Atinense, 18, Pozzilli (IS), 86077, Italy
| | - Carmine Vecchione
- Department of Medicine, Surgery and Dentistry, University of Salerno, Via S. Allende, Baronissi (SA), 84081, Italy; University Hospital San Giovanni di Dio e Ruggi d'Aragona, Largo Città Ippocrate, Salerno, 84131, Italy; Vascular Physiopathology Unit, IRCCS Neuromed Mediterranean Neurological Institute, Via Atinense, 18, Pozzilli (IS), 86077, Italy
| | - Francesco Palmieri
- Department of Computer Science, University of Salerno, Via Giovanni Paolo II, 132, Fisciano (SA), 84084, Italy
| | - Michele Ciccarelli
- Department of Medicine, Surgery and Dentistry, University of Salerno, Via S. Allende, Baronissi (SA), 84081, Italy; University Hospital San Giovanni di Dio e Ruggi d'Aragona, Largo Città Ippocrate, Salerno, 84131, Italy
| | - Gianni D'Angelo
- Department of Computer Science, University of Salerno, Via Giovanni Paolo II, 132, Fisciano (SA), 84084, Italy.
| |
Collapse
|
9
|
Hidayaturrohman QA, Hanada E. Predictive Analytics in Heart Failure Risk, Readmission, and Mortality Prediction: A Review. Cureus 2024; 16:e73876. [PMID: 39697926 PMCID: PMC11652958 DOI: 10.7759/cureus.73876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/15/2024] [Indexed: 12/20/2024] Open
Abstract
Heart failure is a leading cause of death among people worldwide. The cost of treatment can be prohibitive, and early prediction of heart failure would reduce treatment costs to patients and hospitals. Improved readmission prediction would also greatly help hospitals, allowing them to manage their treatment programs and budgets better. This literature review aims to summarize recent studies of predictive analytics models that have been constructed to predict heart failure risk, readmission, and mortality. Random forest, logistic regression, neural networks, and XGBoost were among the most common modeling techniques applied. Most selected studies leveraged structured electronic health record data, including demographics, clinical values, lifestyle, and comorbidities, with some incorporating unstructured clinical notes. Preprocessing through imputation and feature selection were frequently employed in building the predictive analytics models. The reviewed studies exhibit demonstrated promise for predictive analytics in improving early heart failure diagnosis, readmission risk stratification, and mortality prediction. This review study highlights rising research activities and the potential of predictive analytics, especially the implementation of machine learning, in advancing heart failure outcomes. Further rigorous, comprehensive syntheses and head-to-head benchmarking of predictive models are needed to derive robust evidence for clinical adoption.
Collapse
Affiliation(s)
- Qisthi A Hidayaturrohman
- Graduate School of Science and Engineering, Saga University, Saga, JPN
- Department of Electrical Engineering, Universitas Pembangunan Nasional Veteran Jakarta, Jakarta, IDN
| | - Eisuke Hanada
- Faculty of Science and Engineering, Saga University, Saga, JPN
| |
Collapse
|
10
|
Koh HYK, Lam UTF, Ban KHK, Chen ES. Machine learning optimized DriverDetect software for high precision prediction of deleterious mutations in human cancers. Sci Rep 2024; 14:22618. [PMID: 39349509 PMCID: PMC11442673 DOI: 10.1038/s41598-024-71422-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 08/28/2024] [Indexed: 10/02/2024] Open
Abstract
The detection of cancer-driving mutations is important for understanding cancer pathology and therapeutics development. Prediction tools have been created to streamline the computation process. However, most tools available have heterogeneous sensitivity or specificity. We built a machine learning-derived algorithm, DriverDetect that combines the outputs of seven pre-existing tools to improve the prediction of candidate driver cancer mutations. The algorithm was trained with cancer gene-specific mutation datasets of cancer patients to identify cancer drivers. DriverDetect performed better than the individual tools or their combinations in the validation test. It has the potential to incorporate future novel prediction algorithms and can be retrained with new datasets, offering an expanded application to pan-cancer analysis for cross-cancer study. (115 words).
Collapse
Affiliation(s)
- Herrick Yu Kan Koh
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Ulysses Tsz Fung Lam
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Kenneth Hon-Kim Ban
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- National University Health System (NUHS), Singapore, Singapore.
- NUS Center for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
| | - Ee Sin Chen
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- National University Health System (NUHS), Singapore, Singapore.
- NUS Center for Cancer Research, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
- Integrative Sciences and Engineering Programme, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
11
|
Hinrichs N, Meyer A, Koehler K, Kaas T, Hiddemann M, Spethmann S, Balzer F, Eickhoff C, Falk V, Hindricks G, Dagres N, Koehler F. Artificial intelligence based real-time prediction of imminent heart failure hospitalisation in patients undergoing non-invasive telemedicine. Front Cardiovasc Med 2024; 11:1457995. [PMID: 39371396 PMCID: PMC11449733 DOI: 10.3389/fcvm.2024.1457995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Accepted: 09/09/2024] [Indexed: 10/08/2024] Open
Abstract
Background Remote patient management may improve prognosis in heart failure. Daily review of transmitted data for early recognition of patients at risk requires substantial resources that represent a major barrier to wide implementation. An automated analysis of incoming data for detection of risk for imminent events would allow focusing on patients requiring prompt medical intervention. Methods We analysed data of the Telemedical Interventional Management in Heart Failure II (TIM-HF2) randomized trial that were collected during quarterly in-patient visits and daily transmissions from non-invasive monitoring devices. By application of machine learning, we developed and internally validated a risk score for heart failure hospitalisation within seven days following data transmission as estimate of short-term patient risk for adverse heart failure events. Score performance was assessed by the area under the receiver-operating characteristic (ROCAUC) and compared with a conventional algorithm, a heuristic rule set originally applied in the randomized trial. Results The machine learning model significantly outperformed the conventional algorithm (ROCAUC 0.855 vs. 0.727, p < 0.001). On average, the machine learning risk score increased continuously in the three weeks preceding heart failure hospitalisations, indicating potential for early detection of risk. In a simulated one-year scenario, daily review of only the one third of patients with the highest machine learning risk score would have led to detection of 95% of HF hospitalisations occurring within the following seven days. Conclusions A machine learning model allowed automated analysis of incoming remote monitoring data and reliable identification of patients at risk of heart failure hospitalisation requiring immediate medical intervention. This approach may significantly reduce the need for manual data review.
Collapse
Affiliation(s)
- Nils Hinrichs
- Department of Cardiothoracic and Vascular Surgery, Deutsches Herzzentrum der Charité, Berlin, Germany
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Alexander Meyer
- Department of Cardiothoracic and Vascular Surgery, Deutsches Herzzentrum der Charité, Berlin, Germany
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute of Health, Charité – Universitätsmedizin Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data (BIFOLD), Technical University of Berlin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Partner Site Berlin, Berlin, Germany
| | - Kerstin Koehler
- Centre for Cardiovascular Telemedicine, Deutsches Herzzentrum der Charité, Berlin, Germany
| | - Thomas Kaas
- Centre for Cardiovascular Telemedicine, Deutsches Herzzentrum der Charité, Berlin, Germany
| | - Meike Hiddemann
- Centre for Cardiovascular Telemedicine, Deutsches Herzzentrum der Charité, Berlin, Germany
| | - Sebastian Spethmann
- Department of Cardiology, Angiology, and Intensive Care Medicine, Deutsches Herzzentrum der Charité, Berlin, Germany
| | - Felix Balzer
- Institute of Medical Informatics, Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Carsten Eickhoff
- Institute for Bioinformatics and Medical Informatics, Eberhard-Karls-Universität Tübingen, Tübingen, Germany
| | - Volkmar Falk
- Department of Cardiothoracic and Vascular Surgery, Deutsches Herzzentrum der Charité, Berlin, Germany
- Berlin Institute of Health, Charité – Universitätsmedizin Berlin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Partner Site Berlin, Berlin, Germany
- Department of Health Sciences and Technology, Translational Cardiovascular Technologies, Eidgenössische Technische Hochschule Zürich, Zürich, Switzerland
| | - Gerhard Hindricks
- German Centre for Cardiovascular Research (DZHK), Partner Site Berlin, Berlin, Germany
- Department of Cardiology, Angiology, and Intensive Care Medicine, Deutsches Herzzentrum der Charité, Berlin, Germany
| | - Nikolaos Dagres
- Department of Cardiology, Angiology, and Intensive Care Medicine, Deutsches Herzzentrum der Charité, Berlin, Germany
| | - Friedrich Koehler
- Centre for Cardiovascular Telemedicine, Deutsches Herzzentrum der Charité, Berlin, Germany
| |
Collapse
|
12
|
Kirdeev A, Burkin K, Vorobev A, Zbirovskaya E, Lifshits G, Nikolaev K, Zelenskaya E, Donnikov M, Kovalenko L, Urvantseva I, Poptsova M. Machine learning models for predicting risks of MACEs for myocardial infarction patients with different VEGFR2 genotypes. Front Med (Lausanne) 2024; 11:1452239. [PMID: 39301488 PMCID: PMC11410707 DOI: 10.3389/fmed.2024.1452239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Accepted: 08/19/2024] [Indexed: 09/22/2024] Open
Abstract
Background The development of prognostic models for the identification of high-risk myocardial infarction (MI) patients is a crucial step toward personalized medicine. Genetic factors are known to be associated with an increased risk of cardiovascular diseases; however, little is known about whether they can be used to predict major adverse cardiac events (MACEs) for MI patients. This study aimed to build a machine learning (ML) model to predict MACEs in MI patients based on clinical, imaging, laboratory, and genetic features and to assess the influence of genetics on the prognostic power of the model. Methods We analyzed the data from 218 MI patients admitted to the emergency department at the Surgut District Center for Diagnostics and Cardiovascular Surgery, Russia. Upon admission, standard clinical measurements and imaging data were collected for each patient. Additionally, patients were genotyped for VEGFR-2 variation rs2305948 (C/C, C/T, T/T genotypes with T being the minor risk allele). The study included a 9-year follow-up period during which major ischemic events were recorded. We trained and evaluated various ML models, including Gradient Boosting, Random Forest, Logistic Regression, and AutoML. For feature importance analysis, we applied the sequential feature selection (SFS) and Shapley's scheme of additive explanation (SHAP) methods. Results The CatBoost algorithm, with features selected using the SFS method, showed the best performance on the test cohort, achieving a ROC AUC of 0.813. Feature importance analysis identified the dose of statins as the most important factor, with the VEGFR-2 genotype among the top 5. The other important features are coronary artery lesions (coronary artery stenoses ≥70%), left ventricular (LV) parameters such as lateral LV wall and LV mass, diabetes, type of revascularization (CABG or PCI), and age. We also showed that contributions are additive and that high risk can be determined by cumulative negative effects from different prognostic factors. Conclusion Our ML-based approach demonstrated that the VEGFR-2 genotype is associated with an increased risk of MACEs in MI patients. However, the risk can be significantly reduced by high-dose statins and positive factors such as the absence of coronary artery lesions, absence of diabetes, and younger age.
Collapse
Affiliation(s)
- Alexander Kirdeev
- Faculty of Computer Science, AI and Digital Science Institute, International Laboratory of Bioinformatics, Higher School of Economics University, Moscow, Russia
| | - Konstantin Burkin
- Faculty of Computer Science, AI and Digital Science Institute, International Laboratory of Bioinformatics, Higher School of Economics University, Moscow, Russia
| | - Anton Vorobev
- Department of Cardiology, Surgut State University, Surgut, Russia
| | - Elena Zbirovskaya
- Faculty of Computer Science, AI and Digital Science Institute, International Laboratory of Bioinformatics, Higher School of Economics University, Moscow, Russia
| | - Galina Lifshits
- Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, Russia
| | - Konstantin Nikolaev
- Federal Research Center Institute of Cytology and Genetics, Novosibirsk, Russia
| | - Elena Zelenskaya
- Department of Cardiology, Surgut State University, Surgut, Russia
| | - Maxim Donnikov
- Department of Cardiology, Surgut State University, Surgut, Russia
| | - Lyudmila Kovalenko
- Department of General Pathology and Pathophysiology, Surgut State University, Surgut, Russia
| | - Irina Urvantseva
- Department of Cardiology, Surgut State University, Surgut, Russia
- Ugra Center for Diagnostics and Cardiovascular Surgery, Surgut, Russia
| | - Maria Poptsova
- Faculty of Computer Science, AI and Digital Science Institute, International Laboratory of Bioinformatics, Higher School of Economics University, Moscow, Russia
| |
Collapse
|
13
|
Straw I, Rees G, Nachev P. Sex-Based Performance Disparities in Machine Learning Algorithms for Cardiac Disease Prediction: Exploratory Study. J Med Internet Res 2024; 26:e46936. [PMID: 39186324 PMCID: PMC11384168 DOI: 10.2196/46936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Revised: 10/13/2023] [Accepted: 05/04/2024] [Indexed: 08/27/2024] Open
Abstract
BACKGROUND The presence of bias in artificial intelligence has garnered increased attention, with inequities in algorithmic performance being exposed across the fields of criminal justice, education, and welfare services. In health care, the inequitable performance of algorithms across demographic groups may widen health inequalities. OBJECTIVE Here, we identify and characterize bias in cardiology algorithms, looking specifically at algorithms used in the management of heart failure. METHODS Stage 1 involved a literature search of PubMed and Web of Science for key terms relating to cardiac machine learning (ML) algorithms. Papers that built ML models to predict cardiac disease were evaluated for their focus on demographic bias in model performance, and open-source data sets were retained for our investigation. Two open-source data sets were identified: (1) the University of California Irvine Heart Failure data set and (2) the University of California Irvine Coronary Artery Disease data set. We reproduced existing algorithms that have been reported for these data sets, tested them for sex biases in algorithm performance, and assessed a range of remediation techniques for their efficacy in reducing inequities. Particular attention was paid to the false negative rate (FNR), due to the clinical significance of underdiagnosis and missed opportunities for treatment. RESULTS In stage 1, our literature search returned 127 papers, with 60 meeting the criteria for a full review and only 3 papers highlighting sex differences in algorithm performance. In the papers that reported sex, there was a consistent underrepresentation of female patients in the data sets. No papers investigated racial or ethnic differences. In stage 2, we reproduced algorithms reported in the literature, achieving mean accuracies of 84.24% (SD 3.51%) for data set 1 and 85.72% (SD 1.75%) for data set 2 (random forest models). For data set 1, the FNR was significantly higher for female patients in 13 out of 16 experiments, meeting the threshold of statistical significance (-17.81% to -3.37%; P<.05). A smaller disparity in the false positive rate was significant for male patients in 13 out of 16 experiments (-0.48% to +9.77%; P<.05). We observed an overprediction of disease for male patients (higher false positive rate) and an underprediction of disease for female patients (higher FNR). Sex differences in feature importance suggest that feature selection needs to be demographically tailored. CONCLUSIONS Our research exposes a significant gap in cardiac ML research, highlighting that the underperformance of algorithms for female patients has been overlooked in the published literature. Our study quantifies sex disparities in algorithmic performance and explores several sources of bias. We found an underrepresentation of female patients in the data sets used to train algorithms, identified sex biases in model error rates, and demonstrated that a series of remediation techniques were unable to address the inequities present.
Collapse
Affiliation(s)
- Isabel Straw
- University College London, London, United Kingdom
| | - Geraint Rees
- University College London, London, United Kingdom
| | | |
Collapse
|
14
|
Talaat FM, Elnaggar AR, Shaban WM, Shehata M, Elhosseini M. CardioRiskNet: A Hybrid AI-Based Model for Explainable Risk Prediction and Prognosis in Cardiovascular Disease. Bioengineering (Basel) 2024; 11:822. [PMID: 39199780 PMCID: PMC11351968 DOI: 10.3390/bioengineering11080822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Accepted: 08/08/2024] [Indexed: 09/01/2024] Open
Abstract
The global prevalence of cardiovascular diseases (CVDs) as a leading cause of death highlights the imperative need for refined risk assessment and prognostication methods. The traditional approaches, including the Framingham Risk Score, blood tests, imaging techniques, and clinical assessments, although widely utilized, are hindered by limitations such as a lack of precision, the reliance on static risk variables, and the inability to adapt to new patient data, thereby necessitating the exploration of alternative strategies. In response, this study introduces CardioRiskNet, a hybrid AI-based model designed to transcend these limitations. The proposed CardioRiskNet consists of seven parts: data preprocessing, feature selection and encoding, eXplainable AI (XAI) integration, active learning, attention mechanisms, risk prediction and prognosis, evaluation and validation, and deployment and integration. At first, the patient data are preprocessed by cleaning the data, handling the missing values, applying a normalization process, and extracting the features. Next, the most informative features are selected and the categorical variables are converted into a numerical form. Distinctively, CardioRiskNet employs active learning to iteratively select informative samples, enhancing its learning efficacy, while its attention mechanism dynamically focuses on the relevant features for precise risk prediction. Additionally, the integration of XAI facilitates interpretability and transparency in the decision-making processes. According to the experimental results, CardioRiskNet demonstrates superior performance in terms of accuracy, sensitivity, specificity, and F1-Score, with values of 98.7%, 98.7%, 99%, and 98.7%, respectively. These findings show that CardioRiskNet can accurately assess and prognosticate the CVD risk, demonstrating the power of active learning and AI to surpass the conventional methods. Thus, CardioRiskNet's novel approach and high performance advance the management of CVDs and provide healthcare professionals a powerful tool for patient care.
Collapse
Affiliation(s)
- Fatma M. Talaat
- Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh 33516, Egypt;
- Faculty of Computer Science & Engineering, New Mansoura University, Gamasa 35712, Egypt
| | | | - Warda M. Shaban
- Communications and Electronics Engineering Department, Nile Higher Institute for Engineering and Technology, Mansoura 35511, Egypt;
| | - Mohamed Shehata
- Department of Bioengineering, Speed School of Engineering, University of Louisville, Louisville, KY 40292, USA
- Computers and Control Systems Engineering Department, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt;
| | - Mostafa Elhosseini
- Computers and Control Systems Engineering Department, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt;
| |
Collapse
|
15
|
Anuforo A, Evbayekha E, Agwuegbo C, Okafor TL, Antia A, Adabale O, Ugoala OS, Okorare O, Phagoora J, Alagbo HO, Shamaki GR, Disreal Bob-Manuel T. Superficial Venous Disease-An Updated Review. Ann Vasc Surg 2024; 105:106-124. [PMID: 38583765 DOI: 10.1016/j.avsg.2024.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/04/2024] [Accepted: 01/07/2024] [Indexed: 04/09/2024]
Abstract
BACKGROUND This review article provides an updated review of a relatively common pathology with various manifestations. Superficial venous diseases (SVDs) are a broad spectrum of venous vascular disease that predominantly affects the body's lower extremities. The most serious manifestation of this disease includes varicose veins, chronic venous insufficiency, stasis dermatitis, venous ulcers, superficial venous thrombosis, reticular veins, and spider telangiectasias. METHODS The anatomy, pathophysiology, and risk factors of SVD were discussed during this review. The risk factors for developing SVD were related to race, age, sex, lifestyle, and certain genetic conditions as well as comorbid deep vein thrombosis. Various classification systems were listed, focusing on the most common one-the revised Clinical-Etiology-Anatomy-Pathophysiology classification. The clinical features including history and physical examination findings elicited in SVD were outlined. RESULTS Imaging modalities utilized in SVD were highlighted. Duplex ultrasound is the first line in evaluating SVD but magnetic resonance imaging and computed tomography venography, plethysmography, and conventional venography are feasible options in the event of an ambiguous venous duplex ultrasound study. Treatment options highlighted in this review ranged from conservative treatment with compression stockings, which could be primary or adjunctive to pharmacologic topical and systemic agents such as azelaic acid, diuretics, plant extracts, medical foods, nonsteroidal anti-inflammatory drugs, anticoagulants and skin substitutes for different stages of SVD. Interventional treatment modalities include thermal ablative techniques like radiofrequency ablationss, endovenous laser ablation, endovenous steam ablation, and endovenous microwave ablation as well as nonthermal strategies such as the Varithena (polidocanol microfoam) sclerotherapy, VenaSeal (cyanoacrylate) ablation, and Endovenous mechanochemical ablation. Surgical treatments are also available and include debridement, vein ligation, stripping, and skin grafting. CONCLUSIONS SVDs are prevalent and have varied manifestations predominantly in the lower extremities. Several studies highlight the growing clinical and financial burden of these diseases. This review provides an update on the pathophysiology, classification, clinical features, and imaging findings as well as the conservative, pharmacological, and interventional treatment options indicated for different SVD pathologies. It aims to expedite the timely deployment of therapies geared toward reducing the significant morbidity associated with SVD especially varicose veins, venous ulcers, and venous insufficiency, to improve the quality of life of these patients and prevent complications.
Collapse
Affiliation(s)
- Anderson Anuforo
- Internal Medicine, SUNY Upstate Medical University, Syracuse, NY.
| | | | - Charles Agwuegbo
- Internal Medicine Resident, Temecula Valley Hospital, Temecula, CA
| | - Toochukwu Lilian Okafor
- Internal Medicine Resident, Quinnipiac University, Frank H Netter MD School of Medicine/St Vincent's Medical Center, North Haven, CT
| | - Akanimo Antia
- Internal Medicine Resident, Lincoln Medical and Mental Health Center, Bronx, NY
| | | | - Onyinye Sylvia Ugoala
- Internal Medicine Resident, Texas Tech University Health Sciences Center, Amarillo, TX
| | - Ovie Okorare
- Internal Medicine Resident, Nuvance Health Vassar brothers Medical Center, Poughkeepsie, NY
| | - Jaskomal Phagoora
- Internal Medicine Resident, Touro College of Osteopathic Medicine, Harlem, NY
| | - Habib Olatunji Alagbo
- Internal Medicine Resident, V. N. Karazin Kharkiv National University, School of Medicine, Kharkiv, Ukraine
| | | | | |
Collapse
|
16
|
Charizanos G, Demirhan H, İçen D. Binary classification with fuzzy logistic regression under class imbalance and complete separation in clinical studies. BMC Med Res Methodol 2024; 24:145. [PMID: 38970036 PMCID: PMC11225249 DOI: 10.1186/s12874-024-02270-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 06/27/2024] [Indexed: 07/07/2024] Open
Abstract
BACKGROUND In binary classification for clinical studies, an imbalanced distribution of cases to classes and an extreme association level between the binary dependent variable and a subset of independent variables can create significant classification problems. These crucial issues, namely class imbalance and complete separation, lead to classification inaccuracy and biased results in clinical studies. METHOD To deal with class imbalance and complete separation problems, we propose using a fuzzy logistic regression framework for binary classification. Fuzzy logistic regression incorporates combinations of triangular fuzzy numbers for the coefficients, inputs, and outputs and produces crisp classification results. The fuzzy logistic regression framework shows strong classification performance due to fuzzy logic's better handling of imbalance and separation issues. Hence, classification accuracy is improved, mitigating the risk of misclassified conditions and biased insights for clinical study patients. RESULTS The performance of the fuzzy logistic regression model is assessed on twelve binary classification problems with clinical datasets. The model has consistently high sensitivity, specificity, F1, precision, and Mathew's correlation coefficient scores across all clinical datasets. There is no evidence of impact from the imbalance or separation that exists in the datasets. Furthermore, we compare the fuzzy logistic regression classification performance against two versions of classical logistic regression and six different benchmark sources in the literature. These six sources provide a total of ten different proposed methodologies, and the comparison occurs by calculating the same set of classification performance scores for each method. Either imbalance or separation impacts seven out of ten methodologies. The remaining three produce better classification performance in their respective clinical studies. However, these are all outperformed by the fuzzy logistic regression framework. CONCLUSION Fuzzy logistic regression showcases strong performance against imbalance and separation, providing accurate predictions and, hence, informative insights for classifying patients in clinical studies.
Collapse
Affiliation(s)
- Georgios Charizanos
- Mathematical Sciences, School of Science, RMIT University, La Trobe St, Melbourne, 3000, Victoria, Australia
| | - Haydar Demirhan
- Mathematical Sciences, School of Science, RMIT University, La Trobe St, Melbourne, 3000, Victoria, Australia.
| | - Duygu İçen
- Department of Statistics, Hacettepe University, Çankaya, Ankara, 06800, Ankara, Türkiye
| |
Collapse
|
17
|
Sang H, Lee H, Lee M, Park J, Kim S, Woo HG, Rahmati M, Koyanagi A, Smith L, Lee S, Hwang YC, Park TS, Lim H, Yon DK, Rhee SY. Prediction model for cardiovascular disease in patients with diabetes using machine learning derived and validated in two independent Korean cohorts. Sci Rep 2024; 14:14966. [PMID: 38942775 PMCID: PMC11213851 DOI: 10.1038/s41598-024-63798-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2023] [Accepted: 06/03/2024] [Indexed: 06/30/2024] Open
Abstract
This study aimed to develop and validate a machine learning (ML) model tailored to the Korean population with type 2 diabetes mellitus (T2DM) to provide a superior method for predicting the development of cardiovascular disease (CVD), a major chronic complication in these patients. We used data from two cohorts, namely the discovery (one hospital; n = 12,809) and validation (two hospitals; n = 2019) cohorts, recruited between 2008 and 2022. The outcome of interest was the presence or absence of CVD at 3 years. We selected various ML-based models with hyperparameter tuning in the discovery cohort and performed area under the receiver operating characteristic curve (AUROC) analysis in the validation cohort. CVD was observed in 1238 (10.2%) patients in the discovery cohort. The random forest (RF) model exhibited the best overall performance among the models, with an AUROC of 0.830 (95% confidence interval [CI] 0.818-0.842) in the discovery dataset and 0.722 (95% CI 0.660-0.783) in the validation dataset. Creatinine and glycated hemoglobin levels were the most influential factors in the RF model. This study introduces a pioneering ML-based model for predicting CVD in Korean patients with T2DM, outperforming existing prediction tools and providing a groundbreaking approach for early personalized preventive medicine.
Collapse
Affiliation(s)
- Hyunji Sang
- Department of Endocrinology and Metabolism, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, 23 Kyungheedae-ro, Dongdaemun-gu, Seoul, 02447, South Korea
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea
| | - Hojae Lee
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea
- Department of Regulatory Science, Kyung Hee University, Seoul, South Korea
| | - Myeongcheol Lee
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea
- Department of Regulatory Science, Kyung Hee University, Seoul, South Korea
| | - Jaeyu Park
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea
- Department of Regulatory Science, Kyung Hee University, Seoul, South Korea
| | - Sunyoung Kim
- Department of Family Medicine, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea
| | - Ho Geol Woo
- Department of Neurology, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea
| | - Masoud Rahmati
- Research Centre on Health Services and Quality of Life, Aix Marseille University, Marseille, France
- Department of Physical Education and Sport Sciences, Faculty of Literature and Human Sciences, Lorestan University, Khoramabad, Iran
- Department of Physical Education and Sport Sciences, Faculty of Literature and Humanities, Vali-E-Asr University of Rafsanjan, Rafsanjan, Iran
| | - Ai Koyanagi
- Research and Development Unit, Parc Sanitari Sant Joan de Deu, Barcelona, Spain
| | - Lee Smith
- Centre for Health, Performance and Wellbeing, Anglia Ruskin University, Cambridge, UK
| | - Sihoon Lee
- Department of Internal Medicine, Gachon University College of Medicine, Incheon, South Korea
| | - You-Cheol Hwang
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Kyung Hee University Hospital at Gangdong and Kyung Hee University School of Medicine, Seoul, South Korea
| | - Tae Sun Park
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Research Institute of Clinical Medicine of Jeonbuk National University and Jeonbuk National University Hospital, Jeonju, South Korea
| | - Hyunjung Lim
- Department of Medical Nutrition, Graduate School of East-West Medical Science, Kyung Hee University, Yongin, South Korea
| | - Dong Keon Yon
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea.
- Department of Pediatrics, Kyung Hee University College of Medicine, 23 Kyungheedae-ro, Dongdaemun-gu, Seoul, 02447, South Korea.
- Department of Regulatory Science, Kyung Hee University, Seoul, South Korea.
| | - Sang Youl Rhee
- Department of Endocrinology and Metabolism, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, 23 Kyungheedae-ro, Dongdaemun-gu, Seoul, 02447, South Korea.
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, South Korea.
- Department of Regulatory Science, Kyung Hee University, Seoul, South Korea.
| |
Collapse
|
18
|
Wang Y, Wang J, Gao F, Song J. Unveiling value patterns via deep reinforcement learning in heterogeneous data analytics. PATTERNS (NEW YORK, N.Y.) 2024; 5:100965. [PMID: 38800362 PMCID: PMC11117055 DOI: 10.1016/j.patter.2024.100965] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 02/03/2024] [Accepted: 03/06/2024] [Indexed: 05/29/2024]
Abstract
Artificial intelligence has substantially improved the efficiency of data utilization across various sectors. However, the insufficient filtering of low-quality data poses challenges to uncertainty management, threatening system stability. In this study, we introduce a data-valuation approach employing deep reinforcement learning to elucidate the value patterns in data-driven tasks. By strategically optimizing with iterative sampling and feedback, our method is effective in diverse scenarios and consistently outperforms the classic methods in both accuracy and efficiency. In China's wind-power prediction, excluding 25% of the overall dataset deemed low-value led to a 10.5% improvement in accuracy. Utilizing just 42.8% of the dataset, the model discerned 80% of linear patterns, showcasing the data's intrinsic and transferable value. A nationwide analysis identified a data-value-sensitive geographic belt across 10 provinces, leading to robust policy recommendations informed by variances in power outputs and data values, as well as geographic climate factors.
Collapse
Affiliation(s)
- Yanzhi Wang
- Department of Industrial Engineering and Management, College of Engineering, Peking University, Beijing 100871, China
| | - Jianxiao Wang
- National Engineering Laboratory for Big Data Analysis and Applications, Peking University, Beijing 100871, China
- PKU-Changsha Institute for Computing and Digital Economy, Changsha 410000, China
| | - Feng Gao
- Department of Industrial Engineering and Management, College of Engineering, Peking University, Beijing 100871, China
| | - Jie Song
- Department of Industrial Engineering and Management, College of Engineering, Peking University, Beijing 100871, China
- National Engineering Laboratory for Big Data Analysis and Applications, Peking University, Beijing 100871, China
- PKU-Changsha Institute for Computing and Digital Economy, Changsha 410000, China
| |
Collapse
|
19
|
Gangwar N, Balraj K, Rathore AS. Explainable AI for CHO cell culture media optimization and prediction of critical quality attribute. Appl Microbiol Biotechnol 2024; 108:308. [PMID: 38656382 PMCID: PMC11043154 DOI: 10.1007/s00253-024-13147-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 03/28/2024] [Accepted: 04/11/2024] [Indexed: 04/26/2024]
Abstract
Cell culture media play a critical role in cell growth and propagation by providing a substrate; media components can also modulate the critical quality attributes (CQAs). However, the inherent complexity of the cell culture media makes unraveling the impact of the various media components on cell growth and CQAs non-trivial. In this study, we demonstrate an end-to-end machine learning framework for media component selection and prediction of CQAs. The preliminary dataset for feature selection was generated by performing CHO-GS (-/-) cell culture in media formulations with varying metal ion concentrations. Acidic and basic charge variant composition of the innovator product (24.97 ± 0.54% acidic and 11.41 ± 1.44% basic) was chosen as the target variable to evaluate the media formulations. Pearson's correlation coefficient and random forest-based techniques were used for feature ranking and feature selection for the prediction of acidic and basic charge variants. Furthermore, a global interpretation analysis using SHapley Additive exPlanations was utilized to select optimal features by evaluating the contributions of each feature in the extracted vectors. Finally, the medium combinations were predicted by employing fifteen different regression models and utilizing a grid search and random search cross-validation for hyperparameter optimization. Experimental results demonstrate that Fe and Zn significantly impact the charge variant profile. This study aims to offer insights that are pertinent to both innovators seeking to establish a complete pipeline for media development and optimization and biosimilar-based manufacturers who strive to demonstrate the analytical and functional biosimilarity of their products to the innovator. KEY POINTS: • Developed a framework for optimizing media components and prediction of CQA. • SHAP enhances global interpretability, aiding informed decision-making. • Fifteen regression models were employed to predict medium combinations.
Collapse
Affiliation(s)
- Neelesh Gangwar
- School of Interdisciplinary Research, Indian Institute of Technology, Delhi, New Delhi, 110016, India
| | - Keerthiveena Balraj
- Yardi School of Artificial Intelligence, Indian Institute of Technology, Delhi, New Delhi, 110016, India
| | - Anurag S Rathore
- Yardi School of Artificial Intelligence, Indian Institute of Technology, Delhi, New Delhi, 110016, India.
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, New Delhi, 110016, India.
| |
Collapse
|
20
|
Yilmaz R, Yagin FH, Colak C, Toprak K, Abdel Samee N, Mahmoud NF, Alshahrani AA. Analysis of hematological indicators via explainable artificial intelligence in the diagnosis of acute heart failure: a retrospective study. Front Med (Lausanne) 2024; 11:1285067. [PMID: 38633310 PMCID: PMC11023638 DOI: 10.3389/fmed.2024.1285067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 03/14/2024] [Indexed: 04/19/2024] Open
Abstract
Introduction Acute heart failure (AHF) is a serious medical problem that necessitates hospitalization and often results in death. Patients hospitalized in the emergency department (ED) should therefore receive an immediate diagnosis and treatment. Unfortunately, there is not yet a fast and accurate laboratory test for identifying AHF. The purpose of this research is to apply the principles of explainable artificial intelligence (XAI) to the analysis of hematological indicators for the diagnosis of AHF. Methods In this retrospective analysis, 425 patients with AHF and 430 healthy individuals served as assessments. Patients' demographic and hematological information was analyzed to diagnose AHF. Important risk variables for AHF diagnosis were identified using the Least Absolute Shrinkage and Selection Operator (LASSO) feature selection. To test the efficacy of the suggested prediction model, Extreme Gradient Boosting (XGBoost), a 10-fold cross-validation procedure was implemented. The area under the receiver operating characteristic curve (AUC), F1 score, Brier score, Positive Predictive Value (PPV), and Negative Predictive Value (NPV) were all computed to evaluate the model's efficacy. Permutation-based analysis and SHAP were used to assess the importance and influence of the model's incorporated risk factors. Results White blood cell (WBC), monocytes, neutrophils, neutrophil-lymphocyte ratio (NLR), red cell distribution width-standard deviation (RDW-SD), RDW-coefficient of variation (RDW-CV), and platelet distribution width (PDW) values were significantly higher than the healthy group (p < 0.05). On the other hand, erythrocyte, hemoglobin, basophil, lymphocyte, mean platelet volume (MPV), platelet, hematocrit, mean erythrocyte hemoglobin (MCH), and procalcitonin (PCT) values were found to be significantly lower in AHF patients compared to healthy controls (p < 0.05). When XGBoost was used in conjunction with LASSO to diagnose AHF, the resulting model had an AUC of 87.9%, an F1 score of 87.4%, a Brier score of 0.036, and an F1 score of 87.4%. PDW, age, RDW-SD, and PLT were identified as the most crucial risk factors in differentiating AHF. Conclusion The results of this study showed that XAI combined with ML could successfully diagnose AHF. SHAP descriptions show that advanced age, low platelet count, high RDW-SD, and PDW are the primary hematological parameters for the diagnosis of AHF.
Collapse
Affiliation(s)
- Rustem Yilmaz
- Department of Cardiology, Samsun Training and Research Hospital, Samsun University Faculty of Medicine, Samsun, Türkiye
| | - Fatma Hilal Yagin
- Department of Biostatistics and Medical Informatics, Inonu University Faculty of Medicine, Malatya, Türkiye
| | - Cemil Colak
- Department of Biostatistics and Medical Informatics, Inonu University Faculty of Medicine, Malatya, Türkiye
| | - Kenan Toprak
- Department of Cardiology, Faculty of Medicine, Harran University, Sanlıurfa, Türkiye
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Noha F. Mahmoud
- Department of Rehabilitation Sciences, Health and Rehabilitation Sciences College, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Amnah Ali Alshahrani
- Department of Computer Science, Applied College, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| |
Collapse
|
21
|
Yang F, Xu Z, Wang H, Sun L, Zhai M, Zhang J. A hybrid feature selection algorithm combining information gain and grouping particle swarm optimization for cancer diagnosis. PLoS One 2024; 19:e0290332. [PMID: 38466662 PMCID: PMC10927139 DOI: 10.1371/journal.pone.0290332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2023] [Accepted: 08/04/2023] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Cancer diagnosis based on machine learning has become a popular application direction. Support vector machine (SVM), as a classical machine learning algorithm, has been widely used in cancer diagnosis because of its advantages in high-dimensional and small sample data. However, due to the high-dimensional feature space and high feature redundancy of gene expression data, SVM faces the problem of poor classification effect when dealing with such data. METHODS Based on this, this paper proposes a hybrid feature selection algorithm combining information gain and grouping particle swarm optimization (IG-GPSO). The algorithm firstly calculates the information gain values of the features and ranks them in descending order according to the value. Then, ranked features are grouped according to the information index, so that the features in the group are close, and the features outside the group are sparse. Finally, grouped features are searched using grouping PSO and evaluated according to in-group and out-group. RESULTS Experimental results show that the average accuracy (ACC) of the SVM on the feature subset selected by the IG-GPSO is 98.50%, which is significantly better than the traditional feature selection algorithm. Compared with KNN, the classification effect of the feature subset selected by the IG-GPSO is still optimal. In addition, the results of multiple comparison tests show that the feature selection effect of the IG-GPSO is significantly better than that of traditional feature selection algorithms. CONCLUSION The feature subset selected by IG-GPSO not only has the best classification effect, but also has the least feature scale (FS). More importantly, the IG-GPSO significantly improves the ACC of SVM in cancer diagnostic.
Collapse
Affiliation(s)
- Fangyuan Yang
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| | - Zhaozhao Xu
- School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, Henan, China
| | - Hong Wang
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| | - Lisha Sun
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| | - Mengjiao Zhai
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| | - Juan Zhang
- Department of Gynecologic Oncology, The First Affiliated Hospital of Henan Polytechnic University, Jiaozuo, Henan, China
| |
Collapse
|
22
|
Cerono G, Chicco D. Ensemble machine learning reveals key features for diabetes duration from electronic health records. PeerJ Comput Sci 2024; 10:e1896. [PMID: 38435625 PMCID: PMC10909161 DOI: 10.7717/peerj-cs.1896] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2023] [Accepted: 01/30/2024] [Indexed: 03/05/2024]
Abstract
Diabetes is a metabolic disorder that affects more than 420 million of people worldwide, and it is caused by the presence of a high level of sugar in blood for a long period. Diabetes can have serious long-term health consequences, such as cardiovascular diseases, strokes, chronic kidney diseases, foot ulcers, retinopathy, and others. Even if common, this disease is uneasy to spot, because it often comes with no symptoms. Especially for diabetes type 2, that happens mainly in the adults, knowing how long the diabetes has been present for a patient can have a strong impact on the treatment they can receive. This information, although pivotal, might be absent: for some patients, in fact, the year when they received the diabetes diagnosis might be well-known, but the year of the disease unset might be unknown. In this context, machine learning applied to electronic health records can be an effective tool to predict the past duration of diabetes for a patient. In this study, we applied a regression analysis based on several computational intelligence methods to a dataset of electronic health records of 73 patients with diabetes type 1 with 20 variables and another dataset of records of 400 patients of diabetes type 2 with 49 variables. Among the algorithms applied, Random Forests was able to outperform the other ones and to efficiently predict diabetes duration for both the cohorts, with the regression performances measured through the coefficient of determination R2. Afterwards, we applied the same method for feature ranking, and we detected the most relevant factors of the clinical records correlated with past diabetes duration: age, insulin intake, and body-mass index. Our study discoveries can have profound impact on clinical practice: when the information about the duration of diabetes of patient is missing, medical doctors can use our tool and focus on age, insulin intake, and body-mass index to infer this important aspect. Regarding limitations, unfortunately we were unable to find additional dataset of EHRs of patients with diabetes having the same variables of the two analyzed here, so we could not verify our findings on a validation cohort.
Collapse
Affiliation(s)
- Gabriel Cerono
- Department of Neurology, University of California San Francisco, San Francisco, CA, USA
| | - Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Canada
- Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy
| |
Collapse
|
23
|
Hidayat T, Ahmad A, Ngo HC. Non-redundant implicational base of formal context with constraints using SAT. PeerJ Comput Sci 2024; 10:e1806. [PMID: 38435549 PMCID: PMC10909189 DOI: 10.7717/peerj-cs.1806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 12/18/2023] [Indexed: 03/05/2024]
Abstract
An implicational base is knowledge extracted from a formal context. The implicational base of a formal context consists of attribute implications which are sound, complete, and non-redundant regarding to the formal context. Non-redundant means that each attribute implication in the implication base cannot be inferred from the others. However, sometimes some attribute implications in the implication base can be inferred from the others together with a prior knowledge. Regarding knowledge discovery, such attribute implications should be not considered as new knowledge and ignored from the implicational base. In other words, such attribute implications are redundant based on prior knowledge. One sort of prior knowledge is a set of constraints that restricts some attributes in data. In formal context, constraints restrict some attributes of objects in the formal context. This article proposes a method to generate non-redundant implication base of a formal context with some constraints which restricting the formal context. In this case, non-redundant implicational base means that the implicational base does not contain all attribute implications which can be inferred from the others together with information of the constraints. This article also proposes a formulation to check the redundant attribute implications and encoding the problem into satisfiability (SAT) problem such that the problem can be solved by SAT Solver, a software which can solve a SAT problem. After implementation, an experiment shows that the proposed method is able to check the redundant attribute implication and generates a non-redundant implicational base of formal context with constraints.
Collapse
Affiliation(s)
- Taufiq Hidayat
- Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
- Informatics Department, Universitas Islam Indonesia, Yogyakarta, Indonesia
| | - Asmala Ahmad
- Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
| | - Hea Choon Ngo
- Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia
| |
Collapse
|
24
|
Li L, Chen X, Hu S. Application of an end-to-end model with self-attention mechanism in cardiac disease prediction. Front Physiol 2024; 14:1308774. [PMID: 38283283 PMCID: PMC10811162 DOI: 10.3389/fphys.2023.1308774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 12/22/2023] [Indexed: 01/30/2024] Open
Abstract
Introduction: Heart disease is a prevalent global health challenge, necessitating early detection for improved patient outcomes. This study aims to develop an innovative heart disease prediction method using end-to-end deep learning, integrating self-attention mechanisms and generative adversarial networks to enhance predictive accuracy and efficiency in healthcare. Methods: We constructed an end-to-end model capable of processing diverse cardiac health data, including electrocardiograms, clinical data, and medical images. Self-attention mechanisms were incorporated to capture data correlations and dependencies, improving the extraction of latent features. Additionally, generative adversarial networks were employed to synthesize supplementary cardiac health data, augmenting the training dataset. Experiments were conducted using publicly available heart disease datasets for training, validation, and testing. Multiple evaluation metrics, including accuracy, recall, and F1-score, were employed to assess model performance. Results: Our model consistently outperformed traditional methods, achieving accuracy rates exceeding 95% on multiple datasets. Notably, the recall metric demonstrated the model's effectiveness in identifying heart disease patients, with rates exceeding 90%. The comprehensive F1-score also indicated exceptional performance, achieving optimal results. Discussion: This research highlights the potential of end-to-end deep learning with self-attention mechanisms in heart disease prediction. The model's consistent success across diverse datasets offers new possibilities for early diagnosis and intervention, ultimately enhancing patients' quality of life and health. These findings hold significant clinical application prospects and promise substantial advancements in the healthcare field.
Collapse
Affiliation(s)
- Li Li
- Medical and Health College, Xuchang Vocational Technical College, Xuchang, China
| | - Xi Chen
- Public Education Department, Xuchang Vocational Technical College, Xuchang, China
| | - Sanjun Hu
- Xuchang Vocational and Technical College, School of Information Engineering, Xuchang, China
| |
Collapse
|
25
|
R. S, B.R. N, Radhakrishnan R, P. S. Computational intelligence for early detection of infertility in women. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2024; 127:107400. [DOI: 10.1016/j.engappai.2023.107400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
|
26
|
Sutradhar A, Al Rafi M, Shamrat FMJM, Ghosh P, Das S, Islam MA, Ahmed K, Zhou X, Azad AKM, Alyami SA, Moni MA. BOO-ST and CBCEC: two novel hybrid machine learning methods aim to reduce the mortality of heart failure patients. Sci Rep 2023; 13:22874. [PMID: 38129433 PMCID: PMC10739972 DOI: 10.1038/s41598-023-48486-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 11/27/2023] [Indexed: 12/23/2023] Open
Abstract
Heart failure (HF) is a leading cause of mortality worldwide. Machine learning (ML) approaches have shown potential as an early detection tool for improving patient outcomes. Enhancing the effectiveness and clinical applicability of the ML model necessitates training an efficient classifier with a diverse set of high-quality datasets. Hence, we proposed two novel hybrid ML methods ((a) consisting of Boosting, SMOTE, and Tomek links (BOO-ST); (b) combining the best-performing conventional classifier with ensemble classifiers (CBCEC)) to serve as an efficient early warning system for HF mortality. The BOO-ST was introduced to tackle the challenge of class imbalance, while CBCEC was responsible for training the processed and selected features derived from the Feature Importance (FI) and Information Gain (IG) feature selection techniques. We also conducted an explicit and intuitive comprehension to explore the impact of potential characteristics correlating with the fatality cases of HF. The experimental results demonstrated the proposed classifier CBCEC showcases a significant accuracy of 93.67% in terms of providing the early forecasting of HF mortality. Therefore, we can reveal that our proposed aspects (BOO-ST and CBCEC) can be able to play a crucial role in preventing the death rate of HF and reducing stress in the healthcare sector.
Collapse
Affiliation(s)
- Ananda Sutradhar
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka, 1216, Bangladesh
| | - Mustahsin Al Rafi
- Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City (DSC), Birulia, Savar, Dhaka, 1216, Bangladesh
| | - F M Javed Mehedi Shamrat
- Department of Computer System and Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
| | - Pronab Ghosh
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Subrata Das
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Md Anaytul Islam
- Department of Computer Science, Lakehead University, 955 Oliver Rd, Thunder Bay, ON, P7B 5E1, Canada
| | - Kawsar Ahmed
- Department of Electrical and Computer Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK, S7N 5A9, Canada
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Santosh, Tangail, 1902, Bangladesh
- Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh
| | - Xujuan Zhou
- School of Business, University of Southern Queensland, Toowoomba, Australia
| | - A K M Azad
- Department of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi Arabia
| | - Salem A Alyami
- Department of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi Arabia
| | - Mohammad Ali Moni
- Centre for AI & Digital Health Technology, Artificial Intelligence & Cyber Future Institute, Charles Sturt University, Bathurst, NSW, 2795, Australia.
| |
Collapse
|
27
|
Saeed MH, Hama JI. Cardiac disease prediction using AI algorithms with SelectKBest. Med Biol Eng Comput 2023; 61:3397-3408. [PMID: 37679578 DOI: 10.1007/s11517-023-02918-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 08/21/2023] [Indexed: 09/09/2023]
Abstract
Atherosclerotic cardiovascular disease (ASCVD), which includes coronary heart disease (CHD) and ischemic stroke, is the leading cause of mortality globally. According to the European Society of Cardiology (ESC), 26 million people worldwide have heart disease, with 3.6 million diagnosed each year. Early detection of heart disease will aid in lowering the mortality rate. The lack of diversity in training data and the difficulty in comprehending the findings of complicated AI models are the key issues in current research for heart disease prediction using artificial intelligence. To overcome this, in this paper, cardiac disease prediction using AI algorithms with SelectKBest has been proposed. Features are standardized, balanced, and selected using the StandardScaler, SMOTE, and SelectKBest techniques. Machine learning models such as support vector machine (SVM), K-nearest neighbor(KNN), decision tree (DT), logistic regression (LR), adaptive boosting (AB), naive Bayes (NB), random forest (RF), and extra tree (ET) and deep learning models such as vanilla long short-term memory (LSTM), bidirectional long short-term memory (LSTM), stacked long short-term memory (LSTM), and deep neural network (DNN) are assessed using Alizadeh Sani, combined (Cleveland, Hungarian, Switzerland, Long Beach VA, and Stalog), and Pakistan heart failure datasets. As a result of the evaluation, the proposed deep neural network (DNN) with SelectKBest predicted heart disease in a promising way. The prediction rate of unweighted accuracy of 99% on Alizadeh Sani, 98% on combined, and 97% on Pakistan are gained in tenfold cross-validation experiments. The suggested approach can be utilized to diagnose heart disease in its early stages.
Collapse
Affiliation(s)
- Mariwan Hama Saeed
- College of Basic Education, University of Halabja, Halabja, 46018, Iraq.
| | | |
Collapse
|
28
|
Papp L, Haberl D, Ecsedi B, Spielvogel CP, Krajnc D, Grahovac M, Moradi S, Drexler W. DEBI-NN: Distance-encoding biomorphic-informational neural networks for minimizing the number of trainable parameters. Neural Netw 2023; 167:517-532. [PMID: 37690213 DOI: 10.1016/j.neunet.2023.08.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Revised: 08/11/2023] [Accepted: 08/17/2023] [Indexed: 09/12/2023]
Abstract
Modern artificial intelligence (AI) approaches mainly rely on neural network (NN) or deep NN methodologies. However, these approaches require large amounts of data to train, given, that the number of their trainable parameters has a polynomial relationship to their neuron counts. This property renders deep NNs challenging to apply in fields operating with small, albeit representative datasets such as healthcare. In this paper, we propose a novel neural network architecture which trains spatial positions of neural soma and axon pairs, where weights are calculated by axon-soma distances of connected neurons. We refer to this method as distance-encoding biomorphic-informational (DEBI) neural network. This concept significantly minimizes the number of trainable parameters compared to conventional neural networks. We demonstrate that DEBI models can yield comparable predictive performance in tabular and imaging datasets, where they require a fraction of trainable parameters compared to conventional NNs, resulting in a highly scalable solution.
Collapse
Affiliation(s)
- Laszlo Papp
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria.
| | - David Haberl
- Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Boglarka Ecsedi
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria; Georgia Institute of Technology, Atlanta, GA, USA
| | | | - Denis Krajnc
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
| | - Marko Grahovac
- Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Sasan Moradi
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
| | - Wolfgang Drexler
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
29
|
Bottrighi A, Pennisi M. Exploring the State of Machine Learning and Deep Learning in Medicine: A Survey of the Italian Research Community. INFORMATION 2023; 14:513. [DOI: 10.3390/info14090513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2025] Open
Abstract
Artificial intelligence (AI) is becoming increasingly important, especially in the medical field. While AI has been used in medicine for some time, its growth in the last decade is remarkable. Specifically, machine learning (ML) and deep learning (DL) techniques in medicine have been increasingly adopted due to the growing abundance of health-related data, the improved suitability of such techniques for managing large datasets, and more computational power. ML and DL methodologies are fostering the development of new “intelligent” tools and expert systems to process data, to automatize human–machine interactions, and to deliver advanced predictive systems that are changing every aspect of the scientific research, industry, and society. The Italian scientific community was instrumental in advancing this research area. This article aims to conduct a comprehensive investigation of the ML and DL methodologies and applications used in medicine by the Italian research community in the last five years. To this end, we selected all the papers published in the last five years with at least one of the authors affiliated to an Italian institution that in the title, in the abstract, or in the keywords present the terms “machine learning” or “deep learning” and reference a medical area. We focused our research on journal papers under the hypothesis that Italian researchers prefer to present novel but well-established research in scientific journals. We then analyzed the selected papers considering different dimensions, including the medical topic, the type of data, the pre-processing methods, the learning methods, and the evaluation methods. As a final outcome, a comprehensive overview of the Italian research landscape is given, highlighting how the community has increasingly worked on a very heterogeneous range of medical problems.
Collapse
Affiliation(s)
- Alessio Bottrighi
- Dipartimento di Scienze e Innovazione Tecnologica (DiSIT), Computer Science Institute, Università del Piemonte Orientale, 15121 Alessandria, Italy
- Laboratorio Integrato di Intelligenza Artificiale e Informatica in Medicina, Azienda Ospedaliera SS. Antonio e Biagio e Cesare Arrigo, Alessandria—e DiSIT—Università del Piemonte Orientale, 15121 Alessandria, Italy
| | - Marzio Pennisi
- Dipartimento di Scienze e Innovazione Tecnologica (DiSIT), Computer Science Institute, Università del Piemonte Orientale, 15121 Alessandria, Italy
- Laboratorio Integrato di Intelligenza Artificiale e Informatica in Medicina, Azienda Ospedaliera SS. Antonio e Biagio e Cesare Arrigo, Alessandria—e DiSIT—Università del Piemonte Orientale, 15121 Alessandria, Italy
| |
Collapse
|
30
|
Mohseni N, Ghaniee Zarich M, Afshar S, Hosseini M. Identification of Novel Biomarkers for Response to Preoperative Chemoradiation in Locally Advanced Rectal Cancer with Genetic Algorithm-Based Gene Selection. J Gastrointest Cancer 2023; 54:937-950. [PMID: 36534304 DOI: 10.1007/s12029-022-00873-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2022] [Indexed: 12/23/2022]
Abstract
BACKGROUND The conventional treatment for patients with locally advanced colorectal tumors is preoperative chemo-radiotherapy (PCRT) preceding surgery. This treatment strategy has some long-term side effects, and some patients do not respond to it. Therefore, an evaluation of biomarkers that may help predict patients' response to PCRT is essential. METHODS We took advantage of genetic algorithm to search the space of possible combinations of features to choose subsets of genes that would yield convenient performance in differentiating PCRT responders from non-responders using a logistic regression model as our classifier. RESULTS We developed two gene signatures; first, to achieve the maximum prediction accuracy, the algorithm yielded 39 genes, and then, aiming to reduce the feature numbers as much as possible (while maintaining acceptable performance), a 5-gene signature was chosen. The performance of the two gene signatures was (accuracy = 0.97 and 0.81, sensitivity = 0.96 and 0.83, and specificity = 86 and 0.77) using a logistic regression classifier. Through analyzing bias and variance decomposition of the model error, we further investigated the involved genes by discovering and validating another 28-gene signature which possibly points towards two different sub-systems involved in the response of the patients to treatment. CONCLUSIONS Using genetic algorithm as our gene selection method, we have identified two groups of genes that can differentiate PCRT responders from non-responders in patients of the studied dataset with considerable performance. IMPACT After passing standard requirements, our gene signatures may be applicable as a robust and effective PCRT response prediction tool for colorectal cancer patients in clinical settings and may also help future studies aiming to further investigate involved pathways gain a clearer picture for the course of their research.
Collapse
Affiliation(s)
- Nima Mohseni
- Department of Biology, Faculty of Science, Lund University, Skåne, Sweden
| | | | - Saeid Afshar
- Research Center for Molecular Medicine, Hamadan University of Medical Sciences, Hamadan, Iran.
| | | |
Collapse
|
31
|
Kim Y, Kim J, Kim S, Youn H, Choi J, Seo K. Machine learning-based risk prediction model for canine myxomatous mitral valve disease using electronic health record data. Front Vet Sci 2023; 10:1189157. [PMID: 37720471 PMCID: PMC10500836 DOI: 10.3389/fvets.2023.1189157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 08/15/2023] [Indexed: 09/19/2023] Open
Abstract
Introduction Myxomatous mitral valve disease (MMVD) is the most common cause of heart failure in dogs, and assessing the risk of heart failure in dogs with MMVD is often challenging. Machine learning applied to electronic health records (EHRs) is an effective tool for predicting prognosis in the medical field. This study aimed to develop machine learning-based heart failure risk prediction models for dogs with MMVD using a dataset of EHRs. Methods A total of 143 dogs with MMVD between May 2018 and May 2022. Complete medical records were reviewed for all patients. Demographic data, radiographic measurements, echocardiographic values, and laboratory results were obtained from the clinical database. Four machine-learning algorithms (random forest, K-nearest neighbors, naïve Bayes, support vector machine) were used to develop risk prediction models. Model performance was represented by plotting the receiver operating characteristic (ROC) curve and calculating the area under the curve (AUC). The best-performing model was chosen for the feature-ranking process. Results The random forest model showed superior performance to the other models (AUC = 0.88), while the performance of the K-nearest neighbors model showed the lowest performance (AUC = 0.69). The top three models showed excellent performance (AUC ≥ 0.8). According to the random forest algorithm's feature ranking, echocardiographic and radiographic variables had the highest predictive values for heart failure, followed by packed cell volume (PCV) and respiratory rates. Among the electrolyte variables, chloride had the highest predictive value for heart failure. Discussion These machine-learning models will enable clinicians to support decision-making in estimating the prognosis of patients with MMVD.
Collapse
Affiliation(s)
- Yunji Kim
- Department of Veterinary Internal Medicine, College of Veterinary Medicine, Seoul, Republic of Korea
| | - Jaejin Kim
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Sehoon Kim
- Department of Veterinary Internal Medicine, College of Veterinary Medicine, Seoul, Republic of Korea
| | - Hwayoung Youn
- Department of Veterinary Internal Medicine, College of Veterinary Medicine, Seoul, Republic of Korea
| | - Jihye Choi
- Department of Veterinary Medical Imaging, College of Veterinary Medicine, Seoul National University, Seoul, Republic of Korea
| | - Kyoungwon Seo
- Department of Veterinary Internal Medicine, College of Veterinary Medicine, Seoul, Republic of Korea
| |
Collapse
|
32
|
Li X, Shang C, Xu C, Wang Y, Xu J, Zhou Q. Development and comparison of machine learning-based models for predicting heart failure after acute myocardial infarction. BMC Med Inform Decis Mak 2023; 23:165. [PMID: 37620904 PMCID: PMC10463624 DOI: 10.1186/s12911-023-02240-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 07/13/2023] [Indexed: 08/26/2023] Open
Abstract
AIMS Heart failure (HF) is one of the common adverse cardiovascular events after acute myocardial infarction (AMI), but the predictive efficacy of numerous machine learning (ML) built models is unclear. This study aimed to build an optimal model to predict the occurrence of HF in AMI patients by comparing seven ML algorithms. METHODS Cohort 1 included AMI patients from 2018 to 2019 divided into HF and control groups. All first routine test data of the study subjects were collected as the features to be selected for the model, and seven ML algorithms with screenable features were evaluated. Cohort 2 contains AMI patients from 2020 to 2021 to establish an early warning model with external validation. ROC curve and DCA curve to analyze the diagnostic efficacy and clinical benefit of the model respectively. RESULTS The best performer among the seven ML algorithms was XgBoost, and the features of XgBoost algorithm for troponin I, triglycerides, urine red blood cell count, γ-glutamyl transpeptidase, glucose, urine specific gravity, prothrombin time, prealbumin, and urea were ranked high in importance. The AUC of the HF-Lab9 prediction model built by the XgBoost algorithm was 0.966 and had good clinical benefits. CONCLUSIONS This study screened the optimal ML algorithm as XgBoost and developed the model HF-Lab9 will improve the accuracy of clinicians in assessing the occurrence of HF after AMI and provide a reference for the selection of subsequent model-building algorithms.
Collapse
Affiliation(s)
- Xuewen Li
- Department of Laboratory Medicine, First Hospital of Jilin University, Changchun, China
| | - Chengming Shang
- Information center, First Hospital of Jilin University, Changchun, China
| | - Changyan Xu
- Medical Department, First Hospital of Jilin University, Changchun, China
| | - Yiting Wang
- Department of Laboratory Medicine, First Hospital of Jilin University, Changchun, China
| | - Jiancheng Xu
- Department of Laboratory Medicine, First Hospital of Jilin University, Changchun, China
| | - Qi Zhou
- Department of Pediatrics, First Hospital of Jilin University, 1Xinmin Street, Changchun, 130021, Jilin, China.
| |
Collapse
|
33
|
Moreno-Sánchez PA. Improvement of a prediction model for heart failure survival through explainable artificial intelligence. Front Cardiovasc Med 2023; 10:1219586. [PMID: 37600061 PMCID: PMC10434534 DOI: 10.3389/fcvm.2023.1219586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 07/17/2023] [Indexed: 08/22/2023] Open
Abstract
Cardiovascular diseases and their associated disorder of heart failure (HF) are major causes of death globally, making it a priority for doctors to detect and predict their onset and medical consequences. Artificial Intelligence (AI) allows doctors to discover clinical indicators and enhance their diagnoses and treatments. Specifically, "eXplainable AI" (XAI) offers tools to improve the clinical prediction models that experience poor interpretability of their results. This work presents an explainability analysis and evaluation of two HF survival prediction models using a dataset that includes 299 patients who have experienced HF. The first model utilizes survival analysis, considering death events and time as target features, while the second model approaches the problem as a classification task to predict death. The model employs an optimization data workflow pipeline capable of selecting the best machine learning algorithm as well as the optimal collection of features. Moreover, different post hoc techniques have been used for the explainability analysis of the model. The main contribution of this paper is an explainability-driven approach to select the best HF survival prediction model balancing prediction performance and explainability. Therefore, the most balanced explainable prediction models are Survival Gradient Boosting model for the survival analysis and Random Forest for the classification approach with a c-index of 0.714 and balanced accuracy of 0.74 (std 0.03) respectively. The selection of features by the SCI-XAI in the two models is similar where "serum_creatinine", "ejection_fraction", and "sex" are selected in both approaches, with the addition of "diabetes" for the survival analysis model. Moreover, the application of post hoc XAI techniques also confirm common findings from both approaches by placing the "serum_creatinine" as the most relevant feature for the predicted outcome, followed by "ejection_fraction". The explainable prediction models for HF survival presented in this paper would improve the further adoption of clinical prediction models by providing doctors with insights to better understand the reasoning behind usually "black-box" AI clinical solutions and make more reasonable and data-driven decisions.
Collapse
|
34
|
Khashei M, Bakhtiarvand N. Discrete learning-based intelligent methodology for heart disease diagnosis. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/01/2023]
|
35
|
Lee Y, Seo J. Suggestion of statistical validation on feature importance of machine learning. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083557 DOI: 10.1109/embc40787.2023.10340208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Feature importance methods are widely used in machine learning analysis for medical datasets as both primary and subsidiary tools. These methods aid in selecting biomarkers or markers indicating target diseases, and can provide valuable insight into the mechanism of a disease. However, the simple listing of features with their corresponding importance rank is not sufficient in determining the statistical significance of these features. In this paper, we propose a simple method for evaluating the statistical significance of feature importance values and selecting the optimal number of biomarkers. We demonstrate the application of this method using a public open dataset on heart failure.Clinical Relevance- In order for important indicators to be clinically useful, their statistical significance must be defined. By proposing a simple method for calculating statistical significance, this paper enables clinicians to select a group of biomarkers based on their feature importance in a machine learning model. This approach improves the accuracy and effectiveness of clinical decision-making, leading to more precise diagnosis, treatment, and management of various medical conditions.
Collapse
|
36
|
Ma M, Hao X, Zhao J, Luo S, Liu Y, Li D. Predicting heart failure in-hospital mortality by integrating longitudinal and category data in electronic health records. Med Biol Eng Comput 2023:10.1007/s11517-023-02816-z. [PMID: 36959414 DOI: 10.1007/s11517-023-02816-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 03/02/2023] [Indexed: 03/25/2023]
Abstract
Heart failure is a life-threatening syndrome that is diagnosed in 3.6 million people worldwide each year. We propose a deep fusion learning model (DFL-IMP) that uses time series and category data from electronic health records to predict in-hospital mortality in patients with heart failure. We considered 41 time series features (platelets, white blood cells, urea nitrogen, etc.) and 17 category features (gender, insurance, marital status, etc.) as predictors, all of which were available within the time of the patient's last hospitalization, and a total of 7696 patients participated in the observational study. Our model was evaluated against different time windows. The best performance was achieved with an AUC of 0.914 when the observation window was 5 days and the prediction window was 30 days. Outperformed other baseline models including LR (0.708), RF (0.717), SVM (0.675), LSTM (0.757), GRU (0.759), GRU-U (0.766) and MTSSP (0.770). This tool allows us to predict the expected pathway of heart failure patients and intervene early in the treatment process, which has significant implications for improving the life expectancy of heart failure patients.
Collapse
Affiliation(s)
- Meikun Ma
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
- Technology Research Center of Spatial Information Network Engineering of Shanxi, Taiyuan, 030024, China
| | - Xiaoyan Hao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Jumin Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
- Intelligent Perception Engineering Technology Center of Shanxi, Taiyuan, 030024, China
| | - Shijie Luo
- College of Information and Computer, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Yi Liu
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China
- Technology Research Center of Spatial Information Network Engineering of Shanxi, Taiyuan, 030024, China
- College of Data Science, Taiyuan University of Technology, Taiyuan, 030024, China
| | - Dengao Li
- Key Laboratory of Big Data Fusion Analysis and Application of Shanxi Province, Taiyuan, 030024, China.
- Technology Research Center of Spatial Information Network Engineering of Shanxi, Taiyuan, 030024, China.
- College of Data Science, Taiyuan University of Technology, Taiyuan, 030024, China.
| |
Collapse
|
37
|
Ay Ş, Ekinci E, Garip Z. A comparative analysis of meta-heuristic optimization algorithms for feature selection on ML-based classification of heart-related diseases. THE JOURNAL OF SUPERCOMPUTING 2023; 79:11797-11826. [PMID: 37304052 PMCID: PMC9983547 DOI: 10.1007/s11227-023-05132-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/21/2023] [Indexed: 06/13/2023]
Abstract
This study aims to use a machine learning (ML)-based enhanced diagnosis and survival model to predict heart disease and survival in heart failure by combining the cuckoo search (CS), flower pollination algorithm (FPA), whale optimization algorithm (WOA), and Harris hawks optimization (HHO) algorithms, which are meta-heuristic feature selection algorithms. To achieve this, experiments are conducted on the Cleveland heart disease dataset and the heart failure dataset collected from the Faisalabad Institute of Cardiology published at UCI. CS, FPA, WOA, and HHO algorithms for feature selection are applied for different population sizes and are realized based on the best fitness values. For the original dataset of heart disease, the maximum prediction F-score of 88% is obtained using K-nearest neighbour (KNN) when compared to logistic regression (LR), support vector machine (SVM), Gaussian Naive Bayes (GNB), and random forest (RF). With the proposed approach, the heart disease prediction F-score of 99.72% is obtained using KNN for population sizes 60 with FPA by selecting eight features. For the original dataset of heart failure, the maximum prediction F-score of 70% is obtained using LR and RF compared to SVM, GNB, and KNN. With the proposed approach, the heart failure prediction F-score of 97.45% is obtained using KNN for population sizes 10 with HHO by selecting five features. Experimental findings show that the applied meta-heuristic algorithms with ML algorithms significantly improve prediction performances compared to performances obtained from the original datasets. The motivation of this paper is to select the most critical and informative feature subset through meta-heuristic algorithms to improve classification accuracy.
Collapse
Affiliation(s)
- Şevket Ay
- Computer Engineering Department, Faculty of Technology, Sakarya University of Applied Sciences, Sakarya, 54187 Turkey
| | - Ekin Ekinci
- Computer Engineering Department, Faculty of Technology, Sakarya University of Applied Sciences, Sakarya, 54187 Turkey
| | - Zeynep Garip
- Computer Engineering Department, Faculty of Technology, Sakarya University of Applied Sciences, Sakarya, 54187 Turkey
| |
Collapse
|
38
|
Aram KY, Lam SS, Khasawneh MT. Cost-sensitive max-margin feature selection for SVM using alternated sorting method genetic algorithm. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110421] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
|
39
|
Distinct Subtypes of Hepatorenal Syndrome and Associated Outcomes as Identified by Machine Learning Consensus Clustering. Diseases 2023; 11:diseases11010018. [PMID: 36810532 PMCID: PMC9944494 DOI: 10.3390/diseases11010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 01/15/2023] [Accepted: 01/20/2023] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND The utilization of multi-dimensional patient data to subtype hepatorenal syndrome (HRS) can individualize patient care. Machine learning (ML) consensus clustering may identify HRS subgroups with unique clinical profiles. In this study, we aim to identify clinically meaningful clusters of hospitalized patients for HRS using an unsupervised ML clustering approach. METHODS Consensus clustering analysis was performed based on patient characteristics in 5564 patients primarily admitted for HRS in the National Inpatient Sample from 2003-2014 to identify clinically distinct HRS subgroups. We applied standardized mean difference to evaluate key subgroup features, and compared in-hospital mortality between assigned clusters. RESULTS The algorithm revealed four best distinct HRS subgroups based on patient characteristics. Cluster 1 patients (n = 1617) were older, and more likely to have non-alcoholic fatty liver disease, cardiovascular comorbidities, hypertension, and diabetes. Cluster 2 patients (n = 1577) were younger and more likely to have hepatitis C, and less likely to have acute liver failure. Cluster 3 patients (n = 642) were younger, and more likely to have non-elective admission, acetaminophen overdose, acute liver failure, to develop in-hospital medical complications and organ system failure, and to require supporting therapies, including renal replacement therapy, and mechanical ventilation. Cluster 4 patients (n = 1728) were younger, and more likely to have alcoholic cirrhosis and to smoke. Thirty-three percent of patients died in hospital. In-hospital mortality was higher in cluster 1 (OR 1.53; 95% CI 1.31-1.79) and cluster 3 (OR 7.03; 95% CI 5.73-8.62), compared to cluster 2, while cluster 4 had comparable in-hospital mortality (OR 1.13; 95% CI 0.97-1.32). CONCLUSIONS Consensus clustering analysis provides the pattern of clinical characteristics and clinically distinct HRS phenotypes with different outcomes.
Collapse
|
40
|
Menshawi A, Hassan MM, Allheeib N, Fortino G. A Hybrid Generic Framework for Heart Problem Diagnosis Based on a Machine Learning Paradigm. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23031392. [PMID: 36772430 PMCID: PMC9921250 DOI: 10.3390/s23031392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/25/2022] [Accepted: 11/27/2022] [Indexed: 05/31/2023]
Abstract
The early, valid prediction of heart problems would minimize life threats and save lives, while lack of prediction and false diagnosis can be fatal. Addressing a single dataset alone to build a machine learning model for the identification of heart problems is not practical because each country and hospital has its own data schema, structure, and quality. On this basis, a generic framework has been built for heart problem diagnosis. This framework is a hybrid framework that employs multiple machine learning and deep learning techniques and votes for the best outcome based on a novel voting technique with the intention to remove bias from the model. The framework contains two consequent layers. The first layer contains simultaneous machine learning models running over a given dataset. The second layer consolidates the outputs of the first layer and classifies them as a second classification layer based on novel voting techniques. Prior to the classification process, the framework selects the top features using a proposed feature selection framework. It starts by filtering the columns using multiple feature selection methods and considers the top common features selected. Results from the proposed framework, with 95.6% accuracy, show its superiority over the single machine learning model, classical stacking technique, and traditional voting technique. The main contribution of this work is to demonstrate how the prediction probabilities of multiple models can be exploited for the purpose of creating another layer for final output; this step neutralizes any model bias. Another experimental contribution is proving the complete pipeline's ability to be retrained and used for other datasets collected using different measurements and with different distributions.
Collapse
Affiliation(s)
- Alaa Menshawi
- Information Systems Department, College of Computer and Information Science, King Saud University, Riyadh 11543, Saudi Arabia
| | - Mohammad Mehedi Hassan
- Information Systems Department, College of Computer and Information Science, King Saud University, Riyadh 11543, Saudi Arabia
| | - Nasser Allheeib
- Information Systems Department, College of Computer and Information Science, King Saud University, Riyadh 11543, Saudi Arabia
| | - Giancarlo Fortino
- Department of Informatics, Modeling, Electronics, and Systems, University of Calabria, 87036 Rende, Italy
| |
Collapse
|
41
|
Mpanya D, Celik T, Klug E, Ntsinjana H. Predicting in-hospital all-cause mortality in heart failure using machine learning. Front Cardiovasc Med 2023; 9:1032524. [PMID: 36712268 PMCID: PMC9875063 DOI: 10.3389/fcvm.2022.1032524] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 12/23/2022] [Indexed: 01/12/2023] Open
Abstract
Background The age of onset and causes of heart failure differ between high-income and low-and-middle-income countries (LMIC). Heart failure patients in LMIC also experience a higher mortality rate. Innovative ways that can risk stratify heart failure patients in this region are needed. The aim of this study was to demonstrate the utility of machine learning in predicting all-cause mortality in heart failure patients hospitalised in a tertiary academic centre. Methods Six supervised machine learning algorithms were trained to predict in-hospital all-cause mortality using data from 500 consecutive heart failure patients with a left ventricular ejection fraction (LVEF) less than 50%. Results The mean age was 55.2 ± 16.8 years. There were 271 (54.2%) males, and the mean LVEF was 29 ± 9.2%. The median duration of hospitalisation was 7 days (interquartile range: 4-11), and it did not differ between patients discharged alive and those who died. After a prediction window of 4 years (interquartile range: 2-6), 84 (16.8%) patients died before discharge from the hospital. The area under the receiver operating characteristic curve was 0.82, 0.78, 0.77, 0.76, 0.75, and 0.62 for random forest, logistic regression, support vector machines (SVM), extreme gradient boosting, multilayer perceptron (MLP), and decision trees, and the accuracy during the test phase was 88, 87, 86, 82, 78, and 76% for random forest, MLP, SVM, extreme gradient boosting, decision trees, and logistic regression. The support vector machines were the best performing algorithm, and furosemide, beta-blockers, spironolactone, early diastolic murmur, and a parasternal heave had a positive coefficient with the target feature, whereas coronary artery disease, potassium, oedema grade, ischaemic cardiomyopathy, and right bundle branch block on electrocardiogram had negative coefficients. Conclusion Despite a small sample size, supervised machine learning algorithms successfully predicted all-cause mortality with modest accuracy. The SVM model will be externally validated using data from multiple cardiology centres in South Africa before developing a uniquely African risk prediction tool that can potentially transform heart failure management through precision medicine.
Collapse
Affiliation(s)
- Dineo Mpanya
- Division of Cardiology, Department of Internal Medicine, School of Clinical Medicine, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa,Wits Institute of Data Science, University of the Witwatersrand, Johannesburg, South Africa,*Correspondence: Dineo Mpanya,
| | - Turgay Celik
- Wits Institute of Data Science, University of the Witwatersrand, Johannesburg, South Africa,School of Electrical and Information Engineering, Faculty of Engineering and Built Environment, University of the Witwatersrand, Johannesburg, South Africa
| | - Eric Klug
- Netcare Sunninghill, Sunward Park Hospitals and Division of Cardiology, Department of Internal Medicine, School of Clinical Medicine, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Hopewell Ntsinjana
- Department of Paediatrics and Child Health, School of Clinical Medicine, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
42
|
Diaz Ochoa JG, Maier L, Csiszar O. Bayesian logical neural networks for human-centered applications in medicine. FRONTIERS IN BIOINFORMATICS 2023; 3:1082941. [PMID: 36875147 PMCID: PMC9975151 DOI: 10.3389/fbinf.2023.1082941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 02/01/2023] [Indexed: 02/17/2023] Open
Abstract
Background: Medicine is characterized by its inherent uncertainty, i.e., the difficulty of identifying and obtaining exact outcomes from available data. Electronic Health Records aim to improve the exactitude of health management, for instance using automatic data recording techniques or the integration of structured as well as unstructured data. However, this data is far from perfect and is usually noisy, implying that epistemic uncertainty is almost always present in all biomedical research fields. This impairs the correct use and interpretation of the data not only by health professionals but also in modeling techniques and AI models incorporated in professional recommender systems. Method: In this work, we report a novel modeling methodology combining structural explainable models, defined on Logic Neural Networks which replace conventional deep-learning methods with logical gates embedded in neural networks, and Bayesian Networks to model data uncertainties. This means, we do not account for the variability of the input data, but we train single models according to the data and deliver different Logic-Operator neural network models that could adapt to the input data, for instance, medical procedures (Therapy Keys depending on the inherent uncertainty of the observed data. Result: Thus, our model does not only aim to assist physicians in their decisions by providing accurate recommendations; it is above all a user-centered solution that informs the physician when a given recommendation, in this case, a therapy, is uncertain and must be carefully evaluated. As a result, the physician must be a professional who does not solely rely on automatic recommendations. This novel methodology was tested on a database for patients with heart insufficiency and can be the basis for future applications of recommender systems in medicine.
Collapse
Affiliation(s)
- Juan G Diaz Ochoa
- Data Science & Machine Learning Division, PERMEDIQ GmbH, Wang, Germany
| | - Lukas Maier
- Data Science & Machine Learning Division, PERMEDIQ GmbH, Wang, Germany
| | - Orsolya Csiszar
- Faculty of Electrical Engineering and Computer Science, Hochschule Aalen, Aalen, Germany.,John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
| |
Collapse
|
43
|
Duan S, Liu C, Han P, Jin X, Zhang X, He T, Pan H, Xiang X. HT-Fed-GAN: Federated Generative Model for Decentralized Tabular Data Synthesis. ENTROPY (BASEL, SWITZERLAND) 2022; 25:88. [PMID: 36673229 PMCID: PMC9858387 DOI: 10.3390/e25010088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 12/24/2022] [Accepted: 12/28/2022] [Indexed: 06/17/2023]
Abstract
In this paper, we study the problem of privacy-preserving data synthesis (PPDS) for tabular data in a distributed multi-party environment. In a decentralized setting, for PPDS, federated generative models with differential privacy are used by the existing methods. Unfortunately, the existing models apply only to images or text data and not to tabular data. Unlike images, tabular data usually consist of mixed data types (discrete and continuous attributes) and real-world datasets with highly imbalanced data distributions. Existing methods hardly model such scenarios due to the multimodal distributions in the decentralized continuous columns and highly imbalanced categorical attributes of the clients. To solve these problems, we propose a federated generative model for decentralized tabular data synthesis (HT-Fed-GAN). There are three important parts of HT-Fed-GAN: the federated variational Bayesian Gaussian mixture model (Fed-VB-GMM), which is designed to solve the problem of multimodal distributions; federated conditional one-hot encoding with conditional sampling for global categorical attribute representation and rebalancing; and a privacy consumption-based federated conditional GAN for privacy-preserving decentralized data modeling. The experimental results on five real-world datasets show that HT-Fed-GAN obtains the best trade-off between the data utility and privacy level. For the data utility, the tables generated by HT-Fed-GAN are the most statistically similar to the original tables and the evaluation scores show that HT-Fed-GAN outperforms the state-of-the-art model in terms of machine learning tasks.
Collapse
Affiliation(s)
- Shaoming Duan
- School of Computer Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
- Insititute of Data Security, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
| | - Chuanyi Liu
- School of Computer Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
- Insititute of Data Security, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
- Peng Cheng Laboratory, Department of New Networks, Shenzhen 518000, China
| | - Peiyi Han
- School of Computer Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
- Insititute of Data Security, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
- Peng Cheng Laboratory, Department of New Networks, Shenzhen 518000, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518100, China
| | - Xinyi Zhang
- School of Computer Science and Technology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tianyu He
- School of Computer Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
- Insititute of Data Security, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China
| | - Hezhong Pan
- Peng Cheng Laboratory, Department of New Networks, Shenzhen 518000, China
| | - Xiayu Xiang
- Peng Cheng Laboratory, Department of New Networks, Shenzhen 518000, China
| |
Collapse
|
44
|
Cardozo G, Tirloni SF, Pereira Moro AR, Marques JLB. Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e40473. [PMID: 36644762 PMCID: PMC9828303 DOI: 10.2196/40473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/28/2022] [Accepted: 10/31/2022] [Indexed: 11/05/2022]
Abstract
Background In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases.
Collapse
Affiliation(s)
- Glauco Cardozo
- Federal Institute of Santa Catarina Florianópolis Brazil
| | | | | | | |
Collapse
|
45
|
Liu Z, Zhang R, Xv Y, Wang J, Chen J, Zhou X. A Novel Nomogram Integrated with Systemic Inflammation Markers and Traditional Prognostic Factors for Adverse Events' Prediction in Patients with Chronic Heart Failure in the Southwest of China. J Inflamm Res 2022; 15:6785-6800. [PMID: 36573109 PMCID: PMC9789703 DOI: 10.2147/jir.s366903] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 10/18/2022] [Indexed: 12/24/2022] Open
Abstract
Objective Inflammation contributes to the pathogenesis and progression of heart failure (HF). This study aimed to construct a nomogram based on systemic inflammatory markers and traditional prognostic factors to assess the risk of adverse outcomes (cardiovascular readmission and all-cause death) in patients with chronic heart failure (CHF). Methods Data were retrospectively collected from patients with HF admitted to the Department of Cardiovascular Medicine at the First Affiliated Hospital of Chongqing Medical University from January 2018 to April 2020, and each patient had complete follow-up information. The follow-up duration was from June 2018 to May 31, 2022. 550 patients were included and randomly assigned to the derivation and validation cohorts with a ratio of 7:3, and prognostic risk factors of CHF were identified by Cox regression analysis. The nomogram chart scoring model was constructed. Results The Cox multivariate regression analysis showed that traditional prognostic factors such as age (P=0.011), BMI (P=0.048), NYHA classification (P<0.001), creatinine (P<0.001), and systemic inflammatory markers including LMR (P=0.001), and PLR (P=0.015) were independent prognostic factors for CHF patients. Integrated with traditional and inflammatory prognostic factors, a nomogram was established, which yielded a C-index value of 0.739 (95% CI: 0.714-0.764) in the derivation cohort and 0.713 (95% CI: 0.668-0.758) in the validation cohort, respectively. The calibration curves exhibited good performance of the nomogram in predicting the adverse outcomes for patients with CHF. In subgroups (HFrEF, HFmrEF, and HFpEF groups), the systematic inflammatory markers-based nomograms proved to be effective prediction tools for patients' adverse overcomes, as well. Conclusion The nomogram combining systemic inflammatory markers and traditional risk factors has satisfactory predictive performance for adverse outcomes (mortality and readmission) in patients with CHF.
Collapse
Affiliation(s)
- Zhaojun Liu
- Department of Cardiology, First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Ren Zhang
- Department of Cardiology, First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Yingjie Xv
- Department of Urology, First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Jinkui Wang
- Department of Urology; Ministry of Education Key Laboratory of Child Development and Disorders; National Clinical Research Center for Child Health and Disorders (Chongqing); China International Science and Technology Cooperation Base of Child Development and Critical Disorders; Chongqing Key Laboratory of Pediatrics; Children’s Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Jie Chen
- Department of Cardiology, First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China
| | - Xiaoli Zhou
- Department of Cardiology, First Affiliated Hospital of Chongqing Medical University, Chongqing, People’s Republic of China,Correspondence: Xiaoli Zhou, Email
| |
Collapse
|
46
|
Lausser L, Szekely R, Schmid F, Maucher M, Kestler HA. Efficient cross-validation traversals in feature subset selection. Sci Rep 2022; 12:21485. [PMID: 36509882 PMCID: PMC9744898 DOI: 10.1038/s41598-022-25942-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/07/2022] [Indexed: 12/15/2022] Open
Abstract
Sparse and robust classification models have the potential for revealing common predictive patterns that not only allow for categorizing objects into classes but also for generating mechanistic hypotheses. Identifying a small and informative subset of features is their main ingredient. However, the exponential search space of feature subsets and the heuristic nature of selection algorithms limit the coverage of these analyses, even for low-dimensional datasets. We present methods for reducing the computational complexity of feature selection criteria allowing for higher efficiency and coverage of screenings. We achieve this by reducing the preparation costs of high-dimensional subsets [Formula: see text] to those of one-dimensional ones [Formula: see text]. Our methods are based on a tight interaction between a parallelizable cross-validation traversal strategy and distance-based classification algorithms and can be used with any product distance or kernel. We evaluate the traversal strategy exemplarily in exhaustive feature subset selection experiments (perfect coverage). Its runtime, fitness landscape, and predictive performance are analyzed on publicly available datasets. Even in low-dimensional settings, we achieve approximately a 15-fold increase in exhaustively generating distance matrices for feature combinations bringing a new level of evaluations into reach.
Collapse
Affiliation(s)
- Ludwig Lausser
- grid.6582.90000 0004 1936 9748Institute of Medical Systems Biology, Ulm University, Ulm, Germany ,grid.454235.10000 0000 9806 2445Faculty of Computer Science, Technische Hochschule Ingolstadt, Ingolstadt, Germany
| | - Robin Szekely
- grid.6582.90000 0004 1936 9748Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Florian Schmid
- grid.6582.90000 0004 1936 9748Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Markus Maucher
- grid.6582.90000 0004 1936 9748Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Hans A. Kestler
- grid.6582.90000 0004 1936 9748Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| |
Collapse
|
47
|
Krzyziński M, Spytek M, Baniecki H, Biecek P. SurvSHAP(t): Time-dependent explanations of machine learning survival models. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
48
|
Golbus JR, Joo H, Janda AM, Maile MD, Aaronson KD, Engoren MC, Cassidy RB, Kheterpal S, Mathis MR. Preoperative clinical diagnostic accuracy of heart failure among patients undergoing major noncardiac surgery: a single-centre prospective observational analysis. BJA OPEN 2022; 4:100113. [PMID: 36643721 PMCID: PMC9835767 DOI: 10.1016/j.bjao.2022.100113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 10/16/2022] [Accepted: 11/09/2022] [Indexed: 12/12/2022]
Abstract
Background Reliable diagnosis of heart failure during preoperative evaluation is important for perioperative management and long-term care. We aimed to quantify preoperative heart failure diagnostic accuracy and explore characteristics of patients with heart failure misdiagnoses. Methods We performed an observational cohort study of adults undergoing major noncardiac surgery at an academic hospital between 2015 and 2019. A preoperative clinical diagnosis of heart failure was defined using keywords from the history and clinical examination or administrative documentation. Across stratified subsamples of cases with and without clinically diagnosed heart failure, health records were intensively reviewed by an expert panel to develop an adjudicated heart failure reference standard using diagnostic criteria congruent with consensus guidelines. We calculated agreement among experts, and analysed performance of clinically diagnosed heart failure compared with the adjudicated reference standard. Results Across 40 555 major noncardiac procedures, a stratified subsample of 511 patients was reviewed by the expert panel. The prevalence of heart failure was 9.1% based on clinically diagnosed compared with 13.3% (95% confidence interval [CI], 10.3-16.2%) estimated by the expert panel. Overall agreement and inter-rater reliability (kappa) among heart failure experts were 95% and 0.79, respectively. Based upon expert adjudication, heart failure was clinically diagnosed with an accuracy of 92.8% (90.6-95.1%), sensitivity 57.4% (53.1-61.7%), specificity 98.3% (97.1-99.4%), positive predictive value 83.5% (80.3-86.8%), and negative predictive value 93.8% (91.7-95.9%). Conclusions Limitations exist to the preoperative clinical diagnosis of heart failure, with nearly half of cases undiagnosed preoperatively. Considering the risks of undiagnosed heart failure, efforts to improve preoperative heart failure diagnoses are warranted.
Collapse
Affiliation(s)
- Jessica R. Golbus
- Department of Internal Medicine, Division of Cardiovascular Medicine, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| | - Hyeon Joo
- Department of Anesthesiology, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| | - Allison M. Janda
- Department of Anesthesiology, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| | - Michael D. Maile
- Department of Anesthesiology, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| | - Keith D. Aaronson
- Department of Internal Medicine, Division of Cardiovascular Medicine, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| | - Milo C. Engoren
- Department of Anesthesiology, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| | - Ruth B. Cassidy
- Department of Anesthesiology, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| | - Sachin Kheterpal
- Department of Anesthesiology, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| | - Michael R. Mathis
- Department of Anesthesiology, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
- Department of Computational Bioinformatics, Michigan Medicine - University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
49
|
A Genetically-optimised Artificial Life Algorithm for Complexity-based Synthetic Dataset Generation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.11.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
50
|
Liu G, Xie Y, Gao X. Three-way reduction for formal decision contexts. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.10.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|