51
|
Li Y, Wang H, Luo Y. Improving Fairness in the Prediction of Heart Failure Length of Stay and Mortality by Integrating Social Determinants of Health. Circ Heart Fail 2022; 15:e009473. [PMID: 36378761 PMCID: PMC9673161 DOI: 10.1161/circheartfailure.122.009473] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022]
Abstract
BACKGROUND Machine learning (ML) approaches have been broadly applied to the prediction of length of stay and mortality in hospitalized patients. ML may also reduce societal health burdens, assist in health resources planning and improve health outcomes. However, the fairness of these ML models across ethnoracial or socioeconomic subgroups is rarely assessed or discussed. In this study, we aim (1) to quantify the algorithmic bias of ML models when predicting the probability of long-term hospitalization or in-hospital mortality for different heart failure (HF) subpopulations, and (2) to propose a novel method that can improve the fairness of our models without compromising predictive power. METHODS We built 5 ML classifiers to predict the composite outcome of hospitalization length-of-stay and in-hospital mortality for 210 368 HF patients extracted from the Get With The Guidelines-Heart Failure registry data set. We integrated 15 social determinants of health variables, including the Social Deprivation Index and the Area Deprivation Index, into the feature space of ML models based on patients' geographies to mitigate the algorithmic bias. RESULTS The best-performing random forest model demonstrated modest predictive power but selectively underdiagnosed underserved subpopulations, for example, female, Black, and socioeconomically disadvantaged patients. The integration of social determinants of health variables can significantly improve fairness without compromising model performance. CONCLUSIONS We quantified algorithmic bias against underserved subpopulations in the prediction of the composite outcome for HF patients. We provide a potential direction to reduce disparities of ML-based predictive models by integrating social determinants of health variables. We urge fellow researchers to strongly consider ML fairness when developing predictive models for HF patients.
Collapse
Affiliation(s)
- Yikuan Li
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Hanyin Wang
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| | - Yuan Luo
- Division of Health and Biomedical Informatics, Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, Illinois, USA
| |
Collapse
|
52
|
Nagamine T, Gillette B, Kahoun J, Burghaus R, Lippert J, Saxena M. Data-driven identification of heart failure disease states and progression pathways using electronic health records. Sci Rep 2022; 12:17871. [PMID: 36284167 PMCID: PMC9596465 DOI: 10.1038/s41598-022-22398-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Accepted: 10/13/2022] [Indexed: 01/20/2023] Open
Abstract
Heart failure (HF) is a leading cause of morbidity, healthcare costs, and mortality. Guideline based segmentation of HF into distinct subtypes is coarse and unlikely to reflect the heterogeneity of etiologies and disease trajectories of patients. While analyses of electronic health records show promise in expanding our understanding of complex syndromes like HF in an evidence-driven way, limitations in data quality have presented challenges for large-scale EHR-based insight generation and decision-making. We present a hypothesis-free approach to generating real-world characteristics and progression patterns of HF. Patient disease state snapshots are extracted from the complaints mentioned in unstructured clinical notes. Typical disease states are generated by clustering and characterized in terms of their distinguishing features, temporal relationships, and risk of important clinical events. Our analysis generates a comprehensive "disease phenome" of real-world patients computed from large, noisy, secondary-use EHR datasets created in a routine clinical setting.
Collapse
Affiliation(s)
| | - Brian Gillette
- Department of Surgery, NYU Langone Long Island, Mineola, NY, USA
- Department of Foundations of Medicine, NYU Long Island School of Medicine, Mineola, NY, USA
| | - John Kahoun
- Droice Research, New York, NY, USA
- Clinical Informatics, CityMD, New York, NY, USA
| | | | | | | |
Collapse
|
53
|
Zamzami IF, Pathoee K, Gupta BB, Mishra A, Rawat D, Alhalabi W. Machine learning algorithms for smart and intelligent healthcare system in Society 5.0. INT J INTELL SYST 2022. [DOI: 10.1002/int.23061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Ikhlas Fuad Zamzami
- Department of Management Information System, Faculty of Business King Abdulaziz University Rabigh Saudi Arabia
| | | | - Brij B. Gupta
- Department of Computer Science and Information Engineering, International Center for AI and Cyber Security Research and Innovations Asia University Taichung Taiwan
- Lebanese American University Beirut Lebanon
- Center for Interdisciplinary Research University of Petroleum and Energy Studies (UPES) Dehradun Uttarakhand India
- Department of Computer Science King Abdulaziz University Jeddah Saudi Arabia
| | - Anupama Mishra
- Department of Computer Science and Engineering, Himalayan School of Science & Technology Swami Rama Himalayan University India
| | - Deepesh Rawat
- Department of Electronics and Communication Engineering, Himalayan School of Science & Technology Swami Rama Himalayan University India
| | - Wadee Alhalabi
- Department of Computer Science King Abdulaziz University Jeddah Saudi Arabia
| |
Collapse
|
54
|
Hassan CAU, Iqbal J, Irfan R, Hussain S, Algarni AD, Bukhari SSH, Alturki N, Ullah SS. Effectively Predicting the Presence of Coronary Heart Disease Using Machine Learning Classifiers. SENSORS (BASEL, SWITZERLAND) 2022; 22:7227. [PMID: 36236325 PMCID: PMC9573101 DOI: 10.3390/s22197227] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/03/2022] [Accepted: 07/27/2022] [Indexed: 06/16/2023]
Abstract
Coronary heart disease is one of the major causes of deaths around the globe. Predicating a heart disease is one of the most challenging tasks in the field of clinical data analysis. Machine learning (ML) is useful in diagnostic assistance in terms of decision making and prediction on the basis of the data produced by healthcare sector globally. We have also perceived ML techniques employed in the medical field of disease prediction. In this regard, numerous research studies have been shown on heart disease prediction using an ML classifier. In this paper, we used eleven ML classifiers to identify key features, which improved the predictability of heart disease. To introduce the prediction model, various feature combinations and well-known classification algorithms were used. We achieved 95% accuracy with gradient boosted trees and multilayer perceptron in the heart disease prediction model. The Random Forest gives a better performance level in heart disease prediction, with an accuracy level of 96%.
Collapse
Affiliation(s)
- Ch. Anwar ul Hassan
- Department of Creative Technologies, Air University Islamabad, Islamabad 44000, Pakistan
| | - Jawaid Iqbal
- Department of Computer Science, Capital University of Science and Technology, Islamabad 44000, Pakistan
| | - Rizwana Irfan
- Department of Computer Science, University of Jeddah, P.O. Box 123456, Jeddah 21959, Saudi Arabia
| | - Saddam Hussain
- School of Digital Science, Universiti Brunei Darussalam, Jalan Tungku Link, Gadong BE1410, Brunei
| | - Abeer D. Algarni
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | | | - Nazik Alturki
- Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Syed Sajid Ullah
- Department of Information and Communication Technology, University of Agder (UiA), N-4898 Grimstad, Norway
| |
Collapse
|
55
|
Liu T, Chi X, Du Y, Yang H, Xi Y, Guo J. IMLBoost for intelligent diagnosis with imbalanced medical records. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-216050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Class imbalance of medical records is a critical challenge for disease classification in intelligent diagnosis. Existing machine learning algorithms usually assign equal weights to all classes, which may reduce classification accuracy of imbalanced records. In this paper, a new Imbalance Lessened Boosting (IMLBoost) algorithm is proposed to better classify imbalanced medical records, highlighting the contribution of samples in minor classes as well as hard and boundary samples. A tailored Cost-Fitting Loss (CFL) function is proposed to assign befitting costs to these critical samples. The first and second derivations of the CFL are then derived and embedded into the classical XGBoost framework. In addition, some feature analysis skills are utilized to further improve performance of the IMLBoost, which also can speed up the model training. Experimental results on five UCI imbalanced medical datasets have demonstrated the effectiveness of the proposed algorithm. Compared with other existing classification methods, IMLBoost has improved the classification performance in terms of F1-score, G-mean and AUC.
Collapse
Affiliation(s)
- Tongtong Liu
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| | - Xiaofan Chi
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| | - Yukun Du
- The Affiliated Hospital of Qingdao University Spine Surgery, Qingdao, Shandong, China
| | - Huan Yang
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong, China
| | - Yongming Xi
- The Affiliated Hospital of Qingdao University Spine Surgery, Qingdao, Shandong, China
| | - Jianwei Guo
- The Affiliated Hospital of Qingdao University Spine Surgery, Qingdao, Shandong, China
| |
Collapse
|
56
|
Combining symbolic regression with the Cox proportional hazards model improves prediction of heart failure deaths. BMC Med Inform Decis Mak 2022; 22:196. [PMID: 35879758 PMCID: PMC9316394 DOI: 10.1186/s12911-022-01943-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 07/20/2022] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Heart failure is a clinical syndrome characterised by a reduced ability of the heart to pump blood. Patients with heart failure have a high mortality rate, and physicians need reliable prognostic predictions to make informed decisions about the appropriate application of devices, transplantation, medications, and palliative care. In this study, we demonstrate that combining symbolic regression with the Cox proportional hazards model improves the ability to predict death due to heart failure compared to using the Cox proportional hazards model alone. METHODS We used a newly invented symbolic regression method called the QLattice to analyse a data set of medical records for 299 Pakistani patients diagnosed with heart failure. The QLattice identified non-linear mathematical transformations of the available covariates, which we then used in a Cox model to predict survival. RESULTS An exponential function of age, the inverse of ejection fraction, and the inverse of serum creatinine were identified as the best risk factors for predicting heart failure deaths. A Cox model fitted on these transformed covariates had improved predictive performance compared with a Cox model on the same covariates without mathematical transformations. CONCLUSION Symbolic regression is a way to find transformations of covariates from patients' medical records which can improve the performance of survival regression models. At the same time, these simple functions are intuitive and easy to apply in clinical settings. The direct interpretability of the simple forms may help researchers gain new insights into the actual causal pathways leading to deaths.
Collapse
|
57
|
Exploring the predictive capability of machine learning models in identifying foot and mouth disease outbreak occurrences in cattle farms in an endemic setting of Thailand. Prev Vet Med 2022; 207:105706. [DOI: 10.1016/j.prevetmed.2022.105706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/09/2022] [Accepted: 07/01/2022] [Indexed: 11/20/2022]
|
58
|
Zhou X, Nakamura K, Sahara N, Asami M, Toyoda Y, Enomoto Y, Hara H, Noro M, Sugi K, Moroi M, Nakamura M, Huang M, Zhu X. Exploring and Identifying Prognostic Phenotypes of Patients with Heart Failure Guided by Explainable Machine Learning. Life (Basel) 2022; 12:life12060776. [PMID: 35743806 PMCID: PMC9224610 DOI: 10.3390/life12060776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 05/20/2022] [Accepted: 05/23/2022] [Indexed: 12/05/2022] Open
Abstract
Identifying patient prognostic phenotypes facilitates precision medicine. This study aimed to explore phenotypes of patients with heart failure (HF) corresponding to prognostic condition (risk of mortality) and identify the phenotype of new patients by machine learning (ML). A unsupervised ML was applied to explore phenotypes of patients in a derivation dataset (n = 562) based on their medical records. Thereafter, supervised ML models were trained on the derivation dataset to classify these identified phenotypes. Then, the trained classifiers were further validated on an independent validation dataset (n = 168). Finally, Shapley additive explanations were used to interpret decision making of phenotype classification. Three patient phenotypes corresponding to stratified mortality risk (high, low, and intermediate) were identified. Kaplan−Meier survival curves among the three phenotypes had significant difference (pairwise comparison p < 0.05). Hazard ratio of all-cause mortality between patients in phenotype 1 (n = 91; high risk) and phenotype 3 (n = 329; intermediate risk) was 2.08 (95%CI 1.29−3.37, p = 0.003), and 0.26 (95%CI 0.11−0.61, p = 0.002) between phenotype 2 (n = 142; low risk) and phenotype 3. For phenotypes classification by random forest, AUCs of phenotypes 1, 2, and 3 were 0.736 ± 0.038, 0.815 ± 0.035, and 0.721 ± 0.03, respectively, slightly better than the decision tree. Then, the classifier effectively identified the phenotypes for new patients in the validation dataset with significant difference on survival curves and hazard ratios. Finally, age and creatinine clearance rate were identified as the top two most important predictors. ML could effectively identify patient prognostic phenotypes, facilitating reasonable management and treatment considering prognostic condition.
Collapse
Affiliation(s)
- Xue Zhou
- Biomedical Information Engineering Lab, The University of Aizu, Aizuwakamatsu 965-8580, Japan;
| | - Keijiro Nakamura
- Division of Cardiovascular Medicine, Toho University Ohashi Medical Center, Tokyo 153-8515, Japan; (N.S.); (M.A.); (Y.T.); (Y.E.); (H.H.); (M.M.); (M.N.)
- Correspondence: (K.N.); (X.Z.); Tel.: +81-3-468-1251 (K.N.); +81-242-37-2771 (X.Z.)
| | - Naohiko Sahara
- Division of Cardiovascular Medicine, Toho University Ohashi Medical Center, Tokyo 153-8515, Japan; (N.S.); (M.A.); (Y.T.); (Y.E.); (H.H.); (M.M.); (M.N.)
| | - Masako Asami
- Division of Cardiovascular Medicine, Toho University Ohashi Medical Center, Tokyo 153-8515, Japan; (N.S.); (M.A.); (Y.T.); (Y.E.); (H.H.); (M.M.); (M.N.)
| | - Yasutake Toyoda
- Division of Cardiovascular Medicine, Toho University Ohashi Medical Center, Tokyo 153-8515, Japan; (N.S.); (M.A.); (Y.T.); (Y.E.); (H.H.); (M.M.); (M.N.)
| | - Yoshinari Enomoto
- Division of Cardiovascular Medicine, Toho University Ohashi Medical Center, Tokyo 153-8515, Japan; (N.S.); (M.A.); (Y.T.); (Y.E.); (H.H.); (M.M.); (M.N.)
| | - Hidehiko Hara
- Division of Cardiovascular Medicine, Toho University Ohashi Medical Center, Tokyo 153-8515, Japan; (N.S.); (M.A.); (Y.T.); (Y.E.); (H.H.); (M.M.); (M.N.)
| | - Mahito Noro
- Division of Cardiovascular Medicine, Odawara Cardiovascular Hospital, Odawara 250-0873, Japan; (M.N.); (K.S.)
| | - Kaoru Sugi
- Division of Cardiovascular Medicine, Odawara Cardiovascular Hospital, Odawara 250-0873, Japan; (M.N.); (K.S.)
| | - Masao Moroi
- Division of Cardiovascular Medicine, Toho University Ohashi Medical Center, Tokyo 153-8515, Japan; (N.S.); (M.A.); (Y.T.); (Y.E.); (H.H.); (M.M.); (M.N.)
| | - Masato Nakamura
- Division of Cardiovascular Medicine, Toho University Ohashi Medical Center, Tokyo 153-8515, Japan; (N.S.); (M.A.); (Y.T.); (Y.E.); (H.H.); (M.M.); (M.N.)
| | - Ming Huang
- Division of Information Science, Nara Institute of Science and Technology, Ikoma 630-0192, Japan;
| | - Xin Zhu
- Biomedical Information Engineering Lab, The University of Aizu, Aizuwakamatsu 965-8580, Japan;
- Correspondence: (K.N.); (X.Z.); Tel.: +81-3-468-1251 (K.N.); +81-242-37-2771 (X.Z.)
| |
Collapse
|
59
|
Manouchehri N, Bouguila N. A nonparametric Bayesian learning model using accelerated variational inference on multivariate Beta mixture models for medical applications. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2022. [DOI: 10.1142/s1793351x22500039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
60
|
Özbay Karakuş M, Er O. A comparative study on prediction of survival event of heart failure patients using machine learning algorithms. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07201-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
61
|
Uddin S, Haque I, Lu H, Moni MA, Gide E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci Rep 2022; 12:6256. [PMID: 35428863 PMCID: PMC9012855 DOI: 10.1038/s41598-022-10358-x] [Citation(s) in RCA: 73] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 04/05/2022] [Indexed: 11/17/2022] Open
Abstract
Disease risk prediction is a rising challenge in the medical domain. Researchers have widely used machine learning algorithms to solve this challenge. The k-nearest neighbour (KNN) algorithm is the most frequently used among the wide range of machine learning algorithms. This paper presents a study on different KNN variants (Classic one, Adaptive, Locally adaptive, k-means clustering, Fuzzy, Mutual, Ensemble, Hassanat and Generalised mean distance) and their performance comparison for disease prediction. This study analysed these variants in-depth through implementations and experimentations using eight machine learning benchmark datasets obtained from Kaggle, UCI Machine learning repository and OpenML. The datasets were related to different disease contexts. We considered the performance measures of accuracy, precision and recall for comparative analysis. The average accuracy values of these variants ranged from 64.22% to 83.62%. The Hassanaat KNN showed the highest average accuracy (83.62%), followed by the ensemble approach KNN (82.34%). A relative performance index is also proposed based on each performance measure to assess each variant and compare the results. This study identified Hassanat KNN as the best performing variant based on the accuracy-based version of this index, followed by the ensemble approach KNN. This study also provided a relative comparison among KNN variants based on precision and recall measures. Finally, this paper summarises which KNN variant is the most promising candidate to follow under the consideration of three performance measures (accuracy, precision and recall) for disease prediction. Healthcare researchers and stakeholders could use the findings of this study to select the appropriate KNN variant for predictive disease risk analytics.
Collapse
Affiliation(s)
- Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, 2037, Australia.
| | - Ibtisham Haque
- School of Electrical and Information Engineering, Faculty of Engineering, The University of Sydney, Darlington, NSW, 2008, Australia
| | - Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Forest Lodge, NSW, 2037, Australia
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Ergun Gide
- School of Engineering and Technology, CQUniversity (Sydney), Sydney, NSW, 2000, Australia
| |
Collapse
|
62
|
Srujana B, Verma D, Naqvi S. Machine Learning vs. survival analysis models: a study on right censored heart failure data. COMMUN STAT-SIMUL C 2022. [DOI: 10.1080/03610918.2022.2060510] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Affiliation(s)
- B. Srujana
- Department of Mathematics, Indian Institute of Technology Hyderabad, Hyderabad, India
| | - Dhananjay Verma
- Department of Mathematics, Indian Institute of Technology Hyderabad, Hyderabad, India
| | - Sameen Naqvi
- Department of Mathematics, Indian Institute of Technology Hyderabad, Hyderabad, India
| |
Collapse
|
63
|
Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases. Comput Biol Chem 2022; 97:107619. [DOI: 10.1016/j.compbiolchem.2021.107619] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 08/23/2021] [Accepted: 12/17/2021] [Indexed: 12/14/2022]
|
64
|
Rudar J, Porter TM, Wright M, Golding GB, Hajibabaei M. LANDMark: an ensemble approach to the supervised selection of biomarkers in high-throughput sequencing data. BMC Bioinformatics 2022; 23:110. [PMID: 35361114 PMCID: PMC8969335 DOI: 10.1186/s12859-022-04631-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 03/07/2022] [Indexed: 11/10/2022] Open
Abstract
Background Identification of biomarkers, which are measurable characteristics of biological datasets, can be challenging. Although amplicon sequence variants (ASVs) can be considered potential biomarkers, identifying important ASVs in high-throughput sequencing datasets is challenging. Noise, algorithmic failures to account for specific distributional properties, and feature interactions can complicate the discovery of ASV biomarkers. In addition, these issues can impact the replicability of various models and elevate false-discovery rates. Contemporary machine learning approaches can be leveraged to address these issues. Ensembles of decision trees are particularly effective at classifying the types of data commonly generated in high-throughput sequencing (HTS) studies due to their robustness when the number of features in the training data is orders of magnitude larger than the number of samples. In addition, when combined with appropriate model introspection algorithms, machine learning algorithms can also be used to discover and select potential biomarkers. However, the construction of these models could introduce various biases which potentially obfuscate feature discovery. Results We developed a decision tree ensemble, LANDMark, which uses oblique and non-linear cuts at each node. In synthetic and toy tests LANDMark consistently ranked as the best classifier and often outperformed the Random Forest classifier. When trained on the full metabarcoding dataset obtained from Canada’s Wood Buffalo National Park, LANDMark was able to create highly predictive models and achieved an overall balanced accuracy score of 0.96 ± 0.06. The use of recursive feature elimination did not impact LANDMark’s generalization performance and, when trained on data from the BE amplicon, it was able to outperform the Linear Support Vector Machine, Logistic Regression models, and Stochastic Gradient Descent models (p ≤ 0.05). Finally, LANDMark distinguishes itself due to its ability to learn smoother non-linear decision boundaries. Conclusions Our work introduces LANDMark, a meta-classifier which blends the characteristics of several machine learning models into a decision tree and ensemble learning framework. To our knowledge, this is the first study to apply this type of ensemble approach to amplicon sequencing data and we have shown that analyzing these datasets using LANDMark can produce highly predictive and consistent models. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04631-z.
Collapse
Affiliation(s)
- Josip Rudar
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| | - Teresita M Porter
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - Michael Wright
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, 1280 Main St. West, Hamilton, ON, L8S 4K1, Canada
| | - Mehrdad Hajibabaei
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, N1G 2W1, Canada.
| |
Collapse
|
65
|
Umer M, Sadiq S, Karamti H, Karamti W, Majeed R, NAPPI M. IoT Based Smart Monitoring of Patients' with Acute Heart Failure. SENSORS (BASEL, SWITZERLAND) 2022; 22:2431. [PMID: 35408045 PMCID: PMC9003513 DOI: 10.3390/s22072431] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/01/2022] [Accepted: 03/04/2022] [Indexed: 12/05/2022]
Abstract
The prediction of heart failure survivors is a challenging task and helps medical professionals to make the right decisions about patients. Expertise and experience of medical professionals are required to care for heart failure patients. Machine Learning models can help with understanding symptoms of cardiac disease. However, manual feature engineering is challenging and requires expertise to select the appropriate technique. This study proposes a smart healthcare framework using the Internet-of-Things (IoT) and cloud technologies that improve heart failure patients' survival prediction without considering manual feature engineering. The smart IoT-based framework monitors patients on the basis of real-time data and provides timely, effective, and quality healthcare services to heart failure patients. The proposed model also investigates deep learning models in classifying heart failure patients as alive or deceased. The framework employs IoT-based sensors to obtain signals and send them to the cloud web server for processing. These signals are further processed by deep learning models to determine the state of patients. Patients' health records and processing results are shared with a medical professional who will provide emergency help if required. The dataset used in this study contains 13 features and was attained from the UCI repository known as Heart Failure Clinical Records. The experimental results revealed that the CNN model is superior to other deep learning and machine learning models with a 0.9289 accuracy value.
Collapse
Affiliation(s)
- Muhammad Umer
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan 64200, Pakistan; (M.U.); (S.S.)
- Department of Computer Science Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan
| | - Saima Sadiq
- Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan 64200, Pakistan; (M.U.); (S.S.)
| | - Hanen Karamti
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia;
| | - Walid Karamti
- Department of Computer Science, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia;
- Data Engineering and Semantics Research Unit, Faculty of Sciences of Sfax, University of Sfax, Sfax 3052, Tunisia
| | - Rizwan Majeed
- Directorate of Information Technology, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan;
| | - Michele NAPPI
- Department of Computer Science, University of Salerno, 84084 Fisciano, Italy
| |
Collapse
|
66
|
Seedahmed MI, Mogilnicka I, Zeng S, Luo G, Whooley MA, McCulloch CE, Koth L, Arjomandi M. Performance of a Computational Phenotyping Algorithm for Sarcoidosis Using Diagnostic Codes in Electronic Medical Records: Case Validation Study From 2 Veterans Affairs Medical Centers. JMIR Form Res 2022; 6:e31615. [PMID: 35081036 PMCID: PMC8928044 DOI: 10.2196/31615] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 01/24/2022] [Accepted: 01/24/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Electronic medical records (EMRs) offer the promise of computationally identifying sarcoidosis cases. However, the accuracy of identifying these cases in the EMR is unknown. OBJECTIVE The aim of this study is to determine the statistical performance of using the International Classification of Diseases (ICD) diagnostic codes to identify patients with sarcoidosis in the EMR. METHODS We used the ICD diagnostic codes to identify sarcoidosis cases by searching the EMRs of the San Francisco and Palo Alto Veterans Affairs medical centers and randomly selecting 200 patients. To improve the diagnostic accuracy of the computational algorithm in cases where histopathological data are unavailable, we developed an index of suspicion to identify cases with a high index of suspicion for sarcoidosis (confirmed and probable) based on clinical and radiographic features alone using the American Thoracic Society practice guideline. Through medical record review, we determined the positive predictive value (PPV) of diagnosing sarcoidosis by two computational methods: using ICD codes alone and using ICD codes plus the high index of suspicion. RESULTS Among the 200 patients, 158 (79%) had a high index of suspicion for sarcoidosis. Of these 158 patients, 142 (89.9%) had documentation of nonnecrotizing granuloma, confirming biopsy-proven sarcoidosis. The PPV of using ICD codes alone was 79% (95% CI 78.6%-80.5%) for identifying sarcoidosis cases and 71% (95% CI 64.7%-77.3%) for identifying histopathologically confirmed sarcoidosis in the EMRs. The inclusion of the generated high index of suspicion to identify confirmed sarcoidosis cases increased the PPV significantly to 100% (95% CI 96.5%-100%). Histopathology documentation alone was 90% sensitive compared with high index of suspicion. CONCLUSIONS ICD codes are reasonable classifiers for identifying sarcoidosis cases within EMRs with a PPV of 79%. Using a computational algorithm to capture index of suspicion data elements could significantly improve the case-identification accuracy.
Collapse
Affiliation(s)
- Mohamed I Seedahmed
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
| | - Izabella Mogilnicka
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Experimental Physiology and Pathophysiology, Laboratory of the Centre for Preclinical Research, Medical University of Warsaw, Warsaw, Poland
| | - Siyang Zeng
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - Gang Luo
- Department of Biomedical Informatics and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | - Mary A Whooley
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
- Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- Measurement Science Quality Enhancement Research Initiative, San Francisco Veterans Affairs Healthcare System, San Francisco, CA, United States
| | - Charles E McCulloch
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA, United States
| | - Laura Koth
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
| | - Mehrdad Arjomandi
- Division of Pulmonary, Critical Care, Allergy and Immunology, and Sleep, Department of Medicine, University of California San Francisco, San Francisco, CA, United States
- San Francisco Veterans Affairs Medical Center, San Francisco, CA, United States
| |
Collapse
|
67
|
hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103456] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
68
|
Kibria HB, Matin A. The Severity Prediction of The Binary And Multi-Class Cardiovascular Disease - A Machine Learning-Based Fusion Approach. Comput Biol Chem 2022; 98:107672. [DOI: 10.1016/j.compbiolchem.2022.107672] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 02/25/2022] [Accepted: 03/26/2022] [Indexed: 12/22/2022]
|
69
|
Cheap robust learning of data anomalies with analytically solvable entropic outlier sparsification. Proc Natl Acad Sci U S A 2022; 119:2119659119. [PMID: 35197293 PMCID: PMC8917346 DOI: 10.1073/pnas.2119659119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2022] [Indexed: 11/21/2022] Open
Abstract
Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians—used heuristically in many popular data analysis algorithms—represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of 97%±2% when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data.
Collapse
|
70
|
Effectiveness of Artificial Intelligence Models for Cardiovascular Disease Prediction: Network Meta-Analysis. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5849995. [PMID: 35251153 PMCID: PMC8894073 DOI: 10.1155/2022/5849995] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 01/18/2022] [Indexed: 11/23/2022]
Abstract
Heart failure is the most common cause of death in both males and females around the world. Cardiovascular diseases (CVDs), in particular, are the main cause of death worldwide, accounting for 30% of all fatalities in the United States and 45% in Europe. Artificial intelligence (AI) approaches such as machine learning (ML) and deep learning (DL) models are playing an important role in the advancement of heart failure therapy. The main objective of this study was to perform a network meta-analysis of patients with heart failure, stroke, hypertension, and diabetes by comparing the ML and DL models. A comprehensive search of five electronic databases was performed using ScienceDirect, EMBASE, PubMed, Web of Science, and IEEE Xplore. The search strategy was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement. The methodological quality of studies was assessed by following the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) guidelines. The random-effects network meta-analysis forest plot with categorical data was used, as were subgroups testing for all four types of treatments and calculating odds ratio (OR) with a 95% confidence interval (CI). Pooled network forest, funnel plots, and the league table, which show the best algorithms for each outcome, were analyzed. Seventeen studies, with a total of 285,213 patients with CVDs, were included in the network meta-analysis. The statistical evidence indicated that the DL algorithms performed well in the prediction of heart failure with AUC of 0.843 and CI [0.840–0.845], while in the ML algorithm, the gradient boosting machine (GBM) achieved an average accuracy of 91.10% in predicting heart failure. An artificial neural network (ANN) performed well in the prediction of diabetes with an OR and CI of 0.0905 [0.0489; 0.1673]. Support vector machine (SVM) performed better for the prediction of stroke with OR and CI of 25.0801 [11.4824; 54.7803]. Random forest (RF) results performed well in the prediction of hypertension with OR and CI of 10.8527 [4.7434; 24.8305]. The findings of this work suggest that the DL models can effectively advance the prediction of and knowledge about heart failure, but there is a lack of literature regarding DL methods in the field of CVDs. As a result, more DL models should be applied in this field. To confirm our findings, more meta-analysis (e.g., Bayesian network) and thorough research with a larger number of patients are encouraged.
Collapse
|
71
|
Moradi S, Brandner C, Spielvogel C, Krajnc D, Hillmich S, Wille R, Drexler W, Papp L. Clinical data classification with noisy intermediate scale quantum computers. Sci Rep 2022; 12:1851. [PMID: 35115630 PMCID: PMC8814029 DOI: 10.1038/s41598-022-05971-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 01/21/2022] [Indexed: 11/09/2022] Open
Abstract
Quantum machine learning has experienced significant progress in both software and hardware development in the recent years and has emerged as an applicable area of near-term quantum computers. In this work, we investigate the feasibility of utilizing quantum machine learning (QML) on real clinical datasets. We propose two QML algorithms for data classification on IBM quantum hardware: a quantum distance classifier (qDS) and a simplified quantum-kernel support vector machine (sqKSVM). We utilize these different methods using the linear time quantum data encoding technique ([Formula: see text]) for embedding classical data into quantum states and estimating the inner product on the 15-qubit IBMQ Melbourne quantum computer. We match the predictive performance of our QML approaches with prior QML methods and with their classical counterpart algorithms for three open-access clinical datasets. Our results imply that the qDS in small sample and feature count datasets outperforms kernel-based methods. In contrast, quantum kernel approaches outperform qDS in high sample and feature count datasets. We demonstrate that the [Formula: see text] encoding increases predictive performance with up to + 2% area under the receiver operator characteristics curve across all quantum machine learning approaches, thus, making it ideal for machine learning tasks executed in Noisy Intermediate Scale Quantum computers.
Collapse
Affiliation(s)
- S Moradi
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Währinger Gürtel 18-20, 1090, Vienna, Austria
| | - C Brandner
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Währinger Gürtel 18-20, 1090, Vienna, Austria
| | - C Spielvogel
- Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - D Krajnc
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Währinger Gürtel 18-20, 1090, Vienna, Austria
| | - S Hillmich
- Institute for Integrated Circuits, Johannes Kepler University Linz, Linz, Austria
| | - R Wille
- Institute for Integrated Circuits, Johannes Kepler University Linz, Linz, Austria
- Software Competence Center Hagenberg GmbH, Hagenberg, Austria
| | - W Drexler
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Währinger Gürtel 18-20, 1090, Vienna, Austria
| | - L Papp
- Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Währinger Gürtel 18-20, 1090, Vienna, Austria.
| |
Collapse
|
72
|
Cheng L, Qiu Y, Schmidt BJ, Wei GW. Review of applications and challenges of quantitative systems pharmacology modeling and machine learning for heart failure. J Pharmacokinet Pharmacodyn 2022; 49:39-50. [PMID: 34637069 PMCID: PMC8837528 DOI: 10.1007/s10928-021-09785-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 09/22/2021] [Indexed: 12/24/2022]
Abstract
Quantitative systems pharmacology (QSP) is an important approach in pharmaceutical research and development that facilitates in silico generation of quantitative mechanistic hypotheses and enables in silico trials. As demonstrated by applications from numerous industry groups and interest from regulatory authorities, QSP is becoming an increasingly critical component in clinical drug development. With rapidly evolving computational tools and methods, QSP modeling has achieved important progress in pharmaceutical research and development, including for heart failure (HF). However, various challenges exist in the QSP modeling and clinical characterization of HF. Machine/deep learning (ML/DL) methods have had success in a wide variety of fields and disciplines. They provide data-driven approaches in HF diagnosis and modeling, and offer a novel strategy to inform QSP model development and calibration. The combination of ML/DL and QSP modeling becomes an emergent direction in the understanding of HF and clinical development new therapies. In this work, we review the current status and achievement in QSP and ML/DL for HF, and discuss remaining challenges and future perspectives in the field.
Collapse
Affiliation(s)
- Limei Cheng
- Quantitative Systems Pharmacology and Physiologically Based Pharmacokinetics, Bristol Myers Squibb, Princeton, NJ, 08536, USA.
| | - Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
| | - Brian J Schmidt
- Quantitative Systems Pharmacology and Physiologically Based Pharmacokinetics, Bristol Myers Squibb, Princeton, NJ, 08536, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI, 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
73
|
An Optimized Machine Learning Model Accurately Predicts In-Hospital Outcomes at Admission to a Cardiac Unit. Diagnostics (Basel) 2022; 12:diagnostics12020241. [PMID: 35204333 PMCID: PMC8871182 DOI: 10.3390/diagnostics12020241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Revised: 01/14/2022] [Accepted: 01/14/2022] [Indexed: 11/21/2022] Open
Abstract
Risk stratification at the time of hospital admission is of paramount significance in triaging the patients and providing timely care. In the present study, we aim at predicting multiple clinical outcomes using the data recorded during admission to a cardiac care unit via an optimized machine learning method. This study involves a total of 11,498 patients admitted to a cardiac care unit over two years. Patient demographics, admission type (emergency or outpatient), patient history, lab tests, and comorbidities were used to predict various outcomes. We employed a fully connected neural network architecture and optimized the models for various subsets of input features. Using 10-fold cross-validation, our optimized machine learning model predicted mortality with a mean area under the receiver operating characteristic curve (AUC) of 0.967 (95% confidence interval (CI): 0.963–0.972), heart failure AUC of 0.838 (CI: 0.825–0.851), ST-segment elevation myocardial infarction AUC of 0.832 (CI: 0.821–0.842), pulmonary embolism AUC of 0.802 (CI: 0.764–0.84), and estimated the duration of stay (DOS) with a mean absolute error of 2.543 days (CI: 2.499–2.586) of data with a mean and median DOS of 6.35 and 5.0 days, respectively. Further, we objectively quantified the importance of each feature and its correlation with the clinical assessment of the corresponding outcome. The proposed method accurately predicts various cardiac outcomes and can be used as a clinical decision support system to provide timely care and optimize hospital resources.
Collapse
|
74
|
Almazroi AA. Survival prediction among heart patients using machine learning techniques. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:134-145. [PMID: 34902984 DOI: 10.3934/mbe.2022007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Cardiovascular diseases are regarded as the most common reason for worldwide deaths. As per World Health Organization, nearly 17.9 million people die of heart-related diseases each year. The high shares of cardiovascular-related diseases in total worldwide deaths motivated researchers to focus on ways to reduce the numbers. In this regard, several works focused on the development of machine learning techniques/algorithms for early detection, diagnosis, and subsequent treatment of cardiovascular-related diseases. These works focused on a variety of issues such as finding important features to effectively predict the occurrence of heart-related diseases to calculate the survival probability. This research contributes to the body of literature by selecting a standard well defined, and well-curated dataset as well as a set of standard benchmark algorithms to independently verify their performance based on a set of different performance evaluation metrics. From our experimental evaluation, it was observed that decision tree is the best performing algorithm in comparison to logistic regression, support vector machines, and artificial neural networks. Decision trees achieved 14% better accuracy than the average performance of the remaining techniques. In contrast to other studies, this research observed that artificial neural networks are not as competitive as the decision tree or support vector machine.
Collapse
Affiliation(s)
- Abdulwahab Ali Almazroi
- University of Jeddah, College of Computing and Information Technology at Khulais, Department of Information Technology, Jeddah, Saudi Arabia
| |
Collapse
|
75
|
Hodgson S, Cheema S, Rani Z, Olaniyan D, O'Leary E, Price H, Dambha-Miller H. Population stratification in type 2 diabetes mellitus: A systematic review. Diabet Med 2022; 39:e14688. [PMID: 34519086 DOI: 10.1111/dme.14688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 09/11/2021] [Indexed: 12/01/2022]
Abstract
AIMS There is increasing interest in using stratification in type 2 diabetes to target resources, individualise care and improve outcomes. We aim to systematically review and collate literature that has utilised population stratification methods in the study of adults with type 2 diabetes; and to describe and compare stratification methodologies, population characteristics, variables used to stratify and outcome variables. METHODS The MEDLINE, EMBASE, CINAHL and Cochrane databases were searched from inception to July 2020. Studies included adults with type 2 diabetes using population stratification methods. The review protocol was registered on PROSPERO (ID: CRD42020206604) and conducted in line with PRISMA guidance. Extracted data included study aims; study setting (primary or secondary care); population characteristics; stratification variables and outcomes; and methodological approach to stratification. RESULTS Across 348 included studies, there were a total of 10,776,009 participants with a mean age of 61.0 years (SD 5.94). 6.7% of studies used data-driven methods and the rest employed expert-driven approaches using pre-defined stratification criteria. The commonest variable used to stratify populations was HbA1c (n = 57, 16.4%); few studies stratified using clinically important non-traditional variables such as health behaviours and beliefs. CONCLUSIONS Most studies performing population stratification in type 2 diabetes used expert-driven approaches with the aim of predicting outcomes in glycaemic control, mortality and cardiovascular complications. We identified relatively few studies using data-driven approaches, which offer opportunities generate hypotheses beyond current expert knowledge. We describe important research gaps including stratification with regard to disease remission.
Collapse
Affiliation(s)
- Sam Hodgson
- NIHR Academic Clinical Fellow, Primary Care Research Centre, University of Southampton, Southampton, UK
| | | | - Zareena Rani
- Medical Student, University of Southampton, Southampton, UK
| | - Doyinsola Olaniyan
- General Medicine Department, Hinchingbrooke Hospital, North West Anglia NHS Trust, Huntingdon, UK
| | - Ellen O'Leary
- Medical Student, St. George's University of London, London, UK
| | - Hermione Price
- Honorary Senior Lecturer, Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
| | - Hajira Dambha-Miller
- NIHR Clinical Lecturer, Primary Care Research Centre, University of Southampton, Southampton, UK
| |
Collapse
|
76
|
Zhang F, Yang J, Liu L, Yu Y. Generalized linear–quadratic model with a change point due to a covariate threshold. J Stat Plan Inference 2022. [DOI: 10.1016/j.jspi.2021.05.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
77
|
Yang B, Zhu Y, Lu X, Shen C. A Novel Composite Indicator of Predicting Mortality Risk for Heart Failure Patients With Diabetes Admitted to Intensive Care Unit Based on Machine Learning. Front Endocrinol (Lausanne) 2022; 13:917838. [PMID: 35846312 PMCID: PMC9277005 DOI: 10.3389/fendo.2022.917838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 05/11/2022] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Patients with heart failure (HF) with diabetes may face a poorer prognosis and higher mortality than patients with either disease alone, especially for those in intensive care unit. So far, there is no precise mortality risk prediction indicator for this kind of patient. METHOD Two high-quality critically ill databases, the Medical Information Mart for Intensive Care IV (MIMIC-IV) database and the Telehealth Intensive Care Unit (eICU) Collaborative Research Database (eICU-CRD) Collaborative Research Database, were used for study participants' screening as well as internal and external validation. Nine machine learning models were compared, and the best one was selected to define indicators associated with hospital mortality for patients with HF with diabetes. Existing attributes most related to hospital mortality were identified using a visualization method developed for machine learning, namely, Shapley Additive Explanations (SHAP) method. A new composite indicator ASL was established using logistics regression for patients with HF with diabetes based on major existing indicators. Then, the new index was compared with existing indicators to confirm its discrimination ability and clinical value using the receiver operating characteristic (ROC) curve, decision curve, and calibration curve. RESULTS The random forest model outperformed among nine models with the area under the ROC curve (AUC) = 0.92 after hyper-parameter optimization. By using this model, the top 20 attributes associated with hospital mortality in these patients were identified among all the attributes based on SHAP method. Acute Physiology Score (APS) III, Sepsis-related Organ Failure Assessment (SOFA), and Max lactate were selected as major attributes related to mortality risk, and a new composite indicator was developed by combining these three indicators, which was named as ASL. Both in the initial and external cohort, the new indicator, ASL, had greater risk discrimination ability with AUC higher than 0.80 in both low- and high-risk groups compared with existing attributes. The decision curve and calibration curve indicated that this indicator also had a respectable clinical value compared with APS III and SOFA. In addition, this indicator had a good risk stratification ability when the patients were divided into three risk levels. CONCLUSION A new composite indicator for predicting mortality risk in patients with HF with diabetes admitted to intensive care unit was developed on the basis of attributes identified by the random forest model. Compared with existing attributes such as APS III and SOFA, the new indicator had better discrimination ability and clinical value, which had potential value in reducing the mortality risk of these patients.
Collapse
Affiliation(s)
- Boshen Yang
- Department of Cardiology, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, China
| | - Yuankang Zhu
- Department of Gerontology, Xinhua Hospital affiliated to Shanghai Jiaotong University School of Medicine, Shanghai, China
| | - Xia Lu
- Department of Cardiology, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, China
- *Correspondence: Chengxing Shen, ; Xia Lu,
| | - Chengxing Shen
- Department of Cardiology, Shanghai Jiao Tong University Affiliated Sixth People’s Hospital, Shanghai, China
- *Correspondence: Chengxing Shen, ; Xia Lu,
| |
Collapse
|
78
|
Synthetic sampling from small datasets: A modified mega-trend diffusion approach using k-nearest neighbors. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
79
|
Score and Correlation Coefficient-Based Feature Selection for Predicting Heart Failure Diagnosis by Using Machine Learning Algorithms. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:8500314. [PMID: 34966445 PMCID: PMC8712170 DOI: 10.1155/2021/8500314] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 11/20/2021] [Accepted: 12/03/2021] [Indexed: 11/23/2022]
Abstract
Cardiovascular disease (CVD) is one of the most common causes of death that kills approximately 17 million people annually. The main reasons behind CVD are myocardial infarction and the failure of the heart to pump blood normally. Doctors could diagnose heart failure (HF) through electronic medical records on the basis of patient's symptoms and clinical laboratory investigations. However, accurate diagnosis of HF requires medical resources and expert practitioners that are not always available, thus making the diagnosing challengeable. Therefore, predicting the patients' condition by using machine learning algorithms is a necessity to save time and efforts. This paper proposed a machine-learning-based approach that distinguishes the most important correlated features amongst patients' electronic clinical records. The SelectKBest function was applied with chi-squared statistical method to determine the most important features, and then feature engineering method has been applied to create new features correlated strongly in order to train machine learning models and obtain promising results. Optimised hyperparameter classification algorithms SVM, KNN, Decision Tree, Random Forest, and Logistic Regression were used to train two different datasets. The first dataset, called Cleveland, consisted of 303 records. The second dataset, which was used for predicting HF, consisted of 299 records. Experimental results showed that the Random Forest algorithm achieved accuracy, precision, recall, and F1 scores of 95%, 97.62%, 95.35%, and 96.47%, respectively, during the test phase for the second dataset. The same algorithm achieved accuracy scores of 100% for the first dataset and 97.68% for the second dataset, while 100% precision, recall, and F1 scores were reached for both datasets.
Collapse
|
80
|
Jasinska-Piadlo A, Bond R, Biglarbeigi P, Brisk R, Campbell P, McEneaneny D. What can machines learn about heart failure? A systematic literature review. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2021. [DOI: 10.1007/s41060-021-00300-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
AbstractThis paper presents a systematic literature review with respect to application of data science and machine learning (ML) to heart failure (HF) datasets with the intention of generating both a synthesis of relevant findings and a critical evaluation of approaches, applicability and accuracy in order to inform future work within this field. This paper has a particular intention to consider ways in which the low uptake of ML techniques within clinical practice could be resolved. Literature searches were performed on Scopus (2014-2021), ProQuest and Ovid MEDLINE databases (2014-2021). Search terms included ‘heart failure’ or ‘cardiomyopathy’ and ‘machine learning’, ‘data analytics’, ‘data mining’ or ‘data science’. 81 out of 1688 articles were included in the review. The majority of studies were retrospective cohort studies. The median size of the patient cohort across all studies was 1944 (min 46, max 93260). The largest patient samples were used in readmission prediction models with the median sample size of 5676 (min. 380, max. 93260). Machine learning methods focused on common HF problems: detection of HF from available dataset, prediction of hospital readmission following index hospitalization, mortality prediction, classification and clustering of HF cohorts into subgroups with distinctive features and response to HF treatment. The most common ML methods used were logistic regression, decision trees, random forest and support vector machines. Information on validation of models was scarce. Based on the authors’ affiliations, there was a median 3:1 ratio between IT specialists and clinicians. Over half of studies were co-authored by a collaboration of medical and IT specialists. Approximately 25% of papers were authored solely by IT specialists who did not seek clinical input in data interpretation. The application of ML to datasets, in particular clustering methods, enabled the development of classification models assisting in testing the outcomes of patients with HF. There is, however, a tendency to over-claim the potential usefulness of ML models for clinical practice. The next body of work that is required for this research discipline is the design of randomised controlled trials (RCTs) with the use of ML in an intervention arm in order to prospectively validate these algorithms for real-world clinical utility.
Collapse
|
81
|
|
82
|
Reddy KVV, Elamvazuthi I, Aziz AA, Paramasivam S, Chua HN, Pranavanand S. Prediction of Heart Disease Risk Using Machine Learning with Correlation-based Feature Selection and Optimization Techniques. 2021 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICSC) 2021. [DOI: 10.1109/icsc53193.2021.9673490] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
83
|
Alkhodari M, Jelinek HF, Karlas A, Soulaidopoulos S, Arsenos P, Doundoulakis I, Gatzoulis KA, Tsioufis K, Hadjileontiadis LJ, Khandoker AH. Deep Learning Predicts Heart Failure With Preserved, Mid-Range, and Reduced Left Ventricular Ejection Fraction From Patient Clinical Profiles. Front Cardiovasc Med 2021; 8:755968. [PMID: 34881307 PMCID: PMC8645593 DOI: 10.3389/fcvm.2021.755968] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 10/19/2021] [Indexed: 02/03/2023] Open
Abstract
Background: Left ventricular ejection fraction (LVEF) is the gold standard for evaluating heart failure (HF) in coronary artery disease (CAD) patients. It is an essential metric in categorizing HF patients as preserved (HFpEF), mid-range (HFmEF), and reduced (HFrEF) ejection fraction but differs, depending on whether the ASE/EACVI or ESC guidelines are used to classify HF. Objectives: We sought to investigate the effectiveness of using deep learning as an automated tool to predict LVEF from patient clinical profiles using regression and classification trained models. We further investigate the effect of utilizing other LVEF-based thresholds to examine the discrimination ability of deep learning between HF categories grouped with narrower ranges. Methods: Data from 303 CAD patients were obtained from American and Greek patient databases and categorized based on the American Society of Echocardiography and the European Association of Cardiovascular Imaging (ASE/EACVI) guidelines into HFpEF (EF > 55%), HFmEF (50% ≤ EF ≤ 55%), and HFrEF (EF < 50%). Clinical profiles included 13 demographical and clinical markers grouped as cardiovascular risk factors, medication, and history. The most significant and important markers were determined using linear regression fitting and Chi-squared test combined with a novel dimensionality reduction algorithm based on arc radial visualization (ArcViz). Two deep learning-based models were then developed and trained using convolutional neural networks (CNN) to estimate LVEF levels from the clinical information and for classification into one of three LVEF-based HF categories. Results: A total of seven clinical markers were found important for discriminating between the three HF categories. Using statistical analysis, diabetes, diuretics medication, and prior myocardial infarction were found statistically significant (p < 0.001). Furthermore, age, body mass index (BMI), anti-arrhythmics medication, and previous ventricular tachycardia were found important after projections on the ArcViz convex hull with an average nearest centroid (NC) accuracy of 94%. The regression model estimated LVEF levels successfully with an overall accuracy of 90%, average root mean square error (RMSE) of 4.13, and correlation coefficient of 0.85. A significant improvement was then obtained with the classification model, which predicted HF categories with an accuracy ≥93%, sensitivity ≥89%, 1-specificity <5%, and average area under the receiver operating characteristics curve (AUROC) of 0.98. Conclusions: Our study suggests the potential of implementing deep learning-based models clinically to ensure faster, yet accurate, automatic prediction of HF based on the ASE/EACVI LVEF guidelines with only clinical profiles and corresponding information as input to the models. Invasive, expensive, and time-consuming clinical testing could thus be avoided, enabling reduced stress in patients and simpler triage for further intervention.
Collapse
Affiliation(s)
- Mohanad Alkhodari
- Department of Biomedical Engineering, Healthcare Engineering Innovation Center (HEIC), Khalifa University, Abu Dhabi, United Arab Emirates
| | - Herbert F Jelinek
- Department of Biomedical Engineering, Healthcare Engineering Innovation Center (HEIC), Khalifa University, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Biotechnology Center (BTC), Khalifa University, Abu Dhabi, United Arab Emirates
| | - Angelos Karlas
- Chair of Biological Imaging, Center for Translational Cancer Research (TranslaTUM), Technical University of Munich, Munich, Germany
- Institute of Biological and Medical Imaging, Helmholtz Zentrum München, Neuherberg, Germany
- Department for Vascular and Endovascular Surgery, Rechts der Isar University Hospital, Technical University of Munich, Munich, Germany
- DZHK (German Centre for Cardiovascular Research), Partner Site Munich Heart Alliance, Munich, Germany
| | - Stergios Soulaidopoulos
- First Cardiology Department, School of Medicine, "Hippokration" General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Petros Arsenos
- First Cardiology Department, School of Medicine, "Hippokration" General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Ioannis Doundoulakis
- First Cardiology Department, School of Medicine, "Hippokration" General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Konstantinos A Gatzoulis
- First Cardiology Department, School of Medicine, "Hippokration" General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Konstantinos Tsioufis
- First Cardiology Department, School of Medicine, "Hippokration" General Hospital, National and Kapodistrian University of Athens, Athens, Greece
| | - Leontios J Hadjileontiadis
- Department of Biomedical Engineering, Healthcare Engineering Innovation Center (HEIC), Khalifa University, Abu Dhabi, United Arab Emirates
- Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, United Arab Emirates
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Ahsan H Khandoker
- Department of Biomedical Engineering, Healthcare Engineering Innovation Center (HEIC), Khalifa University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
84
|
van der Galiën OP, Hoekstra RC, Gürgöze MT, Manintveld OC, van den Bunt MR, Veenman CJ, Boersma E. Prediction of long-term hospitalisation and all-cause mortality in patients with chronic heart failure on Dutch claims data: a machine learning approach. BMC Med Inform Decis Mak 2021; 21:303. [PMID: 34724933 PMCID: PMC8561992 DOI: 10.1186/s12911-021-01657-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 10/15/2021] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND Accurately predicting which patients with chronic heart failure (CHF) are particularly vulnerable for adverse outcomes is of crucial importance to support clinical decision making. The goal of the current study was to examine the predictive value on long term heart failure (HF) hospitalisation and all-cause mortality in CHF patients, by exploring and exploiting machine learning (ML) and traditional statistical techniques on a Dutch health insurance claims database. METHODS Our study population consisted of 25,776 patients with a CHF diagnosis code between 2012 and 2014 and one year and three years follow-up HF hospitalisation (1446 and 3220 patients respectively) and all-cause mortality (2434 and 7882 patients respectively) were measured from 2015 to 2018. The area under the receiver operating characteristic (ROC) curve (AUC) was calculated after modelling the data using Logistic Regression, Random Forest, Elastic Net regression and Neural Networks. RESULTS AUC rates ranged from 0.710 to 0.732 for 1-year HF hospitalisation, 0.705-0.733 for 3-years HF hospitalisation, 0.765-0.787 for 1-year mortality and 0.764-0.791 for 3-years mortality. Elastic Net performed best for all endpoints. Differences between techniques were small and only statistically significant between Elastic Net and Logistic Regression compared with Random Forest for 3-years HF hospitalisation. CONCLUSION In this study based on a health insurance claims database we found clear predictive value for predicting long-term HF hospitalisation and mortality of CHF patients by using ML techniques compared to traditional statistics.
Collapse
Affiliation(s)
| | | | - Muhammed T Gürgöze
- Department of Cardiology, Thorax Centre, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, The Netherlands
| | - Olivier C Manintveld
- Department of Cardiology, Thorax Centre, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, The Netherlands
| | | | - Cor J Veenman
- TNO, Leiden, The Netherlands
- Leiden University, Leiden, The Netherlands
| | - Eric Boersma
- Department of Cardiology, Thorax Centre, Erasmus MC, University Medical Centre Rotterdam, Rotterdam, The Netherlands.
| |
Collapse
|
85
|
Maurya MR, Riyaz NUSS, Reddy MSB, Yalcin HC, Ouakad HM, Bahadur I, Al-Maadeed S, Sadasivuni KK. A review of smart sensors coupled with Internet of Things and Artificial Intelligence approach for heart failure monitoring. Med Biol Eng Comput 2021; 59:2185-2203. [PMID: 34611787 DOI: 10.1007/s11517-021-02447-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 09/01/2021] [Indexed: 02/07/2023]
Abstract
Over the last decade, there has been a huge demand for health care technologies such as sensors-based prediction using digital health. With the continuous rise in the human population, these technologies showed to be potentially effective solutions to life-threatening diseases such as heart failure (HF). Besides being a potential for early death, HF has a significantly reduced quality of life (QoL). Heart failure has no cure. However, treatment can help you live a longer and more active life with fewer symptoms. Thus, it is essential to develop technological aid solutions allowing early diagnosis and consequently, effective treatment with possibly delayed mortality. Commonly, forecasts of HF are based on the generation of vast volumes of data usually collected from an individual patient by different components of the family history, physical examination, basic laboratory results, and other medical records. Though, these data are not effectively useful for predicting this failure, nevertheless, with the aid of advanced medical technology such as interconnected multi-sensory-based devices, and based on several medical history characteristics, the broad data provided machine learning algorithms to predict risk factors for heart disease of an individual is beneficial. There will be many challenges for the next decade of advancements in HF care: exploiting an increasingly growing repertoire of interconnected internal and external sensors for the benefit of patients and processing large, multimodal datasets with new Artificial Intelligence (AI) software. Various methods for predicting heart failure and, primarily the significance of invasive and non-invasive sensors along with different strategies for machine learning to predict heart failure are presented and summarized in the present study.
Collapse
Affiliation(s)
- Muni Raj Maurya
- Center for Advanced Materials, Qatar University, P.O. Box 2713, Doha, Qatar
- Department of Mechanical and Industrial Engineering, Qatar University, P.O. Box 2713, Doha, Qatar
| | | | - M Sai Bhargava Reddy
- Center for Nanoscience and Technology, Institute of Science and Technology, Jawaharlal Nehru Technological University, Hyderabad, Telangana State, 500085, India
| | | | - Hassen M Ouakad
- Mechanical and Industrial Engineering Department, College of Engineering, Sultan Qaboos University, Al-Khoudh, 123, PO-BOX 33, Muscat, Oman.
| | - Issam Bahadur
- Mechanical and Industrial Engineering Department, College of Engineering, Sultan Qaboos University, Al-Khoudh, 123, PO-BOX 33, Muscat, Oman
| | - Somaya Al-Maadeed
- Department of Computer Engineering, Qatar University, P.O. Box 2713, Doha, Qatar
| | | |
Collapse
|
86
|
Kumar D, Verma C, Dahiya S, Singh PK, Raboaca MS, Illés Z, Bakariya B. Cardiac Diagnostic Feature and Demographic Identification (CDF-DI): An IoT Enabled Healthcare Framework Using Machine Learning. SENSORS 2021; 21:s21196584. [PMID: 34640904 PMCID: PMC8512891 DOI: 10.3390/s21196584] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 09/25/2021] [Accepted: 09/28/2021] [Indexed: 12/24/2022]
Abstract
The incidence of cardiovascular diseases and cardiovascular burden (the number of deaths) are continuously rising worldwide. Heart disease leads to heart failure (HF) in affected patients. Therefore any additional aid to current medical support systems is crucial for the clinician to forecast the survival status for these patients. The collaborative use of machine learning and IoT devices has become very important in today's intelligent healthcare systems. This paper presents a Public Key Infrastructure (PKI) secured IoT enabled framework entitled Cardiac Diagnostic Feature and Demographic Identification (CDF-DI) systems with significant Models that recognize several Cardiac disease features related to HF. To achieve this goal, we used statistical and machine learning techniques to analyze the Cardiac secondary dataset. The Elevated Serum Creatinine (SC) levels and Serum Sodium (SS) could cause renal problems and are well established in HF patients. The Mann Whitney U test found that SC and SS levels affected the survival status of patients (p < 0.05). Anemia, diabetes, and BP features had no significant impact on the SS and SC level in the patient (p > 0.05). The Cox regression model also found a significant association of age group with the survival status using follow-up months. Furthermore, the present study also proposed important features of Cardiac disease that identified the patient's survival status, age group, and gender. The most prominent algorithm was the Random Forest (RF) suggesting five key features to determine the survival status of the patient with an accuracy of 96%: Follow-up months, SC, Ejection Fraction (EF), Creatinine Phosphokinase (CPK), and platelets. Additionally, the RF selected five prominent features (smoking habits, CPK, platelets, follow-up month, and SC) in recognition of gender with an accuracy of 94%. Moreover, the five vital features such as CPK, SC, follow-up month, platelets, and EF were found to be significant predictors for the patient's age group with an accuracy of 96%. The Kaplan Meier plot revealed that mortality was high in the extremely old age group (χ2 (1) = 8.565). The recommended features have possible effects on clinical practice and would be supportive aid to the existing medical support system to identify the possibility of the survival status of the heart patient. The doctor should primarily concentrate on the follow-up month, SC, EF, CPK, and platelet count for the patient's survival in the situation.
Collapse
Affiliation(s)
- Deepak Kumar
- Apex Institute of Technology, Chandigarh University, Mohali 140413, Punjab, India;
| | - Chaman Verma
- Department of Media and Educational Informatics, Faculty of Informatics, Eötvös Loránd University, 1053 Budapest, Hungary;
- Correspondence: (C.V.); (P.K.S.); (M.S.R.)
| | - Sanjay Dahiya
- Department of Computer Science and Engineering, Ch. Devi Lal State Institute of Engineering & Technology, Sirsa 125077, Haryana, India;
| | - Pradeep Kumar Singh
- Department of Computer Science, KIET Group of Institutions, Ghaziabad 201206, Uttar Pradesh, India
- Correspondence: (C.V.); (P.K.S.); (M.S.R.)
| | - Maria Simona Raboaca
- ICSI Energy, National Research and Development Institute for Cryogenic and Isotopic Technologies, 240050 Ramnicu Valcea, Romania
- Faculty of Electrical Engineering and Computer Science, “Stefan cel Mare” University of Suceava, 720229 Suceava, Romania
- Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania
- Doctoral School Polytechnic University of Bucharest, 061071 Bucharest, Romania
- Correspondence: (C.V.); (P.K.S.); (M.S.R.)
| | - Zoltán Illés
- Department of Media and Educational Informatics, Faculty of Informatics, Eötvös Loránd University, 1053 Budapest, Hungary;
| | - Brijesh Bakariya
- Department of Computer Application, I.K. Gujral Punjab Technical University, Jalandhar 144603, Punjab, India;
| |
Collapse
|
87
|
Improved Heart Disease Prediction Using Particle Swarm Optimization Based Stacked Sparse Autoencoder. ELECTRONICS 2021. [DOI: 10.3390/electronics10192347] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Heart disease is the leading cause of death globally. The most common type of heart disease is coronary heart disease, which occurs when there is a build-up of plaque inside the arteries that supply blood to the heart, making blood circulation difficult. The prediction of heart disease is a challenge in clinical machine learning. Early detection of people at risk of the disease is vital in preventing its progression. This paper proposes a deep learning approach to achieve improved prediction of heart disease. An enhanced stacked sparse autoencoder network (SSAE) is developed to achieve efficient feature learning. The network consists of multiple sparse autoencoders and a softmax classifier. Additionally, in deep learning models, the algorithm’s parameters need to be optimized appropriately to obtain efficient performance. Hence, we propose a particle swarm optimization (PSO) based technique to tune the parameters of the stacked sparse autoencoder. The optimization by the PSO improves the feature learning and classification performance of the SSAE. Meanwhile, the multilayer architecture of autoencoders usually leads to internal covariate shift, a problem that affects the generalization ability of the network; hence, batch normalization is introduced to prevent this problem. The experimental results show that the proposed method effectively predicts heart disease by obtaining a classification accuracy of 0.973 and 0.961 on the Framingham and Cleveland heart disease datasets, respectively, thereby outperforming other machine learning methods and similar studies.
Collapse
|
88
|
van Egmond MB, Spini G, van der Galien O, IJpma A, Veugen T, Kraaij W, Sangers A, Rooijakkers T, Langenkamp P, Kamphorst B, van de L'Isle N, Kooij-Janic M. Privacy-preserving dataset combination and Lasso regression for healthcare predictions. BMC Med Inform Decis Mak 2021; 21:266. [PMID: 34530824 PMCID: PMC8445286 DOI: 10.1186/s12911-021-01582-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 06/29/2021] [Indexed: 11/12/2022] Open
Abstract
Background Recent developments in machine learning have shown its potential impact for clinical use such as risk prediction, prognosis, and treatment selection. However, relevant data are often scattered across different stakeholders and their use is regulated, e.g. by GDPR or HIPAA. As a concrete use-case, hospital Erasmus MC and health insurance company Achmea have data on individuals in the city of Rotterdam, which would in theory enable them to train a regression model in order to identify high-impact lifestyle factors for heart failure. However, privacy and confidentiality concerns make it unfeasible to exchange these data. Methods This article describes a solution where vertically-partitioned synthetic data of Achmea and of Erasmus MC are combined using Secure Multi-Party Computation. First, a secure inner join protocol takes place to securely determine the identifiers of the patients that are represented in both datasets. Then, a secure Lasso Regression model is trained on the securely combined data. The involved parties thus obtain the prediction model but no further information on the input data of the other parties. Results We implement our secure solution and describe its performance and scalability: we can train a prediction model on two datasets with 5000 records each and a total of 30 features in less than one hour, with a minimal difference from the results of standard (non-secure) methods. Conclusions This article shows that it is possible to combine datasets and train a Lasso regression model on this combination in a secure way. Such a solution thus further expands the potential of privacy-preserving data analysis in the medical domain.
Collapse
Affiliation(s)
- Marie Beth van Egmond
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands.
| | - Gabriele Spini
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands
| | | | | | - Thijs Veugen
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands.,Cryptology Research Group, Centrum Wiskunde and Informatica (CWI), Amsterdam, The Netherlands
| | - Wessel Kraaij
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands.,Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands
| | - Alex Sangers
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands
| | - Thomas Rooijakkers
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands
| | - Peter Langenkamp
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands
| | - Bart Kamphorst
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands
| | | | - Milena Kooij-Janic
- Unit ICT, TNO (Dutch Organization for Applied Scientific Research), The Hague, The Netherlands
| |
Collapse
|
89
|
Kaliappan J, Srinivasan K, Mian Qaisar S, Sundararajan K, Chang CY, C S. Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate. Front Public Health 2021; 9:729795. [PMID: 34595149 PMCID: PMC8476853 DOI: 10.3389/fpubh.2021.729795] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Accepted: 08/16/2021] [Indexed: 01/28/2023] Open
Abstract
This paper aims to evaluate the performance of multiple non-linear regression techniques, such as support-vector regression (SVR), k-nearest neighbor (KNN), Random Forest Regressor, Gradient Boosting, and XGBOOST for COVID-19 reproduction rate prediction and to study the impact of feature selection algorithms and hyperparameter tuning on prediction. Sixteen features (for example, Total_cases_per_million and Total_deaths_per_million) related to significant factors, such as testing, death, positivity rate, active cases, stringency index, and population density are considered for the COVID-19 reproduction rate prediction. These 16 features are ranked using Random Forest, Gradient Boosting, and XGBOOST feature selection algorithms. Seven features are selected from the 16 features according to the ranks assigned by most of the above mentioned feature-selection algorithms. Predictions by historical statistical models are based solely on the predicted feature and the assumption that future instances resemble past occurrences. However, techniques, such as Random Forest, XGBOOST, Gradient Boosting, KNN, and SVR considered the influence of other significant features for predicting the result. The performance of reproduction rate prediction is measured by mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), R-Squared, relative absolute error (RAE), and root relative squared error (RRSE) metrics. The performances of algorithms with and without feature selection are similar, but a remarkable difference is seen with hyperparameter tuning. The results suggest that the reproduction rate is highly dependent on many features, and the prediction should not be based solely upon past values. In the case without hyperparameter tuning, the minimum value of RAE is 0.117315935 with feature selection and 0.0968989 without feature selection, respectively. The KNN attains a low MAE value of 0.0008 and performs well without feature selection and with hyperparameter tuning. The results show that predictions performed using all features and hyperparameter tuning is more accurate than predictions performed using selected features.
Collapse
Affiliation(s)
- Jayakumar Kaliappan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Kathiravan Srinivasan
- School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
| | - Saeed Mian Qaisar
- Electrical and Computer Engineering Department, Effat University, Jeddah, Saudi Arabia
| | - Karpagam Sundararajan
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Chuan-Yu Chang
- Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Taiwan
| | - Suganthan C
- School of Social Sciences and Languages, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
90
|
Reddy KVV, Elamvazuthi I, Aziz AA, Paramasivam S, Chua HN, Pranavanand S. Heart Disease Risk Prediction Using Machine Learning Classifiers with Attribute Evaluators. APPLIED SCIENCES 2021; 11:8352. [DOI: 10.3390/app11188352] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Cardiovascular diseases (CVDs) kill about 20.5 million people every year. Early prediction can help people to change their lifestyles and to ensure proper medical treatment if necessary. In this research, ten machine learning (ML) classifiers from different categories, such as Bayes, functions, lazy, meta, rules, and trees, were trained for efficient heart disease risk prediction using the full set of attributes of the Cleveland heart dataset and the optimal attribute sets obtained from three attribute evaluators. The performance of the algorithms was appraised using a 10-fold cross-validation testing option. Finally, we performed tuning of the hyperparameter number of nearest neighbors, namely, ‘k’ in the instance-based (IBk) classifier. The sequential minimal optimization (SMO) achieved an accuracy of 85.148% using the full set of attributes and 86.468% was the highest accuracy value using the optimal attribute set obtained from the chi-squared attribute evaluator. Meanwhile, the meta classifier bagging with logistic regression (LR) provided the highest ROC area of 0.91 using both the full and optimal attribute sets obtained from the ReliefF attribute evaluator. Overall, the SMO classifier stood as the best prediction method compared to other techniques, and IBk achieved an 8.25% accuracy improvement by tuning the hyperparameter ‘k’ to 9 with the chi-squared attribute set.
Collapse
|
91
|
Sadilek A, Liu L, Nguyen D, Kamruzzaman M, Serghiou S, Rader B, Ingerman A, Mellem S, Kairouz P, Nsoesie EO, MacFarlane J, Vullikanti A, Marathe M, Eastham P, Brownstein JS, Arcas BAY, Howell MD, Hernandez J. Privacy-first health research with federated learning. NPJ Digit Med 2021; 4:132. [PMID: 34493770 PMCID: PMC8423792 DOI: 10.1038/s41746-021-00489-2] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 07/21/2021] [Indexed: 11/29/2022] Open
Abstract
Privacy protection is paramount in conducting health research. However, studies often rely on data stored in a centralized repository, where analysis is done with full access to the sensitive underlying content. Recent advances in federated learning enable building complex machine-learned models that are trained in a distributed fashion. These techniques facilitate the calculation of research study endpoints such that private data never leaves a given device or healthcare system. We show-on a diverse set of single and multi-site health studies-that federated models can achieve similar accuracy, precision, and generalizability, and lead to the same interpretation as standard centralized statistical models while achieving considerably stronger privacy protections and without significantly raising computational costs. This work is the first to apply modern and general federated learning methods that explicitly incorporate differential privacy to clinical and epidemiological research-across a spectrum of units of federation, model architectures, complexity of learning tasks and diseases. As a result, it enables health research participants to remain in control of their data and still contribute to advancing science-aspects that used to be at odds with each other.
Collapse
Affiliation(s)
| | | | - Dung Nguyen
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Methun Kamruzzaman
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
| | | | - Benjamin Rader
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, MA, USA
- Department of Epidemiology, Boston University, Boston, MA, USA
| | | | | | | | | | | | - Anil Vullikanti
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Madhav Marathe
- Biocomplexity Institute, University of Virginia, Charlottesville, VA, USA
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | | | - John S Brownstein
- Computational Epidemiology Lab, Boston Children's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | | | | |
Collapse
|
92
|
Wang K, Tian J, Zheng C, Yang H, Ren J, Liu Y, Han Q, Zhang Y. Interpretable prediction of 3-year all-cause mortality in patients with heart failure caused by coronary heart disease based on machine learning and SHAP. Comput Biol Med 2021; 137:104813. [PMID: 34481185 DOI: 10.1016/j.compbiomed.2021.104813] [Citation(s) in RCA: 81] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 08/25/2021] [Accepted: 08/25/2021] [Indexed: 01/14/2023]
Abstract
BACKGROUND This study sought to evaluate the performance of machine learning (ML) models and establish an explainable ML model with good prediction of 3-year all-cause mortality in patients with heart failure (HF) caused by coronary heart disease (CHD). METHODS We established six ML models using follow-up data to predict 3-year all-cause mortality. Through comprehensive evaluation, the best performing model was used to predict and stratify patients. The log-rank test was used to assess the difference between Kaplan-Meier curves. The association between ML risk and 3-year all-cause mortality was also assessed using multivariable Cox regression. Finally, an explainable approach based on ML and the SHapley Additive exPlanations (SHAP) method was deployed to calculate 3-year all-cause mortality risk and to generate individual explanations of the model's decisions. RESULTS The best performing extreme gradient boosting (XGBoost) model was selected to predict and stratify patients. Subjects with a higher ML score had a high hazard of suffering events (hazard ratio [HR]: 10.351; P < 0.001), and this relationship persisted with a multivariable analysis (adjusted HR: 5.343; P < 0.001). Age, N-terminal pro-B-type natriuretic peptide, occupation, New York Heart Association classification, and nitrate drug use were important factors for both genders. CONCLUSIONS The ML-based risk stratification tool was able to accurately assess and stratify the risk of 3-year all-cause mortality in patients with HF caused by CHD. ML combined with SHAP could provide an explicit explanation of individualized risk prediction and give physicians an intuitive understanding of the influence of key features in the model.
Collapse
Affiliation(s)
- Ke Wang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Jing Tian
- Department of Cardiology, The First Affiliated Hospital of Shanxi Medical University, Taiyuan, People's Republic of China
| | - Chu Zheng
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Hong Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Jia Ren
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Yanling Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China
| | - Qinghua Han
- Department of Cardiology, The First Affiliated Hospital of Shanxi Medical University, Taiyuan, People's Republic of China
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People's Republic of China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Shanxi Medical University, Taiyuan, People's Republic of China.
| |
Collapse
|
93
|
Erdaw Y, Tachbele E. Machine Learning Model Applied on Chest X-Ray Images Enables Automatic Detection of COVID-19 Cases with High Accuracy. Int J Gen Med 2021; 14:4923-4931. [PMID: 34483682 PMCID: PMC8409602 DOI: 10.2147/ijgm.s325609] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 08/05/2021] [Indexed: 12/21/2022] Open
Abstract
PURPOSE This research was designed to investigate the application of artificial intelligence (AI) in the rapid and accurate diagnosis of coronavirus disease 2019 (COVID-19) using digital chest X-ray images, and to develop a robust computer-aided application for the automatic classification of COVID-19 pneumonia from other pneumonia and normal images. MATERIALS AND METHODS A total of 1100 chest X-ray images were randomly selected from three different open sources, containing 300 X-ray images of confirmed COVID-19 patients, 400 images of other pneumonia patients, and 400 normal X-ray images. In this study, a classical machine learning approach was employed. The model was built using the support vector machine (SVM) classifier algorithm. The SVM was trained by 630 features obtained from the HOG descriptor, which was quantized into 30 orientation bins in the range between 0 and 360. The model was validated using a 10-fold cross-validation method. The performance of the model was evaluated using appropriate classification metrics, including sensitivity, specificity, area under the curve, positive predictive value, negative predictive value, kappa, and accuracy. RESULTS The multi-level classification model was able to distinguish COVID-19 patients with a sensitivity of 97.92% and specificity of 98.91%, for the internal testing or cross-validation. For the independent external testing, the model showed sensitivity of 95% and specificity of 98.13%, for distinguishing COVID-19 from other pneumonia and no-findings. The binary classification model was able to distinguish COVID-19 patients with a sensitivity of 99.58% and specificity of 99.69%, for the internal testing. For the independent external testing, the model showed a sensitivity of 98.33% and specificity of 100%, for distinguishing COVID-19 from normal images. CONCLUSION The model can achieve the rapid and accurate identification of COVID-19 patients from chest X-rays with more than 97% accuracy. This high accuracy and very rapid computer-aided diagnostic approach would be very helpful to control the pandemic.
Collapse
Affiliation(s)
- Yabsera Erdaw
- Electrical and Mechanical Engineering, Addis Ababa Science & Technology University, Addis Ababa, Ethiopia
| | - Erdaw Tachbele
- Nursing & Midwifery, College of Health Sciences, Addis Ababa University, Addis Ababa, Ethiopia
| |
Collapse
|
94
|
Taheri Soodejani M, Tabatabaei SM, Mahmoudimanesh M. Bayesian statistics versus classical statistics in survival analysis: an applicable example. AMERICAN JOURNAL OF CARDIOVASCULAR DISEASE 2021; 11:484-488. [PMID: 34548947 PMCID: PMC8449193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 06/29/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Heart disease is the leading cause of death in the world and 17 million people die from cardiovascular diseases around the world each year, so finding factors that affect the survival of these patients is of particular importance. Therefore, finding the best model to analyze patient survival can help to find more accurate results. METHODS There are different methods to survival analysis that assess one or more risk factors; the classic Kaplan-Meier method, Cox regression, parametric survival models, and newer models such as Bayesian survival. Cox regression is most common and is generally used for time-dependent data, and the main difference between cox regression and Bayesian models is that the prior distribution in Bayesian models can affect the values of the parameters. Some survival analysis models have certain conditions that need to be considered before analyzing the data. In this paper, we use a dataset from Kaggle and discuss these conditions. This dataset contains medical records of 299 patients with heart failure collected at the Faisalabad Institute of Cardiology and the Allied Hospital in Faisalabad (Punjab, Pakistan) from April to December 2015. RESULTS This paper discusses that if the effective sample size is not sufficient, Bayesian survival models can be used to achieve more accurate results because this model is not affected by the sample size. The results of both methods are shown on a sample of cardiac data and based on the results of Bayesian Cox regression model, it was observed that Age, Anemia, Ejection fraction, High blood pressure and Serum creatinine were effective on patient survival. CONCLUSION The Bayesian models are much more accurate to determine survival and determine risk factors when dealing with data on rare diseases or diseases with low mortality, including heart patients whose survival probability is higher than that of cancer patients.
Collapse
Affiliation(s)
- Moslem Taheri Soodejani
- Center for Healthcare Data Modeling, Department of Biostatistics and Epidemiology, School of Public Health, Shahid Sadoughi University of Medical SciencesYazd, Iran
| | - Seyyed Mohammad Tabatabaei
- Medical Informatics Department, School of Medicine, Mashhad University of Medical SciencesMashhad, Iran
- Clinical Research Unit, Imam Reza Hospital, Mashhad University of Medical SciencesMashhad, Iran
| | - Marzieh Mahmoudimanesh
- PhD Student in Biostatistics, Department of Biostatistics and Epidemiology, School of Health, Kerman University of Medical SciencesKerman, Iran
| |
Collapse
|
95
|
Zheng X, Fang F, Nong W, Feng D, Yang Y. Development and validation of a model to estimate the risk of acute ischemic stroke in geriatric patients with primary hypertension. BMC Geriatr 2021; 21:458. [PMID: 34372766 PMCID: PMC8353783 DOI: 10.1186/s12877-021-02392-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 07/18/2021] [Indexed: 12/29/2022] Open
Abstract
Objectives This study aimed to construct and validate a prediction model of acute ischemic stroke in geriatric patients with primary hypertension. Methods This retrospective file review collected information on 1367 geriatric patients diagnosed with primary hypertension and with and without acute ischemic stroke between October 2018 and May 2020. The study cohort was randomly divided into a training set and a testing set at a ratio of 70 to 30%. A total of 15 clinical indicators were assessed using the chi-square test and then multivariable logistic regression analysis to develop the prediction model. We employed the area under the curve (AUC) and calibration curves to assess the performance of the model and a nomogram for visualization. Internal verification by bootstrap resampling (1000 times) and external verification with the independent testing set determined the accuracy of the model. Finally, this model was compared with four machine learning algorithms to identify the most effective method for predicting the risk of stroke. Results The prediction model identified six variables (smoking, alcohol abuse, blood pressure management, stroke history, diabetes, and carotid artery stenosis). The AUC was 0.736 in the training set and 0.730 and 0.725 after resampling and in the external verification, respectively. The calibration curve illustrated a close overlap between the predicted and actual diagnosis of stroke in both the training set and testing validation. The multivariable logistic regression analysis and support vector machine with radial basis function kernel were the best models with an AUC of 0.710. Conclusion The prediction model using multiple logistic regression analysis has considerable accuracy and can be visualized in a nomogram, which is convenient for its clinical application.
Collapse
Affiliation(s)
- Xifeng Zheng
- Department of Geriatrics, Affiliated Hospital of Guangdong Medical University, No.57, South of Renming Road, Zhanjiang, Guangdong, 524001, People's Republic of China.
| | - Fang Fang
- Department of General Medicine, Huazhong University of Science and Technology Union Shenzhen Hospital, Shenzhen, Guangdong, China
| | - Weidong Nong
- Department of Neurology, Affiliated Minzu Hospital of Guangxi Medical University, Nanning, Guangxi, China
| | - Dehui Feng
- Department of Geriatrics, Affiliated Hospital of Guangdong Medical University, No.57, South of Renming Road, Zhanjiang, Guangdong, 524001, People's Republic of China
| | - Yu Yang
- Department of Geriatrics, Affiliated Hospital of Guangdong Medical University, No.57, South of Renming Road, Zhanjiang, Guangdong, 524001, People's Republic of China
| |
Collapse
|
96
|
PHOTONAI-A Python API for rapid machine learning model development. PLoS One 2021; 16:e0254062. [PMID: 34288935 PMCID: PMC8294542 DOI: 10.1371/journal.pone.0254062] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 06/20/2021] [Indexed: 12/03/2022] Open
Abstract
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development. It functions as a unifying framework allowing the user to easily access and combine algorithms from different toolboxes into custom algorithm sequences. It is especially designed to support the iterative model development process and automates the repetitive training, hyperparameter optimization and evaluation tasks. Importantly, the workflow ensures unbiased performance estimates while still allowing the user to fully customize the machine learning analysis. PHOTONAI extends existing solutions with a novel pipeline implementation supporting more complex data streams, feature combinations, and algorithm selection. Metrics and results can be conveniently visualized using the PHOTONAI Explorer and predictive models are shareable in a standardized format for further external validation or application. A growing add-on ecosystem allows researchers to offer data modality specific algorithms to the community and enhance machine learning in the areas of the life sciences. Its practical utility is demonstrated on an exemplary medical machine learning problem, achieving a state-of-the-art solution in few lines of code. Source code is publicly available on Github, while examples and documentation can be found at www.photon-ai.com.
Collapse
|
97
|
Comoretto RI, Azzolina D, Amigoni A, Stoppa G, Todino F, Wolfler A, Gregori D. Predicting Hemodynamic Failure Development in PICU Using Machine Learning Techniques. Diagnostics (Basel) 2021; 11:diagnostics11071299. [PMID: 34359385 PMCID: PMC8303657 DOI: 10.3390/diagnostics11071299] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 07/12/2021] [Accepted: 07/16/2021] [Indexed: 11/16/2022] Open
Abstract
The present work aims to identify the predictors of hemodynamic failure (HF) developed during pediatric intensive care unit (PICU) stay testing a set of machine learning techniques (MLTs), comparing their ability to predict the outcome of interest. The study involved patients admitted to PICUs between 2010 and 2020. Data were extracted from the Italian Network of Pediatric Intensive Care Units (TIPNet) registry. The algorithms considered were generalized linear model (GLM), recursive partition tree (RPART), random forest (RF), neural networks models, and extreme gradient boosting (XGB). Since the outcome is rare, upsampling and downsampling algorithms have been applied for imbalance control. For each approach, the main performance measures were reported. Among an overall sample of 29,494 subjects, only 399 developed HF during the PICU stay. The median age was about two years, and the male gender was the most prevalent. The XGB algorithm outperformed other MLTs in predicting HF development, with a median ROC measure of 0.780 (IQR 0.770-0.793). PIM 3, age, and base excess were found to be the strongest predictors of outcome. The present work provides insights for the prediction of HF development during PICU stay using machine-learning algorithms.
Collapse
Affiliation(s)
- Rosanna I. Comoretto
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (R.I.C.); (D.A.); (G.S.); (F.T.)
| | - Danila Azzolina
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (R.I.C.); (D.A.); (G.S.); (F.T.)
- Department of Medical Sciences, University of Ferrara, 44100 Ferrara, Italy
| | - Angela Amigoni
- Pediatric Intensive Care Unit, Department of Women’s and Children’s Health, University Hospital of Padua, Via Giustiniani 2, 35128 Padova, Italy;
| | - Giorgia Stoppa
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (R.I.C.); (D.A.); (G.S.); (F.T.)
| | - Federica Todino
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (R.I.C.); (D.A.); (G.S.); (F.T.)
| | - Andrea Wolfler
- Department of Anaesthesia, Gaslini Hospital, 16147 Genova, Italy;
| | - Dario Gregori
- Unit of Biostatistics, Epidemiology and Public Health, Department of Cardiac, Thoracic, Vascular Sciences and Public Health, University of Padova, 35131 Padova, Italy; (R.I.C.); (D.A.); (G.S.); (F.T.)
- Correspondence: ; Tel.: +39-049-8275-384; Fax: +39-02-700-445-089
| | | |
Collapse
|
98
|
Reddy KVV, Elamvazuthi I, Aziz AA, Paramasivam S, Chua HN. Heart Disease Risk Prediction using Machine Learning with Principal Component Analysis. 2020 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT AND ADVANCED SYSTEMS (ICIAS) 2021. [DOI: 10.1109/icias49414.2021.9642676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
99
|
A CNN-based novel solution for determining the survival status of heart failure patients with clinical record data: numeric to image. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102716] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
100
|
Onder A, Incebay O, Sen MA, Yapici R, Kalyoncu M. Heuristic optimization of impeller sidewall gaps-based on the bees algorithm for a centrifugal blood pump by CFD. Int J Artif Organs 2021; 44:765-772. [PMID: 34128420 DOI: 10.1177/03913988211023773] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Optimization studies on blood pumps that require complex designs are gradually increasing in number. The essential design criteria of centrifugal blood pump are minimum shear stress with maximal efficiency. The geometry design of impeller sidewall gaps (blade tip clearance, axial gap, radial gap) is highly effective with regard to these two criteria. Therefore, unlike methods such as trial and error, the optimal dimensions of these gaps should be adjusted via a heuristic method, giving more effective results. In this study, the optimal gaps that can ensure these two design criteria with The Bees Algorithm (BA), which is a population-based heuristic method, are investigated. Firstly, a Computational Fluid Dynamics (CFD) analysis of sample pump models, which are selected according to the orthogonal array and pre-designed with different gaps, are performed. The dimensions of the gaps are optimized through this mathematical model. The simulation results for the improved pump model are nearly identical to those predicted by the BA. The improved pump model, as designed with the optimal gap dimensions so obtained, is able to meet the design criteria better than all existing sample pumps. Thanks to the optimal gap dimensions, it has been observed that compared to average values, it has provided a 42% reduction in aWSS and a 20% increase in efficiency. Moreover, original an approach to the design of impeller sidewall gaps was developed. The results show that computational costs have been significantly reduced by using the BA in blood pump geometry design.
Collapse
Affiliation(s)
- Ahmet Onder
- Technical Sciences Vocational School, Mechanical and Metal Technologies Department, Konya Technical University, Konya, Turkey
| | - Omer Incebay
- Faculty of Engineering and Natural Science, Mechanical Engineering Department, Konya Technical University, Konya, Turkey
| | - Muhammed Arif Sen
- Faculty of Engineering and Natural Science, Mechanical Engineering Department, Konya Technical University, Konya, Turkey
| | - Rafet Yapici
- Faculty of Engineering and Natural Science, Mechanical Engineering Department, Konya Technical University, Konya, Turkey
| | - Mete Kalyoncu
- Faculty of Engineering and Natural Science, Mechanical Engineering Department, Konya Technical University, Konya, Turkey
| |
Collapse
|