1
|
Ameksa M, Elamrani Abou Elassad Z, Lamjadli S, Mousannif H. Predicting stroke events with a proactive fusion system: a comprehensive study on imbalance class handling in computational biomechanics. Comput Methods Biomech Biomed Engin 2024:1-18. [PMID: 38902976 DOI: 10.1080/10255842.2024.2363946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 05/28/2024] [Indexed: 06/22/2024]
Abstract
Stroke, as a critical global health concern and the second leading cause of death, occurs when blood flow to the brain is interrupted. Although machine learning has advanced in medical safety, there is limited research on stroke prediction using information fusion systems. This study presents a fusion framework that combines multiple base classifiers and a Meta classifier to improve stroke prediction performance. The research utilizes Grid Search optimized models, such as Random Forest, Support Vector Machine, K Nearest Neighbors, AdaBoost, Gradient Boosting, Light Gradient Boosting, Categorical Boosting, and eXtreme Gradient Boosting for in-depth stroke analysis. Since stroke events are rare and unpredictable, classification outcomes can be deceptive due to imbalanced data. This article examines stroke probability by comparing three data balancing methods: over-sampling, under-sampling, and tomek-link synthetic minority over-sampling (SMOTE-TL) to enhance prediction accuracy. The findings reveal that AdaBoost as a meta-classifier attains the highest performance in the fusion framework, with a peak of 88.09% Recall and 83.66% F1 score. This innovative approach provides crucial insights into stroke prediction and can be a valuable resource for strengthening intervention efforts in advanced healthcare systems.
Collapse
Affiliation(s)
- Mohammed Ameksa
- LISI Laboratory, Computer Science Department, FSSM, Cadi Ayyad University, Marrakesh, Morocco
| | | | - Saad Lamjadli
- Immunology Laboratory, Arrazi Hospital, CHU Mohamed VI, Marrakech, Morocco
| | - Hajar Mousannif
- LISI Laboratory, Computer Science Department, FSSM, Cadi Ayyad University, Marrakesh, Morocco
| |
Collapse
|
2
|
Bhardwaj P, Tyagi A, Tyagi S, Antão J, Deng Q. Machine learning model for classification of predominantly allergic and non-allergic asthma among preschool children with asthma hospitalization. J Asthma 2023; 60:487-495. [PMID: 35344453 DOI: 10.1080/02770903.2022.2059763] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
OBJECTIVE Asthma is the most frequent chronic airway illness in preschool children and is difficult to diagnose due to the disease's heterogeneity. This study aimed to investigate different machine learning models and suggested the most effective one to classify two forms of asthma in preschool children (predominantly allergic asthma and non-allergic asthma) using a minimum number of features. METHODS After pre-processing, 127 patients (70 with non-allergic asthma and 57 with predominantly allergic asthma) were chosen for final analysis from the Frankfurt dataset, which had asthma-related information on 205 patients. The Random Forest algorithm and Chi-square were used to select the key features from a total of 63 features. Six machine learning models: random forest, extreme gradient boosting, support vector machines, adaptive boosting, extra tree classifier, and logistic regression were then trained and tested using 10-fold stratified cross-validation. RESULTS Among all features, age, weight, C-reactive protein, eosinophilic granulocytes, oxygen saturation, pre-medication inhaled corticosteroid + long-acting beta2-agonist (PM-ICS + LABA), PM-other (other pre-medication), H-Pulmicort/celestamine (Pulmicort/celestamine during hospitalization), and H-azithromycin (azithromycin during hospitalization) were found to be highly important. The support vector machine approach with a linear kernel was able to diffrentiate between predominantly allergic asthma and non-allergic asthma with higher accuracy (77.8%), precision (0.81), with a true positive rate of 0.73 and a true negative rate of 0.81, a F1 score of 0.81, and a ROC-AUC score of 0.79. Logistic regression was found to be the second-best classifier with an overall accuracy of 76.2%. CONCLUSION Predominantly allergic and non-allergic asthma can be classified using machine learning approaches based on nine features. Supplemental data for this article is available online at at www.tandfonline.com/ijas .
Collapse
Affiliation(s)
- Piyush Bhardwaj
- Centre for Advanced Computational Solutions (C-fACS), Department of Molecular Biosciences, Lincoln University, Lincoln, Christchurch, New Zealand
| | - Ashish Tyagi
- Department of Forensic Medicine & Toxicology, SHKM Govt. Medical College, Nuh, Haryana, India
| | - Shashank Tyagi
- Department of Forensic Medicine & Toxicology, Lady Hardinge Medical College & Associated Hospitals, New Delhi, India
| | - Joana Antão
- Lab3R-Respiratory Research and Rehabilitation Laboratory, School of Health Sciences (ESSUA), Department of Medical Sciences, Institute of Biomedicine (iBiMED), University of Aveiro, Aveiro, Portugal.,Department of Research and Education, CIRO, Horn, The Netherlands
| | - Qichen Deng
- Department of Research and Education, CIRO, Horn, The Netherlands.,Department of Respiratory Medicine, NUTRIM School of Nutrition and Translational Research in Metabolism, Maastricht University Medical Centre, Maastricht, The Netherlands.,Faculty of Health, Medicine and Life Sciences, Maastricht University Medical Centre, Limburg, The Netherlands
| |
Collapse
|
3
|
Han M, Guo H, Li J, Wang W. Global-local information based oversampling for multi-class imbalanced data. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01746-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
4
|
Davazdahemami B, Zolbanin HM, Delen D. An explanatory machine learning framework for studying pandemics: The case of COVID-19 emergency department readmissions. DECISION SUPPORT SYSTEMS 2022; 161:113730. [PMID: 35068629 PMCID: PMC8763415 DOI: 10.1016/j.dss.2022.113730] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 08/21/2021] [Accepted: 01/10/2022] [Indexed: 05/10/2023]
Abstract
One of the major challenges that confront medical experts during a pandemic is the time required to identify and validate the risk factors of the novel disease and to develop an effective treatment protocol. Traditionally, this process involves numerous clinical trials that may take up to several years, during which strict preventive measures must be in place to control the outbreak and reduce the deaths. Advanced data analytics techniques, however, can be leveraged to guide and speed up this process. In this study, we combine evolutionary search algorithms, deep learning, and advanced model interpretation methods to develop a holistic exploratory-predictive-explanatory machine learning framework that can assist clinical decision-makers in reacting to the challenges of a pandemic in a timely manner. The proposed framework is showcased in studying emergency department (ED) readmissions of COVID-19 patients using ED visits from a real-world electronic health records database. After an exploratory feature selection phase using genetic algorithm, we develop and train a deep artificial neural network to predict early (i.e., 7-day) readmissions (AUC = 0.883). Lastly, a SHAP model is formulated to estimate additive Shapley values (i.e., importance scores) of the features and to interpret the magnitude and direction of their effects. The findings are mostly in line with those reported by lengthy and expensive clinical trial studies.
Collapse
Affiliation(s)
- Behrooz Davazdahemami
- Department of IT & Supply Chain Management, University of Wisconsin-Whitewater, United States
| | - Hamed M Zolbanin
- Department of MIS, Operations & Supply Chain Management, Business Analytics, University of Dayton, United States
| | - Dursun Delen
- Center for Health Systems Innovation, Spears School of Business, Oklahoma State University, United States
- School of Business, Ibn Haldun University, Istanbul, Turkey
| |
Collapse
|
5
|
Mansouri A, Noei M, Saniee Abadeh M. A hybrid machine learning approach for early mortality prediction of ICU patients. PROGRESS IN ARTIFICIAL INTELLIGENCE 2022. [DOI: 10.1007/s13748-022-00288-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
6
|
Applying Machine Learning Techniques to the Audit of Antimicrobial Prophylaxis. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12052586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
High rates of inappropriate use of surgical antimicrobial prophylaxis were reported in many countries. Auditing the prophylactic antimicrobial use in enormous medical records by manual review is labor-intensive and time-consuming. The purpose of this study is to develop accurate and efficient machine learning models for auditing appropriate surgical antimicrobial prophylaxis. The supervised machine learning classifiers (Auto-WEKA, multilayer perceptron, decision tree, SimpleLogistic, Bagging, and AdaBoost) were applied to an antimicrobial prophylaxis dataset, which contained 601 instances with 26 attributes. Multilayer perceptron, SimpleLogistic selected by Auto-WEKA, and decision tree algorithms had outstanding discrimination with weighted average AUC > 0.97. The Bagging and SMOTE algorithms could improve the predictive performance of decision tree against imbalanced datasets. Although with better performance measures, multilayer perceptron and Auto-WEKA took more execution time as compared with that of other algorithms. Multilayer perceptron, SimpleLogistic, and decision tree algorithms have outstanding performance measures for identifying the appropriateness of surgical prophylaxis. The efficient models developed by machine learning can be used to assist the antimicrobial stewardship team in the audit of surgical antimicrobial prophylaxis. In future research, we still have the challenges and opportunities of enriching our datasets with more useful clinical information to improve the performance of the algorithms.
Collapse
|
7
|
Upadhyay K, Kaur P, Verma DK. Evaluating the Performance of Data Level Methods Using KEEL Tool to Address Class Imbalance Problem. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021. [DOI: 10.1007/s13369-021-06377-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
8
|
Selecting the Suitable Resampling Strategy for Imbalanced Data Classification Regarding Dataset Properties. An Approach Based on Association Models. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11188546] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class. This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples. Thus, the prediction model is unreliable although the overall model accuracy can be acceptable. Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class. However, their effectiveness depends on several factors mainly related to data intrinsic characteristics, such as imbalance ratio, dataset size and dimensionality, overlapping between classes or borderline examples. In this work, the impact of these factors is analyzed through a comprehensive comparative study involving 40 datasets from different application areas. The objective is to obtain models for automatic selection of the best resampling strategy for any dataset based on its characteristics. These models allow us to check several factors simultaneously considering a wide range of values since they are induced from very varied datasets that cover a broad spectrum of conditions. This differs from most studies that focus on the individual analysis of the characteristics or cover a small range of values. In addition, the study encompasses both basic and advanced resampling strategies that are evaluated by means of eight different performance metrics, including new measures specifically designed for imbalanced data classification. The general nature of the proposal allows the choice of the most appropriate method regardless of the domain, avoiding the search for special purpose techniques that could be valid for the target data.
Collapse
|
9
|
Bbosa FF, Nabukenya J, Nabende P, Wesonga R. On the goodness of fit of parametric and non-parametric data mining techniques: the case of malaria incidence thresholds in Uganda. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00551-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
10
|
Predicting healthcare-associated infections, length of stay, and mortality with the nursing intensity of care index. Infect Control Hosp Epidemiol 2021; 43:298-305. [PMID: 33858546 DOI: 10.1017/ice.2021.114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
OBJECTIVES The objectives of this study were (1) to develop and validate a simulation model to estimate daily probabilities of healthcare-associated infections (HAIs), length of stay (LOS), and mortality using time varying patient- and unit-level factors including staffing adequacy and (2) to examine whether HAI incidence varies with staffing adequacy. SETTING The study was conducted at 2 tertiary- and quaternary-care hospitals, a pediatric acute care hospital, and a community hospital within a single New York City healthcare network. PATIENTS All patients discharged from 2012 through 2016 (N = 562,435). METHODS We developed a non-Markovian simulation to estimate daily conditional probabilities of bloodstream, urinary tract, surgical site, and Clostridioides difficile infection, pneumonia, length of stay, and mortality. Staffing adequacy was modeled based on total nurse staffing (care supply) and the Nursing Intensity of Care Index (care demand). We compared model performance with logistic regression, and we generated case studies to illustrate daily changes in infection risk. We also described infection incidence by unit-level staffing and patient care demand on the day of infection. RESULTS Most model estimates fell within 95% confidence intervals of actual outcomes. The predictive power of the simulation model exceeded that of logistic regression (area under the curve [AUC], 0.852 and 0.816, respectively). HAI incidence was greatest when staffing was lowest and nursing care intensity was highest. CONCLUSIONS This model has potential clinical utility for identifying modifiable conditions in real time, such as low staffing coupled with high care demand.
Collapse
|
11
|
GT2FS-SMOTE: An Intelligent Oversampling Approach Based Upon General Type-2 Fuzzy Sets to Detect Web Spam. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2020. [DOI: 10.1007/s13369-020-04995-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|