1
|
Ketabi M, Andishgar A, Fereidouni Z, Sani MM, Abdollahi A, Vali M, Alkamel A, Tabrizi R. Predicting the risk of mortality and rehospitalization in heart failure patients: A retrospective cohort study by machine learning approach. Clin Cardiol 2024; 47:e24239. [PMID: 38402566 PMCID: PMC10894620 DOI: 10.1002/clc.24239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 01/17/2024] [Accepted: 02/09/2024] [Indexed: 02/26/2024] Open
Abstract
BACKGROUND Heart failure (HF) is a global problem, affecting more than 26 million people worldwide. This study evaluated the performance of 10 machine learning (ML) algorithms and chose the best algorithm to predict mortality and readmission of HF patients by using The Fasa Registry on Systolic HF (FaRSH) database. HYPOTHESIS ML algorithms may better identify patients at increased risk of HF readmission or death with demographic and clinical data. METHODS Through comprehensive evaluation, the best-performing model was used for prediction. Finally, all the trained models were applied to the test data, which included 20% of the total data. For the final evaluation and comparison of the models, five metrics were used: accuracy, F1-score, sensitivity, specificity and Area Under Curve (AUC). RESULTS Ten ML algorithms were evaluated. The CatBoost (CAT) algorithm uses a series of decision tree models to create a nonlinear model, and this CAT algorithm performed the best of the 10 models studied. According to the three final outcomes from this study, which involved 2488 participants, 366 (14.7%) of the patients were readmitted to the hospital, 97 (3.9%) of the patients died within 1 month of the follow-up, and 342 (13.7%) of the patients died within 1 year of the follow-up. The most significant variables to predict the events were length of stay in the hospital, hemoglobin level, and family history of MI. CONCLUSIONS The ML-based risk stratification tool was able to assess the risk of 5-year all-cause mortality and readmission in patients with HF. ML could provide an explicit explanation of individualized risk prediction and give physicians an intuitive understanding of the influence of critical features in the model.
Collapse
Affiliation(s)
- Marzieh Ketabi
- Student Research CommitteeFasa University of Medical SciencesFasaIran
| | | | - Zhila Fereidouni
- Department of Medical Surgical NursingFasa University of Medical ScienceFarsIran
| | | | - Ashkan Abdollahi
- School of MedicineShiraz University of Medical SciencesShirazIran
| | - Mohebat Vali
- Student Research CommitteeShiraz University of Medical SciencesShirazIran
| | - Abdulhakim Alkamel
- Noncommunicable Diseases Research CenterFasa University of Medical ScienceFasaIran
| | - Reza Tabrizi
- Noncommunicable Diseases Research CenterFasa University of Medical ScienceFasaIran
- Clinical Research Development UnitFasa University of Medical SciencesFasaIran
| |
Collapse
|
2
|
Lin LS, Kao CH, Li YJ, Chen HH, Chen HY. Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:17672-17701. [PMID: 38052532 DOI: 10.3934/mbe.2023786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
To handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest generating synthetic examples in an original data space rather than in a high-dimensional feature space. This may be ineffective in improving SVM classification for imbalanced datasets. To address this problem, we propose a novel hybrid sampling technique termed modified mega-trend-diffusion-extreme learning machine (MMTD-ELM) to effectively move the SVM decision boundary toward a region of the majority class. By this movement, the prediction of SVM for minority class examples can be improved. The proposed method combines α-cut fuzzy number method for screening representative examples of majority class and MMTD method for creating new examples of the minority class. Furthermore, we construct a bagging ELM model to monitor the similarity between new examples and original data. In this paper, four datasets are used to test the efficiency of the proposed MMTD-ELM method in imbalanced data prediction. Additionally, we deployed two SVM models to compare prediction performance of the proposed MMTD-ELM method with three state-of-the-art sampling techniques in terms of geometric mean (G-mean), F-measure (F1), index of balanced accuracy (IBA) and area under curve (AUC) metrics. Furthermore, paired t-test is used to elucidate whether the suggested method has statistically significant differences from the other sampling techniques in terms of the four evaluation metrics. The experimental results demonstrated that the proposed method achieves the best average values in terms of G-mean, F1, IBA and AUC. Overall, the suggested MMTD-ELM method outperforms these sampling methods for imbalanced datasets.
Collapse
Affiliation(s)
- Liang-Sian Lin
- Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan
| | - Chen-Huan Kao
- Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan
| | - Yi-Jie Li
- Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan
| | - Hao-Hsuan Chen
- Department of Information Management, National Taipei University of Nursing and Health Sciences, Taipei 112303, Taiwan
| | - Hung-Yu Chen
- Department of Information Management, National Chin-Yi University of Technology, Taichung 411030, Taiwan
| |
Collapse
|
3
|
Venugopal G, Khan ZH, Dash R, Tulsian V, Agrawal S, Rout S, Mahajan P, Ramadass B. Predictive association of gut microbiome and NLR in anemic low middle-income population of Odisha- a cross-sectional study. Front Nutr 2023; 10:1200688. [PMID: 37528994 PMCID: PMC10390256 DOI: 10.3389/fnut.2023.1200688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 06/27/2023] [Indexed: 08/03/2023] Open
Abstract
Background Iron is abundant on earth but not readily available for colonizing bacteria due to its low solubility in the human body. Hosts and microbiota compete fiercely for iron. <15% Supplemented Iron is absorbed in the small bowel, and the remaining iron is a source of dysbiosis. The gut microbiome signatures to the level of predicting anemia among low-middle-income populations are unknown. The present study was conducted to identify gut microbiome signatures that have predictive potential in association with Neutrophil to lymphocytes ratio (NLR) and Mean corpuscular volume (MCV) in anemia. Methods One hundred and four participants between 10 and 70 years were recruited from Odisha's Low Middle-Income (LMI) rural population. Hematological parameters such as Hemoglobin (HGB), NLR, and MCV were measured, and NLR was categorized using percentiles. The microbiome signatures were analyzed from 61 anemic and 43 non-anemic participants using 16 s rRNA sequencing, followed by the Bioinformatics analysis performed to identify the diversity, correlations, and indicator species. The Multi-Layered Perceptron Neural Network (MLPNN) model were applied to predict anemia. Results Significant microbiome diversity among anemic participants was observed between the lower, middle, and upper Quartile NLR groups. For anemic participants with NLR in the lower quartile, alpha indices indicated bacterial overgrowth, and consistently, we identified R. faecis and B. uniformis were predominating. Using ROC analysis, R. faecis had better distinction (AUC = 0.803) to predict anemia with lower NLR. In contrast, E. biforme and H. parainfluenzae were indicators of the NLR in the middle and upper quartile, respectively. While in Non-anemic participants with low MCV, the bacterial alteration was inversely related to gender. Furthermore, our Multi-Layered Perceptron Neural Network (MLPNN) models also provided 89% accuracy in predicting Anemic or Non-Anemic from the top 20 OTUs, HGB level, NLR, MCV, and indicator species. Conclusion These findings strongly associate anemic hematological parameters and microbiome. Such predictive association between the gut microbiome and NLR could be further evaluated and utilized to design precision nutrition models and to predict Iron supplementation and dietary intervention responses in both community and clinical settings.
Collapse
Affiliation(s)
- Giriprasad Venugopal
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Zaiba Hasan Khan
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Rishikesh Dash
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Vinay Tulsian
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Siwani Agrawal
- Department of Biochemistry, All India Institute of Medical Sciences, Bhubaneswar, Odisha, India
| | - Sudeshna Rout
- Department of Biochemistry, All India Institute of Medical Sciences, Bhubaneswar, Odisha, India
| | - Preetam Mahajan
- Department of Community Medicine and Family Medicine, All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
| | - Balamurugan Ramadass
- Center of Excellence for Clinical Microbiome Research (CCMR), All India Institute of Medical Sciences (AIIMS), Bhubaneswar, Odisha, India
- Department of Biochemistry, All India Institute of Medical Sciences, Bhubaneswar, Odisha, India
- Adelaide Medical School Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia
| |
Collapse
|
4
|
Ding Y, Zhang J, Zhuang W, Gao Z, Kuang K, Tian D, Deng C, Wu H, Chen R, Lu G, Chen G, Mendogni P, Migliore M, Kang MW, Kanzaki R, Tang Y, Yang J, Shi Q, Qiao G. Improving the efficiency of identifying malignant pulmonary nodules before surgery via a combination of artificial intelligence CT image recognition and serum autoantibodies. Eur Radiol 2023; 33:3092-3102. [PMID: 36480027 DOI: 10.1007/s00330-022-09317-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/21/2022] [Accepted: 11/24/2022] [Indexed: 12/09/2022]
Abstract
OBJECTIVE To construct a new pulmonary nodule diagnostic model with high diagnostic efficiency, non-invasive and simple to measure. METHODS This study included 424 patients with radioactive pulmonary nodules who underwent preoperative 7-autoantibody (7-AAB) panel testing, CT-based AI diagnosis, and pathological diagnosis by surgical resection. The patients were randomly divided into a training set (n = 212) and a validation set (n = 212). The nomogram was developed through forward stepwise logistic regression based on the predictive factors identified by univariate and multivariate analyses in the training set and was verified internally in the verification set. RESULTS A diagnostic nomogram was constructed based on the statistically significant variables of age as well as CT-based AI diagnostic, 7-AAB panel, and CEA test results. In the validation set, the sensitivity, specificity, positive predictive value, and AUC were 82.29%, 90.48%, 97.24%, and 0.899 (95%[CI], 0.851-0.936), respectively. The nomogram showed significantly higher sensitivity than the 7-AAB panel test result (82.29% vs. 35.88%, p < 0.001) and CEA (82.29% vs. 18.82%, p < 0.001); it also had a significantly higher specificity than AI diagnosis (90.48% vs. 69.04%, p = 0.022). For lesions with a diameter of ≤ 2 cm, the specificity of the Nomogram was higher than that of the AI diagnostic system (90.00% vs. 67.50%, p = 0.022). CONCLUSIONS Based on the combination of a 7-AAB panel, an AI diagnostic system, and other clinical features, our Nomogram demonstrated good diagnostic performance in distinguishing lung nodules, especially those with ≤ 2 cm diameters. KEY POINTS • A novel diagnostic model of lung nodules was constructed by combining high-specific tumor markers with a high-sensitivity artificial intelligence diagnostic system. • The diagnostic model has good diagnostic performance in distinguishing malignant and benign pulmonary nodules, especially for nodules smaller than 2 cm. • The diagnostic model can assist the clinical decision-making of pulmonary nodules, with the advantages of high diagnostic efficiency, noninvasive, and simple measurement.
Collapse
Affiliation(s)
- Yu Ding
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China
| | - Jingyu Zhang
- State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, No. 1, Medical College Road, Yuzhong District, Chongqing, 400016, China
| | - Weitao Zhuang
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
| | - Zhen Gao
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China
| | | | - Dan Tian
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
| | - Cheng Deng
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
| | - Hansheng Wu
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China
- Department of Thoracic Surgery, The First Affiliated Hospital of Shantou University Medical College, Shantou, China
| | - Rixin Chen
- Research Center of Medical Sciences, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
| | - Guojie Lu
- Department of Thoracic Surgery, Guangzhou Panyu Central Hospital, Guangzhou, China
| | - Gang Chen
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
| | - Paolo Mendogni
- Thoracic Surgery and Lung Transplant Unit, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
| | - Marcello Migliore
- Thoracic Surgery, Cardio-Thoracic Department, University Hospital of Wales, Cardiff, UK
- Minimally Invasive Surgery and New Technology, University Hospital of Catania, Department of Surgery and Medical Specialties, University of Catania, Catania, Italy
| | - Min-Woong Kang
- Department of Thoracic and Cardiovascular Surgery, Chungnam National University School of Medicine, Daejeon, South Korea
| | - Ryu Kanzaki
- Department of General Thoracic Surgery, Osaka University Graduate School of Medicine, Osaka, Japan
| | - Yong Tang
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
| | - Jiancheng Yang
- Dianei Technology, Shanghai, China
- Computer Vision Laboratory (CVLab), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Qiuling Shi
- State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, No. 1, Medical College Road, Yuzhong District, Chongqing, 400016, China.
| | - Guibin Qiao
- Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China.
- The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China.
| |
Collapse
|
5
|
Krajnc D, Spielvogel CP, Grahovac M, Ecsedi B, Rasul S, Poetsch N, Traub-Weidinger T, Haug AR, Ritter Z, Alizadeh H, Hacker M, Beyer T, Papp L. Automated data preparation for in vivo tumor characterization with machine learning. Front Oncol 2022; 12:1017911. [PMID: 36303841 PMCID: PMC9595446 DOI: 10.3389/fonc.2022.1017911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/23/2022] [Indexed: 11/23/2022] Open
Abstract
Background This study proposes machine learning-driven data preparation (MLDP) for optimal data preparation (DP) prior to building prediction models for cancer cohorts. Methods A collection of well-established DP methods were incorporated for building the DP pipelines for various clinical cohorts prior to machine learning. Evolutionary algorithm principles combined with hyperparameter optimization were employed to iteratively select the best fitting subset of data preparation algorithms for the given dataset. The proposed method was validated for glioma and prostate single center cohorts by 100-fold Monte Carlo (MC) cross-validation scheme with 80-20% training-validation split ratio. In addition, a dual-center diffuse large B-cell lymphoma (DLBCL) cohort was utilized with Center 1 as training and Center 2 as independent validation datasets to predict cohort-specific clinical endpoints. Five machine learning (ML) classifiers were employed for building prediction models across all analyzed cohorts. Predictive performance was estimated by confusion matrix analytics over the validation sets of each cohort. The performance of each model with and without MLDP, as well as with manually-defined DP were compared in each of the four cohorts. Results Sixteen of twenty established predictive models demonstrated area under the receiver operator characteristics curve (AUC) performance increase utilizing the MLDP. The MLDP resulted in the highest performance increase for random forest (RF) (+0.16 AUC) and support vector machine (SVM) (+0.13 AUC) model schemes for predicting 36-months survival in the glioma cohort. Single center cohorts resulted in complex (6-7 DP steps) DP pipelines, with a high occurrence of outlier detection, feature selection and synthetic majority oversampling technique (SMOTE). In contrast, the optimal DP pipeline for the dual-center DLBCL cohort only included outlier detection and SMOTE DP steps. Conclusions This study demonstrates that data preparation prior to ML prediction model building in cancer cohorts shall be ML-driven itself, yielding optimal prediction models in both single and multi-centric settings.
Collapse
Affiliation(s)
- Denis Krajnc
- QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
| | - Clemens P. Spielvogel
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
- Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria
| | - Marko Grahovac
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Boglarka Ecsedi
- QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
| | - Sazan Rasul
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Nina Poetsch
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Tatjana Traub-Weidinger
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Alexander R. Haug
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
- Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria
| | - Zsombor Ritter
- Department of Medical Imaging, University of Pécs, Medical School, Pécs, Hungary
| | - Hussain Alizadeh
- 1st Department of Internal Medicine, University of Pécs, Medical School, Pécs, Hungary
| | - Marcus Hacker
- Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
| | - Thomas Beyer
- QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
| | - Laszlo Papp
- QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
- Applied Quantum Computing group, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
| |
Collapse
|
6
|
Bhatia A, Chug A, Singh AP, Singh D. A hybrid approach for noise reduction-based optimal classifier using genetic algorithm: A case study in plant disease prediction. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-216011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Plant diseases can cause significant losses to agricultural productivity; therefore, their early prediction is much needed. So far, many machine learning-based plant disease prediction models have been recommended, but these models face a problem of noisy class label dataset that degrades the performance. Noisy class label dataset results from the improper assignment of positive class labels into negative class data samples or vice versa. Hence, a precise and noise-free plant disease model is required for a better prediction. The current study proposes noise reduction-based hybridized classifiers for plant disease prediction. One tomato and four soybean disease datasets have been selected to conduct the proposed research. The Adaptive Sampling-based Class Label Noise Reduction (AS-CLNR) method has been used along with the Support Vector Machine (SVM) approach for noise reduction. The noise-minimized datasets have been fed into the Extreme Learning Machine (ELM), Decision Tree (DT), and Random Forest (RF) classifiers whose parameters are optimized using Genetic Algorithm (GA) for developing plant disease prediction models. The performances of all these models viz. Hybrid SVM-GA-ELM, Hybrid SVM-GA-DT, and Hybrid SVM-GA-RF have been evaluated using Accuracy, Area under ROC Curve, and F1-Score metrics. Further, these classifiers have been ranked using the statistical Friedman Test in which the Hybrid SVM-GA-RF classifier performed the best. Lastly, the Nemenyi test has also been performed to find out if significant differences exist between various classifiers or not. It was found that 33.33% of the total pairs of hybrid classifiers show a remarkably different performance from one another.
Collapse
Affiliation(s)
- Anshul Bhatia
- University School of Information, Communication and Technology, GGSIP University, Dwarka, New Delhi, India
| | - Anuradha Chug
- University School of Information, Communication and Technology, GGSIP University, Dwarka, New Delhi, India
| | - Amit Prakash Singh
- University School of Information, Communication and Technology, GGSIP University, Dwarka, New Delhi, India
| | - Dinesh Singh
- Division of Plant Pathology, Indian Agricultural Research Institute (IARI), New Delhi, India
| |
Collapse
|
7
|
Javeed A, Ali L, Mohammed Seid A, Ali A, Khan D, Imrana Y. A Clinical Decision Support System (CDSS) for Unbiased Prediction of Caesarean Section Based on Features Extraction and Optimized Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1901735. [PMID: 35707186 PMCID: PMC9192258 DOI: 10.1155/2022/1901735] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 04/16/2022] [Indexed: 12/14/2022]
Abstract
Nowadays, caesarean section (CS) is given preference over vaginal birth and this trend is rapidly rising around the globe, although CS has serious complications such as pregnancy scar, scar dehiscence, and morbidly adherent placenta. Thus, CS should only be performed when it is absolutely necessary for mother and fetus. To avoid unnecessary CS, researchers have developed different machine-learning- (ML-) based clinical decision support systems (CDSS) for CS prediction using electronic health record of the pregnant women. However, previously proposed methods suffer from the problems of poor accuracy and biasedness in ML. To overcome these problems, we have designed a novel CDSS where random oversampling example (ROSE) technique has been used to eliminate the problem of minority classes in the dataset. Furthermore, principal component analysis has been employed for feature extraction from the dataset while, for classification purpose, random forest (RF) model is deployed. We have fine-tuned the hyperparameter of RF using a grid search algorithm for optimal classification performance. Thus, the newly proposed system is named ROSE-PCA-RF and it is trained and tested using an online CS dataset available on the UCI repository. In the first experiment, conventional RF model is trained and tested on the dataset while in the second experiment, the proposed model is tested. The proposed ROSE-PCA-RF model improved the performance of traditional RF by 4.5% with reduced time complexity, while only using two extracted features through the PCA. Moreover, the proposed model has obtained 96.29% accuracy on training data while improving the accuracy of 97.12% on testing data.
Collapse
Affiliation(s)
- Ashir Javeed
- Aging Research Center, Karolinska Institute, Solna, Sweden
| | - Liaqat Ali
- Department of Electrical Engineering, University of Science and Technology Bannu, Bannu, Pakistan
| | - Abegaz Mohammed Seid
- Information & Computing Technology Division, College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Arif Ali
- Department of Computer Science, University of Science and Technology Bannu, Bannu, Pakistan
| | - Dilpazir Khan
- Department of Computer Science, University of Science and Technology Bannu, Bannu, Pakistan
| | - Yakubu Imrana
- School of Engineering, University for Development Studies, Tamale, Ghana
- School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu, China
| |
Collapse
|
8
|
Huang X, Cao T, Chen L, Li J, Tan Z, Xu B, Xu R, Song Y, Zhou Z, Wang Z, Wei Y, Zhang Y, Li J, Huo Y, Qin X, Wu Y, Wang X, Wang H, Cheng X, Xu X, Liu L. Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults. Front Cardiovasc Med 2022; 9:901240. [PMID: 35600480 PMCID: PMC9120532 DOI: 10.3389/fcvm.2022.901240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 04/05/2022] [Indexed: 11/13/2022] Open
Abstract
Background Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis. Methods The training set included 70% of data (n = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data (n = 6,211), and external validation was conducted using a nested case–control (NCC) dataset (n = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set. Results The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance. Conclusion Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models.
Collapse
Affiliation(s)
- Xiao Huang
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
- *Correspondence: Xiao Huang
| | - Tianyu Cao
- Biological Anthropology, University of California, Santa Barbara, Santa Barbara, CA, United States
| | - Liangziqian Chen
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
| | - Junpei Li
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Ziheng Tan
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Benjamin Xu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Richard Xu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Yun Song
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
- Institute of Biomedicine, Anhui Medical University, Hefei, China
| | - Ziyi Zhou
- Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
| | - Zhuo Wang
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yaping Wei
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yan Zhang
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Jianping Li
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Yong Huo
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Xianhui Qin
- National Clinical Research Study Center for Kidney Disease, The State Key Laboratory for Organ Failure Research, Renal Division, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yanqing Wu
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xiaobin Wang
- Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, United States
| | - Hong Wang
- Department of Cardiovascular Science, Temple University Lewis Katz School of Medicine, Philadelphia, PA, United States
| | - Xiaoshu Cheng
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xiping Xu
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Lishun Liu
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
- Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
- Lishun Liu
| |
Collapse
|
9
|
Wang J, Wang S, Zhu MX, Yang T, Yin Q, Hou Y. Risk Prediction of Major Adverse Cardiovascular Events Occurrence Within 6 Months After Coronary Revascularization: Machine Learning Study. JMIR Med Inform 2022; 10:e33395. [PMID: 35442202 PMCID: PMC9069286 DOI: 10.2196/33395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 02/25/2022] [Accepted: 03/02/2022] [Indexed: 12/13/2022] Open
Abstract
Background As a major health hazard, the incidence of coronary heart disease has been increasing year by year. Although coronary revascularization, mainly percutaneous coronary intervention, has played an important role in the treatment of coronary heart disease, major adverse cardiovascular events (MACE) such as recurrent or persistent angina pectoris after coronary revascularization remain a very difficult problem in clinical practice. Objective Given the high probability of MACE after coronary revascularization, the aim of this study was to develop and validate a predictive model for MACE occurrence within 6 months based on machine learning algorithms. Methods A retrospective study was performed including 1004 patients who had undergone coronary revascularization at The People’s Hospital of Liaoning Province and Affiliated Hospital of Liaoning University of Traditional Chinese Medicine from June 2019 to December 2020. According to the characteristics of available data, an oversampling strategy was adopted for initial preprocessing. We then employed six machine learning algorithms, including decision tree, random forest, logistic regression, naïve Bayes, support vector machine, and extreme gradient boosting (XGBoost), to develop prediction models for MACE depending on clinical information and 6-month follow-up information. Among all samples, 70% were randomly selected for training and the remaining 30% were used for model validation. Model performance was assessed based on accuracy, precision, recall, F1-score, confusion matrix, area under the receiver operating characteristic (ROC) curve (AUC), and visualization of the ROC curve. Results Univariate analysis showed that 21 patient characteristic variables were statistically significant (P<.05) between the groups without and with MACE. Coupled with these significant factors, among the six machine learning algorithms, XGBoost stood out with an accuracy of 0.7788, precision of 0.8058, recall of 0.7345, F1-score of 0.7685, and AUC of 0.8599. Further exploration of the models to identify factors affecting the occurrence of MACE revealed that use of anticoagulant drugs and course of the disease consistently ranked in the top two predictive factors in three developed models. Conclusions The machine learning risk models constructed in this study can achieve acceptable performance of MACE prediction, with XGBoost performing the best, providing a valuable reference for pointed intervention and clinical decision-making in MACE prevention.
Collapse
Affiliation(s)
- Jinwan Wang
- School of Information Management, Nanjing University, Nanjing, China
| | - Shuai Wang
- First Department of Cardiology, The Affiliated Hospital of Liaoning University of Traditional Chinese Medicine, Shenyang, China
| | - Mark Xuefang Zhu
- School of Information Management, Nanjing University, Nanjing, China
| | - Tao Yang
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China
| | - Qingfeng Yin
- Jiangsu Famous Medical Technology Co Ltd, Nanjing, China
| | - Ya Hou
- Jiangsu Famous Medical Technology Co Ltd, Nanjing, China
| |
Collapse
|
10
|
Qian X, Zhou Z, Hu J, Zhu J, Huang H, Dai Y. A comparative study of kernel-based vector machines with probabilistic outputs for medical diagnosis. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
11
|
Xiong Y, Ye M, Wu C. Cancer Classification with a Cost-Sensitive Naive Bayes Stacking Ensemble. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:5556992. [PMID: 33986823 PMCID: PMC8093037 DOI: 10.1155/2021/5556992] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 03/17/2021] [Accepted: 04/15/2021] [Indexed: 02/07/2023]
Abstract
Ensemble learning combines multiple learners to perform combinatorial learning, which has advantages of good flexibility and higher generalization performance. To achieve higher quality cancer classification, in this study, the fast correlation-based feature selection (FCBF) method was used to preprocess the data to eliminate irrelevant and redundant features. Then, the classification was carried out in the stacking ensemble learner. A library for support vector machine (LIBSVM), K-nearest neighbor (KNN), decision tree C4.5 (C4.5), and random forest (RF) were used as the primary learners of the stacking ensemble. Given the imbalanced characteristics of cancer gene expression data, the embedding cost-sensitive naive Bayes was used as the metalearner of the stacking ensemble, which was represented as CSNB stacking. The proposed CSNB stacking method was applied to nine cancer datasets to further verify the classification performance of the model. Compared with other classification methods, such as single classifier algorithms and ensemble algorithms, the experimental results showed the effectiveness and robustness of the proposed method in processing different types of cancer data. This method may therefore help guide cancer diagnosis and research.
Collapse
Affiliation(s)
- Yueling Xiong
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Mingquan Ye
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Changrong Wu
- School of Computer and Information, Anhui Normal University, Wuhu 241002, China
| |
Collapse
|
12
|
Shaw SS, Ahmed S, Malakar S, Garcia-Hernandez L, Abraham A, Sarkar R. Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00314-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
AbstractMany real-life datasets are imbalanced in nature, which implies that the number of samples present in one class (minority class) is exceptionally less compared to the number of samples found in the other class (majority class). Hence, if we directly fit these datasets to a standard classifier for training, then it often overlooks the minority class samples while estimating class separating hyperplane(s) and as a result of that it missclassifies the minority class samples. To solve this problem, over the years, many researchers have followed different approaches. However the selection of the true representative samples from the majority class is still considered as an open research problem. A better solution for this problem would be helpful in many applications like fraud detection, disease prediction and text classification. Also, the recent studies show that it needs not only analyzing disproportion between classes, but also other difficulties rooted in the nature of different data and thereby it needs more flexible, self-adaptable, computationally efficient and real-time method for selection of majority class samples without loosing much of important data from it. Keeping this fact in mind, we have proposed a hybrid model constituting Particle Swarm Optimization (PSO), a popular swarm intelligence-based meta-heuristic algorithm, and Ring Theory (RT)-based Evolutionary Algorithm (RTEA), a recently proposed physics-based meta-heuristic algorithm. We have named the algorithm as RT-based PSO or in short RTPSO. RTPSO can select the most representative samples from the majority class as it takes advantage of the efficient exploration and the exploitation phases of its parent algorithms for strengthening the search process. We have used AdaBoost classifier to observe the final classification results of our model. The effectiveness of our proposed method has been evaluated on 15 standard real-life datasets having low to extreme imbalance ratio. The performance of the RTPSO has been compared with PSO, RTEA and other standard undersampling methods. The obtained results demonstrate the superiority of RTPSO over state-of-the-art class imbalance problem-solvers considered here for comparison. The source code of this work is available in https://github.com/Sayansurya/RTPSO_Class_imbalance.
Collapse
|
13
|
Convolution-GRU Based on Independent Component Analysis for fMRI Analysis with Small and Imbalanced Samples. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10217465] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Functional magnetic resonance imaging (fMRI) is a commonly used method of brain research. However, due to the complexity and particularity of the fMRI task, it is difficult to find enough subjects, resulting in a small and, often, imbalanced dataset. A dataset with small samples causes overfitting of the learning model, and the imbalance will make the model insensitive to the minority class, which has been a problem in classification. It is of great significance to classify fMRI data with small and imbalanced samples. In the present study, we propose a 3-step method on a small and imbalanced fMRI dataset from a word-scene memory task. The steps of the method are as follows: (1) An independent component analysis is performed to reduce the dimension of data; (2) The synthetic minority oversampling technique is used to generate new samples of the minority class to balance data; (3) A convolution-Gated Recurrent Unit (GRU) network is used to classify the independent component signals, indicating whether the subjects are performing episodic memory tasks. The accuracy of the proposed method is 72.2%, which improves the classification performance compared with traditional classifiers such as support vector machines (SVM), logistic regression (LGR), linear discriminant analysis (LDA) and k-nearest neighbor (KNN), and this study gives a biomarker for evaluating the reactivation of episodic memory.
Collapse
|
14
|
Okawara H, Shinomiya K, Fujita M, Koda T, Nishioka A, Nonomura Y. Nonlinear friction dynamics in the cognitive process of food textures: Thickness of polysaccharide solution. J Texture Stud 2020; 51:779-788. [DOI: 10.1111/jtxs.12538] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/16/2020] [Accepted: 05/17/2020] [Indexed: 11/28/2022]
Affiliation(s)
- Hina Okawara
- Department of Biochemical Engineering, Graduate School of Science and Engineering Yamagata University Yonezawa Japan
| | - Koki Shinomiya
- Department of Biochemical Engineering, Graduate School of Science and Engineering Yamagata University Yonezawa Japan
| | - Minoru Fujita
- Department of Organic Materials Science, Graduate School of Organic Materials Science Yamagata University Yonezawa Japan
| | - Tomonori Koda
- Department of Organic Materials Science, Graduate School of Organic Materials Science Yamagata University Yonezawa Japan
| | - Akihiro Nishioka
- Department of Organic Materials Science, Graduate School of Organic Materials Science Yamagata University Yonezawa Japan
| | - Yoshimune Nonomura
- Department of Biochemical Engineering, Graduate School of Science and Engineering Yamagata University Yonezawa Japan
| |
Collapse
|
15
|
Yang K, Liu J, Tang W, Zhang H, Zhang R, Gu J, Zhu R, Xiong J, Ru X, Wu J. Identification of benign and malignant pulmonary nodules on chest CT using improved 3D U-Net deep learning framework. Eur J Radiol 2020; 129:109013. [PMID: 32505895 DOI: 10.1016/j.ejrad.2020.109013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 02/11/2020] [Accepted: 04/07/2020] [Indexed: 12/23/2022]
Abstract
PURPOSE To accurately distinguish benign from malignant pulmonary nodules with CT based on partial structures of 3D U-Net integrated with Capsule Networks (CapNets) and provide a reference for the early diagnosis of lung cancer. METHOD The dataset consisted of 1177 samples (benign/malignant: 414/763) from 997 patients provided by collaborating hospital. All nodules were biopsy or surgery proven, and pathologic results were regarded as the "golden standard". This study utilized partial U-Net to capture the low-level (edge, corner, etc.) information and CapNets to preserve high-level (semantic information) information of nodules. For CapNets, each capsule had a 4 × 4 matrix representing the pose and an activation probability representing the presence of an object. Furthermore, we chose accuracy (ACC), area under the curve (AUC), sensitivity (SE) and specificity (SP) to evaluate the generalization of the proposed architecture and compared its identification performance with 3D U-Net and experienced radiologists. RESULTS The AUC of our architecture (0.84) was superior to that (0.81) of the original 3D U-Net (p = 0.04, DeLong's test). Moreover, ACC (84.5 %) and SE (92.9 %) of our model were clearly higher than radiologists' ACC (81.0 %) and SE (84.3 %) at the optimal operating point. However, SP (70 %) of our model was slightly lower than radiologists' SP (75 %), which might be the result of class imbalance with limited benign samples involved for algorithm training. CONCLUSIONS Our architecture showed a high performance for identifying benign and malignant pulmonary nodules, indicating the improved model has a promising application in clinic.
Collapse
Affiliation(s)
- Kaiqiang Yang
- Department of Radiology, Zhongshan Hospital, Dalian University, Dalian, Liaoning Province, China; Infervision, Beijing, China
| | - Jinsha Liu
- Department of Radiology, Zhongshan Hospital, Dalian University, Dalian, Liaoning Province, China
| | | | | | | | - Jun Gu
- Infervision, Beijing, China
| | - Ruiping Zhu
- Department of Pathology, Zhongshan Hospital, Dalian University, Dalian, Liaoning Province, China
| | - Jingtong Xiong
- Second Hospital, Dalian Medical University, Dalian, Liaoning Province, China
| | | | - Jianlin Wu
- Department of Radiology, Zhongshan Hospital, Dalian University, Dalian, Liaoning Province, China.
| |
Collapse
|
16
|
Stroke Prediction with Machine Learning Methods among Older Chinese. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17061828. [PMID: 32178250 PMCID: PMC7142983 DOI: 10.3390/ijerph17061828] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 03/10/2020] [Accepted: 03/10/2020] [Indexed: 12/21/2022]
Abstract
Timely stroke diagnosis and intervention are necessary considering its high prevalence. Previous studies have mainly focused on stroke prediction with balanced data. Thus, this study aimed to develop machine learning models for predicting stroke with imbalanced data in an elderly population in China. Data were obtained from a prospective cohort that included 1131 participants (56 stroke patients and 1075 non-stroke participants) in 2012 and 2014, respectively. Data balancing techniques including random over-sampling (ROS), random under-sampling (RUS), and synthetic minority over-sampling technique (SMOTE) were used to process the imbalanced data in this study. Machine learning methods such as regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) were used to predict stroke with demographic, lifestyle, and clinical variables. Accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curves (AUCs) were used for performance comparison. The top five variables for stroke prediction were selected for each machine learning method based on the SMOTE-balanced data set. The total prevalence of stroke was high in 2014 (4.95%), with men experiencing much higher prevalence than women (6.76% vs. 3.25%). The three machine learning methods performed poorly in the imbalanced data set with extremely low sensitivity (approximately 0.00) and AUC (approximately 0.50). After using data balancing techniques, the sensitivity and AUC considerably improved with moderate accuracy and specificity, and the maximum values for sensitivity and AUC reached 0.78 (95% CI, 0.73–0.83) for RF and 0.72 (95% CI, 0.71–0.73) for RLR. Using AUCs for RLR, SVM, and RF in the imbalanced data set as references, a significant improvement was observed in the AUCs of all three machine learning methods (p < 0.05) in the balanced data sets. Considering RLR in each data set as a reference, only RF in the imbalanced data set and SVM in the ROS-balanced data set were superior to RLR in terms of AUC. Sex, hypertension, and uric acid were common predictors in all three machine learning methods. Blood glucose level was included in both RLR and RF. Drinking, age and high-sensitivity C-reactive protein level, and low-density lipoprotein cholesterol level were also included in RLR, SVM, and RF, respectively. Our study suggests that machine learning methods with data balancing techniques are effective tools for stroke prediction with imbalanced data.
Collapse
|