Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang J, Chen L. Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis. Comput Assist Surg (Abingdon) 2019;24:62-72. [PMID: 31403330 DOI: 10.1080/24699322.2019.1649074] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open

For:	Zhang J, Chen L. Clustering-based undersampling with random over sampling examples and support vector machine for imbalanced classification of breast cancer diagnosis. Comput Assist Surg (Abingdon) 2019;24:62-72. [PMID: 31403330 DOI: 10.1080/24699322.2019.1649074] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open

Number

Cited by Other Article(s)

Ketabi M, Andishgar A, Fereidouni Z, Sani MM, Abdollahi A, Vali M, Alkamel A, Tabrizi R. Predicting the risk of mortality and rehospitalization in heart failure patients: A retrospective cohort study by machine learning approach. Clin Cardiol 2024;47:e24239. [PMID: 38402566 PMCID: PMC10894620 DOI: 10.1002/clc.24239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 01/17/2024] [Accepted: 02/09/2024] [Indexed: 02/26/2024] Open

Lin LS, Kao CH, Li YJ, Chen HH, Chen HY. Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:17672-17701. [PMID: 38052532 DOI: 10.3934/mbe.2023786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]

Abstract

To handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest generating synthetic examples in an original data space rather than in a high-dimensional feature space. This may be ineffective in improving SVM classification for imbalanced datasets. To address this problem, we propose a novel hybrid sampling technique termed modified mega-trend-diffusion-extreme learning machine (MMTD-ELM) to effectively move the SVM decision boundary toward a region of the majority class. By this movement, the prediction of SVM for minority class examples can be improved. The proposed method combines α-cut fuzzy number method for screening representative examples of majority class and MMTD method for creating new examples of the minority class. Furthermore, we construct a bagging ELM model to monitor the similarity between new examples and original data. In this paper, four datasets are used to test the efficiency of the proposed MMTD-ELM method in imbalanced data prediction. Additionally, we deployed two SVM models to compare prediction performance of the proposed MMTD-ELM method with three state-of-the-art sampling techniques in terms of geometric mean (G-mean), F-measure (F1), index of balanced accuracy (IBA) and area under curve (AUC) metrics. Furthermore, paired t-test is used to elucidate whether the suggested method has statistically significant differences from the other sampling techniques in terms of the four evaluation metrics. The experimental results demonstrated that the proposed method achieves the best average values in terms of G-mean, F1, IBA and AUC. Overall, the suggested MMTD-ELM method outperforms these sampling methods for imbalanced datasets.

Collapse

Venugopal G, Khan ZH, Dash R, Tulsian V, Agrawal S, Rout S, Mahajan P, Ramadass B. Predictive association of gut microbiome and NLR in anemic low middle-income population of Odisha- a cross-sectional study. Front Nutr 2023;10:1200688. [PMID: 37528994 PMCID: PMC10390256 DOI: 10.3389/fnut.2023.1200688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 06/27/2023] [Indexed: 08/03/2023] Open

Abstract

Background

Iron is abundant on earth but not readily available for colonizing bacteria due to its low solubility in the human body. Hosts and microbiota compete fiercely for iron. <15% Supplemented Iron is absorbed in the small bowel, and the remaining iron is a source of dysbiosis. The gut microbiome signatures to the level of predicting anemia among low-middle-income populations are unknown. The present study was conducted to identify gut microbiome signatures that have predictive potential in association with Neutrophil to lymphocytes ratio (NLR) and Mean corpuscular volume (MCV) in anemia.

Methods

One hundred and four participants between 10 and 70 years were recruited from Odisha's Low Middle-Income (LMI) rural population. Hematological parameters such as Hemoglobin (HGB), NLR, and MCV were measured, and NLR was categorized using percentiles. The microbiome signatures were analyzed from 61 anemic and 43 non-anemic participants using 16 s rRNA sequencing, followed by the Bioinformatics analysis performed to identify the diversity, correlations, and indicator species. The Multi-Layered Perceptron Neural Network (MLPNN) model were applied to predict anemia.

Results

Significant microbiome diversity among anemic participants was observed between the lower, middle, and upper Quartile NLR groups. For anemic participants with NLR in the lower quartile, alpha indices indicated bacterial overgrowth, and consistently, we identified R. faecis and B. uniformis were predominating. Using ROC analysis, R. faecis had better distinction (AUC = 0.803) to predict anemia with lower NLR. In contrast, E. biforme and H. parainfluenzae were indicators of the NLR in the middle and upper quartile, respectively. While in Non-anemic participants with low MCV, the bacterial alteration was inversely related to gender. Furthermore, our Multi-Layered Perceptron Neural Network (MLPNN) models also provided 89% accuracy in predicting Anemic or Non-Anemic from the top 20 OTUs, HGB level, NLR, MCV, and indicator species.

Conclusion

These findings strongly associate anemic hematological parameters and microbiome. Such predictive association between the gut microbiome and NLR could be further evaluated and utilized to design precision nutrition models and to predict Iron supplementation and dietary intervention responses in both community and clinical settings.

Collapse

Ding Y, Zhang J, Zhuang W, Gao Z, Kuang K, Tian D, Deng C, Wu H, Chen R, Lu G, Chen G, Mendogni P, Migliore M, Kang MW, Kanzaki R, Tang Y, Yang J, Shi Q, Qiao G. Improving the efficiency of identifying malignant pulmonary nodules before surgery via a combination of artificial intelligence CT image recognition and serum autoantibodies. Eur Radiol 2023;33:3092-3102. [PMID: 36480027 DOI: 10.1007/s00330-022-09317-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 10/21/2022] [Accepted: 11/24/2022] [Indexed: 12/09/2022]

Abstract

OBJECTIVE

To construct a new pulmonary nodule diagnostic model with high diagnostic efficiency, non-invasive and simple to measure.

METHODS

This study included 424 patients with radioactive pulmonary nodules who underwent preoperative 7-autoantibody (7-AAB) panel testing, CT-based AI diagnosis, and pathological diagnosis by surgical resection. The patients were randomly divided into a training set (n = 212) and a validation set (n = 212). The nomogram was developed through forward stepwise logistic regression based on the predictive factors identified by univariate and multivariate analyses in the training set and was verified internally in the verification set.

RESULTS

A diagnostic nomogram was constructed based on the statistically significant variables of age as well as CT-based AI diagnostic, 7-AAB panel, and CEA test results. In the validation set, the sensitivity, specificity, positive predictive value, and AUC were 82.29%, 90.48%, 97.24%, and 0.899 (95%[CI], 0.851-0.936), respectively. The nomogram showed significantly higher sensitivity than the 7-AAB panel test result (82.29% vs. 35.88%, p < 0.001) and CEA (82.29% vs. 18.82%, p < 0.001); it also had a significantly higher specificity than AI diagnosis (90.48% vs. 69.04%, p = 0.022). For lesions with a diameter of ≤ 2 cm, the specificity of the Nomogram was higher than that of the AI diagnostic system (90.00% vs. 67.50%, p = 0.022).

CONCLUSIONS

Based on the combination of a 7-AAB panel, an AI diagnostic system, and other clinical features, our Nomogram demonstrated good diagnostic performance in distinguishing lung nodules, especially those with ≤ 2 cm diameters.

KEY POINTS

• A novel diagnostic model of lung nodules was constructed by combining high-specific tumor markers with a high-sensitivity artificial intelligence diagnostic system. • The diagnostic model has good diagnostic performance in distinguishing malignant and benign pulmonary nodules, especially for nodules smaller than 2 cm. • The diagnostic model can assist the clinical decision-making of pulmonary nodules, with the advantages of high diagnostic efficiency, noninvasive, and simple measurement.

Collapse

Affiliation(s)

Yu Ding Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China
Jingyu Zhang State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, No. 1, Medical College Road, Yuzhong District, Chongqing, 400016, China
Weitao Zhuang Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
Zhen Gao The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China
Kaiming Kuang Dianei Technology, Shanghai, China
Dan Tian Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
Cheng Deng Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
Hansheng Wu The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China Department of Thoracic Surgery, The First Affiliated Hospital of Shantou University Medical College, Shantou, China
Rixin Chen Research Center of Medical Sciences, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
Guojie Lu Department of Thoracic Surgery, Guangzhou Panyu Central Hospital, Guangzhou, China
Gang Chen Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
Paolo Mendogni Thoracic Surgery and Lung Transplant Unit, Foundation IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milan, Italy
Marcello Migliore Thoracic Surgery, Cardio-Thoracic Department, University Hospital of Wales, Cardiff, UK Minimally Invasive Surgery and New Technology, University Hospital of Catania, Department of Surgery and Medical Specialties, University of Catania, Catania, Italy
Min-Woong Kang Department of Thoracic and Cardiovascular Surgery, Chungnam National University School of Medicine, Daejeon, South Korea
Ryu Kanzaki Department of General Thoracic Surgery, Osaka University Graduate School of Medicine, Osaka, Japan
Yong Tang Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China
Jiancheng Yang Dianei Technology, Shanghai, China Computer Vision Laboratory (CVLab), Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Qiuling Shi State Key Laboratory of Ultrasound in Medicine and Engineering, College of Biomedical Engineering, Chongqing Medical University, No. 1, Medical College Road, Yuzhong District, Chongqing, 400016, China.
Guibin Qiao Department of Thoracic Surgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, No.106, Zhongshan 2nd Road, Guangzhou, 510080, China. The Second School of Clinical Medicine, Southern Medical University, Guangzhou, China.

Collapse

Krajnc D, Spielvogel CP, Grahovac M, Ecsedi B, Rasul S, Poetsch N, Traub-Weidinger T, Haug AR, Ritter Z, Alizadeh H, Hacker M, Beyer T, Papp L. Automated data preparation for in vivo tumor characterization with machine learning. Front Oncol 2022;12:1017911. [PMID: 36303841 PMCID: PMC9595446 DOI: 10.3389/fonc.2022.1017911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 09/23/2022] [Indexed: 11/23/2022] Open

Abstract

Background

This study proposes machine learning-driven data preparation (MLDP) for optimal data preparation (DP) prior to building prediction models for cancer cohorts.

Methods

A collection of well-established DP methods were incorporated for building the DP pipelines for various clinical cohorts prior to machine learning. Evolutionary algorithm principles combined with hyperparameter optimization were employed to iteratively select the best fitting subset of data preparation algorithms for the given dataset. The proposed method was validated for glioma and prostate single center cohorts by 100-fold Monte Carlo (MC) cross-validation scheme with 80-20% training-validation split ratio. In addition, a dual-center diffuse large B-cell lymphoma (DLBCL) cohort was utilized with Center 1 as training and Center 2 as independent validation datasets to predict cohort-specific clinical endpoints. Five machine learning (ML) classifiers were employed for building prediction models across all analyzed cohorts. Predictive performance was estimated by confusion matrix analytics over the validation sets of each cohort. The performance of each model with and without MLDP, as well as with manually-defined DP were compared in each of the four cohorts.

Results

Sixteen of twenty established predictive models demonstrated area under the receiver operator characteristics curve (AUC) performance increase utilizing the MLDP. The MLDP resulted in the highest performance increase for random forest (RF) (+0.16 AUC) and support vector machine (SVM) (+0.13 AUC) model schemes for predicting 36-months survival in the glioma cohort. Single center cohorts resulted in complex (6-7 DP steps) DP pipelines, with a high occurrence of outlier detection, feature selection and synthetic majority oversampling technique (SMOTE). In contrast, the optimal DP pipeline for the dual-center DLBCL cohort only included outlier detection and SMOTE DP steps.

Conclusions

This study demonstrates that data preparation prior to ML prediction model building in cancer cohorts shall be ML-driven itself, yielding optimal prediction models in both single and multi-centric settings.

Collapse

Affiliation(s)

Denis Krajnc QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
Clemens P. Spielvogel Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria
Marko Grahovac Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Boglarka Ecsedi QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
Sazan Rasul Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Nina Poetsch Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Tatjana Traub-Weidinger Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Alexander R. Haug Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna, Vienna, Austria
Zsombor Ritter Department of Medical Imaging, University of Pécs, Medical School, Pécs, Hungary
Hussain Alizadeh 1st Department of Internal Medicine, University of Pécs, Medical School, Pécs, Hungary
Marcus Hacker Department of Biomedical Imaging and Image-guided Therapy, Division of Nuclear Medicine, Medical University of Vienna, Vienna, Austria
Thomas Beyer QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria
Laszlo Papp QIMP Team, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria Applied Quantum Computing group, Center for Medical Physics and Biomedical Engineering, Medical University of Vienna, Vienna, Austria

Collapse

Bhatia A, Chug A, Singh AP, Singh D. A hybrid approach for noise reduction-based optimal classifier using genetic algorithm: A case study in plant disease prediction. INTELL DATA ANAL 2022. [DOI: 10.3233/ida-216011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Javeed A, Ali L, Mohammed Seid A, Ali A, Khan D, Imrana Y. A Clinical Decision Support System (CDSS) for Unbiased Prediction of Caesarean Section Based on Features Extraction and Optimized Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:1901735. [PMID: 35707186 PMCID: PMC9192258 DOI: 10.1155/2022/1901735] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Accepted: 04/16/2022] [Indexed: 12/14/2022]

Huang X, Cao T, Chen L, Li J, Tan Z, Xu B, Xu R, Song Y, Zhou Z, Wang Z, Wei Y, Zhang Y, Li J, Huo Y, Qin X, Wu Y, Wang X, Wang H, Cheng X, Xu X, Liu L. Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults. Front Cardiovasc Med 2022;9:901240. [PMID: 35600480 PMCID: PMC9120532 DOI: 10.3389/fcvm.2022.901240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 04/05/2022] [Indexed: 11/13/2022] Open

Abstract

Background

Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis.

Methods

The training set included 70% of data (n = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data (n = 6,211), and external validation was conducted using a nested case–control (NCC) dataset (n = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set.

Results

The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance.

Conclusion

Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models.

Collapse

Affiliation(s)

Xiao Huang Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China *Correspondence: Xiao Huang
Tianyu Cao Biological Anthropology, University of California, Santa Barbara, Santa Barbara, CA, United States
Liangziqian Chen Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
Junpei Li Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
Ziheng Tan Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
Benjamin Xu Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States
Richard Xu Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
Yun Song Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China Institute of Biomedicine, Anhui Medical University, Hefei, China
Ziyi Zhou Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
Zhuo Wang Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
Yaping Wei Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
Yan Zhang Department of Cardiology, Peking University First Hospital, Beijing, China
Jianping Li Department of Cardiology, Peking University First Hospital, Beijing, China
Yong Huo Department of Cardiology, Peking University First Hospital, Beijing, China
Xianhui Qin National Clinical Research Study Center for Kidney Disease, The State Key Laboratory for Organ Failure Research, Renal Division, Nanfang Hospital, Southern Medical University, Guangzhou, China
Yanqing Wu Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
Xiaobin Wang Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, United States
Hong Wang Department of Cardiovascular Science, Temple University Lewis Katz School of Medicine, Philadelphia, PA, United States
Xiaoshu Cheng Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
Xiping Xu Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
Lishun Liu Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China Lishun Liu

Collapse

Wang J, Wang S, Zhu MX, Yang T, Yin Q, Hou Y. Risk Prediction of Major Adverse Cardiovascular Events Occurrence Within 6 Months After Coronary Revascularization: Machine Learning Study. JMIR Med Inform 2022;10:e33395. [PMID: 35442202 PMCID: PMC9069286 DOI: 10.2196/33395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 02/25/2022] [Accepted: 03/02/2022] [Indexed: 12/13/2022] Open

Abstract

Background

As a major health hazard, the incidence of coronary heart disease has been increasing year by year. Although coronary revascularization, mainly percutaneous coronary intervention, has played an important role in the treatment of coronary heart disease, major adverse cardiovascular events (MACE) such as recurrent or persistent angina pectoris after coronary revascularization remain a very difficult problem in clinical practice.

Objective

Given the high probability of MACE after coronary revascularization, the aim of this study was to develop and validate a predictive model for MACE occurrence within 6 months based on machine learning algorithms.

Methods

A retrospective study was performed including 1004 patients who had undergone coronary revascularization at The People’s Hospital of Liaoning Province and Affiliated Hospital of Liaoning University of Traditional Chinese Medicine from June 2019 to December 2020. According to the characteristics of available data, an oversampling strategy was adopted for initial preprocessing. We then employed six machine learning algorithms, including decision tree, random forest, logistic regression, naïve Bayes, support vector machine, and extreme gradient boosting (XGBoost), to develop prediction models for MACE depending on clinical information and 6-month follow-up information. Among all samples, 70% were randomly selected for training and the remaining 30% were used for model validation. Model performance was assessed based on accuracy, precision, recall, F1-score, confusion matrix, area under the receiver operating characteristic (ROC) curve (AUC), and visualization of the ROC curve.

Results

Univariate analysis showed that 21 patient characteristic variables were statistically significant (P<.05) between the groups without and with MACE. Coupled with these significant factors, among the six machine learning algorithms, XGBoost stood out with an accuracy of 0.7788, precision of 0.8058, recall of 0.7345, F1-score of 0.7685, and AUC of 0.8599. Further exploration of the models to identify factors affecting the occurrence of MACE revealed that use of anticoagulant drugs and course of the disease consistently ranked in the top two predictive factors in three developed models.

Conclusions

The machine learning risk models constructed in this study can achieve acceptable performance of MACE prediction, with XGBoost performing the best, providing a valuable reference for pointed intervention and clinical decision-making in MACE prevention.

Collapse

Qian X, Zhou Z, Hu J, Zhu J, Huang H, Dai Y. A comparative study of kernel-based vector machines with probabilistic outputs for medical diagnosis. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Xiong Y, Ye M, Wu C. Cancer Classification with a Cost-Sensitive Naive Bayes Stacking Ensemble. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021;2021:5556992. [PMID: 33986823 PMCID: PMC8093037 DOI: 10.1155/2021/5556992] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 03/17/2021] [Accepted: 04/15/2021] [Indexed: 02/07/2023]

Shaw SS, Ahmed S, Malakar S, Garcia-Hernandez L, Abraham A, Sarkar R. Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00314-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Abstract AbstractMany real-life datasets are imbalanced in nature, which implies that the number of samples present in one class (minority class) is exceptionally less compared to the number of samples found in the other class (majority class). Hence, if we directly fit these datasets to a standard classifier for training, then it often overlooks the minority class samples while estimating class separating hyperplane(s) and as a result of that it missclassifies the minority class samples. To solve this problem, over the years, many researchers have followed different approaches. However the selection of the true representative samples from the majority class is still considered as an open research problem. A better solution for this problem would be helpful in many applications like fraud detection, disease prediction and text classification. Also, the recent studies show that it needs not only analyzing disproportion between classes, but also other difficulties rooted in the nature of different data and thereby it needs more flexible, self-adaptable, computationally efficient and real-time method for selection of majority class samples without loosing much of important data from it. Keeping this fact in mind, we have proposed a hybrid model constituting Particle Swarm Optimization (PSO), a popular swarm intelligence-based meta-heuristic algorithm, and Ring Theory (RT)-based Evolutionary Algorithm (RTEA), a recently proposed physics-based meta-heuristic algorithm. We have named the algorithm as RT-based PSO or in short RTPSO. RTPSO can select the most representative samples from the majority class as it takes advantage of the efficient exploration and the exploitation phases of its parent algorithms for strengthening the search process. We have used AdaBoost classifier to observe the final classification results of our model. The effectiveness of our proposed method has been evaluated on 15 standard real-life datasets having low to extreme imbalance ratio. The performance of the RTPSO has been compared with PSO, RTEA and other standard undersampling methods. The obtained results demonstrate the superiority of RTPSO over state-of-the-art class imbalance problem-solvers considered here for comparison. The source code of this work is available in https://github.com/Sayansurya/RTPSO_Class_imbalance. Collapse

Convolution-GRU Based on Independent Component Analysis for fMRI Analysis with Small and Imbalanced Samples. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10217465] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Okawara H, Shinomiya K, Fujita M, Koda T, Nishioka A, Nonomura Y. Nonlinear friction dynamics in the cognitive process of food textures: Thickness of polysaccharide solution. J Texture Stud 2020;51:779-788. [DOI: 10.1111/jtxs.12538] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/16/2020] [Accepted: 05/17/2020] [Indexed: 11/28/2022]

Yang K, Liu J, Tang W, Zhang H, Zhang R, Gu J, Zhu R, Xiong J, Ru X, Wu J. Identification of benign and malignant pulmonary nodules on chest CT using improved 3D U-Net deep learning framework. Eur J Radiol 2020;129:109013. [PMID: 32505895 DOI: 10.1016/j.ejrad.2020.109013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2019] [Revised: 02/11/2020] [Accepted: 04/07/2020] [Indexed: 12/23/2022]

Stroke Prediction with Machine Learning Methods among Older Chinese. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020;17:ijerph17061828. [PMID: 32178250 PMCID: PMC7142983 DOI: 10.3390/ijerph17061828] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Revised: 03/10/2020] [Accepted: 03/10/2020] [Indexed: 12/21/2022]

Abstract

Timely stroke diagnosis and intervention are necessary considering its high prevalence. Previous studies have mainly focused on stroke prediction with balanced data. Thus, this study aimed to develop machine learning models for predicting stroke with imbalanced data in an elderly population in China. Data were obtained from a prospective cohort that included 1131 participants (56 stroke patients and 1075 non-stroke participants) in 2012 and 2014, respectively. Data balancing techniques including random over-sampling (ROS), random under-sampling (RUS), and synthetic minority over-sampling technique (SMOTE) were used to process the imbalanced data in this study. Machine learning methods such as regularized logistic regression (RLR), support vector machine (SVM), and random forest (RF) were used to predict stroke with demographic, lifestyle, and clinical variables. Accuracy, sensitivity, specificity, and areas under the receiver operating characteristic curves (AUCs) were used for performance comparison. The top five variables for stroke prediction were selected for each machine learning method based on the SMOTE-balanced data set. The total prevalence of stroke was high in 2014 (4.95%), with men experiencing much higher prevalence than women (6.76% vs. 3.25%). The three machine learning methods performed poorly in the imbalanced data set with extremely low sensitivity (approximately 0.00) and AUC (approximately 0.50). After using data balancing techniques, the sensitivity and AUC considerably improved with moderate accuracy and specificity, and the maximum values for sensitivity and AUC reached 0.78 (95% CI, 0.73–0.83) for RF and 0.72 (95% CI, 0.71–0.73) for RLR. Using AUCs for RLR, SVM, and RF in the imbalanced data set as references, a significant improvement was observed in the AUCs of all three machine learning methods (p < 0.05) in the balanced data sets. Considering RLR in each data set as a reference, only RF in the imbalanced data set and SVM in the ROS-balanced data set were superior to RLR in terms of AUC. Sex, hypertension, and uric acid were common predictors in all three machine learning methods. Blood glucose level was included in both RLR and RF. Drinking, age and high-sensitivity C-reactive protein level, and low-density lipoprotein cholesterol level were also included in RLR, SVM, and RF, respectively. Our study suggests that machine learning methods with data balancing techniques are effective tools for stroke prediction with imbalanced data.

Collapse