Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kovács G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput 2019;83:105662. [DOI: 10.1016/j.asoc.2019.105662] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

For:	Kovács G. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Appl Soft Comput 2019;83:105662. [DOI: 10.1016/j.asoc.2019.105662] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

Prinzi F, Orlando A, Gaglio S, Vitabile S. Interpretable Radiomic Signature for Breast Microcalcification Detection and Classification. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024;37:1038-1053. [PMID: 38351223 PMCID: PMC11169144 DOI: 10.1007/s10278-024-01012-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 11/20/2023] [Accepted: 12/05/2023] [Indexed: 06/13/2024]

Demircioğlu A. Applying oversampling before cross-validation will lead to high bias in radiomics. Sci Rep 2024;14:11563. [PMID: 38773233 PMCID: PMC11109211 DOI: 10.1038/s41598-024-62585-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 05/20/2024] [Indexed: 05/23/2024] Open

Abstract

Class imbalance is often unavoidable for radiomic data collected from clinical routine. It can create problems during classifier training since the majority class could dominate the minority class. Consequently, resampling methods like oversampling or undersampling are applied to the data to class-balance the data. However, the resampling must not be applied upfront to all data because it would lead to data leakage and, therefore, to erroneous results. This study aims to measure the extent of this bias. Five-fold cross-validation with 30 repeats was performed using a set of 15 radiomic datasets to train predictive models. The training involved two scenarios: first, the models were trained correctly by applying the resampling methods during the cross-validation. Second, the models were trained incorrectly by performing the resampling on all the data before cross-validation. The bias was defined empirically as the difference between the best-performing models in both scenarios in terms of area under the receiver operating characteristic curve (AUC), sensitivity, specificity, balanced accuracy, and the Brier score. In addition, a simulation study was performed on a randomly generated dataset for verification. The results demonstrated that incorrectly applying the oversampling methods to all data resulted in a large positive bias (up to 0.34 in AUC, 0.33 in sensitivity, 0.31 in specificity, and 0.37 in balanced accuracy). The bias depended on the data balance, and approximately an increase of 0.10 in the AUC was observed for each increase in imbalance. The models also showed a bias in calibration measured using the Brier score, which differed by up to -0.18 between the correctly and incorrectly trained models. The undersampling methods were not affected significantly by bias. These results emphasize that any resampling method should be applied correctly only to the training data to avoid data leakage and, subsequently, biased model performance and calibration.

Collapse

Suttie M, Kable J, Mahnke AH, Bandoli G. Machine learning approaches to the identification of children affected by prenatal alcohol exposure: A narrative review. ALCOHOL, CLINICAL & EXPERIMENTAL RESEARCH 2024;48:585-595. [PMID: 38302824 PMCID: PMC11015982 DOI: 10.1111/acer.15271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/05/2023] [Accepted: 01/14/2024] [Indexed: 02/03/2024]

Louro PL, Redinho H, Malheiro R, Paiva RP, Panda R. A Comparison Study of Deep Learning Methodologies for Music Emotion Recognition. SENSORS (BASEL, SWITZERLAND) 2024;24:2201. [PMID: 38610412 PMCID: PMC11014202 DOI: 10.3390/s24072201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 03/20/2024] [Accepted: 03/26/2024] [Indexed: 04/14/2024]

Arizmendi CJ, Bernacki ML, Raković M, Plumley RD, Urban CJ, Panter AT, Greene JA, Gates KM. Predicting student outcomes using digital logs of learning behaviors: Review, current standards, and suggestions for future work. Behav Res Methods 2023;55:3026-3054. [PMID: 36018483 PMCID: PMC10556130 DOI: 10.3758/s13428-022-01939-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2022] [Indexed: 11/08/2022]

Nouri Z, Choi SW, Choi IJ, Ryu KW, Woo SM, Park SJ, Lee WJ, Choi W, Jung YS, Myung SK, Lee JH, Park JY, Praveen Z, Woo YJ, Park JH, Kim MK. Exploring Connections between Oral Microbiota, Short-Chain Fatty Acids, and Specific Cancer Types: A Study of Oral Cancer, Head and Neck Cancer, Pancreatic Cancer, and Gastric Cancer. Cancers (Basel) 2023;15:cancers15112898. [PMID: 37296861 DOI: 10.3390/cancers15112898] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/25/2023] [Accepted: 05/22/2023] [Indexed: 06/12/2023] Open

Abstract

The association between oral microbiota and cancer development has been a topic of intense research in recent years, with compelling evidence suggesting that the oral microbiome may play a significant role in cancer initiation and progression. However, the causal connections between the two remain a subject of debate, and the underlying mechanisms are not fully understood. In this case-control study, we aimed to identify common oral microbiota associated with several cancer types and investigate the potential mechanisms that may trigger immune responses and initiate cancer upon cytokine secretion. Saliva and blood samples were collected from 309 adult cancer patients and 745 healthy controls to analyze the oral microbiome and the mechanisms involved in cancer initiation. Machine learning techniques revealed that six bacterial genera were associated with cancer. The abundance of Leuconostoc, Streptococcus, Abiotrophia, and Prevotella was reduced in the cancer group, while abundance of Haemophilus and Neisseria enhanced. G protein-coupled receptor kinase, H+-transporting ATPase, and futalosine hydrolase were found significantly enriched in the cancer group. Total short-chain fatty acid (SCFAs) concentrations and free fatty acid receptor 2 (FFAR2) expression levels were greater in the control group when compared with the cancer group, while serum tumor necrosis factor alpha induced protein 8 (TNFAIP8), interleukin-6 (IL6), and signal transducer and activator of transcription 3 (STAT3) levels were higher in the cancer group when compared with the control group. These results suggested that the alterations in the composition of oral microbiota can contribute to a reduction in SCFAs and FFAR2 expression that may initiate an inflammatory response through the upregulation of TNFAIP8 and the IL-6/STAT3 pathway, which could ultimately increase the risk of cancer onset.

Collapse

Affiliation(s)

Zahra Nouri Cancer Epidemiology Branch, Division of Cancer Epidemiology and Prevention, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Sung Weon Choi Oral Oncology Clinic, Research Institute and Hospital, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Il Ju Choi Center for Gastric Cancer, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Keun Won Ryu Center for Gastric Cancer, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Sang Myung Woo Center for Liver and Pancreatobiliary Cancer, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Sang-Jae Park Center for Liver and Pancreatobiliary Cancer, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Woo Jin Lee Center for Liver and Pancreatobiliary Cancer, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Wonyoung Choi Center for Rare Cancers, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Yuh-Seog Jung Department of Otorhinolaryngology, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Seung-Kwon Myung Department of Cancer AI & Digital Health, National Cancer Center Graduate School of Cancer Science and Policy, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Jong-Ho Lee Oral Oncology Clinic, Research Institute and Hospital, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Joo-Yong Park Oral Oncology Clinic, Research Institute and Hospital, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Zeba Praveen Cancer Epidemiology Branch, Division of Cancer Epidemiology and Prevention, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Yun Jung Woo Cancer Epidemiology Branch, Division of Cancer Epidemiology and Prevention, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Jin Hee Park Cancer Epidemiology Branch, Division of Cancer Epidemiology and Prevention, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea
Mi Kyung Kim Cancer Epidemiology Branch, Division of Cancer Epidemiology and Prevention, National Cancer Center, 323 Ilsandong-gu, Goyang-si 10408, Gyeonggi-do, Republic of Korea

Collapse

Hassanzadeh R, Farhadian M, Rafieemehr H. Hospital mortality prediction in traumatic injuries patients: comparing different SMOTE-based machine learning algorithms. BMC Med Res Methodol 2023;23:101. [PMID: 37087425 PMCID: PMC10122327 DOI: 10.1186/s12874-023-01920-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 04/13/2023] [Indexed: 04/24/2023] Open

Abstract

BACKGROUND

Trauma is one of the most critical public health issues worldwide, leading to death and disability and influencing all age groups. Therefore, there is great interest in models for predicting mortality in trauma patients admitted to the ICU. The main objective of the present study is to develop and evaluate SMOTE-based machine-learning tools for predicting hospital mortality in trauma patients with imbalanced data.

METHODS

This retrospective cohort study was conducted on 126 trauma patients admitted to an intensive care unit at Besat hospital in Hamadan Province, western Iran, from March 2020 to March 2021. Data were extracted from the medical information records of patients. According to the imbalanced property of the data, SMOTE techniques, namely SMOTE, Borderline-SMOTE1, Borderline-SMOTE2, SMOTE-NC, and SVM-SMOTE, were used for primary preprocessing. Then, the Decision Tree (DT), Random Forest (RF), Naive Bayes (NB), Artificial Neural Network (ANN), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost) methods were used to predict patients' hospital mortality with traumatic injuries. The performance of the methods used was evaluated by sensitivity, specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), accuracy, Area Under the Curve (AUC), Geometric Mean (G-means), F1 score, and P-value of McNemar's test.

RESULTS

Of the 126 patients admitted to an ICU, 117 (92.9%) survived and 9 (7.1%) died. The mean follow-up time from the date of trauma to the date of outcome was 3.98 ± 4.65 days. The performance of ML algorithms is not good with imbalanced data, whereas the performance of SMOTE-based ML algorithms is significantly improved. The mean area under the ROC curve (AUC) of all SMOTE-based models was more than 91%. F1-score and G-means before balancing the dataset were below 70% for all ML models except ANN. In contrast, F1-score and G-means for the balanced datasets reached more than 90% for all SMOTE-based models. Among all SMOTE-based ML methods, RF and ANN based on SMOTE and XGBoost based on SMOTE-NC achieved the highest value for all evaluation criteria.

CONCLUSIONS

This study has shown that SMOTE-based ML algorithms better predict outcomes in traumatic injuries than ML algorithms. They have the potential to assist ICU physicians in making clinical decisions.

Collapse

Dials J, Demirel D, Sanchez-Arias R, Halic T, Kruger U, De S, Gromski MA. Skill-level classification and performance evaluation for endoscopic sleeve gastroplasty. Surg Endosc 2023:10.1007/s00464-023-09955-2. [PMID: 36897405 PMCID: PMC10000349 DOI: 10.1007/s00464-023-09955-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Accepted: 02/12/2023] [Indexed: 03/11/2023]

Abstract

BACKGROUND

We previously developed grading metrics for quantitative performance measurement for simulated endoscopic sleeve gastroplasty (ESG) to create a scalar reference to classify subjects into experts and novices. In this work, we used synthetic data generation and expanded our skill level analysis using machine learning techniques.

METHODS

We used the synthetic data generation algorithm SMOTE to expand and balance our dataset of seven actual simulated ESG procedures using synthetic data. We performed optimization to seek optimum metrics to classify experts and novices by identifying the most critical and distinctive sub-tasks. We used support vector machine (SVM), AdaBoost, K-nearest neighbors (KNN) Kernel Fisher discriminant analysis (KFDA), random forest, and decision tree classifiers to classify surgeons as experts or novices after grading. Furthermore, we used an optimization model to create weights for each task and separate the clusters by maximizing the distance between the expert and novice scores.

RESULTS

We split our dataset into a training set of 15 samples and a testing dataset of five samples. We put this dataset through six classifiers, SVM, KFDA, AdaBoost, KNN, random forest, and decision tree, resulting in 0.94, 0.94, 1.00, 1.00, 1.00, and 1.00 accuracy, respectively, for training and 1.00 accuracy for the testing results for SVM and AdaBoost. Our optimization model maximized the distance between the expert and novice groups from 2 to 53.72.

CONCLUSION

This paper shows that feature reduction, in combination with classification algorithms such as SVM and KNN, can be used in tandem to classify endoscopists as experts or novices based on their results recorded using our grading metrics. Furthermore, this work introduces a non-linear constraint optimization to separate the two clusters and find the most important tasks using weights.

Collapse

Li D, Zheng C, Zhao J, Liu Y. Diagnosis of heart failure from imbalance datasets using multi-level classification. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Szeghalmy S, Fazekas A. A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. SENSORS (BASEL, SWITZERLAND) 2023;23:2333. [PMID: 36850931 PMCID: PMC9967638 DOI: 10.3390/s23042333] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 02/06/2023] [Accepted: 02/15/2023] [Indexed: 06/18/2023]

A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. INFORMATION 2023. [DOI: 10.3390/info14010054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open

Instance hardness and multivariate Gaussian distribution-based oversampling technique for imbalance classification. Pattern Anal Appl 2023. [DOI: 10.1007/s10044-022-01129-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]

Sowjanya AM, Mrudula O. Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms. APPLIED NANOSCIENCE 2023;13:1829-1840. [PMID: 35132368 PMCID: PMC8811587 DOI: 10.1007/s13204-021-02063-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 08/28/2021] [Indexed: 12/03/2022]

Ghaderi Zefrehi H, Altınçay H. MaMiPot: a paradigm shift for the classification of imbalanced data. J Intell Inf Syst 2022. [DOI: 10.1007/s10844-022-00763-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

An imbalanced binary classification method via space mapping using normalizing flows with class discrepancy constraints. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.12.029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Class-imbalanced positive instances augmentation via three-line hybrid. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Hashimoto-Roth E, Surendra A, Lavallée-Adam M, Bennett SAL, Čuperlović-Culf M. METAbolomics data Balancing with Over-sampling Algorithms (META-BOA): an online resource for addressing class imbalance. Bioinformatics 2022;38:5326-5327. [PMID: 36222566 DOI: 10.1093/bioinformatics/btac649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 09/04/2022] [Indexed: 12/24/2022] Open

Perturbation-based oversampling technique for imbalanced classification problems. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01662-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Distance-based arranging oversampling technique for imbalanced data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07828-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]

Kumar A. A new fitness function in genetic programming for classification of imbalanced data. J EXP THEOR ARTIF IN 2022. [DOI: 10.1080/0952813x.2022.2120087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]

Li M, Zhou H, Liu Q, Wang G. SW: A weighted space division framework for imbalanced problems with label noise. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

Wang G, Yin Z, Zhao M, Tian Y, Sun Z. Identification of human mental workload levels in a language comprehension task with imbalance neurophysiological data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022;224:107011. [PMID: 35863122 DOI: 10.1016/j.cmpb.2022.107011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Revised: 05/23/2022] [Accepted: 07/06/2022] [Indexed: 06/15/2023]

Kumar V, Lalotra GS, Kumar RK. Improving performance of classifiers for diagnosis of critical diseases to prevent COVID risk. COMPUTERS & ELECTRICAL ENGINEERING : AN INTERNATIONAL JOURNAL 2022;102:108236. [PMID: 35915590 PMCID: PMC9329734 DOI: 10.1016/j.compeleceng.2022.108236] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 07/03/2022] [Accepted: 07/13/2022] [Indexed: 06/15/2023]

Zhang A, Yu H, Zhou S, Huan Z, Yang X. Instance weighted SMOTE by indirectly exploring the data distribution. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108919] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Teji JS, Jain S, Gupta SK, Suri JS. NeoAI 1.0: Machine learning-based paradigm for prediction of neonatal and infant risk of death. Comput Biol Med 2022;147:105639. [DOI: 10.1016/j.compbiomed.2022.105639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 05/01/2022] [Accepted: 05/01/2022] [Indexed: 11/29/2022]

Kumar V, Lalotra GS, Sasikala P, Rajput DS, Kaluri R, Lakshmanna K, Shorfuzzaman M, Alsufyani A, Uddin M. Addressing Binary Classification over Class Imbalanced Clinical Datasets Using Computationally Intelligent Techniques. Healthcare (Basel) 2022;10:healthcare10071293. [PMID: 35885819 PMCID: PMC9322725 DOI: 10.3390/healthcare10071293] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/03/2022] [Accepted: 07/07/2022] [Indexed: 11/16/2022] Open

Ortiz-Toro C, García-Pedrero A, Lillo-Saavedra M, Gonzalo-Martín C. Automatic detection of pneumonia in chest X-ray images using textural features. Comput Biol Med 2022;145:105466. [PMID: 35585732 PMCID: PMC8966154 DOI: 10.1016/j.compbiomed.2022.105466] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 03/25/2022] [Accepted: 03/26/2022] [Indexed: 12/16/2022]

Clustering-based adaptive data augmentation for class-imbalance in machine learning (CADA): additive manufacturing use case. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07347-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

A Highly Adaptive Oversampling Approach to Address the Issue of Data Imbalance. COMPUTERS 2022. [DOI: 10.3390/computers11050073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]

Ren J, Wang Y, Mao M, Cheung YM. Equalization ensemble for large scale highly imbalanced data classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108295] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Santos MS, Abreu PH, Japkowicz N, Fernández A, Soares C, Wilk S, Santos J. On the joint-effect of class imbalance and overlap: a critical review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10150-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

An imbalanced learning method by combining SMOTE with Center Offset Factor. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.108618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Handling imbalanced datasets through Optimum-Path Forest. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108445] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets. ELECTRONICS 2022. [DOI: 10.3390/electronics11020228] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Serra A, Cattelani L, Fratello M, Fortino V, Kinaret PAS, Greco D. Supervised Methods for Biomarker Detection from Microarray Experiments. Methods Mol Biol 2022;2401:101-120. [PMID: 34902125 DOI: 10.1007/978-1-0716-1839-4_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Tedesco S, Andrulli M, Larsson MÅ, Kelly D, Alamäki A, Timmons S, Barton J, Condell J, O’Flynn B, Nordström A. Comparison of Machine Learning Techniques for Mortality Prediction in a Prospective Cohort of Older Adults. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021;18:12806. [PMID: 34886532 PMCID: PMC8657506 DOI: 10.3390/ijerph182312806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 12/16/2022]

Abstract

As global demographics change, ageing is a global phenomenon which is increasingly of interest in our modern and rapidly changing society. Thus, the application of proper prognostic indices in clinical decisions regarding mortality prediction has assumed a significant importance for personalized risk management (i.e., identifying patients who are at high or low risk of death) and to help ensure effective healthcare services to patients. Consequently, prognostic modelling expressed as all-cause mortality prediction is an important step for effective patient management. Machine learning has the potential to transform prognostic modelling. In this paper, results on the development of machine learning models for all-cause mortality prediction in a cohort of healthy older adults are reported. The models are based on features covering anthropometric variables, physical and lab examinations, questionnaires, and lifestyles, as well as wearable data collected in free-living settings, obtained for the "Healthy Ageing Initiative" study conducted on 2291 recruited participants. Several machine learning techniques including feature engineering, feature selection, data augmentation and resampling were investigated for this purpose. A detailed empirical comparison of the impact of the different techniques is presented and discussed. The achieved performances were also compared with a standard epidemiological model. This investigation showed that, for the dataset under consideration, the best results were achieved with Random UnderSampling in conjunction with Random Forest (either with or without probability calibration). However, while including probability calibration slightly reduced the average performance, it increased the model robustness, as indicated by the lower 95% confidence intervals. The analysis showed that machine learning models could provide comparable results to standard epidemiological models while being completely data-driven and disease-agnostic, thus demonstrating the opportunity for building machine learning models on health records data for research and clinical practice. However, further testing is required to significantly improve the model performance and its robustness.

Collapse

Importance-SMOTE: a synthetic minority oversampling method for noisy imbalanced data. Soft comput 2021. [DOI: 10.1007/s00500-021-06532-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Dudjak M, Martinović G. An empirical study of data intrinsic characteristics that make learning from imbalanced data difficult. EXPERT SYSTEMS WITH APPLICATIONS 2021;182:115297. [DOI: 10.1016/j.eswa.2021.115297] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]

Zoumpekas T, Puig A, Salamó M, Garcı́a‐Sellés D, Blanco Nuñez L, Guinau M. An intelligent framework for end‐to‐end rockfall detection. INT J INTELL SYST 2021. [DOI: 10.1002/int.22557] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Li Y, Li M, Yuan J, Lu J, Abdel-Aty M. Analysis and prediction of intersection traffic violations using automated enforcement system data. ACCIDENT; ANALYSIS AND PREVENTION 2021;162:106422. [PMID: 34607246 DOI: 10.1016/j.aap.2021.106422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 09/01/2021] [Accepted: 09/22/2021] [Indexed: 06/13/2023]

SP-SMOTE: A novel space partitioning based synthetic minority oversampling technique. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.107269] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Liu J, Wong ZSY, So HY, Tsui KL. Evaluating resampling methods and structured features to improve fall incident report identification by the severity level. J Am Med Inform Assoc 2021;28:1756-1764. [PMID: 34010385 DOI: 10.1093/jamia/ocab048] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 02/24/2021] [Accepted: 04/27/2021] [Indexed: 11/14/2022] Open

Abstract

OBJECTIVE

This study aims to improve the classification of the fall incident severity level by considering data imbalance issues and structured features through machine learning.

MATERIALS AND METHODS

We present an incident report classification (IRC) framework to classify the in-hospital fall incident severity level by addressing the imbalanced class problem and incorporating structured attributes. After text preprocessing, bag-of-words features, structured text features, and structured clinical features were extracted from the reports. Next, resampling techniques were incorporated into the training process. Machine learning algorithms were used to build classification models. IRC systems were trained, validated, and tested using a repeated and randomly stratified shuffle-split cross-validation method. Finally, we evaluated the system performance using the F1-measure, precision, and recall over 15 stratified test sets.

RESULTS

The experimental results demonstrated that the classification system setting considering both data imbalance issues and structured features outperformed the other system settings (with a mean macro-averaged F1-measure of 0.733). Considering the structured features and resampling techniques, this classification system setting significantly improved the mean F1-measure for the rare class by 30.88% (P value < .001) and the mean macro-averaged F1-measure by 8.26% from the baseline system setting (P value < .001). In general, the classification system employing the random forest algorithm and random oversampling method outperformed the others.

CONCLUSIONS

Structured features provide essential information for categorizing the fall incident severity level. Resampling methods help rebalance the class distribution of the original incident report data, which improves the performance of machine learning models. The IRC framework presented in this study effectively automates the identification of fall incident reports by the severity level.

Collapse

Soui M, Mansouri N, Alhamad R, Kessentini M, Ghedira K. NSGA-II as feature selection technique and AdaBoost classifier for COVID-19 prediction using patient's symptoms. NONLINEAR DYNAMICS 2021;106:1453-1475. [PMID: 34025034 PMCID: PMC8129611 DOI: 10.1007/s11071-021-06504-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Accepted: 04/28/2021] [Indexed: 05/20/2023]

Abstract

Nowadays, humanity is facing one of the most dangerous pandemics known as COVID-19. Due to its high inter-person contagiousness, COVID-19 is rapidly spreading across the world. Positive patients are often suffering from different symptoms that can vary from mild to severe including cough, fever, sore throat, and body aches. In more dire cases, infected patients can experience severe symptoms that can cause breathing difficulties which lead to stern organ failure and die. The medical corps all over the world are overloaded because of the exponentially myriad number of contagions. Therefore, screening for the disease becomes overwrought with the limited tools of test. Additionally, test results may take a long time to acquire, leaving behind a higher potential for the prevalence of the virus among other individuals by the patients. To reduce the chances of infection, we suggest a prediction model that distinguishes the infected COVID-19 cases based on clinical symptoms and features. This model can be helpful for citizens to catch their infection without the need for visiting the hospital. Also, it helps the medical staff in triaging patients in case of a deficiency of medical amenities. In this paper, we use the non-dominated sorting genetic algorithm (NSGA-II) to select the interesting features by finding the best trade-offs between two conflicting objectives: minimizing the number of features and maximizing the weights of selected features. Then, a classification phase is conducted using an AdaBoost classifier. The proposed model is evaluated using two different datasets. To maximize results, we performed a natural selection of hyper-parameters of the classifier using the genetic algorithm. The obtained results prove the efficiency of NSGA-II as a feature selection algorithm combined with AdaBoost classifier. It exhibits higher classification results that outperformed the existing methods.

Collapse

Anyaso-Samuel S, Sachdeva A, Guha S, Datta S. Metagenomic Geolocation Prediction Using an Adaptive Ensemble Classifier. Front Genet 2021;12:642282. [PMID: 33959149 PMCID: PMC8093763 DOI: 10.3389/fgene.2021.642282] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 03/18/2021] [Indexed: 11/13/2022] Open

Imbalanced data classification based on diverse sample generation and classifier fusion. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01321-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

A New Oversampling Method Based on the Classification Contribution Degree. Symmetry (Basel) 2021. [DOI: 10.3390/sym13020194] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Silva WA, Villela SM. Improving the one-against-all binary approach for multiclass classification using balancing techniques. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01805-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Teh K, Armitage P, Tesfaye S, Selvarajah D, Wilkinson ID. Imbalanced learning: Improving classification of diabetic neuropathy from magnetic resonance imaging. PLoS One 2020;15:e0243907. [PMID: 33320890 PMCID: PMC7737960 DOI: 10.1371/journal.pone.0243907] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Accepted: 12/01/2020] [Indexed: 11/21/2022] Open

Zhu Y, Yan Y, Zhang Y, Zhang Y. EHSO: Evolutionary Hybrid Sampling in overlapping scenarios for imbalanced learning. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.08.060] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Vandewiele G, Dehaene I, Kovács G, Sterckx L, Janssens O, Ongenae F, De Backere F, De Turck F, Roelens K, Decruyenaere J, Van Hoecke S, Demeester T. Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling. Artif Intell Med 2020;111:101987. [PMID: 33461687 DOI: 10.1016/j.artmed.2020.101987] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 09/09/2020] [Accepted: 11/12/2020] [Indexed: 01/10/2023]