Narwane SV, Sawarkar SD. Is handling unbalanced datasets for machine learning uplifts system performance?: A case of diabetic prediction.
Diabetes Metab Syndr 2022;
16:102609. [PMID:
36099677 DOI:
10.1016/j.dsx.2022.102609]
[Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 08/21/2022] [Accepted: 08/23/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND AND AIMS
Healthcare is a sensitive sector, and addressing the class imbalance in the healthcare domain is a time-consuming task for machine learning-based systems due to the vast amount of data. This study looks into the impact of socioeconomic disparities on the healthcare data of diabetic patients to make accurate disease predictions.
METHODS
This study proposed a systematic approach of Closest Distance Ranking and Principal Component Analysis to deal with the unbalanced dataset. A typical machine learning technique was used to analyze the proposed approach. The data set of pregnant diabetic women is analysed for accurate detection.
RESULTS
The results of the case are analysed using sensitivity, which demonstrates that the minority class's lack of information makes it impossible to forecast the results. On the other hand, the unbalanced dataset was treated using the proposed technique and evaluated with the machine learning algorithm which significantly increased the performance of the system.
CONCLUSION
The performance of the machine learning-based system was significantly enhanced by the unbalanced dataset which was processed with the proposed technique and evaluated with the machine learning algorithm. For the first time, an unbalanced dataset was treated with a combination of Closest Distance Ranking and Principal Component Analysis.
Collapse