Endalie D, Tegegne T. Designing a hybrid dimension reduction for improving the performance of Amharic news document classification.
PLoS One 2021;
16:e0251902. [PMID:
34019571 PMCID:
PMC8139506 DOI:
10.1371/journal.pone.0251902]
[Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 05/05/2021] [Indexed: 11/18/2022] Open
Abstract
The volume of Amharic digital documents has grown rapidly in recent years. As a result, automatic document categorization is highly essential. In this paper, we present a novel dimension reduction approach for improving classification accuracy by combining feature selection and feature extraction. The new dimension reduction method utilizes Information Gain (IG), Chi-square test (CHI), and Document Frequency (DF) to select important features and Principal Component Analysis (PCA) to refine the features that have been selected. We evaluate the proposed dimension reduction method with a dataset containing 9 news categories. Our experimental results verified that the proposed dimension reduction method outperforms other methods. Classification accuracy with the new dimension reduction is 92.60%, which is 13.48%, 16.51% and 10.19% higher than with IG, CHI, and DF respectively. Further work is required since classification accuracy still decreases as we reduce the feature size to save computational time.
Collapse