1
|
Xu Y, Yu Z, Chen CLP, Liu Z. Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2284-2297. [PMID: 34469316 DOI: 10.1109/tnnls.2021.3106306] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
It is hard to construct an optimal classifier for high-dimensional imbalanced data, on which the performance of classifiers is seriously affected and becomes poor. Although many approaches, such as resampling, cost-sensitive, and ensemble learning methods, have been proposed to deal with the skewed data, they are constrained by high-dimensional data with noise and redundancy. In this study, we propose an adaptive subspace optimization ensemble method (ASOEM) for high-dimensional imbalanced data classification to overcome the above limitations. To construct accurate and diverse base classifiers, a novel adaptive subspace optimization (ASO) method based on adaptive subspace generation (ASG) process and rotated subspace optimization (RSO) process is designed to generate multiple robust and discriminative subspaces. Then a resampling scheme is applied on the optimized subspace to build a class-balanced data for each base classifier. To verify the effectiveness, our ASOEM is implemented based on different resampling strategies on 24 real-world high-dimensional imbalanced datasets. Experimental results demonstrate that our proposed methods outperform other mainstream imbalance learning approaches and classifier ensemble methods.
Collapse
|
2
|
Lung Cancer Prediction Using Robust Machine Learning and Image Enhancement Methods on Extracted Gray-Level Co-Occurrence Matrix Features. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12136517] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In the present era, cancer is the leading cause of demise in both men and women worldwide, with low survival rates due to inefficient diagnostic techniques. Recently, researchers have been devising methods to improve prediction performance. In medical image processing, image enhancement can further improve prediction performance. This study aimed to improve lung cancer image quality by utilizing and employing various image enhancement methods, such as image adjustment, gamma correction, contrast stretching, thresholding, and histogram equalization methods. We extracted the gray-level co-occurrence matrix (GLCM) features on enhancement images, and applied and optimized vigorous machine learning classification algorithms, such as the decision tree (DT), naïve Bayes, support vector machine (SVM) with Gaussian, radial base function (RBF), and polynomial. Without the image enhancement method, the highest performance was obtained using SVM, polynomial, and RBF, with accuracy of (99.89%). The image enhancement methods, such as image adjustment, contrast stretching at threshold (0.02, 0.98), and gamma correction at gamma value of 0.9, improved the prediction performance of our analysis on 945 images provided by the Lung Cancer Alliance MRI dataset, which yielded 100% accuracy and 1.00 of AUC using SVM, RBF, and polynomial kernels. The results revealed that the proposed methodology can be very helpful to improve the lung cancer prediction for further diagnosis and prognosis by expert radiologists to decrease the mortality rate.
Collapse
|
3
|
Xu Y, Yu Z, Chen CLP. Classifier Ensemble Based on Multiview Optimization for High-Dimensional Imbalanced Data Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:870-883. [PMID: 35657843 DOI: 10.1109/tnnls.2022.3177695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
High-dimensional class imbalanced data have plagued the performance of classification algorithms seriously. Because of a large number of redundant/invalid features and the class imbalanced issue, it is difficult to construct an optimal classifier for high-dimensional imbalanced data. Classifier ensemble has attracted intensive attention since it can achieve better performance than an individual classifier. In this work, we propose a multiview optimization (MVO) to learn more effective and robust features from high-dimensional imbalanced data, based on which an accurate and robust ensemble system is designed. Specifically, an optimized subview generation (OSG) in MVO is first proposed to generate multiple optimized subviews from different scenarios, which can strengthen the classification ability of features and increase the diversity of ensemble members simultaneously. Second, a new evaluation criterion that considers the distribution of data in each optimized subview is developed based on which a selective ensemble of optimized subviews (SEOS) is designed to perform the subview selective ensemble. Finally, an oversampling approach is executed on the optimized view to obtain a new class rebalanced subset for the classifier. Experimental results on 25 high-dimensional class imbalanced datasets indicate that the proposed method outperforms other mainstream classifier ensemble methods.
Collapse
|
4
|
Yuan A, You M, He D, Li X. Convex Non-Negative Matrix Factorization With Adaptive Graph for Unsupervised Feature Selection. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:5522-5534. [PMID: 33237876 DOI: 10.1109/tcyb.2020.3034462] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Unsupervised feature selection (UFS) aims to remove the redundant information and select the most representative feature subset from the original data, so it occupies a core position for high-dimensional data preprocessing. Many proposed approaches use self-expression to explore the correlation between the data samples or use pseudolabel matrix learning to learn the mapping between the data and labels. Furthermore, the existing methods have tried to add constraints to either of these two modules to reduce the redundancy, but no prior literature embeds them into a joint model to select the most representative features by the computed top ranking scores. To address the aforementioned issue, this article presents a novel UFS method via a convex non-negative matrix factorization with an adaptive graph constraint (CNAFS). Through convex matrix factorization with adaptive graph constraint, it can dig up the correlation between the data and keep the local manifold structure of the data. To our knowledge, it is the first work that integrates pseudo label matrix learning into the self-expression module and optimizes them simultaneously for the UFS solution. Besides, two different manifold regularizations are constructed for the pseudolabel matrix and the encoding matrix to keep the local geometrical structure. Eventually, extensive experiments on the benchmark datasets are conducted to prove the effectiveness of our method. The source code is available at: https://github.com/misteru/CNAFS.
Collapse
|
5
|
Qian W, Xiong Y, Yang J, Shu W. Feature selection for label distribution learning via feature similarity and label correlation. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.08.076] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
Machine Learning for Android Scareware Detection. JOURNAL OF INFORMATION TECHNOLOGY RESEARCH 2022. [DOI: 10.4018/jitr.298326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
With the steady rise in the use of smartphones, specifically android smartphones, there is an ongoing need to build strong Intrusion Detection Systems to protect ourselves from malicious software attacks, especially on Android smartphones. This work focuses on a sub-group of android malware, scareware. The novelty of this work lies in being able to detect the various scareware families individually using a small number of network attributes, determined by a recursive feature elimination process based on information gain. No work has yet been done on analyzing the scareware families individually. Results of this work show that the number of bytes initially sent back and forth, packet size, amount of time between flows and flow duration are the most important attributes that would be needed to classify a scareware attack. Three classifiers, Decision Tree, Naïve Bayes and OneR, were used for classification. The highest average classification accuracy (79.5%) was achieved by the Decision Tree classifier with a minimum of 44 attributes.
Collapse
|
7
|
Dlamini WMD, Simelane SP, Nhlabatsi NM. Bayesian network-based spatial predictive modelling reveals COVID-19 transmission dynamics in Eswatini. SPATIAL INFORMATION RESEARCH 2022; 30:183-194. [PMCID: PMC8602516 DOI: 10.1007/s41324-021-00421-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 10/24/2021] [Accepted: 10/26/2021] [Indexed: 07/18/2023]
Abstract
The first case of COVID-19 in Eswatini was first reported in March 2020, posing an unprecedented challenge to the country’s health and socio-economic systems. Using geographic information system (GIS) data comprising 15 socioeconomic, demographic and environmental variables, we model the spatial variability of COVID-19 transmission risk based on case data for the period under strict lockdown (up to 8th May 2020) and after the lockdown regulations were gradually eased (up to 30th June 2020). We implemented and tested 13 spatial data-driven Bayesian network (BN) learning algorithms to examine the factors that determine the spatial distribution of COVID-19 transmission risk. All the BN models performed very well in predicting the COVID-19 cases as evidenced by low log loss (0.705–0.683) and high recall values (0.821–0.836). The tree-augmented naïve (TAN) model outperformed all other BN learning algorithms. The proximity to major health facilities, churches, shopping centres and supermarkets as well as average annual traffic density were the strongest predictors of transmission risk during strict lockdown. After gradual relaxation of the lockdown, the proportion of the youth (15–40 years old) in an area became the strongest predictor of COVID-19 transmission in addition to the proximity to areas where people congregate, excluding churches. The study provides useful insights on the spatio-temporal dynamics of COVID-19 transmission drivers thereby aiding the design of geographically-targeted interventions. The findings also point to the robustness of BN models in spatial predictive modelling and graphically explaining spatial phenomena under uncertainty and with limited data.
Collapse
Affiliation(s)
- Wisdom M. D. Dlamini
- Department of Geography, Environmental Science and Planning, University of Eswatini, Kwaluseni, Eswatini
| | | | | |
Collapse
|
8
|
A Novel Rank Aggregation-Based Hybrid Multifilter Wrapper Feature Selection Method in Software Defect Prediction. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2021; 2021:5069016. [PMID: 34868291 PMCID: PMC8635927 DOI: 10.1155/2021/5069016] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Accepted: 10/22/2021] [Indexed: 11/21/2022]
Abstract
The high dimensionality of software metric features has long been noted as a data quality problem that affects the performance of software defect prediction (SDP) models. This drawback makes it necessary to apply feature selection (FS) algorithm(s) in SDP processes. FS approaches can be categorized into three types, namely, filter FS (FFS), wrapper FS (WFS), and hybrid FS (HFS). HFS has been established as superior because it combines the strength of both FFS and WFS methods. However, selecting the most appropriate FFS (filter rank selection problem) for HFS is a challenge because the performance of FFS methods depends on the choice of datasets and classifiers. In addition, the local optima stagnation and high computational costs of WFS due to large search spaces are inherited by the HFS method. Therefore, as a solution, this study proposes a novel rank aggregation-based hybrid multifilter wrapper feature selection (RAHMFWFS) method for the selection of relevant and irredundant features from software defect datasets. The proposed RAHMFWFS is divided into two stepwise stages. The first stage involves a rank aggregation-based multifilter feature selection (RMFFS) method that addresses the filter rank selection problem by aggregating individual rank lists from multiple filter methods, using a novel rank aggregation method to generate a single, robust, and non-disjoint rank list. In the second stage, the aggregated ranked features are further preprocessed by an enhanced wrapper feature selection (EWFS) method based on a dynamic reranking strategy that is used to guide the feature subset selection process of the HFS method. This, in turn, reduces the number of evaluation cycles while amplifying or maintaining its prediction performance. The feasibility of the proposed RAHMFWFS was demonstrated on benchmarked software defect datasets with Naïve Bayes and Decision Tree classifiers, based on accuracy, the area under the curve (AUC), and F-measure values. The experimental results showed the effectiveness of RAHMFWFS in addressing filter rank selection and local optima stagnation problems in HFS, as well as the ability to select optimal features from SDP datasets while maintaining or enhancing the performance of SDP models. To conclude, the proposed RAHMFWFS achieved good performance by improving the prediction performances of SDP models across the selected datasets, compared to existing state-of-the-arts HFS methods.
Collapse
|
9
|
Abstract
Naive Bayes (NB) is easy to construct but surprisingly effective, and it is one of the top ten classification algorithms in data mining. The conditional independence assumption of NB ignores the dependency between attributes, so its probability estimates are often suboptimal. Hidden naive Bayes (HNB) adds a hidden parent to each attribute, which can reflect dependencies from all the other attributes. Compared with other Bayesian network algorithms, it offers significant improvements in classification performance and avoids structure learning. However, the assumption that HNB regards each instance equivalent in terms of probability estimation is not always true in real-world applications. In order to reflect different influences of different instances in HNB, the HNB model is modified into the improved HNB model. The novel hybrid approach called instance weighted hidden naive Bayes (IWHNB) is proposed in this paper. IWHNB combines instance weighting with the improved HNB model into one uniform framework. Instance weights are incorporated into the improved HNB model to calculate probability estimates in IWHNB. Extensive experimental results show that IWHNB obtains significant improvements in classification performance compared with NB, HNB and other state-of-the-art competitors. Meanwhile, IWHNB maintains the low time complexity that characterizes HNB.
Collapse
|
10
|
Hussain L, Huang P, Nguyen T, Lone KJ, Ali A, Khan MS, Li H, Suh DY, Duong TQ. Machine learning classification of texture features of MRI breast tumor and peri-tumor of combined pre- and early treatment predicts pathologic complete response. Biomed Eng Online 2021; 20:63. [PMID: 34183038 PMCID: PMC8240261 DOI: 10.1186/s12938-021-00899-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 06/09/2021] [Indexed: 12/02/2022] Open
Abstract
Purpose This study used machine learning classification of texture features from MRI of breast tumor and peri-tumor at multiple treatment time points in conjunction with molecular subtypes to predict eventual pathological complete response (PCR) to neoadjuvant chemotherapy. Materials and method This study employed a subset of patients (N = 166) with PCR data from the I-SPY-1 TRIAL (2002–2006). This cohort consisted of patients with stage 2 or 3 breast cancer that underwent anthracycline–cyclophosphamide and taxane treatment. Magnetic resonance imaging (MRI) was acquired pre-neoadjuvant chemotherapy, early, and mid-treatment. Texture features were extracted from post-contrast-enhanced MRI, pre- and post-contrast subtraction images, and with morphological dilation to include peri-tumoral tissue. Molecular subtypes and Ki67 were also included in the prediction model. Performance of classification models used the receiver operating characteristics curve analysis including area under the curve (AUC). Statistical analysis was done using unpaired two-tailed t-tests. Results Molecular subtypes alone yielded moderate prediction performance of PCR (AUC = 0.82, p = 0.07). Pre-, early, and mid-treatment data alone yielded moderate performance (AUC = 0.88, 0.72, and 0.78, p = 0.03, 0.13, 0.44, respectively). The combined pre- and early treatment data markedly improved performance (AUC = 0.96, p = 0.0003). Addition of molecular subtypes improved performance slightly for individual time points but substantially for the combined pre- and early treatment (AUC = 0.98, p = 0.0003). The optimal morphological dilation was 3–5 pixels. Subtraction of post- and pre-contrast MRI further improved performance (AUC = 0.98, p = 0.00003). Finally, among the machine-learning algorithms evaluated, the RUSBoosted Tree machine-learning method yielded the highest performance. Conclusion AI-classification of texture features from MRI of breast tumor at multiple treatment time points accurately predicts eventual PCR. Longitudinal changes in texture features and peri-tumoral features further improve PCR prediction performance. Accurate assessment of treatment efficacy early on could minimize unnecessary toxic chemotherapy and enable mid-treatment modification for patients to achieve better clinical outcomes.
Collapse
Affiliation(s)
- Lal Hussain
- Department of Computer Science & IT, Neelum Campus, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan.,Department of Computer Science & IT, King Abdullah Campus, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan.,Department of Radiology, Renaissance School of Medicine At Stony, Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA.,Department of Radiology, Albert Einstein College of Medicine and Montefiore Medical Center, 111 East 210th Street, Bronx, NY, 10467, USA
| | - Pauline Huang
- Department of Radiology, Renaissance School of Medicine At Stony, Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA
| | - Tony Nguyen
- Department of Radiology, Renaissance School of Medicine At Stony, Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA
| | - Kashif J Lone
- Department of Computer Science & IT, King Abdullah Campus, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| | - Amjad Ali
- Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan
| | - Muhammad Salman Khan
- Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, Pakistan
| | - Haifang Li
- Department of Radiology, Renaissance School of Medicine At Stony, Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA
| | - Doug Young Suh
- College of Electronics and Convergence Engineering, Kyung Hee University, Seoul, South Korea.
| | - Tim Q Duong
- Department of Radiology, Albert Einstein College of Medicine and Montefiore Medical Center, 111 East 210th Street, Bronx, NY, 10467, USA
| |
Collapse
|
11
|
Tena A, Claria F, Solsona F, Meister E, Povedano M. Detection of Bulbar Involvement in Patients With Amyotrophic Lateral Sclerosis by Machine Learning Voice Analysis: Diagnostic Decision Support Development Study. JMIR Med Inform 2021; 9:e21331. [PMID: 33688838 PMCID: PMC7991994 DOI: 10.2196/21331] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 10/26/2020] [Accepted: 01/17/2021] [Indexed: 11/13/2022] Open
Abstract
Background Bulbar involvement is a term used in amyotrophic lateral sclerosis (ALS) that refers to motor neuron impairment in the corticobulbar area of the brainstem, which produces a dysfunction of speech and swallowing. One of the earliest symptoms of bulbar involvement is voice deterioration characterized by grossly defective articulation; extremely slow, laborious speech; marked hypernasality; and severe harshness. Bulbar involvement requires well-timed and carefully coordinated interventions. Therefore, early detection is crucial to improving the quality of life and lengthening the life expectancy of patients with ALS who present with this dysfunction. Recent research efforts have focused on voice analysis to capture bulbar involvement. Objective The main objective of this paper was (1) to design a methodology for diagnosing bulbar involvement efficiently through the acoustic parameters of uttered vowels in Spanish, and (2) to demonstrate that the performance of the automated diagnosis of bulbar involvement is superior to human diagnosis. Methods The study focused on the extraction of features from the phonatory subsystem—jitter, shimmer, harmonics-to-noise ratio, and pitch—from the utterance of the five Spanish vowels. Then, we used various supervised classification algorithms, preceded by principal component analysis of the features obtained. Results To date, support vector machines have performed better (accuracy 95.8%) than the models analyzed in the related work. We also show how the model can improve human diagnosis, which can often misdiagnose bulbar involvement. Conclusions The results obtained are very encouraging and demonstrate the efficiency and applicability of the automated model presented in this paper. It may be an appropriate tool to help in the diagnosis of ALS by multidisciplinary clinical teams, in particular to improve the diagnosis of bulbar involvement.
Collapse
Affiliation(s)
- Alberto Tena
- Information and Communication Technologies Group, International Centre for Numerical Methods in Engineering, Barcelona, Spain
| | - Francec Claria
- Department of Computer Science, Universitat de Lleida, Lleida, Spain
| | - Francesc Solsona
- Department of Computer Science, Universitat de Lleida, Lleida, Spain
| | - Einar Meister
- Institute of Cybernetics, Tallinn University of Technology, Tallinn, Estonia
| | - Monica Povedano
- Motoneuron Functional Unit, Hospital Universitari de Bellvitge, Barcelona, Spain
| |
Collapse
|
12
|
Alizadeh SH, Hediehloo A, Harzevili NS. Multi independent latent component extension of naive Bayes classifier. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106646] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
13
|
Hussain L, Nguyen T, Li H, Abbasi AA, Lone KJ, Zhao Z, Zaib M, Chen A, Duong TQ. Machine-learning classification of texture features of portable chest X-ray accurately classifies COVID-19 lung infection. Biomed Eng Online 2020; 19:88. [PMID: 33239006 PMCID: PMC7686836 DOI: 10.1186/s12938-020-00831-x] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 11/17/2020] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The large volume and suboptimal image quality of portable chest X-rays (CXRs) as a result of the COVID-19 pandemic could post significant challenges for radiologists and frontline physicians. Deep-learning artificial intelligent (AI) methods have the potential to help improve diagnostic efficiency and accuracy for reading portable CXRs. PURPOSE The study aimed at developing an AI imaging analysis tool to classify COVID-19 lung infection based on portable CXRs. MATERIALS AND METHODS Public datasets of COVID-19 (N = 130), bacterial pneumonia (N = 145), non-COVID-19 viral pneumonia (N = 145), and normal (N = 138) CXRs were analyzed. Texture and morphological features were extracted. Five supervised machine-learning AI algorithms were used to classify COVID-19 from other conditions. Two-class and multi-class classification were performed. Statistical analysis was done using unpaired two-tailed t tests with unequal variance between groups. Performance of classification models used the receiver-operating characteristic (ROC) curve analysis. RESULTS For the two-class classification, the accuracy, sensitivity and specificity were, respectively, 100%, 100%, and 100% for COVID-19 vs normal; 96.34%, 95.35% and 97.44% for COVID-19 vs bacterial pneumonia; and 97.56%, 97.44% and 97.67% for COVID-19 vs non-COVID-19 viral pneumonia. For the multi-class classification, the combined accuracy and AUC were 79.52% and 0.87, respectively. CONCLUSION AI classification of texture and morphological features of portable CXRs accurately distinguishes COVID-19 lung infection in patients in multi-class datasets. Deep-learning methods have the potential to improve diagnostic efficiency and accuracy for portable CXRs.
Collapse
Affiliation(s)
- Lal Hussain
- Department of Computer Science and IT, King Abdullah Campus, University of Azad Jammu and Kashmir, Muzaffarabad, 13100, Azad Kashmir, Pakistan.
- Department of Computer Science and IT, Neelum Campus, University of Azad Jammu and Kashmir, Athmuqam, 13230, Azad Kashmir, Pakistan.
| | - Tony Nguyen
- Department of Radiology, Renaissance School of Medicine at Stony Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA
| | - Haifang Li
- Department of Radiology, Renaissance School of Medicine at Stony Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA
| | - Adeel A Abbasi
- Department of Computer Science and IT, King Abdullah Campus, University of Azad Jammu and Kashmir, Muzaffarabad, 13100, Azad Kashmir, Pakistan
| | - Kashif J Lone
- Department of Computer Science and IT, King Abdullah Campus, University of Azad Jammu and Kashmir, Muzaffarabad, 13100, Azad Kashmir, Pakistan
| | - Zirun Zhao
- Department of Radiology, Renaissance School of Medicine at Stony Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA
| | - Mahnoor Zaib
- Department of Computer Science and IT, Neelum Campus, University of Azad Jammu and Kashmir, Athmuqam, 13230, Azad Kashmir, Pakistan
| | - Anne Chen
- Department of Radiology, Renaissance School of Medicine at Stony Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA
| | - Tim Q Duong
- Department of Radiology, Renaissance School of Medicine at Stony Brook University, 101 Nicolls Rd, Stony Brook, NY, 11794, USA
| |
Collapse
|
14
|
An Effective Multi-Label Feature Selection Model Towards Eliminating Noisy Features. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10228093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Feature selection has devoted a consistently great amount of effort to dimension reduction for various machine learning tasks. Existing feature selection models focus on selecting the most discriminative features for learning targets. However, this strategy is weak in handling two kinds of features, that is, the irrelevant and redundant ones, which are collectively referred to as noisy features. These features may hamper the construction of optimal low-dimensional subspaces and compromise the learning performance of downstream tasks. In this study, we propose a novel multi-label feature selection approach by embedding label correlations (dubbed ELC) to address these issues. Particularly, we extract label correlations for reliable label space structures and employ them to steer feature selection. In this way, label and feature spaces can be expected to be consistent and noisy features can be effectively eliminated. An extensive experimental evaluation on public benchmarks validated the superiority of ELC.
Collapse
|
15
|
Hussain L, Saeed S, Awan IA, Idris A, Nadeem MSA, Chaudhry QUA. Detecting Brain Tumor using Machines Learning Techniques Based on Different Features Extracting Strategies. Curr Med Imaging 2020; 15:595-606. [PMID: 32008569 DOI: 10.2174/1573405614666180718123533] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 05/26/2018] [Accepted: 07/10/2018] [Indexed: 02/08/2023]
Abstract
BACKGROUND Brain tumor is the leading cause of death worldwide. It is obvious that the chances of survival can be increased if the tumor is identified and properly classified at an initial stage. MRI (Magnetic Resonance Imaging) is one source of brain tumors detection tool and is extensively used in the diagnosis of brain to detect blood clots. In the past, many researchers developed Computer-Aided Diagnosis (CAD) systems that help the radiologist to detect the abnormalities in an efficient manner. OBJECTIVE The aim of this research is to improve the brain tumor detection performance by proposing a multimodal feature extracting strategy and employing machine learning techniques. METHODS In this study, we extracted multimodal features such as texture, morphological, entropybased, Scale Invariant Feature Transform (SIFT), and Elliptic Fourier Descriptors (EFDs) from brain tumor imaging database. The tumor was detected using robust machine learning techniques such as Support Vector Machine (SVM) with kernels: polynomial, Radial Base Function (RBF), Gaussian; Decision Tree (DT), and Naïve Bayes. Most commonly used Jack-knife 10-fold Cross- Validation (CV) was used for testing and validation of dataset. RESULTS The performance was evaluated in terms of specificity, sensitivity, Positive Predictive Value (PPV), Negative Predictive Value (NPV), False Positive Rate (FPR), Total Accuracy (TA), Area under the receiver operating Curve (AUC), and P-value. The highest performance of 100% in terms of Specificity, Sensitivity, PPV, NPV, TA, AUC using Naïve Bayes classifiers based on entropy, morphological, SIFT and texture features followed by Decision Tree classifier with texture features (TA=97.81%, AUC=1.0) and SVM polynomial kernel with texture features (TA=94.63%). The highest significant p-value was obtained using SVM polynomial with texture features (P-value 2.65e-104) followed by SVM RB with texture features (P-value 1.96e-98). CONCLUSION The results reveal that Naïve Bayes followed by Decision Tree gives highest detection accuracy based on entropy, morphological, SIFT and texture features.
Collapse
Affiliation(s)
- Lal Hussain
- Department of Computer Sciences & Information Technology, University of Azad Jammu and Kashmir, City Campus 13100, Muzaffarabad, Azad Kashmir, Pakistan
| | - Sharjil Saeed
- Department of Computer Sciences & Information Technology, University of Azad Jammu and Kashmir, City Campus 13100, Muzaffarabad, Azad Kashmir, Pakistan
| | - Imtiaz Ahmed Awan
- Department of Computer Sciences & Information Technology, University of Azad Jammu and Kashmir, City Campus 13100, Muzaffarabad, Azad Kashmir, Pakistan
| | - Adnan Idris
- Department of Computer Sciences & Information Technology, University of Poonch Rawalakot, Rawalakot, Pakistan
| | - Malik Sajjad Ahmed Nadeem
- Department of Computer Sciences & Information Technology, University of Azad Jammu and Kashmir, City Campus 13100, Muzaffarabad, Azad Kashmir, Pakistan
| | - Qurat-Ul-Ain Chaudhry
- Department of Computer Sciences & Information Technology, University of Azad Jammu and Kashmir, City Campus 13100, Muzaffarabad, Azad Kashmir, Pakistan
| |
Collapse
|
16
|
Wang X, Ding W, Liu H, Huang X. Shape recognition through multi-level fusion of features and classifiers. GRANULAR COMPUTING 2020. [DOI: 10.1007/s41066-019-00164-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
17
|
Huang Q, Xie L, Yin G, Ran M, Liu X, Zheng J. Acoustic signal analysis for detecting defects inside an arc magnet using a combination of variational mode decomposition and beetle antennae search. ISA TRANSACTIONS 2020; 102:347-364. [PMID: 32173040 DOI: 10.1016/j.isatra.2020.02.036] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 02/29/2020] [Accepted: 02/29/2020] [Indexed: 06/10/2023]
Abstract
An accurate, rapid signal analysis is crucial in the acoustic-based detection for internal defects in arc magnets. Benefiting from the adaptive decomposition without the mode mixing, variational mode decomposition (VMD), has emerged as a promising technology for processing and analyzing acoustic signals. However, improper parameter settings are the root cause of inaccurate VMD results, while existing optimization methods for VMD parameters are only applicable to a single signal with exclusive signal characteristics, rather than different signals with similar features. Therefore, we developed a new acoustic signal analysis method combining VMD, beetle antennae search (BAS), and naive Bayes classification (NBC), and then applied it for detecting internal defects of arc magnets. In this method, multiple optimizations for different signals are simplified to a one-time optimization for the whole signal group by a specially designed parameter-related fitness function. Since the coordinates of the function maximum value in a parameter space correspond to the unified parameter setting generating the overall optimal processing effect for all signals, BAS is introduced to achieve a rapid search of coordinates. With the obtained unified parameter setting, each acoustic signal of arc magnets can be consistently processed by VMD. Next, two modes stemmed from VMD are screened out by an energy threshold, and their specific frequency information is extracted as features representing the internal defects. NBC is carried out to learn and identify the extracted features. The experimental validation of the proposed method was conducted by detecting various arc magnets. Experimental results indicate that the identification accuracy reaches 100% and the detection speed per a single arc magnet approximately ranges between 1.7 and 4.5 s. This work provides not only a new strategy for the parameter optimization of VMD, but also a practical solution for the internal defect detection of arc magnets.
Collapse
Affiliation(s)
- Qinyuan Huang
- School of Automation and Information Engineering, Sichuan University of Science and Engineering, Zigong, Sichuan 643000, PR China; Department of Chemical and Biomolecular Engineering, The University of Akron, Akron, OH, 44325, USA.
| | - Luofeng Xie
- School of Manufacturing Science and Engineering, Sichuan University, Chengdu 610065, PR China
| | - Guofu Yin
- School of Manufacturing Science and Engineering, Sichuan University, Chengdu 610065, PR China
| | - Maoxia Ran
- School of Automation and Information Engineering, Sichuan University of Science and Engineering, Zigong, Sichuan 643000, PR China
| | - Xin Liu
- School of Automation and Information Engineering, Sichuan University of Science and Engineering, Zigong, Sichuan 643000, PR China
| | - Jie Zheng
- Department of Chemical and Biomolecular Engineering, The University of Akron, Akron, OH, 44325, USA
| |
Collapse
|
18
|
Multi-objective feature selection using hybridization of a genetic algorithm and direct multisearch for key quality characteristic selection. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.03.032] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
19
|
|
20
|
|
21
|
A Comparison Study of Algorithms to Detect Drug-Adverse Event Associations: Frequentist, Bayesian, and Machine-Learning Approaches. Drug Saf 2020; 42:743-750. [PMID: 30762164 DOI: 10.1007/s40264-018-00792-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
INTRODUCTION It is important to monitor the safety profile of drugs, and mining for strong associations between drugs and adverse events is an effective and inexpensive method of post-marketing safety surveillance. OBJECTIVE The objective of our work was to compare the accuracy of both common and innovative methods of data mining for pharmacovigilance purposes. METHODS We used the reference standard provided by the Observational Medical Outcomes Partnership, which contains 398 drug-adverse event pairs (165 positive controls, 233 negative controls). Ten methods and algorithms were applied to the US FDA Adverse Event Reporting System data to investigate the 398 pairs. The ten methods include popular methods in the pharmacovigilance literature, newly developed pharmacovigilance methods as at 2018, and popular methods in the genome-wide association study literature. We compared their performance using the receiver operating characteristic (ROC) plot, area under the curve (AUC), and Youden's index. RESULTS The Bayesian confidence propagation neural network had the highest AUC overall. Monte Carlo expectation maximization, a method developed in 2018, had the second highest AUC and the highest Youden's index, and performed very well in terms of high specificity. The regression-adjusted gamma Poisson shrinkage model performed best under high-sensitivity requirements. CONCLUSION Our results will be useful to help choose a method for a given desired level of specificity. Methods popular in the genome-wide association study literature did not perform well because of the sparsity of data and will need modification before their properties can be used in the drug-adverse event association problem.
Collapse
|
22
|
|
23
|
Semi-wrapper feature subset selector for feed-forward neural networks: Applications to binary and multi-class classification problems. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.05.133] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
24
|
A benchmarking study of classification techniques for behavioral data. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2019. [DOI: 10.1007/s41060-019-00185-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
25
|
A novel somatic cancer gene-based biomedical document feature ranking and clustering model. INFORMATICS IN MEDICINE UNLOCKED 2019. [DOI: 10.1016/j.imu.2019.100188] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
26
|
Hussain L, Ahmed A, Saeed S, Rathore S, Awan IA, Shah SA, Majid A, Idris A, Awan AA. Prostate cancer detection using machine learning techniques by employing combination of features extracting strategies. Cancer Biomark 2018; 21:393-413. [PMID: 29226857 DOI: 10.3233/cbm-170643] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Prostate is a second leading causes of cancer deaths among men. Early detection of cancer can effectively reduce the rate of mortality caused by Prostate cancer. Due to high and multiresolution of MRIs from prostate cancer require a proper diagnostic systems and tools. In the past researchers developed Computer aided diagnosis (CAD) systems that help the radiologist to detect the abnormalities. In this research paper, we have employed novel Machine learning techniques such as Bayesian approach, Support vector machine (SVM) kernels: polynomial, radial base function (RBF) and Gaussian and Decision Tree for detecting prostate cancer. Moreover, different features extracting strategies are proposed to improve the detection performance. The features extracting strategies are based on texture, morphological, scale invariant feature transform (SIFT), and elliptic Fourier descriptors (EFDs) features. The performance was evaluated based on single as well as combination of features using Machine Learning Classification techniques. The Cross validation (Jack-knife k-fold) was performed and performance was evaluated in term of receiver operating curve (ROC) and specificity, sensitivity, Positive predictive value (PPV), negative predictive value (NPV), false positive rate (FPR). Based on single features extracting strategies, SVM Gaussian Kernel gives the highest accuracy of 98.34% with AUC of 0.999. While, using combination of features extracting strategies, SVM Gaussian kernel with texture + morphological, and EFDs + morphological features give the highest accuracy of 99.71% and AUC of 1.00.
Collapse
Affiliation(s)
- Lal Hussain
- QEC, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan.,Department of CS and IT, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| | - Adeel Ahmed
- Department of CS and IT, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| | - Sharjil Saeed
- Department of CS and IT, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| | - Saima Rathore
- Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Imtiaz Ahmed Awan
- Department of CS and IT, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| | - Saeed Arif Shah
- Department of CS and IT, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| | - Abdul Majid
- Department of CS and IT, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| | - Adnan Idris
- Department of CS and IT, University of Poonch Rawalakot, Rawalakot, Azad Kashmir, Pakistan
| | - Anees Ahmed Awan
- Department of CS and IT, The University of Azad Jammu and Kashmir, Muzaffarabad, Azad Kashmir, Pakistan
| |
Collapse
|
27
|
Kale A, Sonavane S. F-WSS $$^{++}$$ + + : incremental wrapper subset selection algorithm for fuzzy extreme learning machine. INT J MACH LEARN CYB 2018. [DOI: 10.1007/s13042-018-0859-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
28
|
Nizami IF, Majid M, Afzal H, Khurshid K. Impact of Feature Selection Algorithms on Blind Image Quality Assessment. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2018. [DOI: 10.1007/s13369-017-2803-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
29
|
López-Cabrera JD, Lorenzo-Ginori JV. Feature selection for the classification of traced neurons. J Neurosci Methods 2018; 303:41-54. [DOI: 10.1016/j.jneumeth.2018.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 03/19/2018] [Accepted: 04/04/2018] [Indexed: 10/17/2022]
|
30
|
Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. APPL INTELL 2018. [DOI: 10.1007/s10489-018-1161-y] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
31
|
Sitting Posture Monitoring System Based on a Low-Cost Load Cell Using Machine Learning. SENSORS 2018; 18:s18010208. [PMID: 29329261 PMCID: PMC5796304 DOI: 10.3390/s18010208] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2017] [Revised: 12/23/2017] [Accepted: 01/11/2018] [Indexed: 11/26/2022]
Abstract
Sitting posture monitoring systems (SPMSs) help assess the posture of a seated person in real-time and improve sitting posture. To date, SPMS studies reported have required many sensors mounted on the backrest plate and seat plate of a chair. The present study, therefore, developed a system that measures a total of six sitting postures including the posture that applied a load to the backrest plate, with four load cells mounted only on the seat plate. Various machine learning algorithms were applied to the body weight ratio measured by the developed SPMS to identify the method that most accurately classified the actual sitting posture of the seated person. After classifying the sitting postures using several classifiers, average and maximum classification rates of 97.20% and 97.94%, respectively, were obtained from nine subjects with a support vector machine using the radial basis function kernel; the results obtained by this classifier showed a statistically significant difference from the results of multiple classifications using other classifiers. The proposed SPMS was able to classify six sitting postures including the posture with loading on the backrest and showed the possibility of classifying the sitting posture even though the number of sensors is reduced.
Collapse
|
32
|
Hui KH, Ooi CS, Lim MH, Leong MS, Al-Obaidi SM. An improved wrapper-based feature selection method for machinery fault diagnosis. PLoS One 2017; 12:e0189143. [PMID: 29261689 PMCID: PMC5738058 DOI: 10.1371/journal.pone.0189143] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2017] [Accepted: 11/20/2017] [Indexed: 11/21/2022] Open
Abstract
A major issue of machinery fault diagnosis using vibration signals is that it is over-reliant on personnel knowledge and experience in interpreting the signal. Thus, machine learning has been adapted for machinery fault diagnosis. The quantity and quality of the input features, however, influence the fault classification performance. Feature selection plays a vital role in selecting the most representative feature subset for the machine learning algorithm. In contrast, the trade-off relationship between capability when selecting the best feature subset and computational effort is inevitable in the wrapper-based feature selection (WFS) method. This paper proposes an improved WFS technique before integration with a support vector machine (SVM) model classifier as a complete fault diagnosis system for a rolling element bearing case study. The bearing vibration dataset made available by the Case Western Reserve University Bearing Data Centre was executed using the proposed WFS and its performance has been analysed and discussed. The results reveal that the proposed WFS secures the best feature subset with a lower computational effort by eliminating the redundancy of re-evaluation. The proposed WFS has therefore been found to be capable and efficient to carry out feature selection tasks.
Collapse
Affiliation(s)
- Kar Hoou Hui
- Institute of Noise and Vibration, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
- * E-mail:
| | - Ching Sheng Ooi
- Institute of Noise and Vibration, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
| | - Meng Hee Lim
- Institute of Noise and Vibration, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
| | - Mohd Salman Leong
- Institute of Noise and Vibration, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
| | - Salah Mahdi Al-Obaidi
- Institute of Noise and Vibration, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
| |
Collapse
|
33
|
|
34
|
A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm. Knowl Inf Syst 2017. [DOI: 10.1007/s10115-017-1131-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
35
|
Babu DK, Ramadevi Y, Ramana K. PGNBC: Pearson Gaussian Naïve Bayes classifier for data stream classification with recurring concept drift. INTELL DATA ANAL 2017. [DOI: 10.3233/ida-163020] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- D. Kishore Babu
- Institute of Aeronautical Engineering (AUTONOMOUS),, Dundigal 500043, India
| | | | | |
Collapse
|
36
|
A novel Hybrid Genetic Local Search Algorithm for feature selection and weighting with an application in strategic decision making in innovation management. Inf Sci (N Y) 2017. [DOI: 10.1016/j.ins.2017.04.009] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
37
|
|
38
|
Nam LNH, Quoc HB. The Hybrid Filter Feature Selection Methods for Improving High-Dimensional Text Categorization. INT J UNCERTAIN FUZZ 2017. [DOI: 10.1142/s021848851750009x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The bag-of-words technique is often used to present a document in text categorization. However, for a large set of documents where the dimension of the bag-of-words vector is very high, text categorization becomes a serious challenge as a result of sparse data, over-fitting, and irrelevant features. A filter feature selection method reduces the number of features by eliminating irrelevant features from the bag-of-words vector. In this paper, we analyze the weak points and strong points of two filter feature selection approaches which are the frequency-based approach and the cluster-based approach. Thanks to the analysis, we propose hybrid filter feature selection methods, named the Frequency-Cluster Feature Selection (FCFS) and the Detailed Frequency-Cluster Feature Selection (DtFCFS), to further improve the performance of the filter feature selection process in text categorization. The FCFS is a combination of the Frequency-based approach and the Cluster-based approach, while the DtFCFS, a detailed version of the FCFS, is a comprehensively hybrid clusterbased method. We do experiments with four benchmark datasets (the Reuters-21578 and Newsgroup dataset for news classification, the Ohsumed dataset for medical document classification, and the LingSpam dataset for email classification) to compare the proposed methods with six related wellknown methods such as the Comprehensive Measurement Feature Selection (CMFS), the Optimal Orthogonal Centroid Feature Selection (OCFS), the Crossed Centroid Feature Selection (CIIC), the Information Gain (IG), the Chi-square (CHI), and the Deviation from Poisson Feature Selection (DFPFS). In terms of the Micro-F1, the Macro-F1, and the dimension reduction rate, the DtFCFS is superior to the other methods, while the FCFS shows competitive and even superior performance to the good methods, especially for the Macro-F1.
Collapse
Affiliation(s)
- Le Nguyen Hoai Nam
- Department of Information System, The School of Information Technology, VNUHCM – the University of Science, Ho Chi Minh City, Vietnam
| | - Ho Bao Quoc
- Department of Information System, The School of Information Technology, VNUHCM – the University of Science, Ho Chi Minh City, Vietnam
| |
Collapse
|
39
|
|
40
|
Li AD, He Z, Zhang Y. Bi-objective variable selection for key quality characteristics selection based on a modified NSGA-II and the ideal point method. COMPUT IND 2016. [DOI: 10.1016/j.compind.2016.05.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
41
|
Affiliation(s)
- Mitra Montazeri
- Computer Engineering Department, Shahid Bahonar University, Kerman, Iran
- Medical Informatics Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran E-mail:
| |
Collapse
|
42
|
Zhang L, Jiang L, Li C, Kong G. Two feature weighting approaches for naive Bayes text classifiers. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.02.017] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
43
|
|
44
|
|
45
|
A metaheuristic optimization framework for informative gene selection. INFORMATICS IN MEDICINE UNLOCKED 2016. [DOI: 10.1016/j.imu.2016.09.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
46
|
Tian J, Li M, Chen F, Feng N. Learning Subspace-Based RBFNN Using Coevolutionary Algorithm for Complex Classification Tasks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:47-61. [PMID: 25823042 DOI: 10.1109/tnnls.2015.2411615] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Many real-world classification problems are characterized by samples of a complex distribution in the input space. The classification accuracy is determined by intrinsic properties of all samples in subspaces of features. This paper proposes a novel algorithm for the construction of radial basis function neural network (RBFNN) classifier based on subspace learning. In this paper, feature subspaces are obtained for every hidden node of the RBFNN during the learning process. The connection weights between the input layer and the hidden layer are adjusted to produce various subspaces with dominative features for different hidden nodes. The network structure and dominative features are encoded in two subpopulations that are cooperatively coevolved using the coevolutionary algorithm to achieve a better global optimality for the estimated RBFNN. Experimental results illustrate that the proposed algorithm is able to obtain RBFNN models with both better classification accuracy and simpler network structure when compared with other learning algorithms. Thus, the proposed model provides a more flexible and efficient approach to complex classification tasks by employing the local characteristics of samples in subspaces.
Collapse
|
47
|
Optimized Tumor Breast Cancer Classification Using Combining Random Subspace and Static Classifiers Selection Paradigms. INTELLIGENT SYSTEMS REFERENCE LIBRARY 2016. [DOI: 10.1007/978-3-319-21212-8_13] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
48
|
|
49
|
|
50
|
Degree of contribution (DoC) feature selection algorithm for structural brain MRI volumetric features in depression detection. Int J Comput Assist Radiol Surg 2014; 10:1003-16. [DOI: 10.1007/s11548-014-1130-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2014] [Accepted: 11/04/2014] [Indexed: 10/24/2022]
|