Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Huang X, Zhang L, Wang B, Li F, Zhang Z. Feature clustering based support vector machine recursive feature elimination for gene selection. APPL INTELL 2018;48:594-607. [DOI: 10.1007/s10489-017-0992-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

For:	Huang X, Zhang L, Wang B, Li F, Zhang Z. Feature clustering based support vector machine recursive feature elimination for gene selection. APPL INTELL 2018;48:594-607. [DOI: 10.1007/s10489-017-0992-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Number

Cited by Other Article(s)

Yasin P, Yimit Y, Cai X, Aimaiti A, Sheng W, Mamat M, Nijiati M. Machine learning-enabled prediction of prolonged length of stay in hospital after surgery for tuberculosis spondylitis patients with unbalanced data: a novel approach using explainable artificial intelligence (XAI). Eur J Med Res 2024;29:383. [PMID: 39054495 PMCID: PMC11270948 DOI: 10.1186/s40001-024-01988-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 07/18/2024] [Indexed: 07/27/2024] Open

Abstract

BACKGROUND

Tuberculosis spondylitis (TS), commonly known as Pott's disease, is a severe type of skeletal tuberculosis that typically requires surgical treatment. However, this treatment option has led to an increase in healthcare costs due to prolonged hospital stays (PLOS). Therefore, identifying risk factors associated with extended PLOS is necessary. In this research, we intended to develop an interpretable machine learning model that could predict extended PLOS, which can provide valuable insights for treatments and a web-based application was implemented.

METHODS

We obtained patient data from the spine surgery department at our hospital. Extended postoperative length of stay (PLOS) refers to a hospitalization duration equal to or exceeding the 75th percentile following spine surgery. To identify relevant variables, we employed several approaches, such as the least absolute shrinkage and selection operator (LASSO), recursive feature elimination (RFE) based on support vector machine classification (SVC), correlation analysis, and permutation importance value. Several models using implemented and some of them are ensembled using soft voting techniques. Models were constructed using grid search with nested cross-validation. The performance of each algorithm was assessed through various metrics, including the AUC value (area under the curve of receiver operating characteristics) and the Brier Score. Model interpretation involved utilizing methods such as Shapley additive explanations (SHAP), the Gini Impurity Index, permutation importance, and local interpretable model-agnostic explanations (LIME). Furthermore, to facilitate the practical application of the model, a web-based interface was developed and deployed.

RESULTS

The study included a cohort of 580 patients and 11 features include (CRP, transfusions, infusion volume, blood loss, X-ray bone bridge, X-ray osteophyte, CT-vertebral destruction, CT-paravertebral abscess, MRI-paravertebral abscess, MRI-epidural abscess, postoperative drainage) were selected. Most of the classifiers showed better performance, where the XGBoost model has a higher AUC value (0.86) and lower Brier Score (0.126). The XGBoost model was chosen as the optimal model. The results obtained from the calibration and decision curve analysis (DCA) plots demonstrate that XGBoost has achieved promising performance. After conducting tenfold cross-validation, the XGBoost model demonstrated a mean AUC of 0.85 ± 0.09. SHAP and LIME were used to display the variables' contributions to the predicted value. The stacked bar plots indicated that infusion volume was the primary contributor, as determined by Gini, permutation importance (PFI), and the LIME algorithm.

CONCLUSIONS

Our methods not only effectively predicted extended PLOS but also identified risk factors that can be utilized for future treatments. The XGBoost model developed in this study is easily accessible through the deployed web application and can aid in clinical research.

Collapse

Ahn S, Sung Y, Song W. Machine Learning-Based Identification of Diagnostic Biomarkers for Korean Male Sarcopenia Through Integrative DNA Methylation and Methylation Risk Score: From the Korean Genomic Epidemiology Study (KoGES). J Korean Med Sci 2024;39:e200. [PMID: 38978487 PMCID: PMC11231442 DOI: 10.3346/jkms.2024.39.e200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 05/21/2024] [Indexed: 07/10/2024] Open

Abstract

BACKGROUND

Sarcopenia, characterized by a progressive decline in muscle mass, strength, and function, is primarily attributable to aging. DNA methylation, influenced by both genetic predispositions and environmental exposures, plays a significant role in sarcopenia occurrence. This study employed machine learning (ML) methods to identify differentially methylated probes (DMPs) capable of diagnosing sarcopenia in middle-aged individuals. We also investigated the relationship between muscle strength, muscle mass, age, and sarcopenia risk as reflected in methylation profiles.

METHODS

Data from 509 male participants in the urban cohort of the Korean Genome Epidemiology Study_Health Examinee study were categorized into quartile groups based on the sarcopenia criteria for appendicular skeletal muscle index (ASMI) and handgrip strength (HG). To identify diagnostic biomarkers for sarcopenia, we used recursive feature elimination with cross validation (RFECV), to pinpoint DMPs significantly associated with sarcopenia. An ensemble model, leveraging majority voting, was utilized for evaluation. Furthermore, a methylation risk score (MRS) was calculated, and its correlation with muscle strength, function, and age was assessed using likelihood ratio analysis and multinomial logistic regression.

RESULTS

Participants were classified into two groups based on quartile thresholds: sarcopenia (n = 37) with ASMI and HG in the lowest quartile, and normal ranges (n = 48) in the highest. In total, 238 DMPs were identified and eight probes were selected using RFECV. These DMPs were used to build an ensemble model with robust diagnostic capabilities for sarcopenia, as evidenced by an area under the receiver operating characteristic curve of 0.94. Based on eight probes, the MRS was calculated and then validated by analyzing age, HG, and ASMI among the control group (n = 424). Age was positively correlated with high MRS (coefficient, 1.2494; odds ratio [OR], 3.4882), whereas ASMI and HG were negatively correlated with high MRS (ASMI coefficient, -0.4275; OR, 0.6521; HG coefficient, -0.3116; OR, 0.7323).

CONCLUSION

Overall, this study identified key epigenetic markers of sarcopenia in Korean males and developed a ML model with high diagnostic accuracy for sarcopenia. The MRS also revealed significant correlations between these markers and age, HG, and ASMI. These findings suggest that both diagnostic models and the MRS can play an important role in managing sarcopenia in middle-aged populations.

Collapse

Islam MA, Majumder MZH, Miah MS, Jannaty S. Precision healthcare: A deep dive into machine learning algorithms and feature selection strategies for accurate heart disease prediction. Comput Biol Med 2024;176:108432. [PMID: 38744014 DOI: 10.1016/j.compbiomed.2024.108432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/06/2024] [Accepted: 04/07/2024] [Indexed: 05/16/2024]

Ding X, Li Y, Chen S. Maximum margin and global criterion based-recursive feature selection. Neural Netw 2024;169:597-606. [PMID: 37956576 DOI: 10.1016/j.neunet.2023.10.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 06/19/2023] [Accepted: 10/22/2023] [Indexed: 11/15/2023]

Zieliński K, Drabczyk D, Kunicki M, Drzyzga D, Kloska A, Rumiński J. Evaluating the risk of endometriosis based on patients' self-assessment questionnaires. Reprod Biol Endocrinol 2023;21:102. [PMID: 37898817 PMCID: PMC10612251 DOI: 10.1186/s12958-023-01156-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 10/23/2023] [Indexed: 10/30/2023] Open

Abstract

BACKGROUND

Endometriosis is a condition that significantly affects the quality of life of about 10 % of reproductive-aged women. It is characterized by the presence of tissue similar to the uterine lining (endometrium) outside the uterus, which can lead lead scarring, adhesions, pain, and fertility issues. While numerous factors associated with endometriosis are documented, a wide range of symptoms may still be undiscovered.

METHODS

In this study, we employed machine learning algorithms to predict endometriosis based on the patient symptoms extracted from 13,933 questionnaires. We compared the results of feature selection obtained from various algorithms (i.e., Boruta algorithm, Recursive Feature Selection) with experts' decisions. As a benchmark model architecture, we utilized a LightGBM algorithm, along with Multivariate Imputation by Chained Equations (MICE) and k-nearest neighbors (KNN), for missing data imputation. Our primary objective was to assess the model's performance and feature importance compared to existing studies.

RESULTS

We identified the top 20 predictors of endometriosis, uncovering previously overlooked features such as Cesarean section, ovarian cysts, and hernia. Notably, the model's performance metrics were maximized when utilizing a combination of multiple feature selection methods. Specifically, the final model achieved an area under the receiver operator characteristic curve (AUC) of 0.85 on the training dataset and an AUC of 0.82 on the testing dataset.

CONCLUSIONS

The application of machine learning in diagnosing endometriosis has the potential to significantly impact clinical practice, streamlining the diagnostic process and enhancing efficiency. Our questionnaire-based prediction approach empowers individuals with endometriosis to proactively identify potential symptoms, facilitating informed discussions with healthcare professionals about diagnosis and treatment options.

Collapse

Zheng J, Li Y, Billor N, Ahmed MI, Fang YHD, Pat B, Denney TS, Dell’Italia LJ. Understanding post-surgical decline in left ventricular function in primary mitral regurgitation using regression and machine learning models. Front Cardiovasc Med 2023;10:1112797. [PMID: 37153472 PMCID: PMC10160646 DOI: 10.3389/fcvm.2023.1112797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Accepted: 03/28/2023] [Indexed: 05/09/2023] Open

Abstract

Background

Class I echocardiographic guidelines in primary mitral regurgitation (PMR) risks left ventricular ejection fraction (LVEF) < 50% after mitral valve surgery even with pre-surgical LVEF > 60%. There are no models predicting LVEF < 50% after surgery in the complex interplay of increased preload and facilitated ejection in PMR using cardiac magnetic resonance (CMR).

Objective

Use regression and machine learning models to identify a combination of CMR LV remodeling and function parameters that predict LVEF < 50% after mitral valve surgery.

Methods

CMR with tissue tagging was performed in 51 pre-surgery PMR patients (median CMR LVEF 64%), 49 asymptomatic (median CMR LVEF 63%), and age-matched controls (median CMR LVEF 64%). To predict post-surgery LVEF < 50%, least absolute shrinkage and selection operator (LASSO), random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM) were developed and validated in pre-surgery PMR patients. Recursive feature elimination and LASSO reduced the number of features and model complexity. Data was split and tested 100 times and models were evaluated via stratified cross validation to avoid overfitting. The final RF model was tested in asymptomatic PMR patients to predict post-surgical LVEF < 50% if they had gone to mitral valve surgery.

Results

Thirteen pre-surgery PMR had LVEF < 50% after mitral valve surgery. In addition to LVEF (P = 0.005) and LVESD (P = 0.13), LV sphericity index (P = 0.047) and LV mid systolic circumferential strain rate (P = 0.024) were predictors of post-surgery LVEF < 50%. Using these four parameters, logistic regression achieved 77.92% classification accuracy while RF improved the accuracy to 86.17%. This final RF model was applied to asymptomatic PMR and predicted 14 (28.57%) out of 49 would have post-surgery LVEF < 50% if they had mitral valve surgery.

Conclusions

These preliminary findings call for a longitudinal study to determine whether LV sphericity index and circumferential strain rate, or other combination of parameters, accurately predict post-surgical LVEF in PMR.

Collapse

Hou X, Hou J, Huang G. Bi-dimensional principal gene feature selection from big gene expression data. PLoS One 2022;17:e0278583. [PMID: 36477666 PMCID: PMC9728919 DOI: 10.1371/journal.pone.0278583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 11/20/2022] [Indexed: 12/12/2022] Open

iEnhancer-MRBF: Identifying enhancers and their strength with a multiple Laplacian-regularized radial basis function network. Methods 2022;208:1-8. [DOI: 10.1016/j.ymeth.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/26/2022] [Accepted: 10/03/2022] [Indexed: 11/07/2022] Open

Abdelwahab O, Awad N, Elserafy M, Badr E. A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma. PLoS One 2022;17:e0269126. [PMID: 36067196 PMCID: PMC9447897 DOI: 10.1371/journal.pone.0269126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 05/15/2022] [Indexed: 12/23/2022] Open

Wang T, Jiao M, Wang X. Link Prediction in Complex Networks Using Recursive Feature Elimination and Stacking Ensemble Learning. ENTROPY (BASEL, SWITZERLAND) 2022;24:1124. [PMID: 36010793 PMCID: PMC9407261 DOI: 10.3390/e24081124] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/11/2022] [Accepted: 08/12/2022] [Indexed: 06/15/2023]

Li Y, Shen Y, Fan X, Huang X, Yu H, Zhao G, Ma W. A novel EEG-based major depressive disorder detection framework with two-stage feature selection. BMC Med Inform Decis Mak 2022;22:209. [PMID: 35933348 PMCID: PMC9357341 DOI: 10.1186/s12911-022-01956-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Accepted: 07/29/2022] [Indexed: 11/16/2022] Open

Abstract

Background

Major depressive disorder (MDD) is a common mental illness, characterized by persistent depression, sadness, despair, etc., troubling people’s daily life and work seriously.

Methods

In this work, we present a novel automatic MDD detection framework based on EEG signals. First of all, we derive highly MDD-correlated features, calculating the ratio of extracted features from EEG signals at frequency bands between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta$$\end{document}β and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α. Then, a two-stage feature selection method named PAR is presented with the sequential combination of Pearson correlation coefficient (PCC) and recursive feature elimination (RFE), where the advantages lie in minimizing the feature searching space. Finally, we employ widely used machine learning methods of support vector machine (SVM), logistic regression (LR), and linear regression (LNR) for MDD detection with the merit of feature interpretability.

Results

Experiment results show that our proposed MDD detection framework achieves competitive results. The accuracy and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{1}$$\end{document}F1 score are up to 0.9895 and 0.9846, respectively. Meanwhile, the regression determination coefficient \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R^2$$\end{document}R2 for MDD severity assessment is up to 0.9479. Compared with existing MDD detection methods with the best accuracy of 0.9840 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document}F1 score of 0.97, our proposed framework achieves the state-of-the-art MDD detection performance.

Conclusions

Development of this MDD detection framework can be potentially deployed into a medical system to aid physicians to screen out MDD patients.

Collapse

Simic V, Ebadi Torkayesh A, Ijadi Maghsoodi A. Locating a disinfection facility for hazardous healthcare waste in the COVID-19 era: a novel approach based on Fermatean fuzzy ITARA-MARCOS and random forest recursive feature elimination algorithm. ANNALS OF OPERATIONS RESEARCH 2022;328:1-46. [PMID: 35821664 PMCID: PMC9263821 DOI: 10.1007/s10479-022-04822-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 06/07/2022] [Indexed: 05/09/2023]

Abstract

Hazardous healthcare waste (HCW) management system is one of the most critical urban systems affected by the COVID-19 pandemic due to the increase in waste generation rate in hospitals and medical centers dealing with infected patients as well as the degree of hazardousness of generated waste due to exposure to the virus. In this regard, waste network flow would face severe problems without taking care of hazardous waste through disinfection facilities. For this purpose, this study aims to develop an advanced decision support system based on a multi-stage model that was combined with the random forest recursive feature elimination (RF-RFE) algorithm, the indifference threshold-based attribute ratio analysis (ITARA), and measurement of alternatives and ranking according to compromise solution (MARCOS) methods into a unique framework under the Fermatean fuzzy environment. In the first stage, the innovative Fermatean fuzzy RF-RFE algorithm extracts core criteria from a finite set of initial criteria. In the second stage, the novel Fermatean fuzzy ITARA determines the semi-objective importance of the core criteria. In the third stage, the new Fermatean fuzzy MARCOS method ranks alternatives. A real-life case study in Istanbul, Turkey, illustrates the applicability of the introduced methodology. Our empirical findings indicate that "Pendik" is the best among five candidate locations for sitting a new disinfection facility for hazardous HCW in Istanbul. The sensitivity and comparative analyses confirmed that our approach is highly robust and reliable. This approach could be used to tackle other critical multi-dimensional problems related to COVID-19 and support sustainability and circular economy.

Supplementary Information

The online version contains supplementary material available at 10.1007/s10479-022-04822-0.

Collapse

Virtual reality for the observation of oncology models (VROOM): immersive analytics for oncology patient cohorts. Sci Rep 2022;12:11337. [PMID: 35790803 PMCID: PMC9256599 DOI: 10.1038/s41598-022-15548-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 06/24/2022] [Indexed: 11/08/2022] Open

Xue Y, Cai X, Neri F. A multi-objective evolutionary algorithm with interval based initialization and self-adaptive crossover operator for large-scale feature selection in classification. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109420] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]

Deng X, Li M, Wang L, Wan Q. RFCBF: Enhance the Performance and Stability of Fast Correlation-Based Filter. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS 2022. [DOI: 10.1142/s1469026822500092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Bhandari A, Tripathy BK, Jawad K, Bhatia S, Rahmani MKI, Mashat A. Cancer Detection and Prediction Using Genetic Algorithms. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:1871841. [PMID: 35615545 PMCID: PMC9126682 DOI: 10.1155/2022/1871841] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 04/08/2022] [Accepted: 04/21/2022] [Indexed: 01/07/2023]

Ji M, Xie W, Zhao M, Qian X, Chow CY, Lam KY, Yan J, Hao T. Probabilistic Prediction of Nonadherence to Psychiatric Disorder Medication from Mental Health Forum Data: Developing and Validating Bayesian Machine Learning Classifiers. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:6722321. [PMID: 35463247 PMCID: PMC9033323 DOI: 10.1155/2022/6722321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 02/16/2022] [Accepted: 03/19/2022] [Indexed: 11/18/2022]

Abstract

Background

Medication nonadherence represents a major burden on national health systems. According to the World Health Organization, increasing medication adherence may have a greater impact on public health than any improvement in specific medical treatments. More research is needed to better predict populations at risk of medication nonadherence.

Objective

To develop clinically informative, easy-to-interpret machine learning classifiers to predict people with psychiatric disorders at risk of medication nonadherence based on the syntactic and structural features of written posts on health forums.

Methods

All data were collected from posts between 2016 and 2021 on mental health forum, administered by Together 4 Change, a long-running not-for-profit organisation based in Oxford, UK. The original social media data were annotated using the Tool for the Automatic Analysis of Syntactic Sophistication and Complexity (TAASSC) system. Through applying multiple feature optimisation techniques, we developed a best-performing model using relevance vector machine (RVM) for the probabilistic prediction of medication nonadherence among online mental health forum discussants.

Results

The best-performing RVM model reached a mean AUC of 0.762, accuracy of 0.763, sensitivity of 0.779, and specificity of 0.742 on the testing dataset. It outperformed competing classifiers with more complex feature sets with statistically significant improvement in sensitivity and specificity, after adjusting the alpha levels with Benjamini-Hochberg correction procedure. Discussion. We used the forest plot of multiple logistic regression to explore the association between written post features in the best-performing RVM model and the binary outcome of medication adherence among online post contributors with psychiatric disorders. We found that increased quantities of 3 syntactic complexity features were negatively associated with psychiatric medication adherence: "dobj_stdev" (standard deviation of dependents per direct object of nonpronouns) (OR, 1.486, 95% CI, 1.202-1.838, P < 0.001), "cl_av_deps" (dependents per clause) (OR, 1.597, 95% CI, 1.202-2.122, P, 0.001), and "VP_T" (verb phrases per T-unit) (OR, 2.23, 95% CI, 1.211-4.104, P, 0.010). Finally, we illustrated the clinical use of the classifier with Bayes' monograph which gives the posterior odds and their 95% CI of positive (nonadherence) versus negative (adherence) cases as predicted by the best-performing classifier. The odds ratio of the posterior probability of positive cases was 3.9, which means that around 10 in every 13 psychiatric patients with a positive result as predicted by our model were following their medication regime. The odds ratio of the posterior probability of true negative cases was 0.4, meaning that around 10 in every 14 psychiatric patients with a negative test result after screening by our classifier were not adhering to their medications.

Conclusion

Psychiatric medication nonadherence is a large and increasing burden on national health systems. Using Bayesian machine learning techniques and publicly accessible online health forum data, our study illustrates the viability of developing cost-effective, informative decision aids to support the monitoring and prediction of patients at risk of medication nonadherence.

Collapse

Xu J, Qu K, Meng X, Sun Y, Hou Q. Feature selection based on multiview entropy measures in multiperspective rough set. INT J INTELL SYST 2022. [DOI: 10.1002/int.22878] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Yu K, Huang M, Chen S, Feng C, Li W. GSEnet: feature extraction of gene expression data and its application to Leukemia classification. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022;19:4881-4891. [PMID: 35430845 DOI: 10.3934/mbe.2022228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Canayaz M, Şehribanoğlu S, Özdağ R, Demir M. COVID-19 diagnosis on CT images with Bayes optimization-based deep neural networks and machine learning algorithms. Neural Comput Appl 2022;34:5349-5365. [PMID: 35250180 PMCID: PMC8884105 DOI: 10.1007/s00521-022-07052-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 02/01/2022] [Indexed: 12/24/2022]

Jaddi NS, Saniee Abadeh M. Cell separation algorithm with enhanced search behaviour in miRNA feature selection for cancer diagnosis. INFORM SYST 2022. [DOI: 10.1016/j.is.2021.101906] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Deng X, Li M, Deng S, Wang L. Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification. Med Biol Eng Comput 2022;60:663-681. [PMID: 35028863 DOI: 10.1007/s11517-021-02476-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2021] [Accepted: 11/23/2021] [Indexed: 12/15/2022]

A new feature extraction technique based on improved owl search algorithm: a case study in copper electrorefining plant. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06881-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02412-4 10.1007/s10489-021-02412-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Gu X, Guo J, Xiao L, Li C. Conditional mutual information-based feature selection algorithm for maximal relevance minimal redundancy. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02412-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

A Machine-Learning Approach Combining Wavelet Packet Denoising with Catboost for Weather Forecasting. ATMOSPHERE 2021. [DOI: 10.3390/atmos12121618] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Extraction of Kenyan Grassland Information Using PROBA-V Based on RFE-RF Algorithm. REMOTE SENSING 2021. [DOI: 10.3390/rs13234762] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Africa has the largest grassland area among all grassland ecosystems in the world. As a typical agricultural and animal husbandry country in Africa, animal husbandry plays an important role in this region. The investigation of grassland resources and timely grasping the quantity and spatial distribution of grassland resources are of great significance to the stable development of local animal husbandry economy. Therefore, this paper uses Kenya as the study area to investigate the effective and fast approach for grassland mapping with 100-m resolution using the open resources in the Google Earth Engine cloud platform. The main conclusions are as follows. (1) In the feature combination optimization part of this paper, the machine learning algorithm is used to compare the scores and standard deviations of several common algorithms combined with RFE. It is concluded that the combination of RFE and random forest algorithm has the highest stability in modeling and the best feature optimization effect. (2) After feature optimization by the RFE-RF algorithm, the number of features is reduced from 12 to 8, which compressed the original feature space and reduced the redundancy of features. The optimal combination features are applied to random forest classification, and the overall accuracy and Kappa coefficient of classification are 0.87 and 0.85, respectively. The eight features are: elevation, NDVI, EVI, SWIR, RVI, BLUE, RED, and LSWI. (3) There are great differences in topographic features among the local land types in the study area, and the addition of topographic features is more conducive to the recognition and classification of various land types. There exists “salt-and-pepper phenomenon” in pixel-oriented classification. Later research focus will combine the RFE-RF algorithm and the segmentation algorithm to achieve object-oriented land cover classification. Collapse

Li Y, Li G, Guo L. Feature Selection for Regression Based on Gamma Test Nested Monte Carlo Tree Search. ENTROPY 2021;23:e23101331. [PMID: 34682055 PMCID: PMC8535147 DOI: 10.3390/e23101331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Revised: 10/06/2021] [Accepted: 10/07/2021] [Indexed: 12/03/2022]

Bommert A, Welchowski T, Schmid M, Rahnenführer J. Benchmark of filter methods for feature selection in high-dimensional gene expression survival data. Brief Bioinform 2021;23:6366322. [PMID: 34498681 PMCID: PMC8769710 DOI: 10.1093/bib/bbab354] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 08/05/2021] [Accepted: 08/10/2021] [Indexed: 11/30/2022] Open

Yaseen ZM. An insight into machine learning models era in simulating soil, water bodies and adsorption heavy metals: Review, challenges and solutions. CHEMOSPHERE 2021;277:130126. [PMID: 33774235 DOI: 10.1016/j.chemosphere.2021.130126] [Citation(s) in RCA: 80] [Impact Index Per Article: 26.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 01/23/2021] [Accepted: 02/23/2021] [Indexed: 06/12/2023]

Sheikhi G, Altınçay H. A novel dissimilarity metric based on feature‐to‐feature scatter frequencies for clustering‐based feature selection in biomedical data. Comput Intell 2021. [DOI: 10.1111/coin.12470] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Maâtouk O, Ayadi W, Bouziri H, Duval B. Evolutionary Local Search Algorithm for the biclustering of gene expression data based on biological knowledge. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107177] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

ZHANG HUAN, WANG XINPEI, LIU CHANGCHUN, LI YUANYANG, LIU YUANYUAN, LI PENG, YAO LIANKE, WANG JIKUO, JIAO YU. A METHOD FOR DETECTING CORONARY ARTERY STENOSIS BASED ON ECG SIGNALS. J MECH MED BIOL 2021. [DOI: 10.1142/s0219519421500032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Xie W, Ji M, Zhao M, Zhou T, Yang F, Qian X, Chow CY, Lam KY, Hao T. Detecting Symptom Errors in Neural Machine Translation of Patient Health Information on Depressive Disorders: Developing Interpretable Bayesian Machine Learning Classifiers. Front Psychiatry 2021;12:771562. [PMID: 34744846 PMCID: PMC8566668 DOI: 10.3389/fpsyt.2021.771562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 09/24/2021] [Indexed: 11/13/2022] Open

Abstract

Background: Due to its convenience, wide availability, low usage cost, neural machine translation (NMT) has increasing applications in diverse clinical settings and web-based self-diagnosis of diseases. Given the developing nature of NMT tools, this can pose safety risks to multicultural communities with limited bilingual skills, low education, and low health literacy. Research is needed to scrutinise the reliability, credibility, usability of automatically translated patient health information. Objective: We aimed to develop high-performing Bayesian machine learning classifiers to assist clinical professionals and healthcare workers in assessing the quality and usability of NMT on depressive disorders. The tool did not require any prior knowledge from frontline health and medical professionals of the target language used by patients. Methods: We used Relevance Vector Machine (RVM) to increase generalisability and clinical interpretability of classifiers. It is a typical sparse Bayesian classifier less prone to overfitting with small training datasets. We optimised RVM by leveraging automatic recursive feature elimination and expert feature refinement from the perspective of health linguistics. We evaluated the diagnostic utility of the Bayesian classifier under different probability cut-offs in terms of sensitivity, specificity, positive and negative likelihood ratios against clinical thresholds for diagnostic tests. Finally, we illustrated interpretation of RVM tool in clinic using Bayes' nomogram. Results: After automatic and expert-based feature optimisation, the best-performing RVM classifier (RVM_DUFS12) gained the highest AUC (0.8872) among 52 competing models with distinct optimised, normalised features sets. It also had statistically higher sensitivity and specificity compared to other models. We evaluated the diagnostic utility of the best-performing model using Bayes' nomogram: it had a positive likelihood ratio (LR+) of 4.62 (95% C.I.: 2.53, 8.43), and the associated posterior probability (odds) was 83% (5.0) (95% C.I.: 73%, 90%), meaning that approximately 10 in 12 English texts with positive test are likely to contain information that would cause clinically significant conceptual errors if translated by Google; it had a negative likelihood ratio (LR-) of 0.18 (95% C.I.: 0.10,0.35) and associated posterior probability (odds) was 16% (0.2) (95% C.I: 10%, 27%), meaning that about 10 in 12 English texts with negative test can be safely translated using Google.

Collapse

A feature selection algorithm based on redundancy analysis and interaction weight. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01936-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Guo J, Jin M, Chen Y, Liu J. An embedded gene selection method using knockoffs optimizing neural network. BMC Bioinformatics 2020;21:414. [PMID: 32962627 PMCID: PMC7510330 DOI: 10.1186/s12859-020-03717-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 08/19/2020] [Indexed: 11/30/2022] Open

Pang QQ, Zhang L. Semi-supervised neighborhood discrimination index for feature selection. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106224] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

A survey on single and multi omics data mining methods in cancer data classification. J Biomed Inform 2020;107:103466. [DOI: 10.1016/j.jbi.2020.103466] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 05/01/2020] [Accepted: 05/31/2020] [Indexed: 01/09/2023]

A machine learning-based framework for Predicting Treatment Failure in tuberculosis: A case study of six countries. Tuberculosis (Edinb) 2020;123:101944. [PMID: 32741529 DOI: 10.1016/j.tube.2020.101944] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 02/19/2020] [Accepted: 04/22/2020] [Indexed: 11/24/2022]

Heuristic filter feature selection methods for medical datasets. Genomics 2020;112:1173-1181. [DOI: 10.1016/j.ygeno.2019.07.002] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 06/19/2019] [Accepted: 07/01/2019] [Indexed: 11/23/2022]

Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data. Comput Stat Data Anal 2020. [DOI: 10.1016/j.csda.2019.106839] [Citation(s) in RCA: 206] [Impact Index Per Article: 51.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Role of microRNAs as Clinical Cancer Biomarkers for Ovarian Cancer: A Short Overview. Cells 2020;9:cells9010169. [PMID: 31936634 PMCID: PMC7016727 DOI: 10.3390/cells9010169] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/1970] [Revised: 12/28/2019] [Accepted: 01/06/2020] [Indexed: 12/15/2022] Open

Abstract

Ovarian cancer has the highest mortality rate among gynecological cancers. Early clinical signs are missing and there is an urgent need to establish early diagnosis biomarkers. MicroRNAs are promising biomarkers in this respect. In this paper, we review the most recent advances regarding the alterations of microRNAs in ovarian cancer. We have briefly described the contribution of miRNAs in the mechanisms of ovarian cancer invasion, metastasis, and chemotherapy sensitivity. We have also summarized the alterations underwent by microRNAs in solid ovarian tumors, in animal models for ovarian cancer, and in various ovarian cancer cell lines as compared to previous reviews that were only focused the circulating microRNAs as biomarkers. In this context, we consider that the biomarker screening should not be limited to circulating microRNAs per se, but rather to the simultaneous detection of the same microRNA alteration in solid tumors, in order to understand the differences between the detection of nucleic acids in early vs. late stages of cancer. Moreover, in vitro and in vivo models should also validate these microRNAs, which could be very helpful as preclinical testing platforms for pharmacological and/or molecular genetic approaches targeting microRNAs. The enormous quantity of data produced by preclinical and clinical studies regarding the role of microRNAs that act synergistically in tumorigenesis mechanisms that are associated with ovarian cancer subtypes, should be gathered, integrated, and compared by adequate methods, including molecular clustering. In this respect, molecular clustering analysis should contribute to the discovery of best biomarkers-based microRNAs assays that will enable rapid, efficient, and cost-effective detection of ovarian cancer in early stages. In conclusion, identifying the appropriate microRNAs as clinical biomarkers in ovarian cancer might improve the life quality of patients.

Collapse

Object-Based Tree Species Classification Using Airborne Hyperspectral Images and LiDAR Data. FORESTS 2019. [DOI: 10.3390/f11010032] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Abstract The identification of tree species is one of the most basic and key indicators in forest resource monitoring with great significance in the actual forest resource survey and it can comprehensively improve the efficiency of forest resource monitoring. The related research has mainly focused on single tree species without considering multiple tree species, and therefore the ability to classify forest tree species in complex stand is not clear, especially in the subtropical monsoon climate region of southern China. This study combined airborne hyperspectral data with simultaneously acquired LiDAR data, to evaluate the capability of feature combinations and k-nearest neighbor (KNN) and support vector machine (SVM) classifiers to identify tree species, in southern China. First, the stratified classification method was used to remove non-forest land. Second, the feature variables were extracted from airborne hyperspectral image and LiDAR data, including independent component analysis (ICA) transformation images, spectral indices, texture features, and canopy height model (CHM). Third, random forest and recursion feature elimination methods were adopted for feature selection. Finally, we selected different feature combinations and used KNN and SVM classifiers to classify tree species. The results showed that the SVM classifier has a higher classification accuracy as compared with KNN classifier, with the highest classification accuracy of 94.68% and a Kappa coefficient of 0.937. Through feature elimination, the classification accuracy and performance of SVM classifier was further improved. Recursive feature elimination method based on SVM is better than random forest. In the spectral indices, the new constructed slope spectral index, SL2, has a certain effect on improving the classification accuracy of tree species. Texture features and CHM height information can effectively distinguish tree species with similar spectral features. The height information plays an important role in improving the classification accuracy of other broad-leaved species. In general, the combination of different features can improve the classification accuracy, and the proposed strategies and methods are effective for the identification of tree species at complex forest type in southern China. Collapse

Logistic local hyperplane-Relief: A feature weighting method for classification. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.04.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Sun L, Zhang X, Qian Y, Xu J, Zhang S. Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.05.072] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]

MLW-gcForest: A Multi-Weighted gcForest Model for Cancer Subtype Classification by Methylation Data. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9173589] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Maâtouk O, Ayadi W, Bouziri H, Duval B. Evolutionary biclustering algorithms: an experimental study on microarray data. Soft comput 2019. [DOI: 10.1007/s00500-018-3394-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Su R, Liu X, Wei L. MinE-RFE: determine the optimal subset from RFE by minimizing the subset-accuracy–defined energy. Brief Bioinform 2019;21:687-698. [DOI: 10.1093/bib/bbz021] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2018] [Revised: 01/24/2019] [Accepted: 02/02/2019] [Indexed: 01/18/2023] Open

Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis. J Biomed Inform 2019;92:103124. [PMID: 30796977 DOI: 10.1016/j.jbi.2019.103124] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 01/15/2019] [Accepted: 02/04/2019] [Indexed: 12/17/2022]

Sun L, Zhang X, Xu J, Zhang S. An Attribute Reduction Method Using Neighborhood Entropy Measures in Neighborhood Rough Sets. ENTROPY 2019;21:e21020155. [PMID: 33266871 PMCID: PMC7514638 DOI: 10.3390/e21020155] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Revised: 01/22/2019] [Accepted: 02/01/2019] [Indexed: 11/16/2022]

Abstract

Attribute reduction as an important preprocessing step for data mining, and has become a hot research topic in rough set theory. Neighborhood rough set theory can overcome the shortcoming that classical rough set theory may lose some useful information in the process of discretization for continuous-valued data sets. In this paper, to improve the classification performance of complex data, a novel attribute reduction method using neighborhood entropy measures, combining algebra view with information view, in neighborhood rough sets is proposed, which has the ability of dealing with continuous data whilst maintaining the classification information of original attributes. First, to efficiently analyze the uncertainty of knowledge in neighborhood rough sets, by combining neighborhood approximate precision with neighborhood entropy, a new average neighborhood entropy, based on the strong complementarity between the algebra definition of attribute significance and the definition of information view, is presented. Then, a concept of decision neighborhood entropy is investigated for handling the uncertainty and noisiness of neighborhood decision systems, which integrates the credibility degree with the coverage degree of neighborhood decision systems to fully reflect the decision ability of attributes. Moreover, some of their properties are derived and the relationships among these measures are established, which helps to understand the essence of knowledge content and the uncertainty of neighborhood decision systems. Finally, a heuristic attribute reduction algorithm is proposed to improve the classification performance of complex data sets. The experimental results under an instance and several public data sets demonstrate that the proposed method is very effective for selecting the most relevant attributes with great classification performance.

Collapse