1
|
Mohamed TIA, Ezugwu AE, Fonou-Dombeu JV, Mohammed M, Greeff J, Elbashir MK. A novel feature selection algorithm for identifying hub genes in lung cancer. Sci Rep 2023; 13:21671. [PMID: 38066059 PMCID: PMC10709567 DOI: 10.1038/s41598-023-48953-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/01/2023] [Indexed: 12/18/2023] Open
Abstract
Lung cancer, a life-threatening disease primarily affecting lung tissue, remains a significant contributor to mortality in both developed and developing nations. Accurate biomarker identification is imperative for effective cancer diagnosis and therapeutic strategies. This study introduces the Voting-Based Enhanced Binary Ebola Optimization Search Algorithm (VBEOSA), an innovative ensemble-based approach combining binary optimization and the Ebola optimization search algorithm. VBEOSA harnesses the collective power of the state-of-the-art classification models through soft voting. Moreover, our research applies VBEOSA to an extensive lung cancer gene expression dataset obtained from TCGA, following essential preprocessing steps including outlier detection and removal, data normalization, and filtration. VBEOSA aids in feature selection, leading to the discovery of key hub genes closely associated with lung cancer, validated through comprehensive protein-protein interaction analysis. Notably, our investigation reveals ten significant hub genes-ADRB2, ACTB, ARRB2, GNGT2, ADRB1, ACTG1, ACACA, ATP5A1, ADCY9, and ADRA1B-each demonstrating substantial involvement in the domain of lung cancer. Furthermore, our pathway analysis sheds light on the prominence of strategic pathways such as salivary secretion and the calcium signaling pathway, providing invaluable insights into the intricate molecular mechanisms underpinning lung cancer. We also utilize the weighted gene co-expression network analysis (WGCNA) method to identify gene modules exhibiting strong correlations with clinical attributes associated with lung cancer. Our findings underscore the efficacy of VBEOSA in feature selection and offer profound insights into the multifaceted molecular landscape of lung cancer. Finally, we are confident that this research has the potential to improve diagnostic capabilities and further enrich our understanding of the disease, thus setting the stage for future advancements in the clinical management of lung cancer. The VBEOSA source codes is publicly available at https://github.com/TEHNAN/VBEOSA-A-Novel-Feature-Selection-Algorithm-for-Identifying-hub-Genes-in-Lung-Cancer .
Collapse
Affiliation(s)
- Tehnan I A Mohamed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
- Department of Computer Science, Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, 11123, Sudan
| | - Absalom E Ezugwu
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa.
| | - Jean Vincent Fonou-Dombeu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
| | - Japie Greeff
- School of Computer Science and Information Systems, Faculty of Natural and Agricultural Sciences, North-West University, Vanderbijlpark, South Africa
| | - Murtada K Elbashir
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, 72388, Sakaka, Saudi Arabia
| |
Collapse
|
2
|
Kamalov F, Sulieman H, Moussa S, Reyes JA, Safaraliev M. Nested ensemble selection: An effective hybrid feature selection method. Heliyon 2023; 9:e19686. [PMID: 37809839 PMCID: PMC10558945 DOI: 10.1016/j.heliyon.2023.e19686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 08/29/2023] [Accepted: 08/30/2023] [Indexed: 10/10/2023] Open
Abstract
It has been shown that while feature selection algorithms are able to distinguish between relevant and irrelevant features, they fail to differentiate between relevant and redundant and correlated features. To address this issue, we propose a highly effective approach, called Nested Ensemble Selection (NES), that is based on a combination of filter and wrapper methods. The proposed feature selection algorithm differs from the existing filter-wrapper hybrid methods in its simplicity and efficiency as well as precision. The new algorithm is able to separate the relevant variables from the irrelevant as well as the redundant and correlated features. Furthermore, we provide a robust heuristic for identifying the optimal number of selected features which remains one of the greatest challenges in feature selection. Numerical experiments on synthetic and real-life data demonstrate the effectiveness of the proposed method. The NES algorithm achieves perfect precision on the synthetic data and near optimal accuracy on the real-life data. The proposed method is compared against several popular algorithms including mRMR, Boruta, genetic, recursive feature elimination, Lasso, and Elastic Net. The results show that NES significantly outperforms the benchmarks algorithms especially on multi-class datasets.
Collapse
Affiliation(s)
- Firuz Kamalov
- Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates
| | - Hana Sulieman
- Department of Mathematics and Statistics, American University of Sharjah, Sharjah, United Arab Emirates
| | - Sherif Moussa
- Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates
| | - Jorge Avante Reyes
- Department of Electrical Engineering, Canadian University Dubai, Dubai, United Arab Emirates
| | - Murodbek Safaraliev
- Department of Automated Electrical Systems, Ural Federal University, Yekaterinburg, Russian Federation
| |
Collapse
|
3
|
Alweshah M, Aldabbas Y, Abu-Salih B, Oqeil S, Hasan HS, Alkhalaileh S, Kassaymeh S. Hybrid black widow optimization with iterated greedy algorithm for gene selection problems. Heliyon 2023; 9:e20133. [PMID: 37809602 PMCID: PMC10559925 DOI: 10.1016/j.heliyon.2023.e20133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/03/2023] [Accepted: 09/12/2023] [Indexed: 10/10/2023] Open
Abstract
Gene Selection (GS) is a strategy method targeted at reducing redundancy, limited expressiveness, and low informativeness in gene expression datasets obtained by DNA Microarray technology. These datasets contain a plethora of diverse and high-dimensional samples and genes, with a significant discrepancy in the number of samples and genes present. The complexities of GS are especially noticeable in the context of microarray expression data analysis, owing to the inherent data imbalance. The main goal of this study is to offer a simplified and computationally effective approach to dealing with the conundrum of attribute selection in microarray gene expression data. We use the Black Widow Optimization algorithm (BWO) in the context of GS to achieve this, using two unique methodologies: the unaltered BWO variation and the hybridized BWO variant combined with the Iterated Greedy algorithm (BWO-IG). By improving the local search capabilities of BWO, this hybridization attempts to promote more efficient gene selection. A series of tests was carried out using nine benchmark datasets that were obtained from the gene expression data repository in the pursuit of empirical validation. The results of these tests conclusively show that the BWO-IG technique performs better than the traditional BWO algorithm. Notably, the hybridized BWO-IG technique excels in the efficiency of local searches, making it easier to identify relevant genes and producing findings with higher levels of reliability in terms of accuracy and the degree of gene pruning. Additionally, a comparison analysis is done against five modern wrapper Feature Selection (FS) methodologies, namely BIMFOHHO, BMFO, BHHO, BCS, and BBA, in order to put the suggested BWO-IG method's effectiveness into context. The comparison that follows highlights BWO-IG's obvious superiority in reducing the number of selected genes while also obtaining remarkably high classification accuracy. The key findings were an average classification accuracy of 94.426, average fitness values of 0.061, and an average number of selected genes of 2933.767.
Collapse
Affiliation(s)
- Mohammed Alweshah
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Yasmeen Aldabbas
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Bilal Abu-Salih
- Department of Computer Science, King Abdullah II School of Information Technology, The University of Jordan, Amman, Jordan
| | - Saleh Oqeil
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Hazem S. Hasan
- Department of Plant Production and Protection, Faculty of Agricultural Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Saleh Alkhalaileh
- Prince Abdullah Bin Ghazi Faculty of Information and Communication Technology, Al-Balqa Applied University, Al-Salt, Jordan
| | - Sofian Kassaymeh
- Software Engineering Department, Faculty of Information Technology, Aqaba University of Technology, Aqaba, Jordan
| |
Collapse
|
4
|
A new hybrid algorithm for three-stage gene selection based on whale optimization. Sci Rep 2023; 13:3783. [PMID: 36882446 PMCID: PMC9992521 DOI: 10.1038/s41598-023-30862-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 03/02/2023] [Indexed: 03/09/2023] Open
Abstract
In biomedical data mining, the gene dimension is often much larger than the sample size. To solve this problem, we need to use a feature selection algorithm to select feature gene subsets with a strong correlation with phenotype to ensure the accuracy of subsequent analysis. This paper presents a new three-stage hybrid feature gene selection method, that combines a variance filter, extremely randomized tree, and whale optimization algorithm. First, a variance filter is used to reduce the dimension of the feature gene space, and an extremely randomized tree is used to further reduce the feature gene set. Finally, the whale optimization algorithm is used to select the optimal feature gene subset. We evaluate the proposed method with three different classifiers in seven published gene expression profile datasets and compare it with other advanced feature selection algorithms. The results show that the proposed method has significant advantages in a variety of evaluation indicators.
Collapse
|
5
|
Abbasi Mesrabadi H, Faez K, Pirgazi J. Drug-target interaction prediction based on protein features, using wrapper feature selection. Sci Rep 2023; 13:3594. [PMID: 36869062 PMCID: PMC9984486 DOI: 10.1038/s41598-023-30026-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Accepted: 02/14/2023] [Indexed: 03/05/2023] Open
Abstract
Drug-target interaction prediction is a vital stage in drug development, involving lots of methods. Experimental methods that identify these relationships on the basis of clinical remedies are time-taking, costly, laborious, and complex introducing a lot of challenges. One group of new methods is called computational methods. The development of new computational methods which are more accurate can be preferable to experimental methods, in terms of total cost and time. In this paper, a new computational model to predict drug-target interaction (DTI), consisting of three phases, including feature extraction, feature selection, and classification is proposed. In feature extraction phase, different features such as EAAC, PSSM and etc. would be extracted from sequence of proteins and fingerprint features from drugs. These extracted features would then be combined. In the next step, one of the wrapper feature selection methods named IWSSR, due to the large amount of extracted data, is applied. The selected features are then given to rotation forest classification, to have a more efficient prediction. Actually, the innovation of our work is that we extract different features; and then select features by the use of IWSSR. The accuracy of the rotation forest classifier based on tenfold on the golden standard datasets (enzyme, ion channels, G-protein-coupled receptors, nuclear receptors) is as follows: 98.12, 98.07, 96.82, and 95.64. The results of experiments indicate that the proposed model has an acceptable rate in DTI prediction and is compatible with the proposed methods in other papers.
Collapse
Affiliation(s)
- Hengame Abbasi Mesrabadi
- Faculty of Computer and Information Technology Engineering, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| | - Karim Faez
- Department of Electrical Engineering, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran.
| | - Jamshid Pirgazi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| |
Collapse
|
6
|
Marjit S, Bhattacharyya T, Chatterjee B, Sarkar R. Simulated annealing aided genetic algorithm for gene selection from microarray data. Comput Biol Med 2023; 158:106854. [PMID: 37023541 DOI: 10.1016/j.compbiomed.2023.106854] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/26/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]
Abstract
In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA.
Collapse
|
7
|
Liu J, Feng H, Tang Y, Zhang L, Qu C, Zeng X, Peng X. A novel hybrid algorithm based on Harris Hawks for tumor feature gene selection. PeerJ Comput Sci 2023; 9:e1229. [PMID: 37346505 PMCID: PMC10280456 DOI: 10.7717/peerj-cs.1229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 01/09/2023] [Indexed: 06/23/2023]
Abstract
Background Gene expression data are often used to classify cancer genes. In such high-dimensional datasets, however, only a few feature genes are closely related to tumors. Therefore, it is important to accurately select a subset of feature genes with high contributions to cancer classification. Methods In this article, a new three-stage hybrid gene selection method is proposed that combines a variance filter, extremely randomized tree and Harris Hawks (VEH). In the first stage, we evaluated each gene in the dataset through the variance filter and selected the feature genes that meet the variance threshold. In the second stage, we use extremely randomized tree to further eliminate irrelevant genes. Finally, we used the Harris Hawks algorithm to select the gene subset from the previous two stages to obtain the optimal feature gene subset. Results We evaluated the proposed method using three different classifiers on eight published microarray gene expression datasets. The results showed a 100% classification accuracy for VEH in gastric cancer, acute lymphoblastic leukemia and ovarian cancer, and an average classification accuracy of 95.33% across a variety of other cancers. Compared with other advanced feature selection algorithms, VEH has obvious advantages when measured by many evaluation criteria.
Collapse
Affiliation(s)
- Junjian Liu
- Department of Statistics, Hunan Normal University College of Mathematics and Statistics, Changsha, Hunan, China
| | - Huicong Feng
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Changsha, Hunan, China
| | - Yifan Tang
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Changsha, Hunan, China
| | - Lupeng Zhang
- Department of Biochemistry and Molecular Biology, Jishou University School of Medicine, Jishou, Hunan, China
| | - Chiwen Qu
- Department of Statistics, Hunan Normal University College of Mathematics and Statistics, Changsha, Hunan, China
| | - Xiaomin Zeng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, Changsha, Hunan, China
| | - Xiaoning Peng
- Department of Statistics, Hunan Normal University College of Mathematics and Statistics, Changsha, Hunan, China
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Changsha, Hunan, China
| |
Collapse
|
8
|
Sun J, Liu Q, Wang Y, Wang L, Song X, Zhao X. Five-year prognosis model of esophageal cancer based on genetic algorithm improved deep neural network. Ing Rech Biomed 2023. [DOI: 10.1016/j.irbm.2022.100748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
9
|
Zhu Y, Li W, Li T. A hybrid Artificial Immune optimization for high-dimensional feature selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
10
|
Abdelwahed NM, El-Tawel GS, Makhlouf MA. Effective hybrid feature selection using different bootstrap enhances cancers classification performance. BioData Min 2022; 15:24. [PMID: 36175944 PMCID: PMC9523996 DOI: 10.1186/s13040-022-00304-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 08/31/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Machine learning can be used to predict the different onset of human cancers. Highly dimensional data have enormous, complicated problems. One of these is an excessive number of genes plus over-fitting, fitting time, and classification accuracy. Recursive Feature Elimination (RFE) is a wrapper method for selecting the best subset of features that cause the best accuracy. Despite the high performance of RFE, time computation and over-fitting are two disadvantages of this algorithm. Random forest for selection (RFS) proves its effectiveness in selecting the effective features and improving the over-fitting problem. METHOD This paper proposed a method, namely, positions first bootstrap step (PFBS) random forest selection recursive feature elimination (RFS-RFE) and its abbreviation is PFBS- RFS-RFE to enhance cancer classification performance. It used a bootstrap with many positions included in the outer first bootstrap step (OFBS), inner first bootstrap step (IFBS), and outer/ inner first bootstrap step (O/IFBS). In the first position, OFBS is applied as a resampling method (bootstrap) with replacement before selection step. The RFS is applied with bootstrap = false i.e., the whole datasets are used to build each tree. The importance features are hybrid with RFE to select the most relevant subset of features. In the second position, IFBS is applied as a resampling method (bootstrap) with replacement during applied RFS. The importance features are hybrid with RFE. In the third position, O/IFBS is applied as a hybrid of first and second positions. RFE used logistic regression (LR) as an estimator. The proposed methods are incorporated with four classifiers to solve the feature selection problems and modify the performance of RFE, in which five datasets with different size are used to assess the performance of the PFBS-RFS-RFE. RESULTS The results showed that the O/IFBS-RFS-RFE achieved the best performance compared with previous work and enhanced the accuracy, variance and ROC area for RNA gene and dermatology erythemato-squamous diseases datasets to become 99.994%, 0.0000004, 1.000 and 100.000%, 0.0 and 1.000, respectively. CONCLUSION High dimensional datasets and RFE algorithm face many troubles in cancers classification performance. PFBS-RFS-RFE is proposed to fix these troubles with different positions. The importance features which extracted from RFS are used with RFE to obtain the effective features.
Collapse
Affiliation(s)
- Noura Mohammed Abdelwahed
- Department of Information Systems, Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt.
| | - Gh S El-Tawel
- Department of Computer Science, Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt
| | - M A Makhlouf
- Department of Information Systems, Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt
| |
Collapse
|
11
|
Liu Y, Heidari AA, Cai Z, Liang G, Chen H, Pan Z, Alsufyani A, Bourouis S. Simulated annealing-based dynamic step shuffled frog leaping algorithm: Optimal performance design and feature selection. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.06.075] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
12
|
Akinola OO, Ezugwu AE, Agushaka JO, Zitar RA, Abualigah L. Multiclass feature selection with metaheuristic optimization algorithms: a review. Neural Comput Appl 2022; 34:19751-19790. [PMID: 36060097 PMCID: PMC9424068 DOI: 10.1007/s00521-022-07705-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 08/02/2022] [Indexed: 11/24/2022]
Abstract
Selecting relevant feature subsets is vital in machine learning, and multiclass feature selection is harder to perform since most classifications are binary. The feature selection problem aims at reducing the feature set dimension while maintaining the performance model accuracy. Datasets can be classified using various methods. Nevertheless, metaheuristic algorithms attract substantial attention to solving different problems in optimization. For this reason, this paper presents a systematic survey of literature for solving multiclass feature selection problems utilizing metaheuristic algorithms that can assist classifiers selects optima or near optima features faster and more accurately. Metaheuristic algorithms have also been presented in four primary behavior-based categories, i.e., evolutionary-based, swarm-intelligence-based, physics-based, and human-based, even though some literature works presented more categorization. Further, lists of metaheuristic algorithms were introduced in the categories mentioned. In finding the solution to issues related to multiclass feature selection, only articles on metaheuristic algorithms used for multiclass feature selection problems from the year 2000 to 2022 were reviewed about their different categories and detailed descriptions. We considered some application areas for some of the metaheuristic algorithms applied for multiclass feature selection with their variations. Popular multiclass classifiers for feature selection were also examined. Moreover, we also presented the challenges of metaheuristic algorithms for feature selection, and we identified gaps for further research studies.
Collapse
Affiliation(s)
- Olatunji O. Akinola
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Absalom E. Ezugwu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Jeffrey O. Agushaka
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201 KwaZulu-Natal South Africa
| | - Raed Abu Zitar
- Sorbonne Center of Artificial Intelligence, Sorbonne University-Abu Dhabi, 38044 Abu Dhabi, United Arab Emirates
| | - Laith Abualigah
- Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, 19328 Jordan
- Faculty of Inforsmation Technology, Middle East University, Amman, 11831 Jordan
| |
Collapse
|
13
|
Ebiaredoh-Mienye SA, Swart TG, Esenogho E, Mienye ID. A Machine Learning Method with Filter-Based Feature Selection for Improved Prediction of Chronic Kidney Disease. Bioengineering (Basel) 2022; 9:350. [PMID: 36004875 PMCID: PMC9405039 DOI: 10.3390/bioengineering9080350] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/06/2022] [Accepted: 07/21/2022] [Indexed: 11/25/2022] Open
Abstract
The high prevalence of chronic kidney disease (CKD) is a significant public health concern globally. The condition has a high mortality rate, especially in developing countries. CKD often go undetected since there are no obvious early-stage symptoms. Meanwhile, early detection and on-time clinical intervention are necessary to reduce the disease progression. Machine learning (ML) models can provide an efficient and cost-effective computer-aided diagnosis to assist clinicians in achieving early CKD detection. This research proposed an approach to effectively detect CKD by combining the information-gain-based feature selection technique and a cost-sensitive adaptive boosting (AdaBoost) classifier. An approach like this could save CKD screening time and cost since only a few clinical test attributes would be needed for the diagnosis. The proposed approach was benchmarked against recently proposed CKD prediction methods and well-known classifiers. Among these classifiers, the proposed cost-sensitive AdaBoost trained with the reduced feature set achieved the best classification performance with an accuracy, sensitivity, and specificity of 99.8%, 100%, and 99.8%, respectively. Additionally, the experimental results show that the feature selection positively impacted the performance of the various classifiers. The proposed approach has produced an effective predictive model for CKD diagnosis and could be applied to more imbalanced medical datasets for effective disease detection.
Collapse
Affiliation(s)
- Sarah A. Ebiaredoh-Mienye
- Center for Telecommunications, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa; (S.A.E.-M.); (E.E.)
| | - Theo G. Swart
- Center for Telecommunications, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa; (S.A.E.-M.); (E.E.)
| | - Ebenezer Esenogho
- Center for Telecommunications, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa; (S.A.E.-M.); (E.E.)
| | - Ibomoiye Domor Mienye
- Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg 2006, South Africa;
| |
Collapse
|
14
|
Hidayat SN, Julian T, Dharmawan AB, Puspita M, Chandra L, Rohman A, Julia M, Rianjanu A, Nurputra DK, Triyana K, Wasisto HS. Hybrid learning method based on feature clustering and scoring for enhanced COVID-19 breath analysis by an electronic nose. Artif Intell Med 2022; 129:102323. [PMID: 35659391 PMCID: PMC9110307 DOI: 10.1016/j.artmed.2022.102323] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2021] [Revised: 05/05/2022] [Accepted: 05/12/2022] [Indexed: 01/31/2023]
Abstract
Breath pattern analysis based on an electronic nose (e-nose), which is a noninvasive, fast, and low-cost method, has been continuously used for detecting human diseases, including the coronavirus disease 2019 (COVID-19). Nevertheless, having big data with several available features is not always beneficial because only a few of them will be relevant and useful to distinguish different breath samples (i.e., positive and negative COVID-19 samples). In this study, we develop a hybrid machine learning-based algorithm combining hierarchical agglomerative clustering analysis and permutation feature importance method to improve the data analysis of a portable e-nose for COVID-19 detection (GeNose C19). Utilizing this learning approach, we can obtain an effective and optimum feature combination, enabling the reduction by half of the number of employed sensors without downgrading the classification model performance. Based on the cross-validation test results on the training data, the hybrid algorithm can result in accuracy, sensitivity, and specificity values of (86 ± 3)%, (88 ± 6)%, and (84 ± 6)%, respectively. Meanwhile, for the testing data, a value of 87% is obtained for all the three metrics. These results exhibit the feasibility of using this hybrid filter-wrapper feature-selection method to pave the way for optimizing the GeNose C19 performance.
Collapse
Affiliation(s)
- Shidiq Nur Hidayat
- PT Nanosense Instrument Indonesia, Umbulharjo, Yogyakarta 55167, Indonesia,Department of Physics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Sekip Utara, BLS 21, Yogyakarta 55281, Indonesia
| | - Trisna Julian
- PT Nanosense Instrument Indonesia, Umbulharjo, Yogyakarta 55167, Indonesia
| | - Agus Budi Dharmawan
- PT Nanosense Instrument Indonesia, Umbulharjo, Yogyakarta 55167, Indonesia,Faculty of Information Technology, Universitas Tarumanagara, Jl. Letjen S. Parman No. 1, Jakarta 11440, Indonesia
| | - Mayumi Puspita
- PT Nanosense Instrument Indonesia, Umbulharjo, Yogyakarta 55167, Indonesia
| | - Lily Chandra
- RS Bhayangkara Polda Daerah Istimewa Yogyakarta, Jl. Raya Solo-Yogyakarta KM. 14, Sleman 55571, Indonesia
| | - Abdul Rohman
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Universitas Gadjah Mada, Jl. Farmako Sekip Utara, Yogyakarta 55281, Indonesia
| | - Madarina Julia
- Department of Child Health, Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, Jl. Farmako Sekip Utara, Yogyakarta 55281, Indonesia
| | - Aditya Rianjanu
- Department of Materials Engineering, Institut Teknologi Sumatera, Terusan Ryacudu, Way Hui, Jati Agung, Lampung 35365, Indonesia
| | - Dian Kesumapramudya Nurputra
- Department of Child Health, Faculty of Medicine, Public Health and Nursing, Universitas Gadjah Mada, Jl. Farmako Sekip Utara, Yogyakarta 55281, Indonesia
| | - Kuwat Triyana
- Department of Physics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada, Sekip Utara, BLS 21, Yogyakarta 55281, Indonesia,Corresponding author
| | | |
Collapse
|
15
|
Recognition of cancer mediating biomarkers using rough approximations enabled intuitionistic fuzzy soft sets based similarity measure. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
16
|
A Comparative Analysis of Swarm Intelligence and Evolutionary Algorithms for Feature Selection in SVM-Based Hyperspectral Image Classification. REMOTE SENSING 2022. [DOI: 10.3390/rs14133019] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Feature selection (FS) is vital in hyperspectral image (HSI) classification, it is an NP-hard problem, and Swarm Intelligence and Evolutionary Algorithms (SIEAs) have been proved effective in solving it. However, the high dimensionality of HSIs still leads to the inefficient operation of SIEAs. In addition, many SIEAs exist, but few studies have conducted a comparative analysis of them for HSI FS. Thus, our study has two goals: (1) to propose a new filter–wrapper (F–W) framework that can improve the SIEAs’ performance; and (2) to apply ten SIEAs under the F–W framework (F–W–SIEAs) to optimize the support vector machine (SVM) and compare their performance concerning five aspects, namely the accuracy, the number of selected bands, the convergence rate, and the relative runtime. Based on three HSIs (i.e., Indian Pines, Salinas, and Kennedy Space Center (KSC)), we demonstrate how the proposed framework helps improve these SIEAs’ performances. The five aspects of the ten algorithms are different, but some have similar optimization capacities. On average, the F–W–Genetic Algorithm (F–W–GA) and F–W–Grey Wolf Optimizer (F–W–GWO) have the strongest optimization abilities, while the F–W–GWO requires the least runtime among the ten. The F–W–Marine Predators Algorithm (F–W–MPA) is second only to the two and slightly better than F–W–Differential Evolution (F–W–DE). The F–W–Ant Lion Optimizer (F–W–ALO), F–W–I-Ching Divination Evolutionary Algorithm (F–W–IDEA), and F–W–Whale Optimization Algorithm (F–W–WOA) have the middle optimization abilities, and F–W–IDEA takes the most runtime. Moreover, the F–W–SIEAs outperform other commonly used FS techniques in accuracy overall, especially in complex scenes.
Collapse
|
17
|
Sorkhi AG, Pirgazi J, Ghasemi V. A hybrid feature extraction scheme for efficient malonylation site prediction. Sci Rep 2022; 12:5756. [PMID: 35388017 PMCID: PMC8987080 DOI: 10.1038/s41598-022-08555-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 03/07/2022] [Indexed: 11/09/2022] Open
Abstract
Lysine malonylation is one of the most important post-translational modifications (PTMs). It affects the functionality of cells. Malonylation site prediction in proteins can unfold the mechanisms of cellular functionalities. Experimental methods are one of the due prediction approaches. But they are typically costly and time-consuming to implement. Recently, methods based on machine-learning solutions have been proposed to tackle this problem. Such practices have been shown to reduce costs and time complexities and increase accuracy. However, these approaches also have specific shortcomings, including inappropriate feature extraction out of protein sequences, high-dimensional features, and inefficient underlying classifiers. A machine learning-based method is proposed in this paper to cope with these problems. In the proposed approach, seven different features are extracted. Then, the extracted features are combined, ranked based on the Fisher's score (F-score), and the most efficient ones are selected. Afterward, malonylation sites are predicted using various classifiers. Simulation results show that the proposed method has acceptable performance compared with some state-of-the-art approaches. In addition, the XGBOOST classifier, founded on extracted features such as TFCRF, has a higher prediction rate than the other methods. The codes are publicly available at: https://github.com/jimy2020/Malonylation-site-prediction.
Collapse
Affiliation(s)
- Ali Ghanbari Sorkhi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| | - Jamshid Pirgazi
- Department of Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran.
| | - Vahid Ghasemi
- Department of Computer Engineering, Faculty of Information Technology, Kermanshah University of Technology, Kermanshah, Iran
| |
Collapse
|
18
|
Bansal SR, Wadhawan S, Goel R. mRMR-PSO: A Hybrid Feature Selection Technique with a Multiobjective Approach for Sign Language Recognition. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022. [DOI: 10.1007/s13369-021-06456-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
19
|
A Modified Memetic Algorithm with an Application to Gene Selection in a Sheep Body Weight Study. Animals (Basel) 2022; 12:ani12020201. [PMID: 35049823 PMCID: PMC8772977 DOI: 10.3390/ani12020201] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 01/06/2022] [Accepted: 01/14/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary Due to lacking exploitation capability, traditional genetic algorithm cannot accurately identify the minimal best gene subset. Thus, the improved splicing method is introduced into a genetic algorithm to enhance exploitation capability for achieving balance between exploitation and exploration of GA. It can effectively identify true gene subsets with high probability. Furthermore, a dataset of the body weight of Hu sheep has been used to show that the proposed method can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including genetic algorithm and adaptive best-subset selection algorithm. Abstract Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm.
Collapse
|
20
|
Azaiz MA, Bensaber DA. An Efficient Parallel Hybrid Feature Selection Approach for Big Data Analysis. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2022. [DOI: 10.4018/ijsir.308291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Classification algorithms face runtime complexity due to high data dimension, especially in the context of big data. Feature selection (FS) is a technique for reducing dimensions and improving learning performance. In this paper, the authors proposed a hybrid FS algorithm for classification in the context of big data. Firstly, only the most relevant features are selected using symmetric uncertainty (SU) as a measure of correlation. The features are distributed into subsets using Apache Spark to calculate SU between each feature and target class in parallel. Then a Binary PSO (BPSO) algorithm is used to find the optimal FS. The BPSO has limited convergence and restricted inertial weight adjustment, so the authors suggested using a multiple inertia weight strategy to influence the changes in particle motions so that the search process is more varied. Also, the authors proposed a parallel fitness evaluation for particles under Spark to accelerate the algorithm. The results showed that the proposed FS achieved higher classification performance with a smaller size in reasonable time.
Collapse
|
21
|
Abd-elnaby M, Alfonse M, Roushdy M. A Hybrid Mutual Information-LASSO-Genetic Algorithm Selection Approach for Classifying Breast Cancer. DIGITAL TRANSFORMATION TECHNOLOGY 2022:547-560. [DOI: 10.1007/978-981-16-2275-5_36] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
22
|
Ouadfel S, Abd Elaziz M. Efficient high-dimension feature selection based on enhanced equilibrium optimizer. EXPERT SYSTEMS WITH APPLICATIONS 2022; 187:115882. [DOI: 10.1016/j.eswa.2021.115882] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
23
|
Bhattacharjee S, Ikromjanov K, Carole KS, Madusanka N, Cho NH, Hwang YB, Sumon RI, Kim HC, Choi HK. Cluster Analysis of Cell Nuclei in H&E-Stained Histological Sections of Prostate Cancer and Classification Based on Traditional and Modern Artificial Intelligence Techniques. Diagnostics (Basel) 2021; 12:diagnostics12010015. [PMID: 35054182 PMCID: PMC8774423 DOI: 10.3390/diagnostics12010015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 12/14/2021] [Accepted: 12/20/2021] [Indexed: 11/16/2022] Open
Abstract
Biomarker identification is very important to differentiate the grade groups in the histopathological sections of prostate cancer (PCa). Assessing the cluster of cell nuclei is essential for pathological investigation. In this study, we present a computer-based method for cluster analyses of cell nuclei and performed traditional (i.e., unsupervised method) and modern (i.e., supervised method) artificial intelligence (AI) techniques for distinguishing the grade groups of PCa. Two datasets on PCa were collected to carry out this research. Histopathology samples were obtained from whole slides stained with hematoxylin and eosin (H&E). In this research, state-of-the-art approaches were proposed for color normalization, cell nuclei segmentation, feature selection, and classification. A traditional minimum spanning tree (MST) algorithm was employed to identify the clusters and better capture the proliferation and community structure of cell nuclei. K-medoids clustering and stacked ensemble machine learning (ML) approaches were used to perform traditional and modern AI-based classification. The binary and multiclass classification was derived to compare the model quality and results between the grades of PCa. Furthermore, a comparative analysis was carried out between traditional and modern AI techniques using different performance metrics (i.e., statistical parameters). Cluster features of the cell nuclei can be useful information for cancer grading. However, further validation of cluster analysis is required to accomplish astounding classification results.
Collapse
Affiliation(s)
| | - Kobiljon Ikromjanov
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Kouayep Sonia Carole
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Nuwan Madusanka
- School of Computing & IT, Sri Lanka Technological Campus, Paduka 10500, Sri Lanka;
| | - Nam-Hoon Cho
- Department of Pathology, Yonsei University Hospital, Seoul 03722, Korea;
| | - Yeong-Byn Hwang
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Rashadul Islam Sumon
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Hee-Cheol Kim
- Department of Digital Anti-Aging Healthcare, u-AHRC, Inje University, Gimhae 50834, Korea; (K.I.); (K.S.C.); (Y.-B.H.); (R.I.S.); (H.-C.K.)
| | - Heung-Kook Choi
- Department of Computer Engineering, u-AHRC, Inje University, Gimhae 50834, Korea;
- Correspondence: ; Tel.: +82-10-6733-3437
| |
Collapse
|
24
|
Maurya R, Pathak VK, Burget R, Dutta MK. Automated detection of bioimages using novel deep feature fusion algorithm and effective high-dimensional feature selection approach. Comput Biol Med 2021; 137:104862. [PMID: 34534793 DOI: 10.1016/j.compbiomed.2021.104862] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Revised: 08/26/2021] [Accepted: 09/07/2021] [Indexed: 11/30/2022]
Abstract
The classification of bioimages plays an important role in several biological studies, such as subcellular localisation, phenotype identification and other types of histopathological examinations. The objective of the present study was to develop a computer-aided bioimage classification method for the classification of bioimages across nine diverse benchmark datasets. A novel algorithm was developed, which systematically fused the features extracted from nine different convolution neural network architectures. A systematic fusion of features boosts the performance of a classifier but at the cost of the high dimensionality of the fused feature set. Therefore, non-discriminatory and redundant features need to be removed from a high-dimensional fused feature set to improve the classification performance and reduce the time complexity. To achieve this aim, a method based on analysis of variance and evolutionary feature selection was developed to select an optimal set of discriminatory features from the fused feature set. The proposed method was evaluated on nine different benchmark datasets. The experimental results showed that the proposed method achieved superior performance, with a significant reduction in the dimensionality of the fused feature set for most bioimage datasets. The performance of the proposed feature selection method was better than that of some of the most recent and classical methods used for feature selection. Thus, the proposed method was desirable because of its superior performance and high compression ratio, which significantly reduced the computational complexity.
Collapse
Affiliation(s)
- Ritesh Maurya
- Centre for Advanced Studies, Dr A.P.J. Abdul Kalam Technical University, Lucknow, India.
| | | | - Radim Burget
- Department of Telecommunications, Faculty of Electrical Engineering and Communication, BRNO University of Technology, Czech Republic.
| | - Malay Kishore Dutta
- Centre for Advanced Studies, Dr A.P.J. Abdul Kalam Technical University, Lucknow, India.
| |
Collapse
|
25
|
Hyperspectral Dimensionality Reduction Based on Inter-Band Redundancy Analysis and Greedy Spectral Selection. REMOTE SENSING 2021. [DOI: 10.3390/rs13183649] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Hyperspectral imaging systems are becoming widely used due to their increasing accessibility and their ability to provide detailed spectral responses based on hundreds of spectral bands. However, the resulting hyperspectral images (HSIs) come at the cost of increased storage requirements, increased computational time to process, and highly redundant data. Thus, dimensionality reduction techniques are necessary to decrease the number of spectral bands while retaining the most useful information. Our contribution is two-fold: First, we propose a filter-based method called interband redundancy analysis (IBRA) based on a collinearity analysis between a band and its neighbors. This analysis helps to remove redundant bands and dramatically reduces the search space. Second, we apply a wrapper-based approach called greedy spectral selection (GSS) to the results of IBRA to select bands based on their information entropy values and train a compact convolutional neural network to evaluate the performance of the current selection. We also propose a feature extraction framework that consists of two main steps: first, it reduces the total number of bands using IBRA; then, it can use any feature extraction method to obtain the desired number of feature channels. We present classification results obtained from our methods and compare them to other dimensionality reduction methods on three hyperspectral image datasets. Additionally, we used the original hyperspectral data cube to simulate the process of using actual filters in a multispectral imager.
Collapse
|
26
|
Mukherjee K, Colón YJ. Machine learning and descriptor selection for the computational discovery of metal-organic frameworks. MOLECULAR SIMULATION 2021. [DOI: 10.1080/08927022.2021.1916014] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Krishnendu Mukherjee
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN, USA
| | - Yamil J. Colón
- Department of Chemical and Biomolecular Engineering, University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
27
|
Chen Y, song HS, yang YN, wang GF. Fault detection in mixture production process based on wavelet packet and support vector machine. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2021. [DOI: 10.3233/jifs-201803] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Mixture production equipment is widely employed in road construction, and the quality of the produced mixture is the essential factor to ensure the quality of road construction. To detect the quality of the real-time produced mixture and solve the shortcomings of laboratory detection lag, a new fault detection method in the mixture production process is proposed, which is based on wavelet packet decomposition (WPD) and support vector machine (SVM). The proposed scheme includes feature extraction, feature selection, SVM classification, and optimization algorithm. During feature extraction, wavelet basis function is utilized to 4-layer decompose the aggregate and asphalt data mixed in real-time. The energy value calculated by wavelet packet coefficient is the extracted feature. During feature selection, a method combining the chi-square test and wrapper (CSW) is conducted to select the optimal feature subset from WPD features. Eventually, by adopting the optimal feature subset, SVM has been developed to classify various faults. Its parameters are optimized by differential evolution (DE) algorithm. In the test stage, multiple faults of different specifications of aggregates and asphalt are detected in the mixture production process. The results demonstrate that (1) accuracy produced by the CSW method with WPD features is 4.33% higher than the PCA method with statistical features; (2) SVM classification method optimized by DE algorithm brings an increase in recognition accuracy of identifying different types of mixture production faults produced by different equipment. Compared to other available methods, the proposed algorithm has a very outstanding detection performance.
Collapse
Affiliation(s)
- Yan Chen
- School of Information Engineering, Chang’an University, Xi’an, China
- School of Foreign Studies, Chang’an University, Xi’an, China
| | - Huan-sheng song
- School of Information Engineering, Chang’an University, Xi’an, China
| | - Yan-ni yang
- School of Information Engineering, Chang’an University, Xi’an, China
| | - Gang-feng wang
- School of Foreign Studies, Chang’an University, Xi’an, China
| |
Collapse
|
28
|
A machine learning approach to unmask novel gene signatures and prediction of Alzheimer's disease within different brain regions. Genomics 2021; 113:1778-1789. [PMID: 33878365 DOI: 10.1016/j.ygeno.2021.04.028] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 04/14/2021] [Indexed: 01/11/2023]
Abstract
Alzheimer's disease (AD) is a progressive neurodegenerative disorder whose aetiology is currently unknown. Although numerous studies have attempted to identify the genetic risk factor(s) of AD, the interpretability and/or the prediction accuracies achieved by these studies remained unsatisfactory, reducing their clinical significance. Here, we employ the ensemble of random-forest and regularized regression model (LASSO) to the AD-associated microarray datasets from four brain regions - Prefrontal cortex, Middle temporal gyrus, Hippocampus, and Entorhinal cortex- to discover novel genetic biomarkers through a machine learning-based feature-selection classification scheme. The proposed scheme unraveled the most optimum and biologically significant classifiers within each brain region, which achieved by far the highest prediction accuracy of AD in 5-fold cross-validation (99% average). Interestingly, along with the novel and prominent biomarkers including CORO1C, SLC25A46, RAE1, ANKIB1, CRLF3, PDYN, numerous non-coding RNA genes were also observed as discriminator, of which AK057435 and BC037880 are uncharacterized long non-coding RNA genes.
Collapse
|
29
|
RIFS2D: A two-dimensional version of a randomly restarted incremental feature selection algorithm with an application for detecting low-ranked biomarkers. Comput Biol Med 2021; 133:104405. [PMID: 33930763 DOI: 10.1016/j.compbiomed.2021.104405] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/13/2021] [Accepted: 04/13/2021] [Indexed: 12/20/2022]
Abstract
The era of big data introduces both opportunities and challenges for biomedical researchers. One of the inherent difficulties in the biomedical research field is to recruit large cohorts of samples, while high-throughput biotechnologies may produce thousands or even millions of features for each sample. Researchers tend to evaluate the individual correlation of each feature with the class label and use the incremental feature selection (IFS) strategy to select the top-ranked features with the best prediction performance. Recent experimental data showed that a subset of continuously ranked features randomly restarted from a low-ranked feature (an RIFS block) may outperform the subset of top-ranked features. This study proposed a feature selection Algorithm RIFS2D by integrating multiple RIFS blocks. A comprehensive comparative experiment was conducted with the IFS, RIFS and existing feature selection algorithms and demonstrated that a subset of low-ranked features may also achieve promising prediction performance. This study suggested that a prediction model with promising performance may be trained by low-ranked features, even when top-ranked features did not achieve satisfying prediction performance. Further comparative experiments were conducted between RIFS2D and t-tests for the detection of early-stage breast cancer. The data showed that the RIFS2D-recommended features achieved better prediction accuracy and were targeted by more drugs than the t-test top-ranked features.
Collapse
|
30
|
Pirgazi J, Pirmohammadi A, Shams R. A New Optimal Ensemble Algorithm Based on SVDD Sampling for Imbalanced Data Classification. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001421500208] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Nowadays, imbalanced data classification is a hot topic in data mining and recently, several valuable researches have been conducted to overcome certain difficulties in the field. Moreover, those approaches, which are based on ensemble classifiers, have achieved reasonable results. Despite the success of these works, there are still many unsolved issues such as disregarding the importance of samples in balancing, determination of proper number of classifiers and optimizing weights of base classifiers in voting stage of ensemble methods. This paper intends to find an admissible solution for these challenges. The solution suggested in this paper applies the support vector data descriptor (SVDD) for sampling both minority and majority classes. After determining the optimal number of base classifiers, the selected samples are utilized to adjust base classifiers. Finally, genetic algorithm optimization is used in order to find the optimum weights of each base classifier in the voting stage. The proposed method is compared with some existing algorithms. The results of experiments confirm its effectiveness.
Collapse
Affiliation(s)
- Jamshid Pirgazi
- Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
| | | | - Reza Shams
- Faculty of Information Technology and Computer Engineering, Shahrood University of Technology, Shahrood, Iran
| |
Collapse
|