1
|
Ayub H, Khan MA, Shehryar Ali Naqvi S, Faseeh M, Kim J, Mehmood A, Kim YJ. Unraveling the Potential of Attentive Bi-LSTM for Accurate Obesity Prognosis: Advancing Public Health towards Sustainable Cities. Bioengineering (Basel) 2024; 11:533. [PMID: 38927769 PMCID: PMC11200407 DOI: 10.3390/bioengineering11060533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 05/13/2024] [Accepted: 05/19/2024] [Indexed: 06/28/2024] Open
Abstract
The global prevalence of obesity presents a pressing challenge to public health and healthcare systems, necessitating accurate prediction and understanding for effective prevention and management strategies. This article addresses the need for improved obesity prediction models by conducting a comprehensive analysis of existing machine learning (ML) and deep learning (DL) approaches. This study introduces a novel hybrid model, Attention-based Bi-LSTM (ABi-LSTM), which integrates attention mechanisms with bidirectional Long Short-Term Memory (Bi-LSTM) networks to enhance interpretability and performance in obesity prediction. Our study fills a crucial gap by bridging healthcare and urban planning domains, offering insights into data-driven approaches to promote healthier living within urban environments. The proposed ABi-LSTM model demonstrates exceptional performance, achieving a remarkable accuracy of 96.5% in predicting obesity levels. Comparative analysis showcases its superiority over conventional approaches, with superior precision, recall, and overall classification balance. This study highlights significant advancements in predictive accuracy and positions the ABi-LSTM model as a pioneering solution for accurate obesity prognosis. The implications extend beyond healthcare, offering a precise tool to address the global obesity epidemic and foster sustainable development in smart cities.
Collapse
Affiliation(s)
- Hina Ayub
- Interdisciplinary Graduate Program in Advance Convergence Technology and Science, Jeju National University, Jeju 63243, Republic of Korea;
| | - Murad-Ali Khan
- Department of Computer Engineering, Jeju National University, Jeju 63243, Republic of Korea;
| | - Syed Shehryar Ali Naqvi
- Department of Electronics Engineering, Jeju National University, Jeju 63243, Republic of Korea; (S.S.A.N.)
| | - Muhammad Faseeh
- Department of Electronics Engineering, Jeju National University, Jeju 63243, Republic of Korea; (S.S.A.N.)
| | - Jungsuk Kim
- Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea;
| | - Asif Mehmood
- Department of Biomedical Engineering, College of IT Convergence, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Republic of Korea;
| | - Young-Jin Kim
- Medical Device Development Center, Osong Medical Innovation Foundation, Cheongju 28160, Republic of Korea
| |
Collapse
|
2
|
Yang G, Li W, Xie W, Wang L, Yu K. An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107987. [PMID: 38157825 DOI: 10.1016/j.cmpb.2023.107987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND AND OBJECTIVE The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. METHODS In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. RESULTS We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. CONCLUSIONS The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
Collapse
Affiliation(s)
- Guicheng Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Wei Li
- Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, 110000, Liaoning, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, 110819, Liaoning, China.
| | - Weidong Xie
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Linjie Wang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Kun Yu
- College of Medicine and Bioinformation Engineering, Northeastern University, Shenyang, 110819, Liaoning, China.
| |
Collapse
|
3
|
Yaqoob A, Verma NK, Aziz RM. Optimizing Gene Selection and Cancer Classification with Hybrid Sine Cosine and Cuckoo Search Algorithm. J Med Syst 2024; 48:10. [PMID: 38193948 DOI: 10.1007/s10916-023-02031-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 12/28/2023] [Indexed: 01/10/2024]
Abstract
Gene expression datasets offer a wide range of information about various biological processes. However, it is difficult to find the important genes among the high-dimensional biological data due to the existence of redundant and unimportant ones. Numerous Feature Selection (FS) techniques have been created to get beyond this obstacle. Improving the efficacy and precision of FS methodologies is crucial in order to identify significant genes amongst complicated complex biological data. In this work, we present a novel approach to gene selection called the Sine Cosine and Cuckoo Search Algorithm (SCACSA). This hybrid method is designed to work with well-known machine learning classifiers Support Vector Machine (SVM). Using a dataset on breast cancer, the hybrid gene selection algorithm's performance is carefully assessed and compared to other feature selection methods. To improve the quality of the feature set, we use minimum Redundancy Maximum Relevance (mRMR) as a filtering strategy in the first step. The hybrid SCACSA method is then used to enhance and optimize the gene selection procedure. Lastly, we classify the dataset according to the chosen genes by using the SVM classifier. Given the pivotal role gene selection plays in unraveling complex biological datasets, SCACSA stands out as an invaluable tool for the classification of cancer datasets. The findings help medical practitioners make well-informed decisions about cancer diagnosis and provide them with a valuable tool for navigating the complex world of gene expression data.
Collapse
Affiliation(s)
- Abrar Yaqoob
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India.
| | - Navneet Kumar Verma
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| | - Rabia Musheer Aziz
- School of Advanced Sciences and Languages, VIT Bhopal University, Kothrikalan, Sehore, 466114, India
| |
Collapse
|
4
|
Gao B, Fan B, Wang J, Wu X, Xin Q. A Method for Optimizing the Dwell Time of Optical Components in Magnetorheological Finishing Based on Particle Swarm Optimization. MICROMACHINES 2023; 15:18. [PMID: 38276846 PMCID: PMC11154564 DOI: 10.3390/mi15010018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 12/16/2023] [Accepted: 12/18/2023] [Indexed: 01/27/2024]
Abstract
In this paper, a dwell time optimization method based on the particle swarm optimization algorithm is proposed according to the pulse iteration principle in order to achieve high-precision magnetorheological finishing of optical components. The dwell time optimization method explores the optimal solution in the solution space by comparing the accuracy value of the final surface with the set value. In this way, the dwell time optimization method was able to achieve global optimization of the overall dwell times and each dwell time point, ultimately realizing the high-precision processing of a surface. Through the simulation of two Φ156 mm asphaltic mirrors (1# and 2#), the root-mean-square (RMS) and peak-valley (PV) values of 1# converged from the initial values of 169.164 nm and 1161.69 nm to 24.79 nm and 911.53 nm. Similarly, the RMS and PV values of 2# converged from the initial values of 187.27 nm and 1694.05 nm to 31.76 nm and 1045.61 nm. The simulation results showed that compared with the general pulse iteration method, the proposed algorithm could obtain a more accurate dwell time distribution of each point under the condition of almost the same processing time, subsequently acquiring a better convergence surface and reducing mid-spatial error. Finally, the accuracy of the optimization algorithm was verified through experiments. The experimental results demonstrated that the optimized algorithm could be used to perform high-precision surface machining. Overall, this optimization method provides a solution for dwell time calculation in the process of the magnetorheological finishing of optical components.
Collapse
Affiliation(s)
- Bo Gao
- National Key Laboratory of Optical Field Manipulation Science and Technology, Chengdu 610209, China; (B.G.); (J.W.); (X.W.); (Q.X.)
- Advanced Manufacturing Center of Optics, Chinese Academy of Sciences, Chengdu 610209, China
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bin Fan
- National Key Laboratory of Optical Field Manipulation Science and Technology, Chengdu 610209, China; (B.G.); (J.W.); (X.W.); (Q.X.)
- Advanced Manufacturing Center of Optics, Chinese Academy of Sciences, Chengdu 610209, China
| | - Jia Wang
- National Key Laboratory of Optical Field Manipulation Science and Technology, Chengdu 610209, China; (B.G.); (J.W.); (X.W.); (Q.X.)
- Advanced Manufacturing Center of Optics, Chinese Academy of Sciences, Chengdu 610209, China
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China
| | - Xiang Wu
- National Key Laboratory of Optical Field Manipulation Science and Technology, Chengdu 610209, China; (B.G.); (J.W.); (X.W.); (Q.X.)
- Advanced Manufacturing Center of Optics, Chinese Academy of Sciences, Chengdu 610209, China
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Qiang Xin
- National Key Laboratory of Optical Field Manipulation Science and Technology, Chengdu 610209, China; (B.G.); (J.W.); (X.W.); (Q.X.)
- Advanced Manufacturing Center of Optics, Chinese Academy of Sciences, Chengdu 610209, China
- Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China
| |
Collapse
|
5
|
Guan R, Liu W, Li N, Cui Z, Cai R, Wang Y, Zhao C. Machine learning models based on residue interaction network for ABCG2 transportable compounds recognition. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 337:122620. [PMID: 37769706 DOI: 10.1016/j.envpol.2023.122620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 09/03/2023] [Accepted: 09/25/2023] [Indexed: 10/02/2023]
Abstract
As the one of the most important protein of placental transport of environmental substances, the identification of ABCG2 transport molecules is the key step for assessing the risk of placental exposure to environmental chemicals. Here, residue interaction network (RIN) was used to explore the difference of ABCG2 binding conformations between transportable and non-transportable compounds. The RIN were treated as a kind of special quantitative data of protein conformation, which not only reflected the changes of single amino acid conformation in protein, but also indicated the changes of distance and action type between amino acids. Based on the quantitative RIN, four machine learning algorithms were applied to establish the classification and recognition model for 1100 compounds with transported by ABCG2 potential. The random forest (RF) models constructed with RIN presented the best and satisfied predictive ability with an accuracy of training set of 0.97 and the test set of 0.96 respectively. In conclusion, the construction of residue interaction network provided a new perspective for the quantitative characterization of protein conformation and the establishment of prediction models for transporter molecular recognition. The ABCG2 transport molecular recognition model based on residue interaction network provides a possible way for screening environmental chemistry transported through placenta.
Collapse
Affiliation(s)
- Ruining Guan
- School of Pharmacy, Lanzhou University, Lanzhou, 730000, China
| | - Wencheng Liu
- School of Pharmacy, Lanzhou University, Lanzhou, 730000, China
| | - Ningqi Li
- School of Pharmacy, Lanzhou University, Lanzhou, 730000, China
| | - Zeyang Cui
- School of Information Science & Engineering, Lanzhou University, Lanzhou, 730000, China
| | - Ruitong Cai
- School of Pharmacy, Lanzhou University, Lanzhou, 730000, China
| | - Yawei Wang
- Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China
| | - Chunyan Zhao
- School of Pharmacy, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
6
|
Pacheco J, Saiz O, Casado S, Ubillos S. A multistart tabu search-based method for feature selection in medical applications. Sci Rep 2023; 13:17140. [PMID: 37816874 PMCID: PMC10564765 DOI: 10.1038/s41598-023-44437-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 10/08/2023] [Indexed: 10/12/2023] Open
Abstract
In the design of classification models, irrelevant or noisy features are often generated. In some cases, there may even be negative interactions among features. These weaknesses can degrade the performance of the models. Feature selection is a task that searches for a small subset of relevant features from the original set that generate the most efficient models possible. In addition to improving the efficiency of the models, feature selection confers other advantages, such as greater ease in the generation of the necessary data as well as clearer and more interpretable models. In the case of medical applications, feature selection may help to distinguish which characteristics, habits, and factors have the greatest impact on the onset of diseases. However, feature selection is a complex task due to the large number of possible solutions. In the last few years, methods based on different metaheuristic strategies, mainly evolutionary algorithms, have been proposed. The motivation of this work is to develop a method that outperforms previous methods, with the benefits that this implies especially in the medical field. More precisely, the present study proposes a simple method based on tabu search and multistart techniques. The proposed method was analyzed and compared to other methods by testing their performance on several medical databases. Specifically, eight databases belong to the well-known repository of the University of California in Irvine and one of our own design were used. In these computational tests, the proposed method outperformed other recent methods as gauged by various metrics and classifiers. The analyses were accompanied by statistical tests, the results of which showed that the superiority of our method is significant and therefore strengthened these conclusions. In short, the contribution of this work is the development of a method that, on the one hand, is based on different strategies than those used in recent methods, and on the other hand, improves the performance of these methods.
Collapse
|
7
|
Houssein EH, Samee NA, Mahmoud NF, Hussain K. Dynamic Coati Optimization Algorithm for Biomedical Classification Tasks. Comput Biol Med 2023; 164:107237. [PMID: 37467535 DOI: 10.1016/j.compbiomed.2023.107237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 06/13/2023] [Accepted: 07/07/2023] [Indexed: 07/21/2023]
Abstract
Medical datasets are primarily made up of numerous pointless and redundant elements in a collection of patient records. None of these characteristics are necessary for a medical decision-making process. Conversely, a large amount of data leads to increased dimensionality and decreased classifier performance in terms of machine learning. Numerous approaches have recently been put out to address this issue, and the results indicate that feature selection can be a successful remedy. To meet the various needs of input patterns, medical diagnostic tasks typically involve learning a suitable categorization model. The k-Nearest Neighbors algorithm (kNN) classifier's classification performance is typically decreased by the input variables' abundance of irrelevant features. To simplify the kNN classifier, essential attributes of the input variables have been searched using the feature selection approach. This paper presents the Coati Optimization Algorithm (DCOA) in a dynamic form as a feature selection technique where each iteration of the optimization process involves the introduction of a different feature. We enhance the exploration and exploitation capability of DCOA by employing dynamic opposing candidate solutions. The most impressive feature of DCOA is that it does not require any preparatory parameter fine-tuning to the most popular metaheuristic algorithms. The CEC'22 test suite and nine medical datasets with various dimension sizes were used to evaluate the performance of the original COA and the proposed dynamic version. The statistical results were validated using the Bonferroni-Dunn test and Kendall's W test and showed the superiority of DCOA over seven well-known metaheuristic algorithms with an overall accuracy of 89.7%, a feature selection of 24%, a sensitivity of 93.35% a specificity of 96.81%, and a precision of 93.90%.
Collapse
Affiliation(s)
- Essam H Houssein
- Faculty of Computers and Information, Minia University, Minia, Egypt.
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
| | - Noha F Mahmoud
- Rehabilitation Sciences Department, Health and Rehabilitation Sciences College, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia.
| | - Kashif Hussain
- Department of Science and Engineering, Solent University, East Park Terrace, Southampton, SO14 0YN, United Kingdom.
| |
Collapse
|
8
|
Anđelić N, Baressi Šegota S. Development of Symbolic Expressions Ensemble for Breast Cancer Type Classification Using Genetic Programming Symbolic Classifier and Decision Tree Classifier. Cancers (Basel) 2023; 15:3411. [PMID: 37444522 DOI: 10.3390/cancers15133411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 06/20/2023] [Accepted: 06/26/2023] [Indexed: 07/15/2023] Open
Abstract
Breast cancer is a type of cancer with several sub-types. It occurs when cells in breast tissue grow out of control. The accurate sub-type classification of a patient diagnosed with breast cancer is mandatory for the application of proper treatment. Breast cancer classification based on gene expression is challenging even for artificial intelligence (AI) due to the large number of gene expressions. The idea in this paper is to utilize the genetic programming symbolic classifier (GPSC) on the publicly available dataset to obtain a set of symbolic expressions (SEs) that can classify the breast cancer sub-type using gene expressions with high classification accuracy. The initial problem with the used dataset is a large number of input variables (54,676 gene expressions), a small number of dataset samples (151 samples), and six classes of breast cancer sub-types that are highly imbalanced. The large number of input variables is solved with principal component analysis (PCA), while the small number of samples and the large imbalance between class samples are solved with the application of different oversampling methods generating different dataset variations. On each oversampled dataset, the GPSC with random hyperparameter values search (RHVS) method is trained using 5-fold cross validation (5CV) to obtain a set of SEs. The best set of SEs is chosen based on mean values of accuracy (ACC), the area under the receiving operating characteristic curve (AUC), precision, recall, and F1-score values. In this case, the highest classification accuracy is equal to 0.992 across all evaluation metric methods. The best set of SEs is additionally combined with a decision tree classifier, which slightly improves ACC to 0.994.
Collapse
Affiliation(s)
- Nikola Anđelić
- Department of Automation and Electronics, Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia
| | - Sandi Baressi Šegota
- Department of Automation and Electronics, Faculty of Engineering, University of Rijeka, Vukovarska 58, 51000 Rijeka, Croatia
| |
Collapse
|
9
|
Guo H, Ma J, Wang R, Zhou Y. Feature library-assisted surrogate model for evolutionary wrapper-based feature selection and classification. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
|
10
|
Zhong J, Xuan W, Lu S, Cui S, Zhou Y, Tang M, Qu X, Lu W, Huo H, Zhang C, Zhang N, Niu B. Discovery of ANO1 Inhibitors based on Machine learning and molecule docking simulation approaches. Eur J Pharm Sci 2023; 184:106408. [PMID: 36842513 DOI: 10.1016/j.ejps.2023.106408] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 02/05/2023] [Accepted: 02/19/2023] [Indexed: 02/28/2023]
Abstract
Calcium-activated chloride channels (CaCCs) are chloride channels that are regulated according to intracellular calcium ion concentrations. The channel protein ANO1 is widely present in cells and is involved in physiological activities including cellular secretion, signaling, cell proliferation and vasoconstriction and diastole. In this study, the ANO1 inhibitors were investigated with machine learning and molecular simulation. Two-dimensional structure-activity relationship (2D-SAR) and three-dimensional quantitative structure-activity relationship (3D-QSAR) models were developed for the qualitative and quantitative prediction of ANO1 inhibitors. The results showed that the prediction accuracies of the model were 85.9% and 87.8% for the training and test sets, respectively, and 85.9% and 87.8% for the rotating forest (RF) in the 2D-SAR model. The CoMFA and CoMSIA methods were then used for 3D QSAR modeling of ANO1 inhibitors, respectively. The q2 coefficients for model cross-validation were all greater than 0.5, implying that we were able to obtain a stable model for drug activity prediction. Molecular docking was further used to simulate the interactions between the five most promising compounds predicted by the model and the ANO1 protein. The total score for the docking results between all five compounds and the target protein was greater than 6, indicating that they interacted strongly in the form of hydrogen bonds. Finally, simulations of amino acid mutations around the docking cavity of the target proteins showed that each molecule had two or more sites of reduced affinity following a single mutation, indicating outstanding specificity of the screened drug molecules and their protein ligands.
Collapse
Affiliation(s)
- Junjie Zhong
- School of life Science, Shanghai University, 99 Shangda Road,200444, China.
| | - Wendi Xuan
- School of life Science, Shanghai University, 99 Shangda Road,200444, China.
| | - Sheng Lu
- Department of General Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, PR China.
| | - Shihao Cui
- School of life Science, Shanghai University, 99 Shangda Road,200444, China.
| | - Yuhang Zhou
- School of life Science, Shanghai University, 99 Shangda Road,200444, China.
| | - Mengting Tang
- School of life Science, Shanghai University, 99 Shangda Road,200444, China.
| | - Xiaosheng Qu
- National Engineering laboratory of Southwest Endangered Medicinal Resources Development, Guangxi Botanical Garden of Medicinal Plants, China.
| | - Wencong Lu
- Chemistry Department, College of Science, Shanghai University, 99 Shangda Road,200444, China
| | - Haizhong Huo
- Department of General Surgery, Shanghai Ninth People's Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China.
| | - Chi Zhang
- Huaxia Eye Hospital of Foshan, Huaxia Eye Hospital Group, Foshan, Guangdong 528000, China.
| | - Ning Zhang
- Department of Hepatic Surgery, Fudan University Shanghai Cancer Center, Shanghai 200032, China.
| | - Bing Niu
- School of life Science, Shanghai University, 99 Shangda Road,200444, China.
| |
Collapse
|
11
|
Sadeghian Z, Akbari E, Nematzadeh H, Motameni H. A review of feature selection methods based on meta-heuristic algorithms. J EXP THEOR ARTIF IN 2023. [DOI: 10.1080/0952813x.2023.2183267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Affiliation(s)
- Zohre Sadeghian
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Ebrahim Akbari
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Hossein Nematzadeh
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| | - Homayun Motameni
- Department of Computer Engineering, Sari Branch, Islamic Azad University, Sari, Iran
| |
Collapse
|
12
|
Alromema N, Syed AH, Khan T. A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data. Diagnostics (Basel) 2023; 13:diagnostics13040708. [PMID: 36832196 PMCID: PMC9955903 DOI: 10.3390/diagnostics13040708] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 01/30/2023] [Accepted: 02/07/2023] [Indexed: 02/16/2023] Open
Abstract
The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naïve Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 ± 0.027, an F1-Score of 0.974 ± 0.030, and an AUC value of 0.961 ± 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples.
Collapse
Affiliation(s)
- Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
- Correspondence:
| | - Asif Hassan Syed
- Department of Computer Science, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
| | - Tabrej Khan
- Department of Information Systems, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
| |
Collapse
|
13
|
Hybrid Filter and Genetic Algorithm-Based Feature Selection for Improving Cancer Classification in High-Dimensional Microarray Data. Processes (Basel) 2023. [DOI: 10.3390/pr11020562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023] Open
Abstract
The advancements in intelligent systems have contributed tremendously to the fields of bioinformatics, health, and medicine. Intelligent classification and prediction techniques have been used in studying microarray datasets, which store information about the ways used to express the genes, to assist greatly in diagnosing chronic diseases, such as cancer in its earlier stage, which is important and challenging. However, the high-dimensionality and noisy nature of the microarray data lead to slow performance and low cancer classification accuracy while using machine learning techniques. In this paper, a hybrid filter-genetic feature selection approach has been proposed to solve the high-dimensional microarray datasets problem which ultimately enhances the performance of cancer classification precision. First, the filter feature selection methods including information gain, information gain ratio, and Chi-squared are applied in this study to select the most significant features of cancerous microarray datasets. Then, a genetic algorithm has been employed to further optimize and enhance the selected features in order to improve the proposed method’s capability for cancer classification. To test the proficiency of the proposed scheme, four cancerous microarray datasets were used in the study—this primarily included breast, lung, central nervous system, and brain cancer datasets. The experimental results show that the proposed hybrid filter-genetic feature selection approach achieved better performance of several common machine learning methods in terms of Accuracy, Recall, Precision, and F-measure.
Collapse
|
14
|
Nekouie N, Romoozi M, Esmaeili M. A New Evolutionary Ensemble Learning of Multimodal Feature Selection from Microarray Data. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11159-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]
|
15
|
Feature selection in high dimensional data: A specific preordonnances-based memetic algorithm. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110420] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]
|
16
|
Wu Y, Zhu D, Wang X. Tree enhanced deep adaptive network for cancer prediction with high dimension low sample size microarray data. Appl Soft Comput 2023. [DOI: 10.1016/j.asoc.2023.110078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
17
|
Senthilkumar D, Reshmy A, Paulraj S. Dimensionality reduction strategy for Multi-Target Regression paradigm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2023. [DOI: 10.3233/jifs-220412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Multi-Target Regression (MTR) is used to study the relationship between the same set of input variables and multiple continuous target variables simultaneously. A dataset with many input and output variables is the prime issue to address in the MTR, which is computationally complex to build a prediction model. Also, dimensionality reduction from multiple target variables is a challenging and essential task that aims to reduce the size of the dataset to optimize the time complexity of analysis and remove the redundant and irrelevant variables. This paper proposes an efficient feature selection strategy, Multi-Target Feature Subset Selection (MTFSS), for MTR that constructs a unique subset of features by considering multiple targets. On the other hand, two feature evaluators, correlation and ReliefF, support the MTR dataset without discretization. Furthermore, two new score functions, weighted mean aggregation strategy and threshold function, are introduced to identify the significant features. To evaluate the effectiveness of the proposed MTFSS, experiments were carried out on a benchmark dataset. The experimental results demonstrate that the proposed MTFSS can select fewer features and perform better than the original dataset results. Also, the correlation-based feature evaluator performs better than ReliefF with better performance.
Collapse
Affiliation(s)
- D. Senthilkumar
- Department of Computer Science and Engineering, University College of Engineering, Anna University, Tiruchirappalli, Tamil Nadu, India
| | - A.K. Reshmy
- Department of Computational Intelligence, School of Computing, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur Campus, Chengalpattu, Tamil Nadu, India
| | - S. Paulraj
- Department of Mathematics, College of Engineering Guindy Campus, Anna University, Chennai, Tamil Nadu, India
| |
Collapse
|
18
|
Xie W, Wang L, Yu K, Shi T, Li W. Improved multi-layer binary firefly algorithm for optimizing feature selection and classification of microarray data. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
19
|
Li M, Ke L, Wang L, Deng S, Yu X. A novel hybrid gene selection for tumor identification by combining multifilter integration and a recursive flower pollination search algorithm. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2022.110250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
20
|
Abd Al Rahman E, Intan Raihana Ruhaiyem N, Bouchahma M, Imran Musa K. Framework for a Computer-Aided Treatment Prediction (CATP) System for Breast Cancer. INTELLIGENT AUTOMATION & SOFT COMPUTING 2023; 36:3007-3028. [DOI: 10.32604/iasc.2023.032580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
21
|
Chamlal H, Ouaderhman T, Aaboub F. A graph based preordonnances theoretic supervised feature selection in high dimensional data. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
22
|
Interaction-based clustering algorithm for feature selection: a multivariate filter approach. INT J MACH LEARN CYB 2022. [DOI: 10.1007/s13042-022-01726-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
23
|
Improved swarm-optimization-based filter-wrapper gene selection from microarray data for gene expression tumor classification. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01117-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
24
|
Wang Z, Gao S, Zhang Y, Guo L. Symmetric uncertainty-incorporated probabilistic sequence-based ant colony optimization for feature selection in classification. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
25
|
Li Q, Wang P, Yuan J, Zhou Y, Mei Y, Ye M. A two-stage hybrid gene selection algorithm combined with machine learning models to predict the rupture status in intracranial aneurysms. Front Neurosci 2022; 16:1034971. [PMID: 36340761 PMCID: PMC9631203 DOI: 10.3389/fnins.2022.1034971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 09/30/2022] [Indexed: 07/31/2023] Open
Abstract
An IA is an abnormal swelling of cerebral vessels, and a subset of these IAs can rupture causing aneurysmal subarachnoid hemorrhage (aSAH), often resulting in death or severe disability. Few studies have used an appropriate method of feature selection combined with machine learning by analyzing transcriptomic sequencing data to identify new molecular biomarkers. Following gene ontology (GO) and enrichment analysis, we found that the distinct status of IAs could lead to differential innate immune responses using all 913 differentially expressed genes, and considering that there are numerous irrelevant and redundant genes, we propose a mixed filter- and wrapper-based feature selection. First, we used the Fast Correlation-Based Filter (FCBF) algorithm to filter a large number of irrelevant and redundant genes in the raw dataset, and then used the wrapper feature selection method based on the he Multi-layer Perceptron (MLP) neural network and the Particle Swarm Optimization (PSO), accuracy (ACC) and mean square error (MSE) were then used as the evaluation criteria. Finally, we constructed a novel 10-gene signature (YIPF1, RAB32, WDR62, ANPEP, LRRCC1, AADAC, GZMK, WBP2NL, PBX1, and TOR1B) by the proposed two-stage hybrid algorithm FCBF-MLP-PSO and used different machine learning models to predict the rupture status in IAs. The highest ACC value increased from 0.817 to 0.919 (12.5% increase), the highest area under ROC curve (AUC) value increased from 0.87 to 0.94 (8.0% increase), and all evaluation metrics improved by approximately 10% after being processed by our proposed gene selection algorithm. Therefore, these 10 informative genes used to predict rupture status of IAs can be used as complements to imaging examinations in the clinic, meanwhile, this selected gene signature also provides new targets and approaches for the treatment of ruptured IAs.
Collapse
Affiliation(s)
- Qingqing Li
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Peipei Wang
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Jinlong Yuan
- Department of Neurosurgery, Yijishan Hospital of Wannan Medical College, Wannan Medical College, Wuhu, Anhui, China
| | - Yunfeng Zhou
- Department of Radiology, Yijishan Hospital of Wannan Medical College, Wannan Medical College, Wuhu, Anhui, China
| | - Yaxin Mei
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| | - Mingquan Ye
- School of Medical Information, Wannan Medical College, Wuhu, Anhui, China
- Research Center of Health Big Data Mining and Applications, Wannan Medical College, Wuhu, Anhui, China
| |
Collapse
|
26
|
An Efficient Hybrid Feature Selection Method Using the Artificial Immune Algorithm for High-Dimensional Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1452301. [PMID: 36275946 PMCID: PMC9584659 DOI: 10.1155/2022/1452301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 07/31/2022] [Accepted: 08/29/2022] [Indexed: 12/02/2022]
Abstract
Feature selection provides the optimal subset of features for data mining models. However, current feature selection methods for high-dimensional data also require a better balance between feature subset quality and computational cost. In this paper, an efficient hybrid feature selection method (HFIA) based on artificial immune algorithm optimization is proposed to solve the feature selection problem of high-dimensional data. The algorithm combines filter algorithms and improves clone selection algorithms to explore the feature space of high-dimensional data. According to the target requirements of feature selection, combined with biological research results, this method introduces the lethal mutation mechanism and the Cauchy operator to improve the search performance of the algorithm. Moreover, the adaptive adjustment factor is introduced in the mutation and update phases of the algorithm. The effective combination of these mechanisms enables the algorithm to obtain a better search ability and lower computational costs. Experimental comparisons with 19 state-of-the-art feature selection methods are conducted on 25 high-dimensional benchmark datasets. The results show that the feature reduction rate for all datasets is above 99%, and the performance improvement for the classifier is between 5% and 48.33%. Compared with the five classical filtering feature selection methods, the computational cost of HFIA is lower than the two of them, and it is far better than these five algorithms in terms of the feature reduction rate and classification accuracy improvement. Compared with the 14 hybrid feature selection methods reported in the latest literature, the average winning rates in terms of classification accuracy, feature reduction rate, and computational cost are 85.83%, 88.33%, and 96.67%, respectively.
Collapse
|
27
|
Devi Priya R, Sivaraj R, Abraham A, Pravin T, Sivasankar P, Anitha N. Multi-Objective Particle Swarm Optimization Based Preprocessing of Multi-Class Extremely Imbalanced Datasets. INT J UNCERTAIN FUZZ 2022. [DOI: 10.1142/s0218488522500209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Today’s datasets are usually very large with many features and making analysis on such datasets is really a tedious task. Especially when performing classification, selecting attributes that are salient for the process is a brainstorming task. It is more difficult when there are many class labels for the target class attribute and hence many researchers have introduced methods to select features for performing classification on multi-class attributes. The process becomes more tedious when the attribute values are imbalanced for which researchers have contributed many methods. But, there is no sufficient research to handle extreme imbalance and feature selection together and hence this paper aims to bridge this gap. Here Particle Swarm Optimization (PSO), an efficient evolutionary algorithm is used to handle imbalanced dataset and feature selection process is also enhanced with the required functionalities. First, Multi-objective Particle Swarm Optimization is used to transform the imbalanced datasets into balanced one and then another version of Multi-objective Particle Swarm Optimization is used to select the significant features. The proposed methodology is applied on eight multi-class extremely imbalanced datasets and the experimental results are found to be better than other existing methods in terms of classification accuracy, G mean, F measure. The results validated by using Friedman test also confirm that the proposed methodology effectively balances the dataset with less number of features than other methods.
Collapse
Affiliation(s)
- R. Devi Priya
- Department of Computer Science and Engineering, Centre for IoT and Artificial Intelligence, KPR Institute of Engineering and Technology, Coimbatore, TamilNadu, India
| | - R. Sivaraj
- Department of Computer Science and Engineering, Nandha Engineering College, Erode, TamilNadu, India
| | - Ajith Abraham
- Center for Artificial Intelligence, Innopolis University, Innopolis, Russia
- Machine Intelligence Research Labs (MIR Labs), Auburn, Washington 98071, USA
| | - T. Pravin
- Department of Mechanical Engineering, SNS College of Engineering, Coimbatore, India
| | - P. Sivasankar
- Department of Petroleum Engineering & Earth Sciences, Indian Institute of Petroleum and Energy, Visakhapatnam, India
| | - N. Anitha
- Department of Information Technology, Kongu Engineering College, Erode, TamilNadu, India
| |
Collapse
|
28
|
Vahmiyan M, Kheirabadi M, Akbari E. Feature selection methods in microarray gene expression data: a systematic mapping study. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2022]
|
29
|
Hybrid binary COOT algorithm with simulated annealing for feature selection in high-dimensional microarray data. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07780-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
30
|
Pashaei E. Mutation-based Binary Aquila optimizer for gene selection in cancer classification. Comput Biol Chem 2022; 101:107767. [PMID: 36084602 DOI: 10.1016/j.compbiolchem.2022.107767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 07/10/2022] [Accepted: 08/29/2022] [Indexed: 11/19/2022]
Abstract
Microarray data classification is one of the hottest issues in the field of bioinformatics due to its efficiency in diagnosing patients' ailments. But the difficulty is that microarrays possess a huge number of genes where the majority of which are redundant or irrelevant resulting in the deterioration of classification accuracy. For this issue, mutated binary Aquila Optimizer (MBAO) with a time-varying mirrored S-shaped (TVMS) transfer function is proposed as a new wrapper gene (or feature) selection method to find the optimal subset of informative genes. The suggested hybrid method utilizes Minimum Redundancy Maximum Relevance (mRMR) as a filtering approach to choose top-ranked genes in the first stage and then uses MBAO-TVMS as an efficient wrapper approach to identify the most discriminative genes in the second stage. TVMS is adopted to transform the continuous version of Aquila Optimizer (AO) to binary one and a mutation mechanism is incorporated into binary AO to aid the algorithm to escape local optima and improve its global search capabilities. The suggested method was tested on eleven well-known benchmark microarray datasets and compared to other current state-of-the-art methods. Based on the obtained results, mRMR-MBAO confirms its superiority over the mRMR-BAO algorithm and the other comparative GS approaches on the majority of the medical datasets strategies in terms of classification accuracy and the number of selected genes. R codes of MBAO are available at https://github.com/el-pashaei/MBAO.
Collapse
Affiliation(s)
- Elham Pashaei
- Department of Computer Engineering, Istanbul Gelisim University, Istanbul, Turkey.
| |
Collapse
|
31
|
Gokhale M, Mohanty SK, Ojha A. A stacked autoencoder based gene selection and cancer classification framework. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
|
33
|
Murphy RG, Gilmore A, Senevirathne S, O'Reilly PG, LaBonte Wilson M, Jain S, McArt DG. Particle Swarm Optimization Artificial Intelligence technique for gene signature discovery in transcriptomic cohorts. Comput Struct Biotechnol J 2022; 20:5547-5563. [PMID: 36249564 PMCID: PMC9556859 DOI: 10.1016/j.csbj.2022.09.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 09/22/2022] [Accepted: 09/22/2022] [Indexed: 11/12/2022] Open
Abstract
EBPSO identifies unique, accurate, and succinct gene signatures. Key genes within the signatures provide biological insights its associated functions. A web-based micro-framework developed for ease of use and real-time visualizations. A promising alternative to traditional single gene signature generation. Downstream analysis will better translate these signatures towards clinical translation.
The development of gene signatures is key for delivering personalized medicine, despite only a few signatures being available for use in the clinic for cancer patients. Gene signature discovery tends to revolve around identifying a single signature. However, it has been shown that various highly predictive signatures can be produced from the same dataset. This study assumes that the presentation of top ranked signatures will allow greater efforts in the selection of gene signatures for validation on external datasets and for their clinical translation. Particle swarm optimization (PSO) is an evolutionary algorithm often used as a search strategy and largely represented as binary PSO (BPSO) in this domain. BPSO, however, fails to produce succinct feature sets for complex optimization problems, thus affecting its overall runtime and optimization performance. Enhanced BPSO (EBPSO) was developed to overcome these shortcomings. Thus, this study will validate unique candidate gene signatures for different underlying biology from EBPSO on transcriptomics cohorts. EBPSO was consistently seen to be as accurate as BPSO with substantially smaller feature signatures and significantly faster runtimes. 100% accuracy was achieved in all but two of the selected data sets. Using clinical transcriptomics cohorts, EBPSO has demonstrated the ability to identify accurate, succinct, and significantly prognostic signatures that are unique from one another. This has been proposed as a promising alternative to overcome the issues regarding traditional single gene signature generation. Interpretation of key genes within the signatures provided biological insights into the associated functions that were well correlated to their cancer type.
Collapse
|
34
|
Limam H, Hasni O, Alaya IB. A novel hybrid approach for feature selection enhancement: COVID-19 case study. Comput Methods Biomech Biomed Engin 2022:1-15. [PMID: 35993576 DOI: 10.1080/10255842.2022.2112185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Feature selection is a promising Artificial Intelligence technique for screening, analysing, predicting, and tracking current COVID-19 patients and likely future patients. Significant applications are developed to track data of confirmed, recovered, and death cases. In this work, we propose a new feature selection method based on a new way of hybridization between filter and wrapper methods. The proposed approach is expected to achieve high classification accuracy with a small feature subset. Specifically, the main contribution of this work is a four steps-based approach organized as follows: First, we remove consecutively duplicate and constant features. Then, we select the highest-ranked feature with Mutual Information. In the last step, we run the 'Backward Feature Elimination' algorithm to delete features from the active subset until a stopping criterion based on the degradation of classification performance is met. We applied the proposed approach to a COVID-19 dataset to test its ability to find the relevant feature for characterizing the disease, such as new cases infected with the virus, people vaccinated, and the number of deaths, to better assess the situation. For evaluation purposes, experiments are conducted at the first stage on the COVID-19 dataset, then on six benchmark datasets that have a high dimensional and large size. The method performance is tracked and measured on these datasets and a comparison with many approaches is provided.
Collapse
Affiliation(s)
- Hela Limam
- Institut Supérieur d'Informatique, Université de Tunis El Manar, Tunisia and Laboratoire BestMod, Institut Supérieur de Gestion de Tunis, Tunis, Tunisia
| | - Oumaima Hasni
- Laboratoire BestMod, Institut Supérieur de Gestion de Tunis, Tunis, Tunisia
| | - Ines Ben Alaya
- Higher Institute of Medical Technology of Tunis, Laboratory of Biophysics and Medical Technology, Tunis El Manar University, Tunis, Tunisia
| |
Collapse
|
35
|
MICQ-IPSO: An effective two-stage hybrid feature selection algorithm for high-dimensional data. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.05.048] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
36
|
Gao W, Dang Q, Gong M. An adaptive framework to select the coordinate systems for evolutionary algorithms. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
37
|
A novel biomarker selection method combining graph neural network and gene relationships applied to microarray data. BMC Bioinformatics 2022; 23:303. [PMID: 35883022 PMCID: PMC9327232 DOI: 10.1186/s12859-022-04848-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Accepted: 07/15/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The discovery of critical biomarkers is significant for clinical diagnosis, drug research and development. Researchers usually obtain biomarkers from microarray data, which comes from the dimensional curse. Feature selection in machine learning is usually used to solve this problem. However, most methods do not fully consider feature dependence, especially the real pathway relationship of genes. RESULTS Experimental results show that the proposed method is superior to classical algorithms and advanced methods in feature number and accuracy, and the selected features have more significance. METHOD This paper proposes a feature selection method based on a graph neural network. The proposed method uses the actual dependencies between features and the Pearson correlation coefficient to construct graph-structured data. The information dissemination and aggregation operations based on graph neural network are applied to fuse node information on graph structured data. The redundant features are clustered by the spectral clustering method. Then, the feature ranking aggregation model using eight feature evaluation methods acts on each clustering sub-cluster for different feature selection. CONCLUSION The proposed method can effectively remove redundant features. The algorithm's output has high stability and classification accuracy, which can potentially select potential biomarkers.
Collapse
|
38
|
Rezaee K, Jeon G, Khosravi MR, Attar HH, Sabzevari A. Deep learning‐based microarray cancer classification and ensemble gene selection approach. IET Syst Biol 2022; 16:120-131. [PMID: 35790076 PMCID: PMC9290776 DOI: 10.1049/syb2.12044] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/04/2022] [Accepted: 05/31/2022] [Indexed: 12/19/2022] Open
Abstract
Malignancies and diseases of various genetic origins can be diagnosed and classified with microarray data. There are many obstacles to overcome due to the large size of the gene and the small number of samples in the microarray. A combination strategy for gene expression in a variety of diseases is described in this paper, consisting of two steps: identifying the most effective genes via soft ensembling and classifying them with a novel deep neural network. The feature selection approach combines three strategies to select wrapper genes and rank them according to the k‐nearest neighbour algorithm, resulting in a very generalisable model with low error levels. Using soft ensembling, the most effective subsets of genes were identified from three microarray datasets of diffuse large cell lymphoma, leukaemia, and prostate cancer. A stacked deep neural network was used to classify all three datasets, achieving an average accuracy of 97.51%, 99.6%, and 96.34%, respectively. In addition, two previously unreported datasets from small, round blue cell tumors (SRBCTs)and multiple sclerosis‐related brain tissue lesions were examined to show the generalisability of the model method.
Collapse
Affiliation(s)
- Khosro Rezaee
- Department of Biomedical Engineering Meybod University Meybod Iran
| | - Gwanggil Jeon
- Department of Embedded Systems Engineering College of Information Technology Incheon National University Incheon Korea
| | | | - Hani H. Attar
- Department of Energy Engineering Zarqa University Zarqa Jordan
| | | |
Collapse
|
39
|
Ahmed H, Soliman H, Elmogy M. Early detection of Alzheimer's disease using single nucleotide polymorphisms analysis based on gradient boosting tree. Comput Biol Med 2022; 146:105622. [PMID: 35751201 DOI: 10.1016/j.compbiomed.2022.105622] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Revised: 03/25/2022] [Accepted: 03/29/2022] [Indexed: 11/18/2022]
Abstract
Alzheimer's disease (AD) is a degenerative disorder that attacks nerve cells in the brain. AD leads to memory loss and cognitive & intellectual impairments that can influence social activities and decision-making. The most common type of human genetic variation is single nucleotide polymorphisms (SNPs). SNPs are beneficial markers of complex gene-disease. Many common and serious diseases, such as AD, have associated SNPs. Detection of SNP biomarkers linked with AD could help in the early prediction and diagnosis of this disease. The main objective of this paper is to predict and diagnose AD based on SNPs biomarkers with high classification accuracy in the early stages. One of the most concerning problems is the high number of features. Thus, the paper proposes a comprehensive framework for early AD detection and detecting the most significant genes based on SNPs analysis. Usage of machine learning (ML) techniques to identify new biomarkers of AD is also suggested. In the proposed system, two feature selection techniques are separately checked: the information gain filter and Boruta wrapper. The two feature selection techniques were used to select the most significant genes related to AD in this system. Filter methods measure the relevance of features by their correlation with dependent variables, while wrapper methods measure the usefulness of a subset of features by training a model on it. Gradient boosting tree (GBT) has been applied on all AD genetic data of neuroimaging initiative phase 1 (ADNI-1) and Whole-Genome Sequencing (WGS) datasets by using two feature selection techniques. In the whole-genome approach ADNI-1, results revealed that the GBT learning algorithm scored an overall accuracy of 99.06% in the case of using Boruta feature selection. Using information gain feature selection, the proposed system achieved an average accuracy of 94.87%. The results show that the proposed system is preferable for the early detection of AD. Also, the results revealed that the Boruta wrapper feature selection is superior to the information gain filter technique.
Collapse
Affiliation(s)
- Hala Ahmed
- Information Technology Dept., Faculty of Computers and Information, Mansoura University, Mansoura, P.O.35516, Egypt
| | - Hassan Soliman
- Information Technology Dept., Faculty of Computers and Information, Mansoura University, Mansoura, P.O.35516, Egypt
| | - Mohammed Elmogy
- Information Technology Dept., Faculty of Computers and Information, Mansoura University, Mansoura, P.O.35516, Egypt.
| |
Collapse
|
40
|
Binary Aquila Optimizer for Selecting Effective Features from Medical Data: A COVID-19 Case Study. MATHEMATICS 2022. [DOI: 10.3390/math10111929] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Medical technological advancements have led to the creation of various large datasets with numerous attributes. The presence of redundant and irrelevant features in datasets negatively influences algorithms and leads to decreases in the performance of the algorithms. Using effective features in data mining and analyzing tasks such as classification can increase the accuracy of the results and relevant decisions made by decision-makers using them. This increase can become more acute when dealing with challenging, large-scale problems in medical applications. Nature-inspired metaheuristics show superior performance in finding optimal feature subsets in the literature. As a seminal attempt, a wrapper feature selection approach is presented on the basis of the newly proposed Aquila optimizer (AO) in this work. In this regard, the wrapper approach uses AO as a search algorithm in order to discover the most effective feature subset. S-shaped binary Aquila optimizer (SBAO) and V-shaped binary Aquila optimizer (VBAO) are two binary algorithms suggested for feature selection in medical datasets. Binary position vectors are generated utilizing S- and V-shaped transfer functions while the search space stays continuous. The suggested algorithms are compared to six recent binary optimization algorithms on seven benchmark medical datasets. In comparison to the comparative algorithms, the gained results demonstrate that using both proposed BAO variants can improve the classification accuracy on these medical datasets. The proposed algorithm is also tested on the real-dataset COVID-19. The findings testified that SBAO outperforms comparative algorithms regarding the least number of selected features with the highest accuracy.
Collapse
|
41
|
Azadifar S, Rostami M, Berahmand K, Moradi P, Oussalah M. Graph-based relevancy-redundancy gene selection method for cancer diagnosis. Comput Biol Med 2022; 147:105766. [DOI: 10.1016/j.compbiomed.2022.105766] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 06/12/2022] [Accepted: 06/18/2022] [Indexed: 11/26/2022]
|
42
|
Rashno A, Shafipour M, Fadaei S. Particle ranking: An Efficient Method for Multi-Objective Particle Swarm Optimization Feature Selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108640] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
43
|
Nature-inspired metaheuristics model for gene selection and classification of biomedical microarray data. Med Biol Eng Comput 2022; 60:1627-1646. [PMID: 35399141 DOI: 10.1007/s11517-022-02555-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 03/16/2022] [Indexed: 12/19/2022]
Abstract
Identifying a small subset of informative genes from a gene expression dataset is an important process for sample classification in the fields of bioinformatics and machine learning. In this process, there are two objectives: first, to minimize the number of selected genes, and second, to maximize the classification accuracy of the used classifier. In this paper, a hybrid machine learning framework based on a nature-inspired cuckoo search (CS) algorithm has been proposed to resolve this problem. The proposed framework is obtained by incorporating the cuckoo search (CS) algorithm with an artificial bee colony (ABC) in the exploitation and exploration of the genetic algorithm (GA). These strategies are used to maintain an appropriate balance between the exploitation and exploration phases of the ABC and GA algorithms in the search process. In preprocessing, the independent component analysis (ICA) method extracts the important genes from the dataset. Then, the proposed gene selection algorithms along with the Naive Bayes (NB) classifier and leave-one-out cross-validation (LOOCV) have been applied to find a small set of informative genes that maximize the classification accuracy. To conduct a comprehensive performance study, proposed algorithms have been applied on six benchmark datasets of gene expression. The experimental comparison shows that the proposed framework (ICA and CS-based hybrid algorithm with NB classifier) performs a deeper search in the iterative process, which can avoid premature convergence and produce better results compared to the previously published feature selection algorithm for the NB classifier.
Collapse
|
44
|
Liu S, Yao W. Prediction of lung cancer using gene expression and deep learning with KL divergence gene selection. BMC Bioinformatics 2022; 23:175. [PMID: 35549644 PMCID: PMC9103042 DOI: 10.1186/s12859-022-04689-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Accepted: 04/13/2022] [Indexed: 11/24/2022] Open
Abstract
Background Lung cancer is one of the cancers with the highest mortality rate in China. With the rapid development of high-throughput sequencing technology and the research and application of deep learning methods in recent years, deep neural networks based on gene expression have become a hot research direction in lung cancer diagnosis in recent years, which provide an effective way of early diagnosis for lung cancer. Thus, building a deep neural network model is of great significance for the early diagnosis of lung cancer. However, the main challenges in mining gene expression datasets are the curse of dimensionality and imbalanced data. The existing methods proposed by some researchers can’t address the problems of high-dimensionality and imbalanced data, because of the overwhelming number of variables measured (genes) versus the small number of samples, which result in poor performance in early diagnosis for lung cancer. Method Given the disadvantages of gene expression data sets with small datasets, high-dimensionality and imbalanced data, this paper proposes a gene selection method based on KL divergence, which selects some genes with higher KL divergence as model features. Then build a deep neural network model using Focal Loss as loss function, at the same time, we use k-fold cross validation method to verify and select the best model, we set the value of k is five in this paper. Result The deep learning model method based on KL divergence gene selection proposed in this paper has an AUC of 0.99 on the validation set. The generalization performance of model is high. Conclusion The deep neural network model based on KL divergence gene selection proposed in this paper is proved to be an accurate and effective method for lung cancer prediction.
Collapse
Affiliation(s)
- Suli Liu
- College of Public Health, Zhengzhou University, Zhengzhou, 450001, China
| | - Wu Yao
- College of Public Health, Zhengzhou University, Zhengzhou, 450001, China.
| |
Collapse
|
45
|
A Hybrid Feature Selection Framework Using Improved Sine Cosine Algorithm with Metaheuristic Techniques. ENERGIES 2022. [DOI: 10.3390/en15103485] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Feature selection is the procedure of extracting the optimal subset of features from an elementary feature set, to reduce the dimensionality of the data. It is an important part of improving the classification accuracy of classification algorithms for big data. Hybrid metaheuristics is one of the most popular methods for dealing with optimization issues. This article proposes a novel feature selection technique called MetaSCA, derived from the standard sine cosine algorithm (SCA). Founded on the SCA, the golden sine section coefficient is added, to diminish the search area for feature selection. In addition, a multi-level adjustment factor strategy is adopted to obtain an equilibrium between exploration and exploitation. The performance of MetaSCA was assessed using the following evaluation indicators: average fitness, worst fitness, optimal fitness, classification accuracy, average proportion of optimal feature subsets, feature selection time, and standard deviation. The performance was measured on the UCI data set and then compared with three algorithms: the sine cosine algorithm (SCA), particle swarm optimization (PSO), and whale optimization algorithm (WOA). It was demonstrated by the simulation data results that the MetaSCA technique had the best accuracy and optimal feature subset in feature selection on the UCI data sets, in most of the cases.
Collapse
|
46
|
T-Friedman Test: A New Statistical Test for Multiple Comparison with an Adjustable Conservativeness Measure. INT J COMPUT INT SYS 2022. [DOI: 10.1007/s44196-022-00083-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
AbstractTo prove that a certain algorithm is superior to the benchmark algorithms, the statistical hypothesis tests are commonly adopted with experimental results on a number of datasets. Some statistical hypothesis tests draw statistical test results more conservative than the others, while it is not yet possible to characterize quantitatively the degree of conservativeness of such a statistical test. On the basis of the existing nonparametric statistical tests, this paper proposes a new statistical test for multiple comparison which is named as t-Friedman test. T-Friedman test combines t test with Friedman test for multiple comparison. The confidence level of the t test is adopted as a measure of conservativeness of the proposed t-Friedman test. A bigger confidence level infers a higher degree of conservativeness, and vice versa. Based on the synthetic results generated by Monte Carlo simulations with predefined distributions, the performance of several state-of-the-art multiple comparison tests and post hoc procedures are first qualitatively analyzed. The influences of the type of predefined distribution, the number of benchmark algorithms and the number of datasets are explored in the experiments. The conservativeness measure of the proposed method is also validated and verified in the experiments. Finally, some suggestions for the application of these nonparametric statistical tests are provided.
Collapse
|
47
|
Li X, Li K. Imbalanced data classification based on improved EIWAPSO-AdaBoost-C ensemble algorithm. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02708-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
48
|
Alrefai N, Ibrahim O. Optimized feature selection method using particle swarm intelligence with ensemble learning for cancer classification based on microarray datasets. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07147-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
49
|
Adaptive feature selection framework for DNA methylation-based age prediction. Soft comput 2022. [DOI: 10.1007/s00500-022-06844-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
50
|
Yan C, Li M, Ma J, Liao Y, Luo H, Wang J, Luo J. A Novel Feature Selection Method Based on MRMR and Enhanced Flower Pollination Algorithm for High Dimensional Biomedical Data. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210624130124] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The massive amount of biomedical data accumulated in the past decades can
be utilized for diagnosing disease.
Objective:
However, the high dimensionality, small sample sizes, and irrelevant features of data often have
a negative influence on the accuracy and speed of disease prediction. Some existing machine learning
models cannot capture the patterns on these datasets accurately without utilizing feature selection.
Methods:
Filter and wrapper are two prevailing feature selection methods. The filter method is fast but
has low prediction accuracy, while the latter can obtain high accuracy but has a formidable computation
cost. Given the drawbacks of using filter or wrapper individually, a novel feature selection method,
called MRMR-EFPATS, is proposed, which hybridizes filter method Minimum Redundancy Maximum
Relevance (MRMR) and wrapper method based on an improved Flower Pollination Algorithm (FPA).
First, MRMR is employed to rank and screen out some important features quickly. These features are
further chosen for individual populations following the wrapper method for faster convergence and less
computational time. Then, due to its efficiency and flexibility, FPA is adopted to further discover an optimal
feature subset.
Result:
FPA still has some drawbacks, such as slow convergence rate, inadequacy in terms of searching
new solutions, and tends to be trapped in local optima. In our work, an elite strategy is adopted to
improve the convergence speed of the FPA. Tabu search and Adaptive Gaussian Mutation are employed
to improve the search capability of FPA and escape from local optima. Here, the KNN classifier with
the 5-fold-CV is utilized to evaluate the classification accuracy.
Conclusion:
Extensive experimental results on six public high dimensional biomedical datasets show
that the proposed MRMR-EFPATS has achieved superior performance compared to other state-of-theart
methods.
Collapse
Affiliation(s)
- Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Mengyuan Li
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | | | - Yi Liao
- Academy of Arts & Design, Tsinghua University, Beijing, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Jianlin Wang
- School of Computer and Information Engineering, Henan University, Kaifeng, China
| | - Junwei Luo
- College of Computer Science
and Technology, Henan Polytechnic University, Jiaozuo, China
| |
Collapse
|