1
|
Geng Y, Li Y, Deng C. An Improved Binary Walrus Optimizer with Golden Sine Disturbance and Population Regeneration Mechanism to Solve Feature Selection Problems. Biomimetics (Basel) 2024; 9:501. [PMID: 39194480 DOI: 10.3390/biomimetics9080501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024] Open
Abstract
Feature selection (FS) is a significant dimensionality reduction technique in machine learning and data mining that is adept at managing high-dimensional data efficiently and enhancing model performance. Metaheuristic algorithms have become one of the most promising solutions in FS owing to their powerful search capabilities as well as their performance. In this paper, the novel improved binary walrus optimizer (WO) algorithm utilizing the golden sine strategy, elite opposition-based learning (EOBL), and population regeneration mechanism (BGEPWO) is proposed for FS. First, the population is initialized using an iterative chaotic map with infinite collapses (ICMIC) chaotic map to improve the diversity. Second, a safe signal is obtained by introducing an adaptive operator to enhance the stability of the WO and optimize the trade-off between exploration and exploitation of the algorithm. Third, BGEPWO innovatively designs a population regeneration mechanism to continuously eliminate hopeless individuals and generate new promising ones, which keeps the population moving toward the optimal solution and accelerates the convergence process. Fourth, EOBL is used to guide the escape behavior of the walrus to expand the search range. Finally, the golden sine strategy is utilized for perturbing the population in the late iteration to improve the algorithm's capacity to evade local optima. The BGEPWO algorithm underwent evaluation on 21 datasets of different sizes and was compared with the BWO algorithm and 10 other representative optimization algorithms. The experimental results demonstrate that BGEPWO outperforms these competing algorithms in terms of fitness value, number of selected features, and F1-score in most datasets. The proposed algorithm achieves higher accuracy, better feature reduction ability, and stronger convergence by increasing population diversity, continuously balancing exploration and exploitation processes and effectively escaping local optimal traps.
Collapse
Affiliation(s)
- Yanyu Geng
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Ying Li
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Chunyan Deng
- College of Computer Science and Technology, Jilin University, Changchun 130012, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| |
Collapse
|
2
|
Song S, Wang Q, Zou X, Li Z, Ma Z, Jiang D, Fu Y, Liu Q. High-precision prediction of blood glucose concentration utilizing Fourier transform Raman spectroscopy and an ensemble machine learning algorithm. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2023; 303:123176. [PMID: 37494812 DOI: 10.1016/j.saa.2023.123176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 07/17/2023] [Accepted: 07/19/2023] [Indexed: 07/28/2023]
Abstract
Raman spectroscopy has gained popularity in analyzing blood glucose levels due to its non-invasive identification and minimal interference from water. However, the challenge lies in how to accurately predict blood glucose concentrations in human blood using Raman spectroscopy. This paper researches a novel integrated machine learning algorithm called Bagging-ABC-ELM. The optimal input weights and biases of extreme learning machine (ELM) model are obtained by artificial bee colony (ABC) algorithm. The bagging algorithm is used to obtain a better the stability of the model and higher performance than ELM algorithm. The results show that the mean value of coefficient of determination is 0.9928, and root mean square error is 0.1928. Compared to other regression models, the Bagging-ABC-ELM model exhibited superior prediction accuracy, robustness, and generalization capability. The Bagging-ABC-ELM model presents a promising alternative for analyzing blood glucose levels in clinical and research settings.
Collapse
Affiliation(s)
- Shuai Song
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China
| | - Qiaoyun Wang
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China; Hebei Key Laboratory of Micro-Nano Precision Optical Sensing and Measurement Technology, Qinhuangdao 066004, China.
| | - Xin Zou
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China
| | - Zhigang Li
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China
| | - Zhenhe Ma
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China
| | - Daying Jiang
- Zhongyou BSS (Qinhuangdao) Petropipe Company Limited, Qinhuangdao 066004, China
| | - YongQing Fu
- Faculty of Engineering and Environment, Northumbria University, Newcastle upon Tyne NE1 8ST, UK
| | - Qiang Liu
- College of Information Science and Engineering, Northeastern University, Shenyang, Liaoning Province 110819, China; Hebei Key Laboratory of Micro-Nano Precision Optical Sensing and Measurement Technology, Qinhuangdao 066004, China
| |
Collapse
|
3
|
Ersoz NS, Bakir-Gungor B, Yousef M. GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning. Front Genet 2023; 14:1139082. [PMID: 37671046 PMCID: PMC10476493 DOI: 10.3389/fgene.2023.1139082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 07/05/2023] [Indexed: 09/07/2023] Open
Abstract
Introduction: Identifying significant sets of genes that are up/downregulated under specific conditions is vital to understand disease development mechanisms at the molecular level. Along this line, in order to analyze transcriptomic data, several computational feature selection (i.e., gene selection) methods have been proposed. On the other hand, uncovering the core functions of the selected genes provides a deep understanding of diseases. In order to address this problem, biological domain knowledge-based feature selection methods have been proposed. Unlike computational gene selection approaches, these domain knowledge-based methods take the underlying biology into account and integrate knowledge from external biological resources. Gene Ontology (GO) is one such biological resource that provides ontology terms for defining the molecular function, cellular component, and biological process of the gene product. Methods: In this study, we developed a tool named GeNetOntology which performs GO-based feature selection for gene expression data analysis. In the proposed approach, the process of Grouping, Scoring, and Modeling (G-S-M) is used to identify significant GO terms. GO information has been used as the grouping information, which has been embedded into a machine learning (ML) algorithm to select informative ontology terms. The genes annotated with the selected ontology terms have been used in the training part to carry out the classification task of the ML model. The output is an important set of ontologies for the two-class classification task applied to gene expression data for a given phenotype. Results: Our approach has been tested on 11 different gene expression datasets, and the results showed that GeNetOntology successfully identified important disease-related ontology terms to be used in the classification model. Discussion: GeNetOntology will assist geneticists and scientists to identify a range of disease-related genes and ontologies in transcriptomic data analysis, and it will also help doctors design diagnosis platforms and improve patient treatment plans.
Collapse
Affiliation(s)
- Nur Sebnem Ersoz
- Department of Bioengineering, Graduate School of Engineering and Science, Abdullah Gul University, Kayseri, Türkiye
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Faculty of Engineering, Abdullah Gul University, Kayseri, Türkiye
- Department of Bioengineering, Faculty of Life and Natural Sciences, Abdullah Gul University, Kayseri, Türkiye
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel
| |
Collapse
|
4
|
Kuzudisli C, Bakir-Gungor B, Bulut N, Qaqish B, Yousef M. Review of feature selection approaches based on grouping of features. PeerJ 2023; 11:e15666. [PMID: 37483989 PMCID: PMC10358338 DOI: 10.7717/peerj.15666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 06/08/2023] [Indexed: 07/25/2023] Open
Abstract
With the rapid development in technology, large amounts of high-dimensional data have been generated. This high dimensionality including redundancy and irrelevancy poses a great challenge in data analysis and decision making. Feature selection (FS) is an effective way to reduce dimensionality by eliminating redundant and irrelevant data. Most traditional FS approaches score and rank each feature individually; and then perform FS either by eliminating lower ranked features or by retaining highly-ranked features. In this review, we discuss an emerging approach to FS that is based on initially grouping features, then scoring groups of features rather than scoring individual features. Despite the presence of reviews on clustering and FS algorithms, to the best of our knowledge, this is the first review focusing on FS techniques based on grouping. The typical idea behind FS through grouping is to generate groups of similar features with dissimilarity between groups, then select representative features from each cluster. Approaches under supervised, unsupervised, semi supervised and integrative frameworks are explored. The comparison of experimental results indicates the effectiveness of sequential, optimization-based (i.e., fuzzy or evolutionary), hybrid and multi-method approaches. When it comes to biological data, the involvement of external biological sources can improve analysis results. We hope this work's findings can guide effective design of new FS approaches using feature grouping.
Collapse
Affiliation(s)
- Cihan Kuzudisli
- Department of Computer Engineering, Hasan Kalyoncu University, Gaziantep, Turkey
- Department of Electrical and Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Burcu Bakir-Gungor
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Nurten Bulut
- Department of Computer Engineering, Abdullah Gul University, Kayseri, Turkey
| | - Bahjat Qaqish
- Department of Biostatistics, University of North Carolina at Chapel Hill, North Carolina, Chapel Hill, United States of America
| | - Malik Yousef
- Department of Information Systems, Zefat Academic College, Zefat, Israel
- Galilee Digital Health Research Center, Zefat Academic College, Zefat, Israel
| |
Collapse
|
5
|
Zheng K, Li B, Li Y, Chang P, Sun G, Li H, Zhang J. Fall detection based on dynamic key points incorporating preposed attention. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:11238-11259. [PMID: 37322980 DOI: 10.3934/mbe.2023498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Accidental falls pose a significant threat to the elderly population, and accurate fall detection from surveillance videos can significantly reduce the negative impact of falls. Although most fall detection algorithms based on video deep learning focus on training and detecting human posture or key points in pictures or videos, we have found that the human pose-based model and key points-based model can complement each other to improve fall detection accuracy. In this paper, we propose a preposed attention capture mechanism for images that will be fed into the training network, and a fall detection model based on this mechanism. We accomplish this by fusing the human dynamic key point information with the original human posture image. We first propose the concept of dynamic key points to account for incomplete pose key point information in the fall state. We then introduce an attention expectation that predicates the original attention mechanism of the depth model by automatically labeling dynamic key points. Finally, the depth model trained with human dynamic key points is used to correct the detection errors of the depth model with raw human pose images. Our experiments on the Fall Detection Dataset and the UP-Fall Detection Dataset demonstrate that our proposed fall detection algorithm can effectively improve the accuracy of fall detection and provide better support for elderly care.
Collapse
Affiliation(s)
- Kun Zheng
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Bin Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Yu Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Peng Chang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Guangmin Sun
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Hui Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Junjie Zhang
- Smart Learning Institute, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
6
|
Murshed BAH, Mallappa S, Abawajy J, Saif MAN, Al-ariki HDE, Abdulwahab HM. Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis. Artif Intell Rev 2022; 56:5133-5260. [PMID: 36320612 PMCID: PMC9607740 DOI: 10.1007/s10462-022-10254-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/17/2022] [Indexed: 12/01/2022]
Abstract
Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, sparse, and low density. Since many real-world applications need semantic interpretation of such short texts, research in Short Text Topic Modeling (STTM) has recently gained a lot of interest to reveal unique and cohesive latent topics. This article examines the current state of the art in STTM algorithms. It presents a comprehensive survey and taxonomy of STTM algorithms for short text topic modelling. The article also includes a qualitative and quantitative study of the STTM algorithms, as well as analyses of the various strengths and drawbacks of STTM techniques. Moreover, a comparative analysis of the topic quality and performance of representative STTM models is presented. The performance evaluation is conducted on two real-world Twitter datasets: the Real-World Pandemic Twitter (RW-Pand-Twitter) dataset and Real-world Cyberbullying Twitter (RW-CB-Twitter) dataset in terms of several metrics such as topic coherence, purity, NMI, and accuracy. Finally, the open challenges and future research directions in this promising field are discussed to highlight the trends of research in STTM. The work presented in this paper is useful for researchers interested in learning state-of-the-art short text topic modelling and researchers focusing on developing new algorithms for short text topic modelling.
Collapse
Affiliation(s)
- Belal Abdullah Hezam Murshed
- Department of Studies in Computer Science, Mysore University, Mysore, 570006 Karnataka India
- Department of Computer Science, College of Engineering and IT, Amran University, Amran, Yemen
| | - Suresha Mallappa
- Department of Studies in Computer Science, Mysore University, Mysore, 570006 Karnataka India
| | - Jemal Abawajy
- School of Information Technology, Faculty of Science, Engineering and Built Environment, Deakin University, Geelong, VIC 3220 Australia
| | - Mufeed Ahmed Naji Saif
- Department of Computer Applications, Sri Jayachamarajendra College of Engineering, VTU, Mysore, Karnataka India
| | - Hasib Daowd Esmail Al-ariki
- Department of Computer Networks and Distributed Systems, Al Saeed Faculty for Engineering and IT, Taiz University, Taiz, Yemen
- Department of Computer Networks Engineering and Technologies, Sana’a Community College, Sana’a, Yemen
| | | |
Collapse
|