1
|
Rakhshaninejad M, Fathian M, Shirkoohi R, Barzinpour F, Gandomi AH. Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach. BMC Bioinformatics 2024; 25:33. [PMID: 38253993 PMCID: PMC10810249 DOI: 10.1186/s12859-024-05657-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 01/15/2024] [Indexed: 01/24/2024] Open
Abstract
Breast cancer remains a major public health challenge worldwide. The identification of accurate biomarkers is critical for the early detection and effective treatment of breast cancer. This study utilizes an integrative machine learning approach to analyze breast cancer gene expression data for superior biomarker and drug target discovery. Gene expression datasets, obtained from the GEO database, were merged post-preprocessing. From the merged dataset, differential expression analysis between breast cancer and normal samples revealed 164 differentially expressed genes. Meanwhile, a separate gene expression dataset revealed 350 differentially expressed genes. Additionally, the BGWO_SA_Ens algorithm, integrating binary grey wolf optimization and simulated annealing with an ensemble classifier, was employed on gene expression datasets to identify predictive genes including TOP2A, AKR1C3, EZH2, MMP1, EDNRB, S100B, and SPP1. From over 10,000 genes, BGWO_SA_Ens identified 1404 in the merged dataset (F1 score: 0.981, PR-AUC: 0.998, ROC-AUC: 0.995) and 1710 in the GSE45827 dataset (F1 score: 0.965, PR-AUC: 0.986, ROC-AUC: 0.972). The intersection of DEGs and BGWO_SA_Ens selected genes revealed 35 superior genes that were consistently significant across methods. Enrichment analyses uncovered the involvement of these superior genes in key pathways such as AMPK, Adipocytokine, and PPAR signaling. Protein-protein interaction network analysis highlighted subnetworks and central nodes. Finally, a drug-gene interaction investigation revealed connections between superior genes and anticancer drugs. Collectively, the machine learning workflow identified a robust gene signature for breast cancer, illuminated their biological roles, interactions and therapeutic associations, and underscored the potential of computational approaches in biomarker discovery and precision oncology.
Collapse
Affiliation(s)
- Morteza Rakhshaninejad
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Mohammad Fathian
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran.
| | - Reza Shirkoohi
- Cancer Biology Research Center, Cancer Institute, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Keshavarz Boulevard, Tehran, 1419733141, Tehran, Iran
| | - Farnaz Barzinpour
- Industrial Engineering Department, Iran University of Science and Technology, Hengam Street, Tehran, 1684613114, Tehran, Iran
| | - Amir H Gandomi
- Faculty of Engineering and Information Technology, University of Technology Sydney, Ultimo, 2007, NSW, Australia
- University Research and Innovation Center (EKIK), Óbuda University, Budapest, 1034, Hungary
| |
Collapse
|
2
|
Rahimi MR, Makarem D, Sarspy S, Mahdavi SA, Albaghdadi MF, Armaghan SM. Classification of cancer cells and gene selection based on microarray data using MOPSO algorithm. J Cancer Res Clin Oncol 2023; 149:15171-15184. [PMID: 37634207 DOI: 10.1007/s00432-023-05308-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 08/16/2023] [Indexed: 08/29/2023]
Abstract
PURPOSE Microarray information is crucial for the identification and categorisation of malignant tissues. The very limited sample size in the microarray has always been a challenge for classification design in cancer research. As a result, by pre-processing gene selection approaches and genes lacking their information, the microarray data are deleted prior to categorisation. In essence, an appropriate gene selection technique can significantly increase the accuracy of illness (cancer) classification. METHODS For the classification of high-dimensional microarray data, a novel approach based on the hybrid model of multi-objective particle swarm optimisation (MOPSO) is proposed in this research. First, a binary vector representing each particle's position is presented at random. A gene is represented by each bit. Bit 0 denotes the absence of selection of the characteristic (gene) corresponding to it, while bit 1 denotes the selection of the gene. Therefore, the position of each particle represents a set of genes, and the linear Bayesian discriminant analysis classification algorithm calculates each particle's degree of fitness to assess the quality of the gene set that particle has chosen. The suggested methodology is applied to four different cancer database sets, and the results are contrasted with those of other approaches currently in use. RESULTS The proposed algorithm has been applied on four sets of cancer database and its results have been compared with other existing methods. The results of the implementation show that the improvement of classification accuracy in the proposed algorithm compared to other methods for four sets of databases is 25.84% on average. So that it has improved by 18.63% in the blood cancer database, 24.25% in the lung cancer database, 27.73% in the breast cancer database, and 32.80% in the prostate cancer database. Therefore, the proposed algorithm is able to identify a small set of genes containing information in a way choose to increase the classification accuracy. CONCLUSION Our proposed solution is used for data classification, which also improves classification accuracy. This is possible because the MOPSO model removes redundancy and reduces the number of redundant and redundant genes by considering how genes are correlated with each other.
Collapse
Affiliation(s)
| | - Dorna Makarem
- Escuela Tecnica Superior de Ingenieros de Telecomunicacion Politecnica de Madrid, Madrid, Spain
| | - Sliva Sarspy
- Department of Computer Science, College of Science, Cihan University-Erbil, Erbil, Iraq
| | | | | | | |
Collapse
|
3
|
Wang R, Wang H, Shi L, Han C, He Q, Che Y, Luo L. A novel framework of MOPSO-GDM in recognition of Alzheimer's EEG-based functional network. Front Aging Neurosci 2023; 15:1160534. [PMID: 37455939 PMCID: PMC10339813 DOI: 10.3389/fnagi.2023.1160534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/13/2023] [Indexed: 07/18/2023] Open
Abstract
Background Most patients with Alzheimer's disease (AD) have an insidious onset and frequently atypical clinical symptoms, which are considered a normal consequence of aging, making it difficult to diagnose AD medically. But then again, accurate diagnosis is critical to prevent degeneration and provide early treatment for AD patients. Objective This study aims to establish a novel EEG-based classification framework with deep learning methods for AD recognition. Methods First, considering the network interactions in different frequency bands (δ, θ, α, β, and γ), multiplex networks are reconstructed by the phase synchronization index (PSI) method, and fourteen topology features are extracted subsequently, forming a high-dimensional feature vector. However, in feature combination, not all features can provide effective information for recognition. Moreover, combining features by manual selection is time-consuming and laborious. Thus, a feature selection optimization algorithm called MOPSO-GDM was proposed by combining multi-objective particle swarm optimization (MOPSO) algorithm with Gaussian differential mutation (GDM) algorithm. In addition to considering the classification error rates of support vector machine, naive bayes, and discriminant analysis classifiers, our algorithm also considers distance measure as an optimization objective. Results Finally, this method proposed achieves an excellent classification error rate of 0.0531 (5.31%) with the feature vector size of 8, by a ten-fold cross-validation strategy. Conclusion These findings show that our framework can adaptively combine the best brain network features to explore network synchronization, functional interactions, and characterize brain functional abnormalities, which can improve the recognition efficiency of diseases. While improving the classification accuracy of application algorithms, we aim to expand our understanding of the brain function of patients with neurological disorders through the analysis of brain networks.
Collapse
Affiliation(s)
- Ruofan Wang
- School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Haodong Wang
- School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Lianshuan Shi
- School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Chunxiao Han
- Tianjin Key Laboratory of Information Sensing and Intelligent Control, School of Automation and Electrical Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Qiguang He
- School of Information Technology Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Yanqiu Che
- Tianjin Key Laboratory of Information Sensing and Intelligent Control, School of Automation and Electrical Engineering, Tianjin University of Technology and Education, Tianjin, China
| | - Li Luo
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
| |
Collapse
|
4
|
Fu Q, Li Q, Li X. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data. Comput Biol Med 2023; 160:107020. [PMID: 37196457 DOI: 10.1016/j.compbiomed.2023.107020] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/09/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Collapse
Affiliation(s)
- Qiyong Fu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Qi Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Xiaobo Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
5
|
Chen Z, Xuan P, Heidari AA, Liu L, Wu C, Chen H, Escorcia-Gutierrez J, Mansour RF. An artificial bee bare-bone hunger games search for global optimization and high-dimensional feature selection. iScience 2023; 26:106679. [PMID: 37216098 PMCID: PMC10193239 DOI: 10.1016/j.isci.2023.106679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/01/2023] [Accepted: 04/12/2023] [Indexed: 05/24/2023] Open
Abstract
The domains of contemporary medicine and biology have generated substantial high-dimensional genetic data. Identifying representative genes and decreasing the dimensionality of the data can be challenging. The goal of gene selection is to minimize computing costs and enhance classification precision. Therefore, this article designs a new wrapper gene selection algorithm named artificial bee bare-bone hunger games search (ABHGS), which is the hunger games search (HGS) integrated with an artificial bee strategy and a Gaussian bare-bone structure to address this issue. To evaluate and validate the performance of our proposed method, ABHGS is compared to HGS and a single strategy embedded in HGS, six classic algorithms, and ten advanced algorithms on the CEC 2017 functions. The experimental results demonstrate that the bABHGS outperforms the original HGS. Compared to peers, it increases classification accuracy and decreases the number of selected features, indicating its actual engineering utility in spatial search and feature selection.
Collapse
Affiliation(s)
- Zhiqing Chen
- School of Intelligent Manufacturing, Wenzhou Polytechnic, Wenzhou 325035, China
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| | - Ali Asghar Heidari
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - Lei Liu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Chengwen Wu
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - Huiling Chen
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325035, China
| | - José Escorcia-Gutierrez
- Department of Computational Science and Electronics, Universidad de la Costa, CUC, Barranquilla 080002, Colombia
| | - Romany F. Mansour
- Department of Mathematics, Faculty of Science, New Valley University, El-Kharga 72511, Egypt
| |
Collapse
|
6
|
Devi SS, Prithiviraj K.. Breast Cancer Classification With Microarray Gene Expression Data Based on Improved Whale Optimization Algorithm. INTERNATIONAL JOURNAL OF SWARM INTELLIGENCE RESEARCH 2023. [DOI: 10.4018/ijsir.317091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Breast cancer is one of the most common and dangerous cancer types in women worldwide. Since it is generally a genetic disease, microarray technology-based cancer prediction is technically significant among lot of diagnosis methods. The microarray gene expression data contains fewer samples with many redundant and noisy genes. It leads to inaccurate diagnose and low prediction accuracy. To overcome these difficulties, this paper proposes an Improved Whale Optimization Algorithm (IWOA) for wrapper based feature selection in gene expression data. The proposed IWOA incorporates modified cross over and mutation operations to enhance the exploration and exploitation of classical WOA. The proposed IWOA adapts multiobjective fitness function, which simultaneously balance between minimization of error rate and feature selection. The experimental analysis demonstrated that, the proposed IWOA with Gradient Boost Classifier (GBC) achieves high classification accuracy of 97.7% with minimum subset of features and also converges quickly for the breast cancer dataset.
Collapse
Affiliation(s)
- S. Sathiya Devi
- University College of Engineering, Birla Institute of Technology, Trichy, India
| | - Prithiviraj K.
- University College of Engineering, Birla Institute of Technology, Trichy, India
| |
Collapse
|
7
|
Panigrahi A, Pati A, Sahu B, Das MN, Nayak DSK, Sahoo G, Kant S. En-MinWhale: An Ensemble Approach Based on MRMR and Whale Optimization for Cancer Diagnosis. IEEE ACCESS 2023; 11:113526-113542. [DOI: 10.1109/access.2023.3318261] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2025]
Affiliation(s)
- Amrutanshu Panigrahi
- Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| | - Abhilash Pati
- Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| | - Bibhuprasad Sahu
- Department of AI&DS, Vardhaman College of Engineering (Autonomous), Hyderabad, Telangana, India
| | - Manmath Nath Das
- Department of AI&DS, VNR Vignana Jyothi Institute of Engineering and Technology, Hyderabad, Telangana, India
| | - Debasish Swapnesh Kumar Nayak
- Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| | - Ghanashyam Sahoo
- Department of Computer Science and Engineering, GITA Autonomous College, Bhubaneswar, Odisha, India
| | - Shashi Kant
- Department of Management, College of Business and Economics, Bule Hora University, Bule Hora, Ethiopia
| |
Collapse
|
8
|
Dual Regularized Unsupervised Feature Selection Based on Matrix Factorization and Minimum Redundancy with application in gene selection. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Qiu F, Zheng P, Heidari AA, Liang G, Chen H, Karim FK, Elmannai H, Lin H. Mutational Slime Mould Algorithm for Gene Selection. Biomedicines 2022; 10:2052. [PMID: 36009599 PMCID: PMC9406076 DOI: 10.3390/biomedicines10082052] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/14/2022] [Accepted: 08/16/2022] [Indexed: 02/02/2023] Open
Abstract
A large volume of high-dimensional genetic data has been produced in modern medicine and biology fields. Data-driven decision-making is particularly crucial to clinical practice and relevant procedures. However, high-dimensional data in these fields increase the processing complexity and scale. Identifying representative genes and reducing the data's dimensions is often challenging. The purpose of gene selection is to eliminate irrelevant or redundant features to reduce the computational cost and improve classification accuracy. The wrapper gene selection model is based on a feature set, which can reduce the number of features and improve classification accuracy. This paper proposes a wrapper gene selection method based on the slime mould algorithm (SMA) to solve this problem. SMA is a new algorithm with a lot of application space in the feature selection field. This paper improves the original SMA by combining the Cauchy mutation mechanism with the crossover mutation strategy based on differential evolution (DE). Then, the transfer function converts the continuous optimizer into a binary version to solve the gene selection problem. Firstly, the continuous version of the method, ISMA, is tested on 33 classical continuous optimization problems. Then, the effect of the discrete version, or BISMA, was thoroughly studied by comparing it with other gene selection methods on 14 gene expression datasets. Experimental results show that the continuous version of the algorithm achieves an optimal balance between local exploitation and global search capabilities, and the discrete version of the algorithm has the highest accuracy when selecting the least number of genes.
Collapse
Affiliation(s)
- Feng Qiu
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Pan Zheng
- Information Systems, University of Canterbury, Christchurch 8014, New Zealand
| | - Ali Asghar Heidari
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Guoxi Liang
- Department of Information Technology, Wenzhou Polytechnic, Wenzhou 325035, China
| | - Huiling Chen
- Department of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou 325035, China
| | - Faten Khalid Karim
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Hela Elmannai
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Haiping Lin
- Department of Information Engineering, Hangzhou Vocational & Technical College, Hangzhou 310018, China
| |
Collapse
|
10
|
Uncertainty measurement for a gene space based on class-consistent technology: an application in gene selection. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03657-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
11
|
Rostami M, Forouzandeh S, Berahmand K, Soltani M, Shahsavari M, Oussalah M. Gene selection for microarray data classification via multi-objective graph theoretic-based method. Artif Intell Med 2022; 123:102228. [PMID: 34998517 DOI: 10.1016/j.artmed.2021.102228] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 11/23/2021] [Accepted: 11/27/2021] [Indexed: 12/20/2022]
Abstract
In recent decades, the improvement of computer technology has increased the growth of high-dimensional microarray data. Thus, data mining methods for DNA microarray data classification usually involve samples consisting of thousands of genes. One of the efficient strategies to solve this problem is gene selection, which improves the accuracy of microarray data classification and also decreases computational complexity. In this paper, a novel social network analysis-based gene selection approach is proposed. The proposed method has two main objectives of the relevance maximization and redundancy minimization of the selected genes. In this method, on each iteration, a maximum community is selected repetitively. Then among the existing genes in this community, the appropriate genes are selected by using the node centrality-based criterion. The reported results indicate that the developed gene selection algorithm while increasing the classification accuracy of microarray data, will also decrease the time complexity.
Collapse
Affiliation(s)
- Mehrdad Rostami
- Centre of Machine Vision and Signal Processing, Faculty of Information Technology, University of Oulu, Oulu, Finland.
| | - Saman Forouzandeh
- Department of Computer Engineering, University of Applied Science and Technology, Center of Tehran Municipality ICT org., Tehran, Iran
| | - Kamal Berahmand
- School of Computer Sciences, Science and Engineering Faculty, Queensland University of Technology (QUT), Brisbane, Australia.
| | - Mina Soltani
- Department of Nutrition, Kashan University of Medical Sciences, Kashan, Iran
| | - Meisam Shahsavari
- Department of engineering physics, Tsinghua University, Beijing, China
| | - Mourad Oussalah
- Centre of Machine Vision and Signal Processing, Faculty of Information Technology, University of Oulu, Oulu, Finland; Research Unit of Medical Imaging, Physics, and Technology, Faculty of Medicine, University of Oulu, Finland.
| |
Collapse
|
12
|
Multi-objective feature selection based on quasi-oppositional based Jaya algorithm for microarray data. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107804] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
13
|
Rostami M, Oussalah M. A novel explainable COVID-19 diagnosis method by integration of feature selection with random forest. INFORMATICS IN MEDICINE UNLOCKED 2022; 30:100941. [PMID: 35399333 PMCID: PMC8985417 DOI: 10.1016/j.imu.2022.100941] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 04/01/2022] [Accepted: 04/01/2022] [Indexed: 12/12/2022] Open
Abstract
Several Artificial Intelligence-based models have been developed for COVID-19 disease diagnosis. In spite of the promise of artificial intelligence, there are very few models which bridge the gap between traditional human-centered diagnosis and the potential future of machine-centered disease diagnosis. Under the concept of human-computer interaction design, this study proposes a new explainable artificial intelligence method that exploits graph analysis for feature visualization and optimization for the purpose of COVID-19 diagnosis from blood test samples. In this developed model, an explainable decision forest classifier is employed to COVID-19 classification based on routinely available patient blood test data. The approach enables the clinician to use the decision tree and feature visualization to guide the explainability and interpretability of the prediction model. By utilizing this novel feature selection phase, the proposed diagnosis model will not only improve diagnosis accuracy but decrease the execution time as well.
Collapse
Affiliation(s)
- Mehrdad Rostami
- Centre for Machine Vision and Signal Processing, Faculty of Information Technology, University of Oulu, Oulu, Finland
| | - Mourad Oussalah
- Centre for Machine Vision and Signal Processing, Faculty of Information Technology, University of Oulu, Oulu, Finland
- Research Unit of Medical Imaging, Physics, and Technology, Faculty of Medicine, University of Oulu, Finland
| |
Collapse
|
14
|
An efficient feature selection framework based on information theory for high dimensional data. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107729] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
|
15
|
A novel bio-inspired hybrid multi-filter wrapper gene selection method with ensemble classifier for microarray data. Neural Comput Appl 2021; 35:11531-11561. [PMID: 34539088 PMCID: PMC8435304 DOI: 10.1007/s00521-021-06459-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 08/26/2021] [Indexed: 01/04/2023]
Abstract
Microarray technology is known as one of the most important tools for collecting DNA expression data. This technology allows researchers to investigate and examine types of diseases and their origins. However, microarray data are often associated with a small sample size, a significant number of genes, imbalanced data, etc., making classification models inefficient. Thus, a new hybrid solution based on a multi-filter and adaptive chaotic multi-objective forest optimization algorithm (AC-MOFOA) is presented to solve the gene selection problem and construct the Ensemble Classifier. In the proposed solution, a multi-filter model (i.e., ensemble filter) is proposed as preprocessing step to reduce the dataset's dimensions, using a combination of five filter methods to remove redundant and irrelevant genes. Accordingly, the results of the five filter methods are combined using a voting-based function. Additionally, the results of the proposed multi-filter indicate that it has good capability in reducing the gene subset size and selecting relevant genes. Then, an AC-MOFOA based on the concepts of non-dominated sorting, crowding distance, chaos theory, and adaptive operators is presented. AC-MOFOA as a wrapper method aimed at reducing dataset dimensions, optimizing KELM, and increasing the accuracy of the classification, simultaneously. Next, in this method, an ensemble classifier model is presented using AC-MOFOA results to classify microarray data. The performance of the proposed algorithm was evaluated on nine public microarray datasets, and its results were compared in terms of the number of selected genes, classification efficiency, execution time, time complexity, hypervolume indicator, and spacing metric with five hybrid multi-objective methods, and three hybrid single-objective methods. According to the results, the proposed hybrid method could increase the accuracy of the KELM in most datasets by reducing the dataset's dimensions and achieve similar or superior performance compared to other multi-objective methods. Furthermore, the proposed Ensemble Classifier model could provide better classification accuracy and generalizability in the seven of nine microarray datasets compared to conventional ensemble methods. Moreover, the comparison results of the Ensemble Classifier model with three state-of-the-art ensemble generation methods indicate its competitive performance in which the proposed ensemble model achieved better results in the five of nine datasets.
Collapse
|
16
|
Vafamand A, Vafamand N, Zarei J, Razavi-Far R, Saif M. Multi-objective NSBGA-II control of HIV therapy with monthly output measurement. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102561] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
17
|
Hou Z, Lao W, Wang Y, Lu W. Homotopy-based hyper-heuristic searching approach for reciprocal feedback inversion of groundwater contamination source and aquifer parameters. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107191] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
18
|
Mahendran N, Durai Raj Vincent PM, Srinivasan K, Chang CY. Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions. Front Genet 2020; 11:603808. [PMID: 33362861 PMCID: PMC7758324 DOI: 10.3389/fgene.2020.603808] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 10/29/2020] [Indexed: 12/20/2022] Open
Abstract
Gene Expression is the process of determining the physical characteristics of living beings by generating the necessary proteins. Gene Expression takes place in two steps, translation and transcription. It is the flow of information from DNA to RNA with enzymes' help, and the end product is proteins and other biochemical molecules. Many technologies can capture Gene Expression from the DNA or RNA. One such technique is Microarray DNA. Other than being expensive, the main issue with Microarray DNA is that it generates high-dimensional data with minimal sample size. The issue in handling such a heavyweight dataset is that the learning model will be over-fitted. This problem should be addressed by reducing the dimension of the data source to a considerable amount. In recent years, Machine Learning has gained popularity in the field of genomic studies. In the literature, many Machine Learning-based Gene Selection approaches have been discussed, which were proposed to improve dimensionality reduction precision. This paper does an extensive review of the various works done on Machine Learning-based gene selection in recent years, along with its performance analysis. The study categorizes various feature selection algorithms under Supervised, Unsupervised, and Semi-supervised learning. The works done in recent years to reduce the features for diagnosing tumors are discussed in detail. Furthermore, the performance of several discussed methods in the literature is analyzed. This study also lists out and briefly discusses the open issues in handling the high-dimension and less sample size data.
Collapse
Affiliation(s)
- Nivedhitha Mahendran
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - P. M. Durai Raj Vincent
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Kathiravan Srinivasan
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Chuan-Yu Chang
- Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Taiwan
| |
Collapse
|
19
|
Mallick PK, Mohapatra SK, Chae GS, Mohanty MN. Convergent learning-based model for leukemia classification from gene expression. PERSONAL AND UBIQUITOUS COMPUTING 2020; 27:1103-1110. [PMID: 33100943 PMCID: PMC7567412 DOI: 10.1007/s00779-020-01467-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 09/28/2020] [Indexed: 05/05/2023]
Abstract
Microarray data analysis is a major challenging field of research in recent days. Machine learning-based automated gene data classification is an essential aspect for diagnosis of gene related any malfunctions and diseases. As the size of the data is very large, it is essential to design a suitable classifier that can process huge amount of data. Deep learning is one of the advanced machine learning techniques to mitigate these types of problems. Due the presence of more number of hidden layers, it can easily handle the big amount of data. We have presented a method of classification to understand the convergence of training deep neural network (DNN). The assumptions are taken as the inputs do not degenerate and the network is over-parameterized. Also the number of hidden neurons is sufficiently large. Authors in this piece of work have used DNN for classifying the gene expressions data. The dataset used in the work contains the bone marrow expressions of 72 leukemia patients. A five-layer DNN classifier is designed for classifying acute lymphocyte (ALL) and acute myelocytic (AML) samples. The network is trained with 80% data and rest 20% data is considered for validation purpose. Proposed DNN classifier is providing a satisfactory result as compared to other classifiers. Two types of leukemia are classified with 98.2% accuracy, 96.59% sensitivity, and 97.9% specificity. The different types of computer-aided analyses of genes can be helpful to genetic and virology researchers as well in future generation.
Collapse
Affiliation(s)
- Pradeep Kumar Mallick
- School of Computer Engineering, KIIT (Deemed to be University), Bhubaneswar, Odisha India
| | - Saumendra Kumar Mohapatra
- Department of Computer Science and Engineering, ITER, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha India
| | - Gyoo-Soo Chae
- Division of Information and Communication Engineering, Baekseok University, Cheonan, 330-704 South Korea
| | - Mihir Narayan Mohanty
- Department of Electronics and Communication Engineering, ITER, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha India
| |
Collapse
|
20
|
Sharma A, Rani R. Drug sensitivity prediction framework using ensemble and multi-task learning. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-01034-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Mariani MC, Tweneboah OK, Bhuiyan MAM. Supervised machine learning models applied to disease diagnosis and prognosis. AIMS Public Health 2019; 6:405-423. [PMID: 31909063 PMCID: PMC6940574 DOI: 10.3934/publichealth.2019.4.405] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2019] [Accepted: 10/08/2019] [Indexed: 11/24/2022] Open
Abstract
This work analyses the diagnosis and prognosis of cancer and heart disease data using five Machine Learning (ML) algorithms. We compare the predictive ability of all the ML algorithms to breast cancer and heart disease. The important variables that causes cancer and heart disease are also studied. We predict the test data based on the important variables and compute the prediction accuracy using the Receiver Operating Characteristic (ROC) curve. The Random Forest (RF) and Principal Component Regression (PCR) provides the best performance in analyzing the breast cancer and heart disease data respectively.
Collapse
Affiliation(s)
- Maria C Mariani
- Department of Mathematical Sciences, University of Texas, El Paso, United States
| | - Osei K Tweneboah
- Computational Science Program, University of Texas, El Paso, United States
| | | |
Collapse
|