1
|
Jeyananthan P. Performance comparison between multi-level gene expression data in cancer subgroup classification. Pathol Res Pract 2024; 260:155419. [PMID: 38955118 DOI: 10.1016/j.prp.2024.155419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 06/06/2024] [Accepted: 06/19/2024] [Indexed: 07/04/2024]
Abstract
Cancer is a serious disease that can affect various parts of the body such as breast, colon, lung or stomach. Each of these cancers has their own treatment dependent historical subgroups. Hence, the correct identification of cancer subgroup has almost same importance as the timely diagnosis of cancer. This is still a challenging task and a system with highest accuracy is essential. Current researches are moving towards analyzing the gene expression data of cancer patients for various purposes including biomarker identification and studying differently expressed genes, using gene expression data measured in a single level (selected from different gene levels including genome, transcriptome or translation). However, previous studies showed that information carried by one level of gene expression is not similar to another level. This shows the importance of integrating multi-level omics data in these studies. Hence, this study uses tumor gene expression data measured from various levels of gene along with the integration of those data in the subgroup classification of nine different cancers. This is a comprehensive analysis where four different gene expression data such as transcriptome, miRNA, methylation and proteome are used in this subgrouping and the performances between models are compared to reveal the best model.
Collapse
|
2
|
M S K, Rajaguru H, Nair AR. Enhancement of Classifier Performance with Adam and RanAdam Hyper-Parameter Tuning for Lung Cancer Detection from Microarray Data-In Pursuit of Precision. Bioengineering (Basel) 2024; 11:314. [PMID: 38671736 PMCID: PMC11047746 DOI: 10.3390/bioengineering11040314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 03/18/2024] [Accepted: 03/20/2024] [Indexed: 04/28/2024] Open
Abstract
Microarray gene expression analysis is a powerful technique used in cancer classification and research to identify and understand gene expression patterns that can differentiate between different cancer types, subtypes, and stages. However, microarray databases are highly redundant, inherently nonlinear, and noisy. Therefore, extracting meaningful information from such a huge database is a challenging one. The paper adopts the Fast Fourier Transform (FFT) and Mixture Model (MM) for dimensionality reduction and utilises the Dragonfly optimisation algorithm as the feature selection technique. The classifiers employed in this research are Nonlinear Regression, Naïve Bayes, Decision Tree, Random Forest and SVM (RBF). The classifiers' performances are analysed with and without feature selection methods. Finally, Adaptive Moment Estimation (Adam) and Random Adaptive Moment Estimation (RanAdam) hyper-parameter tuning techniques are used as improvisation techniques for classifiers. The SVM (RBF) classifier with the Fast Fourier Transform Dimensionality Reduction method and Dragonfly feature selection achieved the highest accuracy of 98.343% with RanAdam hyper-parameter tuning compared to other classifiers.
Collapse
Affiliation(s)
- Karthika M S
- Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| | - Ajin R. Nair
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| |
Collapse
|
3
|
Zhou M, Wang J, Shi J, Zhai G, Zhou X, Ye L, Li L, Hu M, Zhou Y. Prediction model of radiotherapy outcome for Ocular Adnexal Lymphoma using informative features selected by chemometric algorithms. Comput Biol Med 2024; 170:108067. [PMID: 38301513 DOI: 10.1016/j.compbiomed.2024.108067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 12/28/2023] [Accepted: 01/27/2024] [Indexed: 02/03/2024]
Abstract
BACKGROUND Ocular Adnexal Lymphoma (OAL) is a non-Hodgkin's lymphoma that most often appears in the tissues near the eye, and radiotherapy is the currently preferred treatment. There has been a controversy regarding the prognostic factors for systemic failure of OAL radiotherapy, the thorough evaluation prior to receiving radiotherapy is highly recommended to better the patient's prognosis and minimize the likelihood of any adverse effects. PURPOSE To investigate the risk factors that contribute to incomplete remission in OAL radiotherapy and to establish a hybrid model for predicting the radiotherapy outcomes in OAL patients. METHODS A retrospective chart review was performed for 87 consecutive patients with OAL who received radiotherapy between Feb 2011 and August 2022 in our center. Seven image features, derived from MRI sequences, were integrated with 122 clinical features to form comprehensive patient feature sets. Chemometric algorithms were then employed to distill highly informative features from these sets. Based on these refined features, SVM and XGBoost classifiers were performed to classify the effect of radiotherapy. RESULTS The clinical records of from 87 OAL patients (median age: 60 months, IQR: 52-68 months; 62.1% male) treated with radiotherapy were reviewed. Analysis of Lasso (AUC = 0.75, 95% CI: 0.72-0.77) and Random Forest (AUC = 0.67, 95% CI: 0.62-0.70) algorithms revealed four potential features, resulting in an intersection AUC of 0.80 (95% CI: 0.75-0.82). Logistic Regression (AUC = 0.75, 95% CI: 0.72-0.77) identified two features. Furthermore, the integration of chemometric methods such as CARS (AUC = 0.66, 95% CI: 0.62-0.72), UVE (AUC = 0.71, 95% CI: 0.66-0.75), and GA (AUC = 0.65, 95% CI: 0.60-0.69) highlighted six features in total, with an intersection AUC of 0.82 (95% CI: 0.78-0.83). These features included enophthalmos, diplopia, tenderness, elevated ALT count, HBsAg positivity, and CD43 positivity in immunohistochemical tests. CONCLUSION The findings suggest the effectiveness of chemometric algorithms in pinpointing OAL risk factors, and the prediction model we proposed shows promise in helping clinicians identify OAL patients likely to achieve complete remission via radiotherapy. Notably, patients with a history of exophthalmos, diplopia, tenderness, elevated ALT levels, HBsAg positivity, and CD43 positivity are less likely to attain complete remission after radiotherapy. These insights offer more targeted management strategies for OAL patients. The developed model is accessible online at: https://lzz.testop.top/.
Collapse
Affiliation(s)
- Min Zhou
- Ophthalmology Department, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, 639 Zhizaoju Road, Shanghai 200011, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai 200011, China.
| | - Jiaqi Wang
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China.
| | - Jiahao Shi
- Ophthalmology Department, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, 639 Zhizaoju Road, Shanghai 200011, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai 200011, China.
| | - Guangtao Zhai
- Institute of Image Communication and Network Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China.
| | - Xiaowen Zhou
- Ophthalmology Department, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, 639 Zhizaoju Road, Shanghai 200011, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai 200011, China.
| | - Lulu Ye
- Department of Oral and Maxillofacial- Head Neck Oncology, Shanghai Ninth People's Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, 639 Zhizaoju Road, Shanghai 200011, China.
| | - Lunhao Li
- Ophthalmology Department, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, 639 Zhizaoju Road, Shanghai 200011, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai 200011, China.
| | - Menghan Hu
- Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University, 500 Dongchuan Road, Shanghai 200241, China.
| | - Yixiong Zhou
- Ophthalmology Department, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, 639 Zhizaoju Road, Shanghai 200011, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai 200011, China.
| |
Collapse
|
4
|
Abbasi EY, Deng Z, Ali Q, Khan A, Shaikh A, Reshan MSA, Sulaiman A, Alshahrani H. A machine learning and deep learning-based integrated multi-omics technique for leukemia prediction. Heliyon 2024; 10:e25369. [PMID: 38352790 PMCID: PMC10862685 DOI: 10.1016/j.heliyon.2024.e25369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 12/13/2023] [Accepted: 01/25/2024] [Indexed: 02/16/2024] Open
Abstract
In recent years, scientific data on cancer has expanded, providing potential for a better understanding of malignancies and improved tailored care. Advances in Artificial Intelligence (AI) processing power and algorithmic development position Machine Learning (ML) and Deep Learning (DL) as crucial players in predicting Leukemia, a blood cancer, using integrated multi-omics technology. However, realizing these goals demands novel approaches to harness this data deluge. This study introduces a novel Leukemia diagnosis approach, analyzing multi-omics data for accuracy using ML and DL algorithms. ML techniques, including Random Forest (RF), Naive Bayes (NB), Decision Tree (DT), Logistic Regression (LR), Gradient Boosting (GB), and DL methods such as Recurrent Neural Networks (RNN) and Feedforward Neural Networks (FNN) are compared. GB achieved 97 % accuracy in ML, while RNN outperformed by achieving 98 % accuracy in DL. This approach filters unclassified data effectively, demonstrating the significance of DL for leukemia prediction. The testing validation was based on 17 different features such as patient age, sex, mutation type, treatment methods, chromosomes, and others. Our study compares ML and DL techniques and chooses the best technique that gives optimum results. The study emphasizes the implications of high-throughput technology in healthcare, offering improved patient care.
Collapse
Affiliation(s)
- Erum Yousef Abbasi
- State Key Laboratory of Wireless Network Positioning and Communication Engineering Integration Research, School of Electronics Engineering, Beijing University of Posts and Telecommunications, Beijing, China
| | - Zhongliang Deng
- State Key Laboratory of Wireless Network Positioning and Communication Engineering Integration Research, School of Electronics Engineering, Beijing University of Posts and Telecommunications, Beijing, China
| | - Qasim Ali
- Department of Software Engineering, Mehran University of Engineering and Technology, Jamshoro, Pakistan
| | - Adil Khan
- State Key Laboratory of Wireless Network Positioning and Communication Engineering Integration Research, School of Electronics Engineering, Beijing University of Posts and Telecommunications, Beijing, China
| | - Asadullah Shaikh
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
| | - Mana Saleh Al Reshan
- Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
- Scientific and Engineering Research Centre, Najran University, Najran, 61441, Saudi Arabia
| | - Adel Sulaiman
- Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
| | - Hani Alshahrani
- Department of Computer Science, College of Computer Science and Information Systems, Najran University, Najran, 61441, Saudi Arabia
| |
Collapse
|
5
|
Yang G, Li W, Xie W, Wang L, Yu K. An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 244:107987. [PMID: 38157825 DOI: 10.1016/j.cmpb.2023.107987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 11/04/2023] [Accepted: 12/16/2023] [Indexed: 01/03/2024]
Abstract
BACKGROUND AND OBJECTIVE The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems. METHODS In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy. RESULTS We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm. CONCLUSIONS The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
Collapse
Affiliation(s)
- Guicheng Yang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Wei Li
- Key Laboratory of Intelligent Computing in Medical Image (MIIC), Northeastern University, Ministry of Education, Shenyang, 110000, Liaoning, China; National Frontiers Science Center for Industrial Intelligence and Systems Optimization, Shenyang, 110819, Liaoning, China.
| | - Weidong Xie
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Linjie Wang
- College of Computer Science and Engineering, Northeastern University, Shenyang, 110000, Liaoning, China.
| | - Kun Yu
- College of Medicine and Bioinformation Engineering, Northeastern University, Shenyang, 110819, Liaoning, China.
| |
Collapse
|
6
|
Zhou K, Yin Z, Gu J, Zeng Z. A Feature Selection Method Based on Graph Theory for Cancer Classification. Comb Chem High Throughput Screen 2024; 27:650-660. [PMID: 37056061 DOI: 10.2174/1386207326666230413085646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 02/02/2023] [Accepted: 02/24/2023] [Indexed: 04/15/2023]
Abstract
OBJECTIVE Gene expression profile data is a good data source for people to study tumors, but gene expression data has the characteristics of high dimension and redundancy. Therefore, gene selection is a very important step in microarray data classification. METHODS In this paper, a feature selection method based on the maximum mutual information coefficient and graph theory is proposed. Each feature of gene expression data is treated as a vertex of the graph, and the maximum mutual information coefficient between genes is used to measure the relationship between the vertices to construct an undirected graph, and then the core and coritivity theory is used to determine the feature subset of gene data. RESULTS In this work, we used three different classification models and three different evaluation metrics such as accuracy, F1-Score, and AUC to evaluate the classification performance to avoid reliance on any one classifier or evaluation metric. The experimental results on six different types of genetic data show that our proposed algorithm has high accuracy and robustness compared to other advanced feature selection methods. CONCLUSION In this method, the importance and correlation of features are considered at the same time, and the problem of gene selection in microarray data classification is solved.
Collapse
Affiliation(s)
- Kai Zhou
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Zhixiang Yin
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Jiaying Gu
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| | - Zhiliang Zeng
- School of Mathematics, Physics and Statistics, Shanghai University of Engineering Science, Shanghai, 201620, China
| |
Collapse
|
7
|
Mohamed TIA, Ezugwu AE, Fonou-Dombeu JV, Mohammed M, Greeff J, Elbashir MK. A novel feature selection algorithm for identifying hub genes in lung cancer. Sci Rep 2023; 13:21671. [PMID: 38066059 PMCID: PMC10709567 DOI: 10.1038/s41598-023-48953-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/01/2023] [Indexed: 12/18/2023] Open
Abstract
Lung cancer, a life-threatening disease primarily affecting lung tissue, remains a significant contributor to mortality in both developed and developing nations. Accurate biomarker identification is imperative for effective cancer diagnosis and therapeutic strategies. This study introduces the Voting-Based Enhanced Binary Ebola Optimization Search Algorithm (VBEOSA), an innovative ensemble-based approach combining binary optimization and the Ebola optimization search algorithm. VBEOSA harnesses the collective power of the state-of-the-art classification models through soft voting. Moreover, our research applies VBEOSA to an extensive lung cancer gene expression dataset obtained from TCGA, following essential preprocessing steps including outlier detection and removal, data normalization, and filtration. VBEOSA aids in feature selection, leading to the discovery of key hub genes closely associated with lung cancer, validated through comprehensive protein-protein interaction analysis. Notably, our investigation reveals ten significant hub genes-ADRB2, ACTB, ARRB2, GNGT2, ADRB1, ACTG1, ACACA, ATP5A1, ADCY9, and ADRA1B-each demonstrating substantial involvement in the domain of lung cancer. Furthermore, our pathway analysis sheds light on the prominence of strategic pathways such as salivary secretion and the calcium signaling pathway, providing invaluable insights into the intricate molecular mechanisms underpinning lung cancer. We also utilize the weighted gene co-expression network analysis (WGCNA) method to identify gene modules exhibiting strong correlations with clinical attributes associated with lung cancer. Our findings underscore the efficacy of VBEOSA in feature selection and offer profound insights into the multifaceted molecular landscape of lung cancer. Finally, we are confident that this research has the potential to improve diagnostic capabilities and further enrich our understanding of the disease, thus setting the stage for future advancements in the clinical management of lung cancer. The VBEOSA source codes is publicly available at https://github.com/TEHNAN/VBEOSA-A-Novel-Feature-Selection-Algorithm-for-Identifying-hub-Genes-in-Lung-Cancer .
Collapse
Affiliation(s)
- Tehnan I A Mohamed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
- Department of Computer Science, Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, 11123, Sudan
| | - Absalom E Ezugwu
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa.
| | - Jean Vincent Fonou-Dombeu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, South Africa
| | - Japie Greeff
- School of Computer Science and Information Systems, Faculty of Natural and Agricultural Sciences, North-West University, Vanderbijlpark, South Africa
| | - Murtada K Elbashir
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, 72388, Sakaka, Saudi Arabia
| |
Collapse
|
8
|
Huang HH, Lu CJ, Jhou MJ, Liu TC, Yang CT, Hsieh SJ, Yang WJ, Chang HC, Chen MS. Using a Decision Tree Algorithm Predictive Model for Sperm Count Assessment and Risk Factors in Health Screening Population. Risk Manag Healthc Policy 2023; 16:2469-2478. [PMID: 38024496 PMCID: PMC10658962 DOI: 10.2147/rmhp.s433193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 11/01/2023] [Indexed: 12/01/2023] Open
Abstract
Purpose Approximately 20% of couples face infertility challenges and struggle to conceive naturally. Despite advances in artificial reproduction, its success hinges on sperm quality. Our previous study used five machine learning (ML) algorithms, random forest, stochastic gradient boosting, least absolute shrinkage and selection operator regression, ridge regression, and extreme gradient boosting, to model health data from 1375 Taiwanese males and identified ten risk factors affecting sperm count. Methods We employed the CART algorithm to generate decision trees using identified risk factors to predict healthy sperm counts. Four error metrics, SMAPE, RAE, RRSE, and RMSE, were used to evaluate the decision trees. We identified the top five decision trees based on their low errors and discussed in detail the tree with the least error. Results The decision tree featuring the least error, comprising BMI, UA, ST, T-Cho/HDL-C ratio, and BUN, corroborated the negative impacts of metabolic syndrome, particularly high BMI, on sperm count, while emphasizing the link between good sleep and male fertility. Our study also sheds light on the potentially significant influence of high BUN on spermatogenesis. Two novel risk factors, T-Cho/HDL-C and UA, warrant further investigation. Conclusion The ML algorithm established a predictive model for healthcare personnel to assess low sperm counts. Refinement of the model using additional data is crucial for improved precision. The risk factors identified offer avenues for future investigations.
Collapse
Affiliation(s)
- Hung-Hsiang Huang
- Department of Urology, Surgery, Far Eastern Memorial Hospital, New Taipei City, 220, Taiwan
| | - Chi-Jie Lu
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, 242, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, 242, Taiwan
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, 242, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, 242, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, 242, Taiwan
| | - Chih-Te Yang
- Department of Business Administration, Tamkang University, New Taipei City, 251, Taiwan
| | - Shang-Ju Hsieh
- Department of Urology, Surgery, Far Eastern Memorial Hospital, New Taipei City, 220, Taiwan
| | - Wen-Jen Yang
- Health Screening Center, Chi Hsin Clinic, Taipei City, 104, Taiwan
| | - Hsiao-Chun Chang
- Department of Urology, Surgery, Far Eastern Memorial Hospital, New Taipei City, 220, Taiwan
| | - Ming-Shu Chen
- Department of Healthcare Administration, Asia Eastern University of Science and Technology, New Taipei City, 220, Taiwan
| |
Collapse
|
9
|
Lv G, Xia Y, Qi Z, Zhao Z, Tang L, Chen C, Yang S, Wang Q, Gu L. LncRNA-protein interaction prediction with reweighted feature selection. BMC Bioinformatics 2023; 24:410. [PMID: 37904080 PMCID: PMC10617115 DOI: 10.1186/s12859-023-05536-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
LncRNA-protein interactions are ubiquitous in organisms and play a crucial role in a variety of biological processes and complex diseases. Many computational methods have been reported for lncRNA-protein interaction prediction. However, the experimental techniques to detect lncRNA-protein interactions are laborious and time-consuming. Therefore, to address this challenge, this paper proposes a reweighting boosting feature selection (RBFS) method model to select key features. Specially, a reweighted apporach can adjust the contribution of each observational samples to learning model fitting; let higher weights are given more influence samples than those with lower weights. Feature selection with boosting can efficiently rank to iterate over important features to obtain the optimal feature subset. Besides, in the experiments, the RBFS method is applied to the prediction of lncRNA-protein interactions. The experimental results demonstrate that our method achieves higher accuracy and less redundancy with fewer features.
Collapse
Affiliation(s)
- Guohao Lv
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Yingchun Xia
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zhao Qi
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Zihao Zhao
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Lianggui Tang
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Cheng Chen
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Shuai Yang
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Qingyong Wang
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Lichuan Gu
- School of Information and Computer, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
10
|
Morabito F, Adornetto C, Monti P, Amaro A, Reggiani F, Colombo M, Rodriguez-Aldana Y, Tripepi G, D’Arrigo G, Vener C, Torricelli F, Rossi T, Neri A, Ferrarini M, Cutrona G, Gentile M, Greco G. Genes selection using deep learning and explainable artificial intelligence for chronic lymphocytic leukemia predicting the need and time to therapy. Front Oncol 2023; 13:1198992. [PMID: 37719021 PMCID: PMC10501728 DOI: 10.3389/fonc.2023.1198992] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 07/31/2023] [Indexed: 09/19/2023] Open
Abstract
Analyzing gene expression profiles (GEP) through artificial intelligence provides meaningful insight into cancer disease. This study introduces DeepSHAP Autoencoder Filter for Genes Selection (DSAF-GS), a novel deep learning and explainable artificial intelligence-based approach for feature selection in genomics-scale data. DSAF-GS exploits the autoencoder's reconstruction capabilities without changing the original feature space, enhancing the interpretation of the results. Explainable artificial intelligence is then used to select the informative genes for chronic lymphocytic leukemia prognosis of 217 cases from a GEP database comprising roughly 20,000 genes. The model for prognosis prediction achieved an accuracy of 86.4%, a sensitivity of 85.0%, and a specificity of 87.5%. According to the proposed approach, predictions were strongly influenced by CEACAM19 and PIGP, moderately influenced by MKL1 and GNE, and poorly influenced by other genes. The 10 most influential genes were selected for further analysis. Among them, FADD, FIBP, FIBP, GNE, IGF1R, MKL1, PIGP, and SLC39A6 were identified in the Reactome pathway database as involved in signal transduction, transcription, protein metabolism, immune system, cell cycle, and apoptosis. Moreover, according to the network model of the 3D protein-protein interaction (PPI) explored using the NetworkAnalyst tool, FADD, FIBP, IGF1R, QTRT1, GNE, SLC39A6, and MKL1 appear coupled into a complex network. Finally, all 10 selected genes showed a predictive power on time to first treatment (TTFT) in univariate analyses on a basic prognostic model including IGHV mutational status, del(11q) and del(17p), NOTCH1 mutations, β2-microglobulin, Rai stage, and B-lymphocytosis known to predict TTFT in CLL. However, only IGF1R [hazard ratio (HR) 1.41, 95% CI 1.08-1.84, P=0.013), COL28A1 (HR 0.32, 95% CI 0.10-0.97, P=0.045), and QTRT1 (HR 7.73, 95% CI 2.48-24.04, P<0.001) genes were significantly associated with TTFT in multivariable analyses when combined with the prognostic factors of the basic model, ultimately increasing the Harrell's c-index and the explained variation to 78.6% (versus 76.5% of the basic prognostic model) and 52.6% (versus 42.2% of the basic prognostic model), respectively. Also, the goodness of model fit was enhanced (χ2 = 20.1, P=0.002), indicating its improved performance above the basic prognostic model. In conclusion, DSAF-GS identified a group of significant genes for CLL prognosis, suggesting future directions for bio-molecular research.
Collapse
Affiliation(s)
| | - Carlo Adornetto
- Department of Mathematics and Computer Science, University of Calabria, Cosenza, Italy
| | - Paola Monti
- Mutagenesis and Cancer Prevention Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Adriana Amaro
- Tumor Epigenetics Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Francesco Reggiani
- Tumor Epigenetics Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Monica Colombo
- Molecular Pathology Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | | | - Giovanni Tripepi
- Consiglio Nazionale delle Ricerche, Istituto di Fisiologia Clinica del Consiglio Nazionale delle Ricerche (CNR), Reggio Calabria, Italy
| | - Graziella D’Arrigo
- Consiglio Nazionale delle Ricerche, Istituto di Fisiologia Clinica del Consiglio Nazionale delle Ricerche (CNR), Reggio Calabria, Italy
| | - Claudia Vener
- Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy
| | - Federica Torricelli
- Laboratory of Translational Research, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Crabtree Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Teresa Rossi
- Laboratory of Translational Research, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Crabtree Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Antonino Neri
- Scientific Directorate, Azienda Unità Sanitaria Locale - Istituto di Ricovero e Cura a Carattere Scientifico (USL-IRCCS) of Reggio Emilia, Reggio Emilia, Italy
| | - Manlio Ferrarini
- Unità Operariva (UO) Molecular Pathology, Ospedale Policlinico San Martino Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS), Genoa, Italy
| | - Giovanna Cutrona
- Molecular Pathology Unit, Istituto di Ricovero e Cura a Carattere Scientifico (IRCCS) Ospedale Policlinico San Martino, Genoa, Italy
| | - Massimo Gentile
- Hematology Unit, Department of Onco-Hematology, Azienda Ospedaliera (A.O.) of Cosenza, Cosenza, Italy
- Department of Pharmacy and Health and Nutritional Sciences, University of Calabria, Cosenza, Italy
| | - Gianluigi Greco
- Department of Mathematics and Computer Science, University of Calabria, Cosenza, Italy
| |
Collapse
|
11
|
Kabzinski J, Kucharska-Lusina A, Majsterek I. RNA-Based Liquid Biopsy in Head and Neck Cancer. Cells 2023; 12:1916. [PMID: 37508579 PMCID: PMC10377854 DOI: 10.3390/cells12141916] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 07/17/2023] [Accepted: 07/21/2023] [Indexed: 07/30/2023] Open
Abstract
Head and neck cancer (HNC) is a prevalent and diverse group of malignancies with substantial morbidity and mortality rates. Early detection and monitoring of HNC are crucial for improving patient outcomes. Liquid biopsy, a non-invasive diagnostic approach, has emerged as a promising tool for cancer detection and monitoring. In this article, we review the application of RNA-based liquid biopsy in HNC. Various types of RNA, including messenger RNA (mRNA), microRNA (miRNA), long non-coding RNA (lncRNA), small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), circular RNA (circRNA) and PIWI-interacting RNA (piRNA), are explored as potential biomarkers in HNC liquid-based diagnostics. The roles of RNAs in HNC diagnosis, metastasis, tumor resistance to radio and chemotherapy, and overall prognosis are discussed. RNA-based liquid biopsy holds great promise for the early detection, prognosis, and personalized treatment of HNC. Further research and validation are necessary to translate these findings into clinical practice and improve patient outcomes.
Collapse
Affiliation(s)
- Jacek Kabzinski
- Department of Clinical Chemistry and Biochemistry, Medical University of Lodz, MolecoLAB A6, Mazowiecka 5, 92-215 Lodz, Poland
| | - Aleksandra Kucharska-Lusina
- Department of Clinical Chemistry and Biochemistry, Medical University of Lodz, MolecoLAB A6, Mazowiecka 5, 92-215 Lodz, Poland
| | - Ireneusz Majsterek
- Department of Clinical Chemistry and Biochemistry, Medical University of Lodz, MolecoLAB A6, Mazowiecka 5, 92-215 Lodz, Poland
| |
Collapse
|
12
|
Fu Q, Li Q, Li X. An improved multi-objective marine predator algorithm for gene selection in classification of cancer microarray data. Comput Biol Med 2023; 160:107020. [PMID: 37196457 DOI: 10.1016/j.compbiomed.2023.107020] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 04/09/2023] [Accepted: 05/05/2023] [Indexed: 05/19/2023]
Abstract
Gene selection (GS) is an important branch of interest within the field of feature selection, which is widely used in cancer classification. It provides essential insights into the pathogenesis of cancer and enables a deeper understanding of cancer data. In cancer classification, GS is essentially a multi-objective optimization problem, which aims to simultaneously optimize the two objectives of classification accuracy and the size of the gene subset. The marine predator algorithm (MPA) has been successfully employed in practical applications, however, its random initialization can lead to blindness, which may adversely affect the convergence of the algorithm. Furthermore, the elite individuals in guiding evolution are randomly chosen from the Pareto solutions, which may degrade the good exploration performance of the population. To overcome these limitations, a multi-objective improved MPA with continuous mapping initialization and leader selection strategies is proposed. In this work, a new continuous mapping initialization with ReliefF overwhelms the defects with less information in late evolution. Moreover, an improved elite selection mechanism with Gaussian distribution guides the population to evolve towards a better Pareto front. Finally, an efficient mutation method is adopted to prevent evolutionary stagnation. To evaluate its effectiveness, the proposed algorithm was compared with 9 famous algorithms. The experimental results on 16 datasets demonstrate that the proposed algorithm can significantly reduce the data dimension and obtain the highest classification accuracy on most of high-dimension cancer microarray datasets.
Collapse
Affiliation(s)
- Qiyong Fu
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Qi Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
| | - Xiaobo Li
- School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China.
| |
Collapse
|
13
|
Vahabzadeh V, Moattar MH. Robust microarray data feature selection using a correntropy based distance metric learning approach. Comput Biol Med 2023; 161:107056. [PMID: 37235945 DOI: 10.1016/j.compbiomed.2023.107056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 04/18/2023] [Accepted: 05/20/2023] [Indexed: 05/28/2023]
Abstract
Classification of high-dimensional microarray data is a challenge in bioinformatics and genetic data processing. One of the challenging issues of feature selection is the presence of outliers. The Euclidean distance metric is sensitive to outliers. In this study, a distance metric learning based feature selection approach that uses the correntropy function as the discrimination metric is proposed. For this purpose, the metric learning problem is formulated as an optimization problem and solved using the Lagrange method. The output of the approach signifies the most important and robust features. After feature selection, different classification methods such as SVM, decision trees, and NN classifiers are used to investigate the classification accuracy of the proposed method as well as precision, recall, and F-measure. Experiments are carried out on 13 high-dimensional datasets and show that the proposed method outperforms the previous models in terms of accuracy and robustness.
Collapse
Affiliation(s)
- Venus Vahabzadeh
- Department of Software Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.
| | | |
Collapse
|
14
|
Wang Z, Zhou Y, Takagi T, Song J, Tian YS, Shibuya T. Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data. BMC Bioinformatics 2023; 24:139. [PMID: 37031189 PMCID: PMC10082986 DOI: 10.1186/s12859-023-05267-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 04/02/2023] [Indexed: 04/10/2023] Open
Abstract
BACKGROUND Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is "large p and small n" in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. RESULTS This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies-Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. CONCLUSIONS The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.
Collapse
Affiliation(s)
- Zixuan Wang
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan.
| | - Yi Zhou
- Beijing International Center for Mathematical Research, Peking University, Beijing, 100871, China
| | - Tatsuya Takagi
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Jiangning Song
- Biomedicine Discovery Institute and Monash Data Futures Institute, Monash University, Melbourne, VIC, 3800, Australia
| | - Yu-Shi Tian
- Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka, 565-0871, Japan.
| | - Tetsuo Shibuya
- Division of Medical Data Informatics, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, 108-8639, Japan
| |
Collapse
|
15
|
Alhenawi E, Al-Sayyed R, Hudaib A, Mirjalili S. Improved intelligent water drop-based hybrid feature selection method for microarray data processing. Comput Biol Chem 2023; 103:107809. [PMID: 36696844 DOI: 10.1016/j.compbiolchem.2022.107809] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 12/13/2022] [Accepted: 12/30/2022] [Indexed: 01/15/2023]
Abstract
Classifying microarray datasets, which usually contains many noise genes that degrade the performance of classifiers and decrease classification accuracy rate, is a competitive research topic. Feature selection (FS) is one of the most practical ways for finding the most optimal subset of genes that increases classification's accuracy for diagnostic and prognostic prediction of tumor cancer from the microarray datasets. This means that we always need to develop more efficient FS methods, that select only optimal or close-to-optimal subset of features to improve classification performance. In this paper, we propose a hybrid FS method for microarray data processing, that combines an ensemble filter with an Improved Intelligent Water Drop (IIWD) algorithm as a wrapper by adding one of three local search (LS) algorithms: Tabu search (TS), Novel LS algorithm (NLSA), or Hill Climbing (HC) in each iteration from IWD, and using a correlation coefficient filter as a heuristic undesirability (HUD) for next node selection in the original IWD algorithm. The effects of adding three different LS algorithms to the proposed IIWD algorithm have been evaluated through comparing the performance of the proposed ensemble filter-IIWD-based wrapper without adding any LS algorithms named (PHFS-IWD) FS method versus its performance when adding a specific LS algorithm from (TS, NLSA or HC) in FS methods named, (PHFS-IWDTS, PHFS-IWDNLSA, and PHFS-IWDHC), respectively. Naïve Bayes(NB) classifier with five microarray datasets have been deployed for evaluating and comparing the proposed hybrid FS methods. Results show that using LS algorithms in each iteration from the IWD algorithm improves F-score value with an average equal to 5% compared with PHFS-IWD. Also, PHFS-IWDNLSA improves the F-score value with an average of 4.15% over PHFS-IWDTS, and 5.67% over PHFS-IWDHC while PHFS-IWDTS outperformed PHFS-IWDHC with an average of increment equal to 1.6%. On the other hand, the proposed hybrid-based FS methods improve accuracy with an average equal to 8.92% in three out of five datasets and decrease the number of genes with a percentage of 58.5% in all five datasets compared with six of the most recent state-of-the-art FS methods.
Collapse
Affiliation(s)
- Esra'a Alhenawi
- Software Engineering Department, Al-Ahliyya Amman University, Amman, Jordan; King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Rizik Al-Sayyed
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Amjad Hudaib
- King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.
| | - Seyedali Mirjalili
- Center for Artificial Intelligence Research and Optimization, Torrens University Australia, Fortitude Valley, Brisbane, 4006 QLD, Australia; University Research and Innovation Center, Obuda University, Budapest, Hungary.
| |
Collapse
|
16
|
Awotunde JB, Ayo FE, Panigrahi R, Garg A, Bhoi AK, Barsocchi P. A Multi-level Random Forest Model-Based Intrusion Detection Using Fuzzy Inference System for Internet of Things Networks. INT J COMPUT INT SYS 2023. [DOI: 10.1007/s44196-023-00205-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2023] Open
Abstract
AbstractIntrusion detection (ID) methods are security frameworks designed to safeguard network information systems. The strength of an intrusion detection method is dependent on the robustness of the feature selection method. This study developed a multi-level random forest algorithm for intrusion detection using a fuzzy inference system. The strengths of the filter and wrapper approaches are combined in this work to create a more advanced multi-level feature selection technique, which strengthens network security. The first stage of the multi-level feature selection is the filter method using a correlation-based feature selection to select essential features based on the multi-collinearity in the data. The correlation-based feature selection used a genetic search method to choose the best features from the feature set. The genetic search algorithm assesses the merits of each attribute, which then delivers the characteristics with the highest fitness values for selection. A rule assessment has also been used to determine whether two feature subsets have the same fitness value, which ultimately returns the feature subset with the fewest features. The second stage is a wrapper method based on the sequential forward selection method to further select top features based on the accuracy of the baseline classifier. The selected top features serve as input into the random forest algorithm for detecting intrusions. Finally, fuzzy logic was used to classify intrusions as either normal, low, medium, or high to reduce misclassification. When the developed intrusion method was compared to other existing models using the same dataset, the results revealed a higher accuracy, precision, sensitivity, specificity, and F1-score of 99.46%, 99.46%, 99.46%, 93.86%, and 99.46%, respectively. The classification of attacks using the fuzzy inference system also indicates that the developed method can correctly classify attacks with reduced misclassification. The use of a multi-level feature selection method to leverage the advantages of filter and wrapper feature selection methods and fuzzy logic for intrusion classification makes this study unique.
Collapse
|
17
|
Gokhale M, Mohanty SK, Ojha A. GeneViT: Gene Vision Transformer with Improved DeepInsight for cancer classification. Comput Biol Med 2023; 155:106643. [PMID: 36803792 DOI: 10.1016/j.compbiomed.2023.106643] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/03/2023] [Accepted: 02/05/2023] [Indexed: 02/09/2023]
Abstract
Analysis of gene expression data is crucial for disease prognosis and diagnosis. Gene expression data has high redundancy and noise that brings challenges in extracting disease information. Over the past decade, several conventional machine learning and deep learning models have been developed for classification of diseases using gene expressions. In recent years, vision transformer networks have shown promising performance in many fields due to their powerful attention mechanism that provides a better insight into the data characteristics. However, these network models have not been explored for gene expression analysis. In this paper, a method for classifying cancerous gene expression is presented that uses a Vision transformer. The proposed method first performs dimensionality reduction using a stacked autoencoder followed by an Improved DeepInsight algorithm that converts the data into image format. The data is then fed to the vision transformer for building the classification model. Performance of the proposed classification model is evaluated on ten benchmark datasets having binary classes or multiple classes. Its performance is also compared with nine existing classification models. The experimental results demonstrate that the proposed model outperforms existing methods. The t-SNE plots demonstrate the distinctive feature learning property of the model.
Collapse
Affiliation(s)
- Madhuri Gokhale
- Department of Computer Science & Engineering, Jabalpur Engineering College, Jabalpur, 482001, India; Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| | - Sraban Kumar Mohanty
- Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| | - Aparajita Ojha
- Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| |
Collapse
|
18
|
Alromema N, Syed AH, Khan T. A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data. Diagnostics (Basel) 2023; 13:diagnostics13040708. [PMID: 36832196 PMCID: PMC9955903 DOI: 10.3390/diagnostics13040708] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 01/30/2023] [Accepted: 02/07/2023] [Indexed: 02/16/2023] Open
Abstract
The high dimensionality and sparsity of the microarray gene expression data make it challenging to analyze and screen the optimal subset of genes as predictors of breast cancer (BC). The authors in the present study propose a novel hybrid Feature Selection (FS) sequential framework involving minimum Redundancy-Maximum Relevance (mRMR), a two-tailed unpaired t-test, and meta-heuristics to screen the most optimal set of gene biomarkers as predictors for BC. The proposed framework identified a set of three most optimal gene biomarkers, namely, MAPK 1, APOBEC3B, and ENAH. In addition, the state-of-the-art supervised Machine Learning (ML) algorithms, namely Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Neural Net (NN), Naïve Bayes (NB), Decision Tree (DT), eXtreme Gradient Boosting (XGBoost), and Logistic Regression (LR) were used to test the predictive capability of the selected gene biomarkers and select the most effective breast cancer diagnostic model with higher values of performance matrices. Our study found that the XGBoost-based model was the superior performer with an accuracy of 0.976 ± 0.027, an F1-Score of 0.974 ± 0.030, and an AUC value of 0.961 ± 0.035 when tested on an independent test dataset. The screened gene biomarkers-based classification system efficiently detects primary breast tumors from normal breast samples.
Collapse
Affiliation(s)
- Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
- Correspondence:
| | - Asif Hassan Syed
- Department of Computer Science, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
| | - Tabrej Khan
- Department of Information Systems, Faculty of Computing and Information Technology Rabigh (FCITR), King Abdulaziz University, Jeddah 22254, Saudi Arabia
| |
Collapse
|
19
|
Hybrid Filter and Genetic Algorithm-Based Feature Selection for Improving Cancer Classification in High-Dimensional Microarray Data. Processes (Basel) 2023. [DOI: 10.3390/pr11020562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2023] Open
Abstract
The advancements in intelligent systems have contributed tremendously to the fields of bioinformatics, health, and medicine. Intelligent classification and prediction techniques have been used in studying microarray datasets, which store information about the ways used to express the genes, to assist greatly in diagnosing chronic diseases, such as cancer in its earlier stage, which is important and challenging. However, the high-dimensionality and noisy nature of the microarray data lead to slow performance and low cancer classification accuracy while using machine learning techniques. In this paper, a hybrid filter-genetic feature selection approach has been proposed to solve the high-dimensional microarray datasets problem which ultimately enhances the performance of cancer classification precision. First, the filter feature selection methods including information gain, information gain ratio, and Chi-squared are applied in this study to select the most significant features of cancerous microarray datasets. Then, a genetic algorithm has been employed to further optimize and enhance the selected features in order to improve the proposed method’s capability for cancer classification. To test the proficiency of the proposed scheme, four cancerous microarray datasets were used in the study—this primarily included breast, lung, central nervous system, and brain cancer datasets. The experimental results show that the proposed hybrid filter-genetic feature selection approach achieved better performance of several common machine learning methods in terms of Accuracy, Recall, Precision, and F-measure.
Collapse
|
20
|
Arafa A, El-Fishawy N, Badawy M, Radad M. RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data. J Biol Eng 2023; 17:7. [PMID: 36717866 PMCID: PMC9887895 DOI: 10.1186/s13036-022-00319-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 12/12/2022] [Indexed: 01/31/2023] Open
Abstract
BACKGROUND In the current genomic era, gene expression datasets have become one of the main tools utilized in cancer classification. Both curse of dimensionality and class imbalance problems are inherent characteristics of these datasets. These characteristics have a negative impact on the performance of most classifiers when used to classify cancer using genomic datasets. RESULTS This paper introduces Reduced Noise-Autoencoder (RN-Autoencoder) for pre-processing imbalanced genomic datasets for precise cancer classification. Firstly, RN-Autoencoder solves the curse of dimensionality problem by utilizing the autoencoder for feature reduction and hence generating new extracted data with lower dimensionality. In the next stage, RN-Autoencoder introduces the extracted data to the well-known Reduced Noise-Synthesis Minority Over Sampling Technique (RN- SMOTE) that efficiently solve the problem of class imbalance in the extracted data. RN-Autoencoder has been evaluated using different classifiers and various imbalanced datasets with different imbalance ratios. The results proved that the performance of the classifiers has been improved with RN-Autoencoder and outperformed the performance with original data and extracted data with percentages based on the classifier, dataset and evaluation metric. Also, the performance of RN-Autoencoder has been compared to the performance of the current state of the art and resulted in an increase up to 18.017, 19.183, 18.58 and 8.87% in terms of test accuracy using colon, leukemia, Diffuse Large B-Cell Lymphoma (DLBCL) and Wisconsin Diagnostic Breast Cancer (WDBC) datasets respectively. CONCLUSION RN-Autoencoder is a model for cancer classification using imbalanced gene expression datasets. It utilizes the autoencoder to reduce the high dimensionality of the gene expression datasets and then handles the class imbalance using RN-SMOTE. RN-Autoencoder has been evaluated using many different classifiers and many different imbalanced datasets. The performance of many classifiers has improved and some have succeeded in classifying cancer with 100% performance in terms of all used metrics. In addition, RN-Autoencoder outperformed many recent works using the same datasets.
Collapse
Affiliation(s)
- Ahmed Arafa
- grid.411775.10000 0004 0621 4712Faculty of Electronic Engineering, Menoufia University, El-Gish Street, Box No. 32951, Menouf, Menoufia Egypt
| | - Nawal El-Fishawy
- grid.411775.10000 0004 0621 4712Faculty of Electronic Engineering, Menoufia University, El-Gish Street, Box No. 32951, Menouf, Menoufia Egypt
| | - Mohammed Badawy
- grid.411775.10000 0004 0621 4712Faculty of Electronic Engineering, Menoufia University, El-Gish Street, Box No. 32951, Menouf, Menoufia Egypt
| | - Marwa Radad
- grid.411775.10000 0004 0621 4712Faculty of Electronic Engineering, Menoufia University, El-Gish Street, Box No. 32951, Menouf, Menoufia Egypt
| |
Collapse
|
21
|
Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel) 2023; 10:bioengineering10020173. [PMID: 36829667 PMCID: PMC9952758 DOI: 10.3390/bioengineering10020173] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 01/31/2023] Open
Abstract
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, as well as convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, we review pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.
Collapse
|
22
|
A new ranking-based stability measure for feature selection algorithms. Soft comput 2023. [DOI: 10.1007/s00500-022-07767-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
23
|
Attallah O. MonDiaL-CAD: Monkeypox diagnosis via selected hybrid CNNs unified with feature selection and ensemble learning. Digit Health 2023; 9:20552076231180054. [PMID: 37312961 PMCID: PMC10259124 DOI: 10.1177/20552076231180054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 05/18/2023] [Indexed: 06/15/2023] Open
Abstract
Objective Recently, monkeypox virus is slowly evolving and there are fears it will spread as COVID-19. Computer-aided diagnosis (CAD) based on deep learning approaches especially convolutional neural network (CNN) can assist in the rapid determination of reported incidents. The current CADs were mostly based on an individual CNN. Few CADs employed multiple CNNs but did not investigate which combination of CNNs has a greater impact on the performance. Furthermore, they relied on only spatial information of deep features to train their models. This study aims to construct a CAD tool named "Monkey-CAD" that can address the previous limitations and automatically diagnose monkeypox rapidly and accurately. Methods Monkey-CAD extracts features from eight CNNs and then examines the best possible combination of deep features that influence classification. It employs discrete wavelet transform (DWT) to merge features which diminishes fused features' size and provides a time-frequency demonstration. These deep features' sizes are then further reduced via an entropy-based feature selection approach. These reduced fused features are finally used to deliver a better representation of the input features and feed three ensemble classifiers. Results Two freely accessible datasets called Monkeypox skin image (MSID) and Monkeypox skin lesion (MSLD) are employed in this study. Monkey-CAD could discriminate among cases with and without Monkeypox achieving an accuracy of 97.1% for MSID and 98.7% for MSLD datasets respectively. Conclusions Such promising results demonstrate that the Monkey-CAD can be employed to assist health practitioners. They also verify that fusing deep features from selected CNNs can boost performance.
Collapse
Affiliation(s)
- Omneya Attallah
- Department of Electronics and Communications Engineering, College of Engineering and Technology, Arab Academy for Science, Technology and Maritime Transport, Alexandria, Egypt
| |
Collapse
|
24
|
Braik M. Enhanced Ali Baba and the forty thieves algorithm for feature selection. Neural Comput Appl 2023; 35:6153-6184. [PMID: 36408290 PMCID: PMC9666985 DOI: 10.1007/s00521-022-08015-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 10/26/2022] [Indexed: 11/16/2022]
Abstract
Feature Selection (FS) aims to ameliorate the classification rate of dataset models by selecting only a small set of appropriate features from the initial range of features. In consequence, a reliable optimization method is needed to deal with the matters involved in this problem. Often, traditional methods fail to optimally reduce the high dimensionality of the feature space of complex datasets, which lead to the elicitation of weak classification models. Meta-heuristics can offer a favorable classification rate for high-dimensional datasets. Here, a binary version of a new human-based algorithm named Ali Baba and the Forty Thieves (AFT) was applied to tackle a pool of FS problems. Although AFT is an efficient meta-heuristic for optimizing many problems, it sometimes exhibits premature convergence and low search performance. These issues were mitigated by proposing three enhanced versions of AFT, namely: (1) A Binary Multi-layered AFT called BMAFT which uses hierarchical and distributed frameworks, (2) Binary Elitist AFT (BEAFT) which uses an elitist learning strategy, and, (3) Binary Self-adaptive AFT (BSAFT) which uses an adapted tracking distance parameter. These versions along with the basic Binary AFT (BAFT) were expansively assessed on twenty-four problems gathered from different repositories. The results showed that the proposed algorithms substantially enhance the performance of BAFT in terms of convergence speed and solution accuracy. On top of that, the overall results showed that BMAFT is the most competitive, which provided the best results with excellent performance scores compared to other competing algorithms.
Collapse
Affiliation(s)
- Malik Braik
- Department of Computer Science, Al-Balqa Applied University, Salt, Jordan
| |
Collapse
|
25
|
Pan X, Zhang G, Lin A, Guan X, Chen P, Ge Y, Chen X. An evaluation model for children's foot & ankle deformity severity using sparse multi-objective feature selection algorithm. Comput Biol Med 2022; 151:106229. [PMID: 36308897 DOI: 10.1016/j.compbiomed.2022.106229] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 10/08/2022] [Accepted: 10/16/2022] [Indexed: 12/27/2022]
Abstract
Foot & ankle deformity is a chronic disease with high incidence and is best treated in childhood. However, the current diagnostic procedures rely on doctor's consultation and empirical judgment, and lack objective and quantitative evaluation methods, resulting in low screening rates. To solve this problem, this paper aims to construct an evaluation model for children's foot & ankle deformity through data mining and machine learning technologies. Firstly, it proposes the grading rules for children's foot & ankle deformity severity based on analyzing the existing quantitative indexes and expert experience. Then the 3D foot scanner is used to collect the sample data including 30 foot structure indexes. Finally, an advanced sparse multi-objective evolutionary algorithm (sparse MO-FS) is present for feature selection. The effectiveness of the proposed sparse MO-FS and its search efficiency are proved by comparing 8 feature selection methods and 7 search strategies. Using sparse MO-FS, foot length, arch index, ankle index, and hallux valgus index are selected, which not only simplifies the evaluation model but also improves the average classification accuracy of random forest to more than 98%.
Collapse
Affiliation(s)
- Xiaotian Pan
- School of Information Management and Artificial Intelligence, Zhejiang University of Finance and Economics, Hangzhou 310018, China.
| | - Guodao Zhang
- School of Media and Design, Hangzhou Dianzi University, Hangzhou 310018, China.
| | - Aiju Lin
- College of international Education, Wenzhou University, Wenzhou 325035, China.
| | - Xiaochun Guan
- Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China.
| | - PingKuo Chen
- Great Bay University, Dongguan City 523000, China.
| | - Yisu Ge
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325100, China.
| | - Xin Chen
- Orthopedics Department of The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China.
| |
Collapse
|
26
|
Shaban WM. Insight into breast cancer detection: new hybrid feature selection method. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-08062-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
AbstractBreast cancer, which is also the leading cause of death among women, is one of the most common forms of the disease that affects females all over the world. The discovery of breast cancer at an early stage is extremely important because it allows selecting appropriate treatment protocol and thus, stops the development of cancer cells. In this paper, a new patients detection strategy has been presented to identify patients with the disease earlier. The proposed strategy composes of two parts which are data preprocessing phase and patient detection phase (PDP). The purpose of this study is to introduce a feature selection methodology for determining the most efficient and significant features for identifying breast cancer patients. This method is known as new hybrid feature selection method (NHFSM). NHFSM is made up of two modules which are quick selection module that uses information gain, and feature selection module that uses hybrid bat algorithm and particle swarm optimization. Consequently, NHFSM is a hybrid method that combines the advantages of bat algorithm and particle swarm optimization based on filter method to eliminate many drawbacks such as being stuck in a local optimal solution and having unbalanced exploitation. The preprocessed data are then used during PDP in order to enable a quick and accurate detection of patients. Based on experimental results, the proposed NHFSM improves the efficiency of patients’ classification in comparison with state-of-the-art feature selection approaches by roughly 0.97, 0.76, 0.75, and 0.716 in terms of accuracy, precision, sensitivity/recall, and F-measure. In contrast, it has the lowest error rate value of 0.03.
Collapse
|
27
|
Hybrid Feature Selection Method for Intrusion Detection Systems Based on an Improved Intelligent Water Drop Algorithm. CYBERNETICS AND INFORMATION TECHNOLOGIES 2022. [DOI: 10.2478/cait-2022-0040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Abstract
A critical task and a competitive research area is to secure networks against attacks. One of the most popular security solutions is Intrusion Detection Systems (IDS). Machine learning has been recently used by researchers to develop high performance IDS. One of the main challenges in developing intelligent IDS is Feature Selection (FS). In this manuscript, a hybrid FS for the IDS network is proposed based on an ensemble filter, and an improved Intelligent Water Drop (IWD) wrapper. The Improved version from IWD algorithm uses local search algorithm as an extra operator to increase the exploiting capability of the basic IWD algorithm. Experimental results on three benchmark datasets “UNSW-NB15”, “NLS-KDD”, and “KDDCUPP99” demonstrate the effectiveness of the proposed model for IDS versus some of the most recent IDS algorithms existing in the literature depending on “F-score”, “accuracy”, “FPR”, “TPR” and “the number of selected features” metrics.
Collapse
|
28
|
Zanella L, Facco P, Bezzo F, Cimetta E. Feature Selection and Molecular Classification of Cancer Phenotypes: A Comparative Study. Int J Mol Sci 2022; 23:ijms23169087. [PMID: 36012350 PMCID: PMC9408964 DOI: 10.3390/ijms23169087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/09/2022] [Accepted: 08/11/2022] [Indexed: 11/16/2022] Open
Abstract
The classification of high dimensional gene expression data is key to the development of effective diagnostic and prognostic tools. Feature selection involves finding the best subset with the highest power in predicting class labels. Here, we conducted a comparative study focused on different combinations of feature selectors (Chi-Squared, mRMR, Relief-F, and Genetic Algorithms) and classification learning algorithms (Random Forests, PLS-DA, SVM, Regularized Logistic/Multinomial Regression, and kNN) to identify those with the best predictive capacity. The performance of each combination is evaluated through an empirical study on three benchmark cancer-related microarray datasets. Our results first suggest that the quality of the data relevant to the target classes is key for the successful classification of cancer phenotypes. We also proved that, for a given classification learning algorithm and dataset, all filters have a similar performance. Interestingly, filters achieve comparable or even better results with respect to the GA-based wrappers, while also being easier and faster to implement. Taken together, our findings suggest that simple, well-established feature selectors in combination with optimized classifiers guarantee good performances, with no need for complicated and computationally demanding methodologies.
Collapse
Affiliation(s)
- Luca Zanella
- Department of Industrial Engineering (DII), University of Padova, 35131 Padova, Italy
| | - Pierantonio Facco
- Department of Industrial Engineering (DII), University of Padova, 35131 Padova, Italy
| | - Fabrizio Bezzo
- Department of Industrial Engineering (DII), University of Padova, 35131 Padova, Italy
| | - Elisa Cimetta
- Department of Industrial Engineering (DII), University of Padova, 35131 Padova, Italy
- Fondazione Istituto di Ricerca Pediatrica Città della Speranza (IRP), 35127 Padova, Italy
- Correspondence:
| |
Collapse
|
29
|
Performance Analysis of Ovarian Cancer Detection and Classification for Microarray Gene Data. BIOMED RESEARCH INTERNATIONAL 2022; 2022:6750457. [PMID: 35872866 PMCID: PMC9307352 DOI: 10.1155/2022/6750457] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Accepted: 06/30/2022] [Indexed: 11/18/2022]
Abstract
The most common gynecologic cancer, behind cervical and uterine, is ovarian cancer. Ovarian cancer is a severe concern for women. Abnormal cells form and spread throughout the body. Ovarian cancer microarray data can diagnose and prognosis. Typically, ovarian cancer microarray data contains tens of thousands of genes. In order to reduce computational complexity, selecting the most critical genes or attributes in the entire dataset is necessary. Because microarray datasets have limited samples and many characteristics, classifier detection lags. So, dimensionality reduction measures are essential to protect disease classification genes. In this research, initially the ANOVA method is used for gene selection and then two clustering-based and three transform-based feature extraction methods, namely, Fuzzy C Means, Softmax Discriminant Algorithm (SDA), Hilbert Transform, Fast Fourier Transform (FFT), and Discrete Cosine Transform (DCT), respectively, are used to select relevant genes further. Six classifiers further classify the features as normal and abnormal. The NLR classifier gives the highest accuracy for SDA features at 92%, and KNN gives the lowest accuracy of 55% for SDA, Hilbert, and DCT features. With correlation distance feature selection, the NLR classifier attains the lowest accuracy of 53%, and the highest accuracy of 88% is obtained by the GMM classifier.
Collapse
|
30
|
Azadifar S, Rostami M, Berahmand K, Moradi P, Oussalah M. Graph-based relevancy-redundancy gene selection method for cancer diagnosis. Comput Biol Med 2022; 147:105766. [DOI: 10.1016/j.compbiomed.2022.105766] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 06/12/2022] [Accepted: 06/18/2022] [Indexed: 11/26/2022]
|
31
|
Feature Subset Selection with Optimal Adaptive Neuro-Fuzzy Systems for Bioinformatics Gene Expression Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1698137. [PMID: 35607459 PMCID: PMC9124108 DOI: 10.1155/2022/1698137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 04/20/2022] [Accepted: 04/27/2022] [Indexed: 01/28/2023]
Abstract
Recently, bioinformatics and computational biology-enabled applications such as gene expression analysis, cellular restoration, medical image processing, protein structure examination, and medical data classification utilize fuzzy systems in offering effective solutions and decisions. The latest developments of fuzzy systems with artificial intelligence techniques enable to design the effective microarray gene expression classification models. In this aspect, this study introduces a novel feature subset selection with optimal adaptive neuro-fuzzy inference system (FSS-OANFIS) for gene expression classification. The major aim of the FSS-OANFIS model is to detect and classify the gene expression data. To accomplish this, the FSS-OANFIS model designs an improved grey wolf optimizer-based feature selection (IGWO-FS) model to derive an optimal subset of features. Besides, the OANFIS model is employed for gene classification and the parameter tuning of the ANFIS model is adjusted by the use of coyote optimization algorithm (COA). The application of IGWO-FS and COA techniques helps in accomplishing enhanced microarray gene expression classification outcomes. The experimental validation of the FSS-OANFIS model has been performed using Leukemia, Prostate, DLBCL Stanford, and Colon Cancer datasets. The proposed FSS-OANFIS model has resulted in a maximum classification accuracy of 89.47%.
Collapse
|
32
|
Red Fox Optimizer with Data-Science-Enabled Microarray Gene Expression Classification Model. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12094172] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Microarray data examination is a relatively new technology that intends to determine the proper treatment for various diseases and a precise medical diagnosis by analyzing a massive number of genes in various experimental conditions. The conventional data classification techniques suffer from overfitting and the high dimensionality of gene expression data. Therefore, the feature (gene) selection approach plays a vital role in handling a high dimensionality of data. Data science concepts can be widely employed in several data classification problems, and they identify different class labels. In this aspect, we developed a novel red fox optimizer with deep-learning-enabled microarray gene expression classification (RFODL-MGEC) model. The presented RFODL-MGEC model aims to improve classification performance by selecting appropriate features. The RFODL-MGEC model uses a novel red fox optimizer (RFO)-based feature selection approach for deriving an optimal subset of features. Moreover, the RFODL-MGEC model involves a bidirectional cascaded deep neural network (BCDNN) for data classification. The parameters involved in the BCDNN technique were tuned using the chaos game optimization (CGO) algorithm. Comprehensive experiments on benchmark datasets indicated that the RFODL-MGEC model accomplished superior results for subtype classifications. Therefore, the RFODL-MGEC model was found to be effective for the identification of various classes for high-dimensional and small-scale microarray data.
Collapse
|
33
|
Zhong P, Wei X, Li X, Wei X, Wu S, Huang W, Koidis A, Xu Z, Lei H. Untargeted metabolomics by liquid chromatography‐mass spectrometry for food authentication: A review. Compr Rev Food Sci Food Saf 2022; 21:2455-2488. [DOI: 10.1111/1541-4337.12938] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 02/20/2022] [Accepted: 02/21/2022] [Indexed: 12/17/2022]
Affiliation(s)
- Peng Zhong
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Xiaoqun Wei
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Xiangmei Li
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Xiaoyi Wei
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Shaozong Wu
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Weijuan Huang
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Anastasios Koidis
- Institute for Global Food Security Queen's University Belfast Belfast UK
| | - Zhenlin Xu
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
| | - Hongtao Lei
- Guangdong Provincial Key Laboratory of Food Quality and Safety / National–Local Joint Engineering Research Center for Precision Machining and Safety of Livestock and Poultry Products, College of Food Science South China Agricultural University Guangzhou 510642 China
- Guangdong Laboratory for Lingnan Modern Agriculture South China Agricultural University Guangzhou 510642 China
| |
Collapse
|
34
|
Tahmouresi A, Rashedi E, Yaghoobi MM, Rezaei M. Gene selection using pyramid gravitational search algorithm. PLoS One 2022; 17:e0265351. [PMID: 35290401 PMCID: PMC8923457 DOI: 10.1371/journal.pone.0265351] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 02/28/2022] [Indexed: 11/24/2022] Open
Abstract
Genetics play a prominent role in the development and progression of malignant neoplasms. Identification of the relevant genes is a high-dimensional data processing problem. Pyramid gravitational search algorithm (PGSA), a hybrid method in which the number of genes is cyclically reduced is proposed to conquer the curse of dimensionality. PGSA consists of two elements, a filter and a wrapper method (inspired by the gravitational search algorithm) which iterates through cycles. The genes selected in each cycle are passed on to the subsequent cycles to further reduce the dimension. PGSA tries to maximize the classification accuracy using the most informative genes while reducing the number of genes. Results are reported on a multi-class microarray gene expression dataset for breast cancer. Several feature selection algorithms have been implemented to have a fair comparison. The PGSA ranked first in terms of accuracy (84.5%) with 73 genes. To check if the selected genes are meaningful in terms of patient’s survival and response to therapy, protein-protein interaction network analysis has been applied on the genes. An interesting pattern was emerged when examining the genetic network. HSP90AA1, PTK2 and SRC genes were amongst the top-rated bottleneck genes, and DNA damage, cell adhesion and migration pathways are highly enriched in the network.
Collapse
Affiliation(s)
| | - Esmat Rashedi
- Department of Electrical and Computer Engineering, Graduate University of Advanced Technology, Kerman, Iran
- * E-mail:
| | - Mohammad Mehdi Yaghoobi
- Department of Biotechnology, Institute of Science and High Technology and Environmental Sciences, Graduate University of Advanced Technology, Kerman, Iran
| | - Masoud Rezaei
- Faculty of Medicine, Kerman University of Medical Sciences, Kerman, Iran
| |
Collapse
|
35
|
Li F, Zhou Y, Zhang Y, Yin J, Qiu Y, Gao J, Zhu F. POSREG: proteomic signature discovered by simultaneously optimizing its reproducibility and generalizability. Brief Bioinform 2022; 23:6532538. [PMID: 35183059 DOI: 10.1093/bib/bbac040] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/21/2022] [Accepted: 01/27/2022] [Indexed: 12/17/2022] Open
Abstract
Mass spectrometry-based proteomic technique has become indispensable in current exploration of complex and dynamic biological processes. Instrument development has largely ensured the effective production of proteomic data, which necessitates commensurate advances in statistical framework to discover the optimal proteomic signature. Current framework mainly emphasizes the generalizability of the identified signature in predicting the independent data but neglects the reproducibility among signatures identified from independently repeated trials on different sub-dataset. These problems seriously restricted the wide application of the proteomic technique in molecular biology and other related directions. Thus, it is crucial to enable the generalizable and reproducible discovery of the proteomic signature with the subsequent indication of phenotype association. However, no such tool has been developed and available yet. Herein, an online tool, POSREG, was therefore constructed to identify the optimal signature for a set of proteomic data. It works by (i) identifying the proteomic signature of good reproducibility and aggregating them to ensemble feature ranking by ensemble learning, (ii) assessing the generalizability of ensemble feature ranking to acquire the optimal signature and (iii) indicating the phenotype association of discovered signature. POSREG is unique in its capacity of discovering the proteomic signature by simultaneously optimizing its reproducibility and generalizability. It is now accessible free of charge without any registration or login requirement at https://idrblab.org/posreg/.
Collapse
Affiliation(s)
- Fengcheng Li
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ying Zhou
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang 310000, China
| | - Ying Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou, Zhejiang 310000, China
| | - Jianqing Gao
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, Zhejiang, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
36
|
Xavier D, Floris C, Fabrice P, Angoulvant D, Mewton N, Roubille F, Pascal R, Marc F, Valérie M, Laurane C, Alain F, Gabriel G, Loïc B, Delphine MP. Post-infarct cardiac remodeling predictions with machine learning. Int J Cardiol 2022; 355:1-4. [DOI: 10.1016/j.ijcard.2022.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Revised: 02/04/2022] [Accepted: 02/07/2022] [Indexed: 11/05/2022]
|
37
|
Cao Y. Possible relationship between the somatic mutations and the formation of cancers. BIO WEB OF CONFERENCES 2022. [DOI: 10.1051/bioconf/20225501009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Cancer is one of the most life-threatening diseases and has been studied for more than 3 thousand years (earliest records of cancer research is 1500BC). But there are still insufficient number of efficient treatments for cancer. This is a review started with introducing the cancer and somatic mutations by explaining the hallmarks of cancer, followed by, the discussion of few types of mutations, which may be potential targets regarding to the therapeutic treatments. Also, some potential targets related to those mutations are listed, such as, pRb proteins with its two subunits (p130 and p107), reverse transcriptase telomerase (TERT), shelterin complex and so on. The statement “cancer is caused by accumulation of somatic mutations” can be supported by the positive correlation between cancer and age. In addition, some mutations, which have contribution on increasing mutation frequencies, has been proved to be the factors of cancer. For example, xeroderma pigmentosum, mutations on DNA MMR rep air and BRCA1 and BRCA2 mutations. This overview of the relationship between cancer and those somatic mutations, which may provide potentials for further cancer treatments.
Collapse
|