1
|
Al-Shalif SA, Senan N, Saeed F, Ghaban W, Ibrahim N, Aamir M, Sharif W. A systematic literature review on meta-heuristic based feature selection techniques for text classification. PeerJ Comput Sci 2024; 10:e2084. [PMID: 38983195 PMCID: PMC11232610 DOI: 10.7717/peerj-cs.2084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 05/03/2024] [Indexed: 07/11/2024]
Abstract
Feature selection (FS) is a critical step in many data science-based applications, especially in text classification, as it includes selecting relevant and important features from an original feature set. This process can improve learning accuracy, streamline learning duration, and simplify outcomes. In text classification, there are often many excessive and unrelated features that impact performance of the applied classifiers, and various techniques have been suggested to tackle this problem, categorized as traditional techniques and meta-heuristic (MH) techniques. In order to discover the optimal subset of features, FS processes require a search strategy, and MH techniques use various strategies to strike a balance between exploration and exploitation. The goal of this research article is to systematically analyze the MH techniques used for FS between 2015 and 2022, focusing on 108 primary studies from three different databases such as Scopus, Science Direct, and Google Scholar to identify the techniques used, as well as their strengths and weaknesses. The findings indicate that MH techniques are efficient and outperform traditional techniques, with the potential for further exploration of MH techniques such as Ringed Seal Search (RSS) to improve FS in several applications.
Collapse
Affiliation(s)
- Sarah Abdulkarem Al-Shalif
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Norhalina Senan
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Faisal Saeed
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, University of Birmingham, Birmingham, United Kingdom
| | - Wad Ghaban
- Applied College, University of Tabuk, Tabuk, Saudi Arabia
| | - Noraini Ibrahim
- Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, Johor, Malaysia
| | - Muhammad Aamir
- School of Electronics, Computing and Mathematics,, University of Derby, Derby, United Kingdom
| | - Wareesa Sharif
- Faculty of Computing, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
| |
Collapse
|
2
|
Nissar I, Alam S, Masood S, Kashif M. MOB-CBAM: A dual-channel attention-based deep learning generalizable model for breast cancer molecular subtypes prediction using mammograms. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 248:108121. [PMID: 38531147 DOI: 10.1016/j.cmpb.2024.108121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Revised: 02/15/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024]
Abstract
BACKGROUND AND OBJECTIVE Deep Learning models have emerged as a significant tool in generating efficient solutions for complex problems including cancer detection, as they can analyze large amounts of data with high efficiency and performance. Recent medical studies highlight the significance of molecular subtype detection in breast cancer, aiding the development of personalized treatment plans as different subtypes of cancer respond better to different therapies. METHODS In this work, we propose a novel lightweight dual-channel attention-based deep learning model MOB-CBAM that utilizes the backbone of MobileNet-V3 architecture with a Convolutional Block Attention Module to make highly accurate and precise predictions about breast cancer. We used the CMMD mammogram dataset to evaluate the proposed model in our study. Nine distinct data subsets were created from the original dataset to perform coarse and fine-grained predictions, enabling it to identify masses, calcifications, benign, malignant tumors and molecular subtypes of cancer, including Luminal A, Luminal B, HER-2 Positive, and Triple Negative. The pipeline incorporates several image pre-processing techniques, including filtering, enhancement, and normalization, for enhancing the model's generalization ability. RESULTS While identifying benign versus malignant tumors, i.e., coarse-grained classification, the MOB-CBAM model produced exceptional results with 99 % accuracy, precision, recall, and F1-score values of 0.99 and MCC of 0.98. In terms of fine-grained classification, the MOB-CBAM model has proven to be highly efficient in accurately identifying mass with (benign/malignant) and calcification with (benign/malignant) classification tasks with an impressive accuracy rate of 98 %. We have also cross-validated the efficiency of the proposed MOB-CBAM deep learning architecture on two datasets: MIAS and CBIS-DDSM. On the MIAS dataset, an accuracy of 97 % was reported for the task of classifying benign, malignant, and normal images, while on the CBIS-DDSM dataset, an accuracy of 98 % was achieved for the classification of mass with either benign or malignant, and calcification with benign and malignant tumors. CONCLUSION This study presents lightweight MOB-CBAM, a novel deep learning framework, to address breast cancer diagnosis and subtype prediction. The model's innovative incorporation of the CBAM enhances precise predictions. The extensive evaluation of the CMMD dataset and cross-validation on other datasets affirm the model's efficacy.
Collapse
Affiliation(s)
- Iqra Nissar
- Department of Computer Engineering, Jamia Millia Islamia (A Central University), New Delhi, 110025, India.
| | - Shahzad Alam
- Department of Computer Engineering, Jamia Millia Islamia (A Central University), New Delhi, 110025, India
| | - Sarfaraz Masood
- Department of Computer Engineering, Jamia Millia Islamia (A Central University), New Delhi, 110025, India
| | - Mohammad Kashif
- Department of Computer Engineering, Jamia Millia Islamia (A Central University), New Delhi, 110025, India
| |
Collapse
|
3
|
Sajjad Ahmed Nadeem M, Hammad Waseem M, Aziz W, Habib U, Masood A, Attique Khan M. Hybridizing Artificial Neural Networks Through Feature Selection Based Supervised Weight Initialization and Traditional Machine Learning Algorithms for Improved Colon Cancer Prediction. IEEE ACCESS 2024; 12:97099-97114. [DOI: 10.1109/access.2024.3422317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Affiliation(s)
- Malik Sajjad Ahmed Nadeem
- Department of Computer Science and Information Technology, The University of Azad Jammu and Kashmir, Muzaffarabad, Pakistan
| | - Muhammad Hammad Waseem
- Department of Computer Science and Information Technology, The University of Azad Jammu and Kashmir, Muzaffarabad, Pakistan
| | - Wajid Aziz
- Department of Computer Science and Information Technology, The University of Azad Jammu and Kashmir, Muzaffarabad, Pakistan
| | - Usman Habib
- Software Engineering Department, FAST School of Computing, National University of Computer and Emerging Sciences, Islamabad, Pakistan
| | - Anum Masood
- Department of Physics, Norwegian University of Science and Technology, Trondheim, Norway
| | - Muhammad Attique Khan
- Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon
| |
Collapse
|
4
|
Munquad S, Das AB. DeepAutoGlioma: a deep learning autoencoder-based multi-omics data integration and classification tools for glioma subtyping. BioData Min 2023; 16:32. [PMID: 37968655 PMCID: PMC10652591 DOI: 10.1186/s13040-023-00349-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 11/06/2023] [Indexed: 11/17/2023] Open
Abstract
BACKGROUND AND OBJECTIVE The classification of glioma subtypes is essential for precision therapy. Due to the heterogeneity of gliomas, the subtype-specific molecular pattern can be captured by integrating and analyzing high-throughput omics data from different genomic layers. The development of a deep-learning framework enables the integration of multi-omics data to classify the glioma subtypes to support the clinical diagnosis. RESULTS Transcriptome and methylome data of glioma patients were preprocessed, and differentially expressed features from both datasets were identified. Subsequently, a Cox regression analysis determined genes and CpGs associated with survival. Gene set enrichment analysis was carried out to examine the biological significance of the features. Further, we identified CpG and gene pairs by mapping them in the promoter region of corresponding genes. The methylation and gene expression levels of these CpGs and genes were embedded in a lower-dimensional space with an autoencoder. Next, ANN and CNN were used to classify subtypes using the latent features from embedding space. CNN performs better than ANN for subtyping lower-grade gliomas (LGG) and glioblastoma multiforme (GBM). The subtyping accuracy of CNN was 98.03% (± 0.06) and 94.07% (± 0.01) in LGG and GBM, respectively. The precision of the models was 97.67% in LGG and 90.40% in GBM. The model sensitivity was 96.96% in LGG and 91.18% in GBM. Additionally, we observed the superior performance of CNN with external datasets. The genes and CpGs pairs used to develop the model showed better performance than the random CpGs-gene pairs, preprocessed data, and single omics data. CONCLUSIONS The current study showed that a novel feature selection and data integration strategy led to the development of DeepAutoGlioma, an effective framework for diagnosing glioma subtypes.
Collapse
Affiliation(s)
- Sana Munquad
- Department of Biotechnology, National Institute of Technology Warangal, Warangal, Telangana, 506004, India
| | - Asim Bikas Das
- Department of Biotechnology, National Institute of Technology Warangal, Warangal, Telangana, 506004, India.
| |
Collapse
|
5
|
Khatun R, Akter M, Islam MM, Uddin MA, Talukder MA, Kamruzzaman J, Azad AKM, Paul BK, Almoyad MAA, Aryal S, Moni MA. Cancer Classification Utilizing Voting Classifier with Ensemble Feature Selection Method and Transcriptomic Data. Genes (Basel) 2023; 14:1802. [PMID: 37761941 PMCID: PMC10530870 DOI: 10.3390/genes14091802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/10/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and 94.34% for the 11-tumor dataset. This study concludes by identifying a subset of the most important cancer-causing genes and demonstrating their significance compared to the original data. The proposed approach surpasses existing strategies in accuracy and stability, significantly impacting the development of ML-based gene analysis. It detects vital genes with higher precision and stability than other existing methods.
Collapse
Affiliation(s)
- Rabea Khatun
- Department of Computer Science and Engineering, Green University of Bangladesh, Dhaka 1207, Bangladesh;
| | - Maksuda Akter
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Md. Manowarul Islam
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Md. Ashraf Uddin
- School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia; (M.A.U.); (S.A.)
| | - Md. Alamin Talukder
- Department of Computer Science and Engineering, Jagannath University, Dhaka 1100, Bangladesh; (M.A.); (M.A.T.)
| | - Joarder Kamruzzaman
- Centre for Smart Analytics, Federation University Australia, Ballarat, VIC 3842, Australia;
| | - AKM Azad
- Department of Mathematics and Statistics, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11564, Saudi Arabia;
| | - Bikash Kumar Paul
- Department of Information and Communication Technology, Mawlana Bhashani Science and Technology University, Tangail 1902, Bangladesh;
- Department of Software Engineering, Daffodil International University (DIU), Dhaka 1342, Bangladesh
| | - Muhammad Ali Abdulllah Almoyad
- Department of Basic Medical Sciences, College of Applied Medical Sciences in Khamis Mushyt King Khalid University, Abha 61412, Saudi Arabia;
| | - Sunil Aryal
- School of Information Technology, Deakin University, Waurn Ponds Campus, Geelong, VIC 3125, Australia; (M.A.U.); (S.A.)
| | - Mohammad Ali Moni
- Artificial Intelligence & Data Science, School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St Lucia, QLD 4072, Australia
| |
Collapse
|
6
|
M S K, Rajaguru H, Nair AR. Evaluation and Exploration of Machine Learning and Convolutional Neural Network Classifiers in Detection of Lung Cancer from Microarray Gene-A Paradigm Shift. Bioengineering (Basel) 2023; 10:933. [PMID: 37627818 PMCID: PMC10451477 DOI: 10.3390/bioengineering10080933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 08/03/2023] [Accepted: 08/04/2023] [Indexed: 08/27/2023] Open
Abstract
Microarray gene expression-based detection and classification of medical conditions have been prominent in research studies over the past few decades. However, extracting relevant data from the high-volume microarray gene expression with inherent nonlinearity and inseparable noise components raises significant challenges during data classification and disease detection. The dataset used for the research is the Lung Harvard 2 Dataset (LH2) which consists of 150 Adenocarcinoma subjects and 31 Mesothelioma subjects. The paper proposes a two-level strategy involving feature extraction and selection methods before the classification step. The feature extraction step utilizes Short Term Fourier Transform (STFT), and the feature selection step employs Particle Swarm Optimization (PSO) and Harmonic Search (HS) metaheuristic methods. The classifiers employed are Nonlinear Regression, Gaussian Mixture Model, Softmax Discriminant, Naive Bayes, SVM (Linear), SVM (Polynomial), and SVM (RBF). The two-level extracted relevant features are compared with raw data classification results, including Convolutional Neural Network (CNN) methodology. Among the methods, STFT with PSO feature selection and SVM (RBF) classifier produced the highest accuracy of 94.47%.
Collapse
Affiliation(s)
- Karthika M S
- Department of Information Technology, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| | - Harikumar Rajaguru
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| | - Ajin R. Nair
- Department of Electronics and Communication Engineering, Bannari Amman Institute of Technology, Sathyamangalam 638401, India;
| |
Collapse
|
7
|
Vázquez-Blomquist D, Ramón AC, Rosales M, Pérez GV, Rosales A, Palenzuela D, Perera Y, Perea SE. Gene expression profiling unveils the temporal dynamics of CIGB-300-regulated transcriptome in AML cell lines. BMC Genomics 2023; 24:373. [PMID: 37400761 DOI: 10.1186/s12864-023-09472-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 06/20/2023] [Indexed: 07/05/2023] Open
Abstract
BACKGROUND Protein kinase CK2 activity is implicated in the pathogenesis of various hematological malignancies like Acute Myeloid Leukemia (AML) that remains challenging concerning treatment. This kinase has emerged as an attractive molecular target in therapeutic. Antitumoral peptide CIGB-300 blocks CK2 phospho-acceptor sites on their substrates but it also binds to CK2α catalytic subunit. Previous proteomic and phosphoproteomic experiments showed molecular and cellular processes with relevance for the peptide action in diverse AML backgrounds but earlier transcriptional level events might also support the CIGB-300 anti-leukemic effect. Here we used a Clariom S HT assay for gene expression profiling to study the molecular events supporting the anti-leukemic effect of CIGB-300 peptide on HL-60 and OCI-AML3 cell lines. RESULTS We found 183 and 802 genes appeared significantly modulated in HL-60 cells at 30 min and 3 h of incubation with CIGB-300 for p < 0.01 and FC > = │1.5│, respectively; while 221 and 332 genes appeared modulated in OCI-AML3 cells. Importantly, functional enrichment analysis evidenced that genes and transcription factors related to apoptosis, cell cycle, leukocyte differentiation, signaling by cytokines/interleukins, and NF-kB, TNF signaling pathways were significantly represented in AML cells transcriptomic profiles. The influence of CIGB-300 on these biological processes and pathways is dependent on the cellular background, in the first place, and treatment duration. Of note, the impact of the peptide on NF-kB signaling was corroborated by the quantification of selected NF-kB target genes, as well as the measurement of p50 binding activity and soluble TNF-α induction. Quantification of CSF1/M-CSF and CDKN1A/P21 by qPCR supports peptide effects on differentiation and cell cycle. CONCLUSIONS We explored for the first time the temporal dynamics of the gene expression profile regulated by CIGB-300 which, along with the antiproliferative mechanism, can stimulate immune responses by increasing immunomodulatory cytokines. We provided fresh molecular clues concerning the antiproliferative effect of CIGB-300 in two relevant AML backgrounds.
Collapse
Affiliation(s)
- Dania Vázquez-Blomquist
- Pharmacogenomic Group, Department of System Biology, Biomedical Research Division, Center for Genetic Engineering & Biotechnology (CIGB), 10600, Havana, Cuba.
| | - Ailyn C Ramón
- Molecular Oncology Group, Department of Pharmaceuticals, Biomedical Research Division, CIGB, 10600, Havana, Cuba
| | - Mauro Rosales
- Molecular Oncology Group, Department of Pharmaceuticals, Biomedical Research Division, CIGB, 10600, Havana, Cuba
- Department of Animal and Human Biology, Faculty of Biology, University of Havana (UH), 10400, Havana, Cuba
| | - George V Pérez
- Molecular Oncology Group, Department of Pharmaceuticals, Biomedical Research Division, CIGB, 10600, Havana, Cuba
| | - Ailenis Rosales
- Department of Animal and Human Biology, Faculty of Biology, University of Havana (UH), 10400, Havana, Cuba
| | - Daniel Palenzuela
- Pharmacogenomic Group, Department of System Biology, Biomedical Research Division, Center for Genetic Engineering & Biotechnology (CIGB), 10600, Havana, Cuba
| | - Yasser Perera
- Molecular Oncology Group, Department of Pharmaceuticals, Biomedical Research Division, CIGB, 10600, Havana, Cuba.
- China-Cuba Biotechnology Joint Innovation Center (CCBJIC), Hunan Province, Yongzhou Zhong Gu Biotechnology Co., Ltd, Lengshuitan District, Yongzhou City, 425000, China.
| | - Silvio E Perea
- Molecular Oncology Group, Department of Pharmaceuticals, Biomedical Research Division, CIGB, 10600, Havana, Cuba.
| |
Collapse
|
8
|
Zhang B, Shi H, Wang H. Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach. J Multidiscip Healthc 2023; 16:1779-1791. [PMID: 37398894 PMCID: PMC10312208 DOI: 10.2147/jmdh.s410301] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 06/12/2023] [Indexed: 07/04/2023] Open
Abstract
Cancer is a leading cause of morbidity and mortality worldwide. While progress has been made in the diagnosis, prognosis, and treatment of cancer patients, individualized and data-driven care remains a challenge. Artificial intelligence (AI), which is used to predict and automate many cancers, has emerged as a promising option for improving healthcare accuracy and patient outcomes. AI applications in oncology include risk assessment, early diagnosis, patient prognosis estimation, and treatment selection based on deep knowledge. Machine learning (ML), a subset of AI that enables computers to learn from training data, has been highly effective at predicting various types of cancer, including breast, brain, lung, liver, and prostate cancer. In fact, AI and ML have demonstrated greater accuracy in predicting cancer than clinicians. These technologies also have the potential to improve the diagnosis, prognosis, and quality of life of patients with various illnesses, not just cancer. Therefore, it is important to improve current AI and ML technologies and to develop new programs to benefit patients. This article examines the use of AI and ML algorithms in cancer prediction, including their current applications, limitations, and future prospects.
Collapse
Affiliation(s)
- Bo Zhang
- Jinling Institute of Science and Technology, Nanjing City, Jiangsu Province, People’s Republic of China
| | - Huiping Shi
- Jinling Institute of Science and Technology, Nanjing City, Jiangsu Province, People’s Republic of China
| | - Hongtao Wang
- School of Life Science, Tonghua Normal University, Tonghua City, Jilin Province, People’s Republic of China
| |
Collapse
|
9
|
Yang Y, Wu Y, Hou M, Luo J, Xie X. Solving Emden–Fowler Equations Using Improved Extreme Learning Machine Algorithm Based on Block Legendre Basis Neural Network. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11254-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
10
|
Gokhale M, Mohanty SK, Ojha A. GeneViT: Gene Vision Transformer with Improved DeepInsight for cancer classification. Comput Biol Med 2023; 155:106643. [PMID: 36803792 DOI: 10.1016/j.compbiomed.2023.106643] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 01/03/2023] [Accepted: 02/05/2023] [Indexed: 02/09/2023]
Abstract
Analysis of gene expression data is crucial for disease prognosis and diagnosis. Gene expression data has high redundancy and noise that brings challenges in extracting disease information. Over the past decade, several conventional machine learning and deep learning models have been developed for classification of diseases using gene expressions. In recent years, vision transformer networks have shown promising performance in many fields due to their powerful attention mechanism that provides a better insight into the data characteristics. However, these network models have not been explored for gene expression analysis. In this paper, a method for classifying cancerous gene expression is presented that uses a Vision transformer. The proposed method first performs dimensionality reduction using a stacked autoencoder followed by an Improved DeepInsight algorithm that converts the data into image format. The data is then fed to the vision transformer for building the classification model. Performance of the proposed classification model is evaluated on ten benchmark datasets having binary classes or multiple classes. Its performance is also compared with nine existing classification models. The experimental results demonstrate that the proposed model outperforms existing methods. The t-SNE plots demonstrate the distinctive feature learning property of the model.
Collapse
Affiliation(s)
- Madhuri Gokhale
- Department of Computer Science & Engineering, Jabalpur Engineering College, Jabalpur, 482001, India; Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| | - Sraban Kumar Mohanty
- Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| | - Aparajita Ojha
- Computer Science & Engineering, PDPM Indian Institute of Information Technology, Design and Manufacturing, Jabalpur, 482005, India.
| |
Collapse
|
11
|
Ashraf MT, Hamid I, Nawaz Q, Ali H. Hybrid Approach using Extreme Gradient Boosting (XGBoost) and Evolutionary Algorithm for Cancer Classification. 2023 INTERNATIONAL MULTI-DISCIPLINARY CONFERENCE IN EMERGING RESEARCH TRENDS (IMCERT) 2023. [DOI: 10.1109/imcert57083.2023.10075236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Affiliation(s)
| | - Isma Hamid
- National Textie University,Department of Computer Science,Faisalabad,Pakistan
| | - Qamar Nawaz
- University of Agriculture,Department of Computer Science,Faisalabad,Pakistan
| | - Hamid Ali
- National Textile University,Department of Computer Science,Faisalabad,Pakistan
| |
Collapse
|
12
|
Ramesh P, Karuppasamy R, Veerappapillai S. Machine learning driven drug repurposing strategy for identification of potential RET inhibitors against non-small cell lung cancer. Med Oncol 2023; 40:56. [PMID: 36542155 PMCID: PMC9769489 DOI: 10.1007/s12032-022-01924-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022]
Abstract
Non-small cell lung cancer (NSCLC) remains the leading cause of mortality and morbidity worldwide accounting about 85% of total lung cancer cases. The receptor REarranged during Transfection (RET) plays an important role by ligand independent activation of kinase domain resulting in carcinogenesis. Presently, the treatment for RET driven NSCLC is limited to multiple kinase inhibitors. This situation necessitates the discovery of novel and potent RET specific inhibitors. Thus, we employed high throughput screening strategy to repurpose FDA approved compounds from DrugBank comprising of 2509 molecules. It is worth noting that the initial screening is accomplished with the aid of in-house machine learning model built using IC50 values corresponding to 2854 compounds obtained from BindingDB repository. A total of 497 compounds (19%) were predicted as actives by our generated model. Subsequent in silico validation process such as molecular docking, MMGBSA and density function theory analysis resulted in identification of two lead compounds named DB09313 and DB00471. The simulation study highlights the potency of DB00471 (Montelukast) as potential RET inhibitor among the investigated compounds. In the end, the half-minimal inhibitory activity of montelukast was also predicted against RET protein expressing LC-2/ad cell lines demonstrated significant anticancer activity. Collective analysis from our study highlights that montelukast could be a promising candidate for the management of RET specific NSCLC.
Collapse
Affiliation(s)
- Priyanka Ramesh
- grid.412813.d0000 0001 0687 4946Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu India
| | - Ramanathan Karuppasamy
- grid.412813.d0000 0001 0687 4946Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu India
| | - Shanthi Veerappapillai
- grid.412813.d0000 0001 0687 4946Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu India
| |
Collapse
|
13
|
Gupta S, Gupta MK, Shabaz M, Sharma A. Deep learning techniques for cancer classification using microarray gene expression data. Front Physiol 2022; 13:952709. [PMID: 36246115 PMCID: PMC9563992 DOI: 10.3389/fphys.2022.952709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 09/01/2022] [Indexed: 11/28/2022] Open
Abstract
Cancer is one of the top causes of death globally. Recently, microarray gene expression data has been used to aid in cancer’s effective and early detection. The use of DNA microarray technology to uncover information from the expression levels of thousands of genes has enormous promise. The DNA microarray technique can determine the levels of thousands of genes simultaneously in a single experiment. The analysis of gene expression is critical in many disciplines of biological study to obtain the necessary information. This study analyses all the research studies focused on optimizing gene selection for cancer detection using artificial intelligence. One of the most challenging issues is figuring out how to extract meaningful information from massive databases. Deep Learning architectures have performed efficiently in numerous sectors and are used to diagnose many other chronic diseases and to assist physicians in making medical decisions. In this study, we have evaluated the results of different optimizers on a RNA sequence dataset. The Deep learning algorithm proposed in the study classifies five different forms of cancer, including kidney renal clear cell carcinoma (KIRC), Breast Invasive Carcinoma (BRCA), lung adenocarcinoma (LUAD), Prostate Adenocarcinoma (PRAD) and Colon Adenocarcinoma (COAD). The performance of different optimizers like Stochastic gradient descent (SGD), Root Mean Squared Propagation (RMSProp), Adaptive Gradient Optimizer (AdaGrad), and Adaptive Momentum (AdaM). The experimental results gathered on the dataset affirm that AdaGrad and Adam. Also, the performance analysis has been done using different learning rates and decay rates. This study discusses current advancements in deep learning-based gene expression data analysis using optimized feature selection methods.
Collapse
Affiliation(s)
- Surbhi Gupta
- Department of Computer Science and Engineering Department, SMVDU, Jammu, India
- Model Institute of Engineering and Technology, Jammu, India
| | - Manoj K. Gupta
- Department of Computer Science and Engineering Department, SMVDU, Jammu, India
| | - Mohammad Shabaz
- Model Institute of Engineering and Technology, Jammu, India
- *Correspondence: Mohammad Shabaz,
| | - Ashutosh Sharma
- School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
| |
Collapse
|
14
|
Gokhale M, Mohanty SK, Ojha A. A stacked autoencoder based gene selection and cancer classification framework. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Malakar S, Roy SD, Das S, Sen S, Velásquez JD, Sarkar R. Computer Based Diagnosis of Some Chronic Diseases: A Medical Journey of the Last Two Decades. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2022; 29:5525-5567. [PMID: 35729963 PMCID: PMC9199478 DOI: 10.1007/s11831-022-09776-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/22/2022] [Indexed: 06/15/2023]
Abstract
Disease prediction from diagnostic reports and pathological images using artificial intelligence (AI) and machine learning (ML) is one of the fastest emerging applications in recent days. Researchers are striving to achieve near-perfect results using advanced hardware technologies in amalgamation with AI and ML based approaches. As a result, a large number of AI and ML based methods are found in the literature. A systematic survey describing the state-of-the-art disease prediction methods, specifically chronic disease prediction algorithms, will provide a clear idea about the recent models developed in this field. This will also help the researchers to identify the research gaps present there. To this end, this paper looks over the approaches in the literature designed for predicting chronic diseases like Breast Cancer, Lung Cancer, Leukemia, Heart Disease, Diabetes, Chronic Kidney Disease and Liver Disease. The advantages and disadvantages of various techniques are thoroughly explained. This paper also presents a detailed performance comparison of different methods. Finally, it concludes the survey by highlighting some future research directions in this field that can be addressed through the forthcoming research attempts.
Collapse
Affiliation(s)
- Samir Malakar
- Department of Computer Science, Asutosh College, Kolkata, India
| | - Soumya Deep Roy
- Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata, India
| | - Soham Das
- Department of Metallurgical and Material Engineering, Jadavpur University, Kolkata, India
| | - Swaraj Sen
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| | - Juan D. Velásquez
- Departament of Industrial Engineering, University of Chile, Santiago, Chile
- Instituto Sistemas Complejos de Ingeniería (ISCI), Santiago, Chile
| | - Ram Sarkar
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
| |
Collapse
|
16
|
Han F, Zhu S, Ling Q, Han H, Li H, Guo X, Cao J. Gene-CWGAN: a data enhancement method for gene expression profile based on improved CWGAN-GP. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07417-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
17
|
Prabhakar SK, Ryu S, Jeong IC, Won DO. A Dual Level Analysis with Evolutionary Computing and Swarm Models for Classification of Leukemia. BIOMED RESEARCH INTERNATIONAL 2022; 2022:2052061. [PMID: 35663047 PMCID: PMC9162867 DOI: 10.1155/2022/2052061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 03/17/2022] [Accepted: 03/28/2022] [Indexed: 11/17/2022]
Abstract
One of the major reasons of mortality in human beings is cancer, and there is an absolute necessity for doctors to identify and treat a person suffering from it. Leukemia is a group of blood cancers that usually originates in the bone marrow and results in very high number of abnormal cells. For the diagnosis of cancer, microarray data serves as an important clinical application and serves as a great aid to the entire medical community. The dimensionality of the microarray data is too high, and so selection of suitable genes is quite an important step for the improvement of data classification. Therefore, for the prediction and diagnosis of cancer, there is an utmost necessity to select the most informative genes. In this work, Minimum Redundancy Maximum Relevance (MRMR), Signal to Noise Ratio (SNR), Multivariate Error Weight Uncorrelated Shrunken Centroid (EWUSC), and multivariate correlation-based feature selection (CFS) are chosen as initial feature selection techniques. Then, to select the most informative genes, five different kinds of evolutionary optimization techniques too are incorporated here such as African Buffalo Optimization (ABO), Artificial Bee Colony Optimization (ABCO), Cockroach Swarm Optimization (CSO), Imperialist Competitive Optimization (ICO), and Social Spider Optimization (SSO). Finally, the optimized values are fed through classification process and the best results are obtained when multivariate CFS with SSO is utilized and classified with Probabilistic Neural Network (PNN), and a high classification accuracy of 95.70% is obtained.
Collapse
Affiliation(s)
- Sunil Kumar Prabhakar
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, 24252 Gangwon, Republic of Korea
| | - Semin Ryu
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, 24252 Gangwon, Republic of Korea
| | - In cheol Jeong
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, 24252 Gangwon, Republic of Korea
| | - Dong-Ok Won
- Department of Artificial Intelligence Convergence, Hallym University, Chuncheon, 24252 Gangwon, Republic of Korea
| |
Collapse
|
18
|
Hilal AM, Malibari AA, Obayya M, Alzahrani JS, Alamgeer M, Mohamed A, Motwakel A, Yaseen I, Hamza MA, Zamani AS. Feature Subset Selection with Optimal Adaptive Neuro-Fuzzy Systems for Bioinformatics Gene Expression Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:1698137. [PMID: 35607459 PMCID: PMC9124108 DOI: 10.1155/2022/1698137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 04/20/2022] [Accepted: 04/27/2022] [Indexed: 01/28/2023]
Abstract
Recently, bioinformatics and computational biology-enabled applications such as gene expression analysis, cellular restoration, medical image processing, protein structure examination, and medical data classification utilize fuzzy systems in offering effective solutions and decisions. The latest developments of fuzzy systems with artificial intelligence techniques enable to design the effective microarray gene expression classification models. In this aspect, this study introduces a novel feature subset selection with optimal adaptive neuro-fuzzy inference system (FSS-OANFIS) for gene expression classification. The major aim of the FSS-OANFIS model is to detect and classify the gene expression data. To accomplish this, the FSS-OANFIS model designs an improved grey wolf optimizer-based feature selection (IGWO-FS) model to derive an optimal subset of features. Besides, the OANFIS model is employed for gene classification and the parameter tuning of the ANFIS model is adjusted by the use of coyote optimization algorithm (COA). The application of IGWO-FS and COA techniques helps in accomplishing enhanced microarray gene expression classification outcomes. The experimental validation of the FSS-OANFIS model has been performed using Leukemia, Prostate, DLBCL Stanford, and Colon Cancer datasets. The proposed FSS-OANFIS model has resulted in a maximum classification accuracy of 89.47%.
Collapse
Affiliation(s)
- Anwer Mustafa Hilal
- Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam Bin Abdulaziz University, AlKharj, Saudi Arabia
| | - Areej A. Malibari
- Department of Industrial and Systems Engineering, College of Engineering, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Marwa Obayya
- Department of Biomedical Engineering, College of Engineering, Princess Nourah Bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia
| | - Jaber S. Alzahrani
- Department of Industrial Engineering, College of Engineering Alqunfudah, Umm Al-Qura University, Mecca, Saudi Arabia
| | - Mohammad Alamgeer
- Department of Information Systems, College of Science & Art Mahayil, King Khalid University, Abha, Saudi Arabia
| | - Abdullah Mohamed
- Research Centre, Future University, Egypt, New Cairo 11845, Egypt
| | - Abdelwahed Motwakel
- Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam Bin Abdulaziz University, AlKharj, Saudi Arabia
| | - Ishfaq Yaseen
- Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam Bin Abdulaziz University, AlKharj, Saudi Arabia
| | - Manar Ahmed Hamza
- Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam Bin Abdulaziz University, AlKharj, Saudi Arabia
| | - Abu Sarwar Zamani
- Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam Bin Abdulaziz University, AlKharj, Saudi Arabia
| |
Collapse
|
19
|
Aziz RM. Application of nature inspired soft computing techniques for gene selection: a novel frame work for classification of cancer. Soft comput 2022. [DOI: 10.1007/s00500-022-07032-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
20
|
Gelman A, Furman E, Kalinina N, Malinin S, Furman G, Sheludko V, Sokolovsky V. Computer-Aided Detection of Respiratory Sounds in Bronchial Asthma Patients Based on Machine Learning Method. Sovrem Tekhnologii Med 2022; 14:45-51. [PMID: 37181833 PMCID: PMC10171063 DOI: 10.17691/stm2022.14.5.05] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Indexed: 05/16/2023] Open
Abstract
The aim of the study is to develop a method for detection of pathological respiratory sound, caused by bronchial asthma, with the aid of machine learning techniques. Materials and Methods To build and train neural networks, we used the records of respiratory sounds of bronchial asthma patients at different stages of the disease (n=951) aged from several months to 47 years old and healthy volunteers (n=167). The sounds were recorded with calm breathing at four points: at the oral cavity, above the trachea, on the chest (second intercostal space on the right side), and at a point on the back. Results The method developed for computer-aided detection of respiratory sounds allows to diagnose sounds typical for bronchial asthma in 89.4% of cases with 89.3% sensitivity and 86.0% specificity regardless of sex and age of the patients, stage of the disease, and the point of sound recording.
Collapse
Affiliation(s)
- A. Gelman
- Laboratory Engineer, Department of Physics; Ben-Gurion University of the Negev, P.O.B. 653, Beer-Sheva, 8410501, Israel
| | - E.G. Furman
- Professor, Corresponding Member of Russian Academy of Sciences, Head of Faculty and Hospital Pediatrics Department; Perm State Medical University named after Academician E.A. Wagner, 26 Petropavlovskaya St., Perm, 614990, Russia
- Corresponding author: Evgeny G. Furman, e-mail:
| | - N.M. Kalinina
- Resident; Perm State Medical University named after Academician E.A. Wagner, 26 Petropavlovskaya St., Perm, 614990, Russia
| | - S.V. Malinin
- Researcher; Perm State Medical University named after Academician E.A. Wagner, 26 Petropavlovskaya St., Perm, 614990, Russia
| | - G.B. Furman
- Professor, Department of Physics; Ben-Gurion University of the Negev, P.O.B. 653, Beer-Sheva, 8410501, Israel
| | - V.S. Sheludko
- Leading Researcher, Central Scientific Research Laboratory; Perm State Medical University named after Academician E.A. Wagner, 26 Petropavlovskaya St., Perm, 614990, Russia
| | - V.L. Sokolovsky
- Professor, Department of Physics; Ben-Gurion University of the Negev, P.O.B. 653, Beer-Sheva, 8410501, Israel
| |
Collapse
|
21
|
Ramesh P, Veerappapillai S. Prediction of Micronucleus Assay Outcome Using In Vivo Activity Data and Molecular Structure Features. Appl Biochem Biotechnol 2021; 193:4018-4034. [PMID: 34669110 DOI: 10.1007/s12010-021-03720-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/08/2021] [Indexed: 11/28/2022]
Abstract
In vivo micronucleus assay is the widely used genotoxic test to determine the extent of chromosomal aberrations caused by the chemicals in human beings, which plays a significant role in the drug discovery paradigm. To reduce the uncertainties of the in vivo experiments and the expenses, we intended to develop novel machine learning-based tools to predict the toxicity of the compounds with high precision. A total of 372 compounds with known toxicity information were retrieved from the PubChem Bioassay database and literature. The fingerprints and descriptors of the compounds were generated using PaDEL and ChemSAR, respectively, for the analysis. The performance of the models was assessed using the three tires of evaluation strategies such as fivefold, tenfold, and validation by external dataset. Further, structural alerts causing genotoxicity of the compounds were identified using SARpy method. Of note, fingerprint-based random forest model built in our analysis is able to demonstrate the highest accuracy of about 0.97 during tenfold cross-validation. In essence, our study highlights that structural alerts such as chlorocyclohexane and trimethylamine are likely to be the leading cause of toxicity in humans. Indeed, we believe that random forest model generated in this study is appropriate for reduction of test animals and should be considered in the future for the good practice of animal welfare.
Collapse
Affiliation(s)
- Priyanka Ramesh
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Shanthi Veerappapillai
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India.
| |
Collapse
|
22
|
Asad E, Mollah AF. Biomarker Identification From Gene Expression Based on Symmetrical Uncertainty. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES 2021. [DOI: 10.4018/ijiit.289966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this paper, we present an effective information theoretic feature selection method, Symmetrical Uncertainty to classify gene expression microarray data and detect biomarkers from it. Here, Information Gain and Symmetrical Uncertainty contribute for ranking the features. Based on computed values of Symmetrical Uncertainty, features were sorted from most informative to least informative ones. Then, the top features from the sorted list are passed to Random Forest, Logistic Regression and other well-known classifiers with Leave-One-Out cross validation to construct the best classification model(s) and accordingly select the most important genes from microarray datasets. Obtained results in terms of classification accuracy, running time, root mean square error and other parameters computed on Leukemia and Colon cancer datasets demonstrate the effectiveness of the proposed approach. The proposed method is relatively much faster than many other wrapper or ensemble methods.
Collapse
|
23
|
Cancer data classification by quantum-inspired immune clone optimization-based optimal feature selection using gene expression data: deep learning approach. DATA TECHNOLOGIES AND APPLICATIONS 2021. [DOI: 10.1108/dta-05-2020-0109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
PurposeGene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.Design/methodology/approachThe proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.FindingsThe proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.Originality/valueThis paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.
Collapse
|
24
|
Tumor Nonimmune-Microenvironment-Related Gene Expression Signature Predicts Brain Metastasis in Lung Adenocarcinoma Patients after Surgery: A Machine Learning Approach Using Gene Expression Profiling. Cancers (Basel) 2021; 13:cancers13174468. [PMID: 34503278 PMCID: PMC8430997 DOI: 10.3390/cancers13174468] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/30/2021] [Accepted: 09/02/2021] [Indexed: 12/26/2022] Open
Abstract
Simple Summary It is important to be able to predict brain metastasis in lung adenocarcinoma patients; however, research in this area is still lacking. Much of the previous work on tumor microenvironments in lung adenocarcinoma with brain metastasis concerns the tumor immune microenvironment. The importance of the tumor nonimmune microenvironment (extracellular matrix (ECM), epithelial–mesenchymal transition (EMT) feature, and angiogenesis) has been overlooked with regard to brain metastasis. We evaluated tumor nonimmune-microenvironment-related gene expression signatures that could predict brain metastasis after the surgical resection of lung adenocarcinoma using a machine learning approach. We identified a tumor nonimmune-microenvironment-related 17-gene expression signature, and this signature showed high brain metastasis predictive power in four machine learning classifiers. The immunohistochemical expression of the top three genes of the 17-gene expression signature yielded similar results to NanoString tests. Our tumor nonimmune-microenvironment-related gene expression signatures are important biological markers that can predict brain metastasis and provide patient-specific treatment options. Abstract Using a machine learning approach with a gene expression profile, we discovered a tumor nonimmune-microenvironment-related gene expression signature, including extracellular matrix (ECM) remodeling, epithelial–mesenchymal transition (EMT), and angiogenesis, that could predict brain metastasis (BM) after the surgical resection of 64 lung adenocarcinomas (LUAD). Gene expression profiling identified a tumor nonimmune-microenvironment-related 17-gene expression signature that significantly correlated with BM. Of the 17 genes, 11 were ECM-remodeling-related genes. The 17-gene expression signature showed high BM predictive power in four machine learning classifiers (areas under the receiver operating characteristic curve = 0.845 for naïve Bayes, 0.849 for support vector machine, 0.858 for random forest, and 0.839 for neural network). Subgroup analysis revealed that the BM predictive power of the 17-gene signature was higher in the early-stage LUAD than in the late-stage LUAD. Pathway enrichment analysis showed that the upregulated differentially expressed genes were mainly enriched in the ECM–receptor interaction pathway. The immunohistochemical expression of the top three genes of the 17-gene expression signature yielded similar results to NanoString tests. The tumor nonimmune-microenvironment-related gene expression signatures found in this study are important biological markers that can predict BM and provide patient-specific treatment options.
Collapse
|
25
|
Mohammed M, Mwambi H, Mboya IB, Elbashir MK, Omolo B. A stacking ensemble deep learning approach to cancer type classification based on TCGA data. Sci Rep 2021; 11:15626. [PMID: 34341396 PMCID: PMC8329290 DOI: 10.1038/s41598-021-95128-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 07/19/2021] [Indexed: 12/13/2022] Open
Abstract
Cancer tumor classification based on morphological characteristics alone has been shown to have serious limitations. Breast, lung, colorectal, thyroid, and ovarian are the most commonly diagnosed cancers among women. Precise classification of cancers into their types is considered a vital problem for cancer diagnosis and therapy. In this paper, we proposed a stacking ensemble deep learning model based on one-dimensional convolutional neural network (1D-CNN) to perform a multi-class classification on the five common cancers among women based on RNASeq data. The RNASeq gene expression data was downloaded from Pan-Cancer Atlas using GDCquery function of the TCGAbiolinks package in the R software. We used least absolute shrinkage and selection operator (LASSO) as feature selection method. We compared the results of the new proposed model with and without LASSO with the results of the single 1D-CNN and machine learning methods which include support vector machines with radial basis function, linear, and polynomial kernels; artificial neural networks; k-nearest neighbors; bagging trees. The results show that the proposed model with and without LASSO has a better performance compared to other classifiers. Also, the results show that the machine learning methods (SVM-R, SVM-L, SVM-P, ANN, KNN, and bagging trees) with under-sampling have better performance than with over-sampling techniques. This is supported by the statistical significance test of accuracy where the p-values for differences between the SVM-R and SVM-P, SVM-R and ANN, SVM-R and KNN are found to be p = 0.003, p = < 0.001, and p = < 0.001, respectively. Also, SVM-L had a significant difference compared to ANN p = 0.009. Moreover, SVM-P and ANN, SVM-P and KNN are found to be significantly different with p-values p = < 0.001 and p = < 0.001, respectively. In addition, ANN and bagging trees, ANN and KNN were found to be significantly different with p-values p = < 0.001 and p = 0.004, respectively. Thus, the proposed model can help in the early detection and diagnosis of cancer in women, and hence aid in designing early treatment strategies to improve survival.
Collapse
Affiliation(s)
- Mohanad Mohammed
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Private Bag X01, Scottsville, 3209, South Africa.
| | - Henry Mwambi
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Private Bag X01, Scottsville, 3209, South Africa
| | - Innocent B Mboya
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Private Bag X01, Scottsville, 3209, South Africa
- Department of Epidemiology and Biostatistics, Kilimanjaro Christian Medical University College (KCMUCo), P. O. Box 2240, Moshi, Tanzania
| | - Murtada K Elbashir
- College of Computer and Information Sciences, Jouf University, Sakaka, 72441, Saudi Arabia
- Faculty of Mathematical and Computer Sciences, University of Gezira, Wad Madani, 11123, Sudan
| | - Bernard Omolo
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, Private Bag X01, Scottsville, 3209, South Africa
- Division of Mathematics and Computer Science, University of South Carolina-Upstate, 800 University Way, Spartanburg, USA
- School of Public Health, Faculty of Health Sciences, University of Witwatersrand, Johannesburg, South Africa
| |
Collapse
|
26
|
Schaack D, Weigand MA, Uhle F. Comparison of machine-learning methodologies for accurate diagnosis of sepsis using microarray gene expression data. PLoS One 2021; 16:e0251800. [PMID: 33999966 PMCID: PMC8128240 DOI: 10.1371/journal.pone.0251800] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/04/2021] [Indexed: 11/27/2022] Open
Abstract
We investigate the feasibility of molecular-level sample classification of sepsis using microarray gene expression data merged by in silico meta-analysis. Publicly available data series were extracted from NCBI Gene Expression Omnibus and EMBL-EBI ArrayExpress to create a comprehensive meta-analysis microarray expression set (meta-expression set). Measurements had to be obtained via microarray-technique from whole blood samples of adult or pediatric patients with sepsis diagnosed based on international consensus definition immediately after admission to the intensive care unit. We aggregate trauma patients, systemic inflammatory response syndrome (SIRS) patients, and healthy controls in a non-septic entity. Differential expression (DE) analysis is compared with machine-learning-based solutions like decision tree (DT), random forest (RF), support vector machine (SVM), and deep-learning neural networks (DNNs). We evaluated classifier training and discrimination performance in 100 independent iterations. To test diagnostic resilience, we gradually degraded expression data in multiple levels. Clustering of expression values based on DE genes results in partial identification of sepsis samples. In contrast, RF, SVM, and DNN provide excellent diagnostic performance measured in terms of accuracy and area under the curve (>0.96 and >0.99, respectively). We prove DNNs as the most resilient methodology, virtually unaffected by targeted removal of DE genes. By surpassing most other published solutions, the presented approach substantially augments current diagnostic capability in intensive care medicine.
Collapse
Affiliation(s)
- Dominik Schaack
- Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany
- * E-mail:
| | - Markus A. Weigand
- Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Florian Uhle
- Department of Anesthesiology, Heidelberg University Hospital, Heidelberg, Germany
| |
Collapse
|
27
|
Wang Y, Zhou M, Zou Q, Xu L. Machine learning for phytopathology: from the molecular scale towards the network scale. Brief Bioinform 2021; 22:6204793. [PMID: 33787847 DOI: 10.1093/bib/bbab037] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 01/09/2021] [Accepted: 01/26/2021] [Indexed: 01/16/2023] Open
Abstract
With the increasing volume of high-throughput sequencing data from a variety of omics techniques in the field of plant-pathogen interactions, sorting, retrieving, processing and visualizing biological information have become a great challenge. Within the explosion of data, machine learning offers powerful tools to process these complex omics data by various algorithms, such as Bayesian reasoning, support vector machine and random forest. Here, we introduce the basic frameworks of machine learning in dissecting plant-pathogen interactions and discuss the applications and advances of machine learning in plant-pathogen interactions from molecular to network biology, including the prediction of pathogen effectors, plant disease resistance protein monitoring and the discovery of protein-protein networks. The aim of this review is to provide a summary of advances in plant defense and pathogen infection and to indicate the important developments of machine learning in phytopathology.
Collapse
Affiliation(s)
- Yansu Wang
- Postdoctoral Innovation Practice Base, Shenzhen Polytechnic, China
| | | | - Quan Zou
- University of Electronic Science and Technology of China
| | - Lei Xu
- Shenzhen Polytechnic, China
| |
Collapse
|
28
|
Deep Learning Feature Extraction Approach for Hematopoietic Cancer Subtype Classification. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18042197. [PMID: 33672300 PMCID: PMC7926954 DOI: 10.3390/ijerph18042197] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 02/15/2021] [Accepted: 02/19/2021] [Indexed: 11/30/2022]
Abstract
Hematopoietic cancer is a malignant transformation in immune system cells. Hematopoietic cancer is characterized by the cells that are expressed, so it is usually difficult to distinguish its heterogeneities in the hematopoiesis process. Traditional approaches for cancer subtyping use statistical techniques. Furthermore, due to the overfitting problem of small samples, in case of a minor cancer, it does not have enough sample material for building a classification model. Therefore, we propose not only to build a classification model for five major subtypes using two kinds of losses, namely reconstruction loss and classification loss, but also to extract suitable features using a deep autoencoder. Furthermore, for considering the data imbalance problem, we apply an oversampling algorithm, the synthetic minority oversampling technique (SMOTE). For validation of our proposed autoencoder-based feature extraction approach for hematopoietic cancer subtype classification, we compared other traditional feature selection algorithms (principal component analysis, non-negative matrix factorization) and classification algorithms with the SMOTE oversampling approach. Additionally, we used the Shapley Additive exPlanations (SHAP) interpretation technique in our model to explain the important gene/protein for hematopoietic cancer subtype classification. Furthermore, we compared five widely used classification algorithms, including logistic regression, random forest, k-nearest neighbor, artificial neural network and support vector machine. The results of autoencoder-based feature extraction approaches showed good performance, and the best result was the SMOTE oversampling-applied support vector machine algorithm consider both focal loss and reconstruction loss as the loss function for autoencoder (AE) feature selection approach, which produced 97.01% accuracy, 92.60% recall, 99.52% specificity, 93.54% F1-measure, 97.87% G-mean and 95.46% index of balanced accuracy as subtype classification performance measures.
Collapse
|
29
|
Liu T, Huang J, Liao T, Pu R, Liu S, Peng Y. A Hybrid Deep Learning Model for Predicting Molecular Subtypes of Human Breast Cancer Using Multimodal Data. Ing Rech Biomed 2021. [DOI: 10.1016/j.irbm.2020.12.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
30
|
Ramesh P, Veerappapillai S, Karuppasamy R. Gene expression profiling of corona virus microarray datasets to identify crucial targets in COVID-19 patients. GENE REPORTS 2020; 22:100980. [PMID: 33263093 PMCID: PMC7691848 DOI: 10.1016/j.genrep.2020.100980] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 10/03/2020] [Accepted: 11/23/2020] [Indexed: 12/23/2022]
Abstract
The current outbreak of coronavirus disease (COVID-19) has been affecting millions of people and has caused devastating mortality worldwide. Moreover, it is to be noted that cytokine storm has become an important cause for the rising mortality. However, the efforts for the development of drugs, vaccines and treatment has also been intervened due to poor understanding of host's defense mechanism and also due to the development of cytokine storm against this viral infection. Thus, a deeper understanding of the mechanism behind the immune dysregulation and cytokine storm development might give us clues for the clinical management of the severe cases. Hence, we have implemented differential gene expression analysis together with protein-protein interaction and Gene Ontology (GO) studies with the help of Severe Acute respiratory syndrome coronavirus (SARS-CoV) data sets such as GSE1739 and GSE33267 to give us more knowledge on the host immune response for the pathogenic coronavirus which in turn reduces the mortality. A total of 79 differentially-expressed genes (DEGs) were identified in our data set using the filters such as P-value and log2 fold change values of less than 0.05 and 1.5 respectively. Further, network analysis and GO studies showed that differential expression of two hub genes namely ELANE and LTF which could induce higher levels of pro-inflammatory cytokines in the lungs. We are certain that differential expression of ELANE and LTF results in an excessive inflammatory reaction known as the cytokine storm and ultimately leading to death. Therefore, targeting these key drivers of cytokine storm genes appears to be the potential therapeutic targets for combating the Severe Acute respiratory syndrome coronavirus - 2 (SARS-CoV-2) infection ultimately resulting in reduced mortality. Indeed, this predictive view may open new insights for designing an immune intervention for COVID-19 in the near future resulting in the mitigation of mortality rate.
Collapse
Affiliation(s)
- Priyanka Ramesh
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Shanthi Veerappapillai
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| | - Ramanathan Karuppasamy
- Department of Biotechnology, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, Tamil Nadu, India
| |
Collapse
|
31
|
Gu H, Xu X, Qin P, Wang J. FI-Net: Identification of Cancer Driver Genes by Using Functional Impact Prediction Neural Network. Front Genet 2020; 11:564839. [PMID: 33244318 PMCID: PMC7683798 DOI: 10.3389/fgene.2020.564839] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 09/30/2020] [Indexed: 12/24/2022] Open
Abstract
Identification of driver genes, whose mutations cause the development of tumors, is crucial for the improvement of cancer research and precision medicine. To overcome the problem that the traditional frequency-based methods cannot detect lowly recurrently mutated driver genes, researchers have focused on the functional impact of gene mutations and proposed the function-based methods. However, most of the function-based methods estimate the distribution of the null model through the non-parametric method, which is sensitive to sample size. Besides, such methods could probably lead to underselection or overselection results. In this study, we proposed a method to identify driver genes by using functional impact prediction neural network (FI-net). An artificial neural network as a parametric model was constructed to estimate the functional impact scores for genes, in which multi-omics features were used as the multivariate inputs. Then the estimation of the background distribution and the identification of driver genes were conducted in each cluster obtained by the hierarchical clustering algorithm. We applied FI-net and other 22 state-of-the-art methods to 31 datasets from The Cancer Genome Atlas project. According to the comprehensive evaluation criterion, FI-net was powerful among various datasets and outperformed the other methods in terms of the overlap fraction with Cancer Gene Census and Network of Cancer Genes database, and the consensus in predictions among methods. Furthermore, the results illustrated that FI-net can identify known and potential novel driver genes.
Collapse
Affiliation(s)
- Hong Gu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Xiaolu Xu
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Pan Qin
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, China
| | - Jia Wang
- Department of Breast Surgery, Institute of Breast Disease, Second Hospital of Dalian Medical University, Dalian, China
| |
Collapse
|
32
|
Value-Added Carp Products: Multi-Class Evaluation of Crisp Grass Carp by Machine Learning-Based Analysis of Blood Indexes. Foods 2020; 9:foods9111615. [PMID: 33172118 PMCID: PMC7694760 DOI: 10.3390/foods9111615] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 10/31/2020] [Accepted: 11/04/2020] [Indexed: 11/27/2022] Open
Abstract
Crisp grass carp products from China are becoming more prevalent in the worldwide fish market because muscle hardness is the primary desirable characteristic for consumer satisfaction of fish fillet products. Unfortunately, current instrumental methods to evaluate muscle hardness are expensive, time-consuming, and wasteful. This study sought to develop classification models for differentiating the muscle hardness of crisp grass carp on the basis of blood analysis. Out of the total 264 grass carp samples, 12 outliers from crisp grass carp group were removed based on muscle hardness (<9 N), and the remaining 252 samples were used for the analysis of seven blood indexes including hydrogen peroxide (H2O2), glucose 6-phosphate dehydrogenase (G6PD), malondialdehyde (MDA), glutathione (GSH/GSSH), red blood cells (RBC), platelet count (PLT), and lymphocytes (LY). Furthermore, six machine learning models were applied to predict the muscle hardness of grass carp based on the training (152) and testing (100) datasets obtained from the blood analysis: random forest (RF), naïve Bayes (NB), gradient boosting decision tree (GBDT), support vector machine (SVM), partial least squares regression (PLSR), and artificial neural network (ANN). The RF model exhibited the best prediction performance with a classification accuracy of 100%, specificity of 93.08%, and sensitivity of 100% for discriminating crisp grass carp muscle hardness, followed by the NB model (93.75% accuracy, 91.83% specificity, and 94% sensitivity), whereas the ANN model had the lowest prediction performance (85.42% accuracy, 81.05% specificity, and 85% sensitivity). These machine learning methods provided objective, cheap, fast, and reliable classification for in vivo crisp grass carp and also prove useful for muscle quality evaluation of other freshwater fish.
Collapse
|
33
|
Mboya IB, Mahande MJ, Mohammed M, Obure J, Mwambi HG. Prediction of perinatal death using machine learning models: a birth registry-based cohort study in northern Tanzania. BMJ Open 2020; 10:e040132. [PMID: 33077570 PMCID: PMC7574940 DOI: 10.1136/bmjopen-2020-040132] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE We aimed to determine the key predictors of perinatal deaths using machine learning models compared with the logistic regression model. DESIGN A secondary data analysis using the Kilimanjaro Christian Medical Centre (KCMC) Medical Birth Registry cohort from 2000 to 2015. We assessed the discriminative ability of models using the area under the receiver operating characteristics curve (AUC) and the net benefit using decision curve analysis. SETTING The KCMC is a zonal referral hospital located in Moshi Municipality, Kilimanjaro region, Northern Tanzania. The Medical Birth Registry is within the hospital grounds at the Reproductive and Child Health Centre. PARTICIPANTS Singleton deliveries (n=42 319) with complete records from 2000 to 2015. PRIMARY OUTCOME MEASURES Perinatal death (composite of stillbirths and early neonatal deaths). These outcomes were only captured before mothers were discharged from the hospital. RESULTS The proportion of perinatal deaths was 3.7%. There were no statistically significant differences in the predictive performance of four machine learning models except for bagging, which had a significantly lower performance (AUC 0.76, 95% CI 0.74 to 0.79, p=0.006) compared with the logistic regression model (AUC 0.78, 95% CI 0.76 to 0.81). However, in the decision curve analysis, the machine learning models had a higher net benefit (ie, the correct classification of perinatal deaths considering a trade-off between false-negatives and false-positives)-over the logistic regression model across a range of threshold probability values. CONCLUSIONS In this cohort, there was no significant difference in the prediction of perinatal deaths between machine learning and logistic regression models, except for bagging. The machine learning models had a higher net benefit, as its predictive ability of perinatal death was considerably superior over the logistic regression model. The machine learning models, as demonstrated by our study, can be used to improve the prediction of perinatal deaths and triage for women at risk.
Collapse
Affiliation(s)
- Innocent B Mboya
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
- Department of Epidemiology and Biostatistics, Institute of Public Health, Kilimanjaro Christian Medical University College, Moshi, Tanzania
| | - Michael J Mahande
- Department of Epidemiology and Biostatistics, Institute of Public Health, Kilimanjaro Christian Medical University College, Moshi, Tanzania
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
| | - Joseph Obure
- Department of Obstetrics and Gynecology, Kilimanjaro Christian Medical Center, Moshi, Tanzania
| | - Henry G Mwambi
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
| |
Collapse
|
34
|
Mallick PK, Mohapatra SK, Chae GS, Mohanty MN. Convergent learning-based model for leukemia classification from gene expression. PERSONAL AND UBIQUITOUS COMPUTING 2020; 27:1103-1110. [PMID: 33100943 PMCID: PMC7567412 DOI: 10.1007/s00779-020-01467-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 09/28/2020] [Indexed: 05/05/2023]
Abstract
Microarray data analysis is a major challenging field of research in recent days. Machine learning-based automated gene data classification is an essential aspect for diagnosis of gene related any malfunctions and diseases. As the size of the data is very large, it is essential to design a suitable classifier that can process huge amount of data. Deep learning is one of the advanced machine learning techniques to mitigate these types of problems. Due the presence of more number of hidden layers, it can easily handle the big amount of data. We have presented a method of classification to understand the convergence of training deep neural network (DNN). The assumptions are taken as the inputs do not degenerate and the network is over-parameterized. Also the number of hidden neurons is sufficiently large. Authors in this piece of work have used DNN for classifying the gene expressions data. The dataset used in the work contains the bone marrow expressions of 72 leukemia patients. A five-layer DNN classifier is designed for classifying acute lymphocyte (ALL) and acute myelocytic (AML) samples. The network is trained with 80% data and rest 20% data is considered for validation purpose. Proposed DNN classifier is providing a satisfactory result as compared to other classifiers. Two types of leukemia are classified with 98.2% accuracy, 96.59% sensitivity, and 97.9% specificity. The different types of computer-aided analyses of genes can be helpful to genetic and virology researchers as well in future generation.
Collapse
Affiliation(s)
- Pradeep Kumar Mallick
- School of Computer Engineering, KIIT (Deemed to be University), Bhubaneswar, Odisha India
| | - Saumendra Kumar Mohapatra
- Department of Computer Science and Engineering, ITER, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha India
| | - Gyoo-Soo Chae
- Division of Information and Communication Engineering, Baekseok University, Cheonan, 330-704 South Korea
| | - Mihir Narayan Mohanty
- Department of Electronics and Communication Engineering, ITER, Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha India
| |
Collapse
|
35
|
An Amalgamated Approach to Bilevel Feature Selection Techniques Utilizing Soft Computing Methods for Classifying Colon Cancer. BIOMED RESEARCH INTERNATIONAL 2020; 2020:8427574. [PMID: 33102596 PMCID: PMC7578727 DOI: 10.1155/2020/8427574] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 09/17/2020] [Accepted: 09/22/2020] [Indexed: 12/20/2022]
Abstract
One of the deadliest diseases which affects the large intestine is colon cancer. Older adults are typically affected by colon cancer though it can happen at any age. It generally starts as small benign growth of cells that forms on the inside of the colon, and later, it develops into cancer. Due to the propagation of somatic alterations that affects the gene expression, colon cancer is caused. A standardized format for assessing the expression levels of thousands of genes is provided by the DNA microarray technology. The tumors of various anatomical regions can be distinguished by the patterns of gene expression in microarray technology. As the microarray data is too huge to process due to the curse of dimensionality problem, an amalgamated approach of utilizing bilevel feature selection techniques is proposed in this paper. In the first level, the genes or the features are dimensionally reduced with the help of Multivariate Minimum Redundancy–Maximum Relevance (MRMR) technique. Then, in the second level, six optimization techniques are utilized in this work for selecting the best genes or features before proceeding to classification process. The optimization techniques considered in this work are Invasive Weed Optimization (IWO), Teaching Learning-Based Optimization (TLBO), League Championship Optimization (LCO), Beetle Antennae Search Optimization (BASO), Crow Search Optimization (CSO), and Fruit Fly Optimization (FFO). Finally, it is classified with five suitable classifiers, and the best results show when IWO is utilized with MRMR, and then classified with Quadratic Discriminant Analysis (QDA), a classification accuracy of 99.16% is obtained.
Collapse
|
36
|
Khan F, Khan M, Iqbal N, Khan S, Muhammad Khan D, Khan A, Wei DQ. Prediction of Recombination Spots Using Novel Hybrid Feature Extraction Method via Deep Learning Approach. Front Genet 2020; 11:539227. [PMID: 33093842 PMCID: PMC7527634 DOI: 10.3389/fgene.2020.539227] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 08/13/2020] [Indexed: 01/20/2023] Open
Abstract
Meiotic recombination is the driving force of evolutionary development and an important source of genetic variation. The meiotic recombination does not take place randomly in a chromosome but occurs in some regions of the chromosome. A region in chromosomes with higher rate of meiotic recombination events are considered as hotspots and a region where frequencies of the recombination events are lower are called coldspots. Prediction of meiotic recombination spots provides useful information about the basic functionality of inheritance and genome diversity. This study proposes an intelligent computational predictor called iRSpots-DNN for the identification of recombination spots. The proposed predictor is based on a novel feature extraction method and an optimized deep neural network (DNN). The DNN was employed as a classification engine whereas, the novel features extraction method was developed to extract meaningful features for the identification of hotspots and coldspots across the yeast genome. Unlike previous algorithms, the proposed feature extraction avoids bias among different selected features and preserved the sequence discriminant properties along with the sequence-structure information simultaneously. This study also considered other effective classifiers named support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to predict recombination spots. Experimental results on a benchmark dataset with 10-fold cross-validation showed that iRSpots-DNN achieved the highest accuracy, i.e., 95.81%. Additionally, the performance of the proposed iRSpots-DNN is significantly better than the existing predictors on a benchmark dataset. The relevant benchmark dataset and source code are freely available at: https://github.com/Fatima-Khan12/iRspot_DNN/tree/master/iRspot_DNN.
Collapse
Affiliation(s)
- Fatima Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mukhtaj Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Abbas Khan
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Dong-Qing Wei
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.,State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint Laboratory of International Cooperation in Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Ministry of Education, Shanghai, China.,Peng Cheng Laboratory, Shenzhen, China
| |
Collapse
|
37
|
Nguyen TTH, Nguyen PV, Tran QV, Vo NX, Vo TQ. Cancer classification from microarray data for genomic disorder research using optimal discriminant independent component analysis and kernel extreme learning machine. INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN BIOMEDICAL ENGINEERING 2020; 36:e3372. [PMID: 32453470 DOI: 10.1002/cnm.3372] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 05/08/2020] [Accepted: 05/13/2020] [Indexed: 06/11/2023]
Abstract
One of the challenging tasks in the medicinal field is genomic disorder investigation and its classification from the microarray dataset. The microarray dataset reorganization and its classification is more complex and expensive in the biomedical research area due to the larger number of features in the microarray dataset. In this paper, we construct a hybrid feature selection method such as t test, Fisher ration, and Bayesian logistic regression to select genes and that reduce the time cost. Based on the features, the top-ranked features are selected via the best hybrid rank method. Thereafter, the features are extracted using the modified firefly optimization-based discriminant independent component analysis (MF-DICA). Especially, the modified firefly optimization algorithm is capable of improving the search efficiency of DICA. From the high dimensional microarray dataset, MF-DICA is used to obtain the best features within the entire search space. The kernel extreme learning machine classifies the gene features depending upon the most relevant class. Experimentally, six datasets namely Leukemia dataset, Diffuse Larger B-cell Lymphomas, Lung cancer, Breast cancer, Prostate tumor, and Colon dataset are chosen to evaluate the performance of proposed approaches. Finally, the experimental data demonstrate that the proposed method is well suitable to classify the microarray data.
Collapse
Affiliation(s)
- Tram Thi Huyen Nguyen
- Department of Pharmacy, Ear - Nose - Throat Hospital in Ho Chi Minh city, Ho Chi Minh City, Vietnam
| | - Pol Van Nguyen
- Department of Economic and Administrative Pharmacy, Faculty of Pharmacy, Pham Ngoc Thach University of Medicine, Ho Chi Minh City, Vietnam
| | - Quang Vinh Tran
- Department of Economic and Administrative Pharmacy, Faculty of Pharmacy, Pham Ngoc Thach University of Medicine, Ho Chi Minh City, Vietnam
| | - Nam Xuan Vo
- Department of Economic and Administrative Pharmacy, Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| | - Trung Quang Vo
- Department of Economic and Administrative Pharmacy, Faculty of Pharmacy, Pham Ngoc Thach University of Medicine, Ho Chi Minh City, Vietnam
| |
Collapse
|
38
|
Wilentzik Müller R, Gat-Viks I. Exploring Neural Networks and Related Visualization Techniques in Gene Expression Data. Front Genet 2020; 11:402. [PMID: 32499810 PMCID: PMC7243731 DOI: 10.3389/fgene.2020.00402] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 03/30/2020] [Indexed: 12/04/2022] Open
Abstract
Over the past decade, neural networks have become one of the cutting-edge methods in various research fields, outshining specifically in complex classification problems. In this paper, we propose two main contributions: first, we conduct a methodological study of neural network modeling for classifying biological traits based on structured gene expression data. Then, we suggest an innovative approach for utilizing deep learning visualization techniques in order to reveal the specific genes important for the correct classification of each trait within the trained models. Our data suggests that this approach have great potential for becoming a standard feature importance tool used in complex medical research problems, and that it can further be generalized to various structured data classification problems outside the biological domain.
Collapse
Affiliation(s)
- Roni Wilentzik Müller
- School of Molecular Cell Biology & Biotechnology, Tel Aviv University, Tel Aviv, Israel
| | - Irit Gat-Viks
- School of Molecular Cell Biology & Biotechnology, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
39
|
A new incomplete pattern belief classification method with multiple estimations based on KNN. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106175] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
40
|
Menaga D, Revathi S. AN EMPIRICAL STUDY OF CANCER CLASSIFICATION TECHNIQUES BASED ON THE NEURAL NETWORKS. BIOMEDICAL ENGINEERING: APPLICATIONS, BASIS AND COMMUNICATIONS 2020. [DOI: 10.4015/s1016237220500131] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/09/2022]
Abstract
Cancer is one of the most common dreadful diseases prevailing worldwide, and patients with cancer are rescued only when the cancer is detected at a very early stage. Early detection of cancer is appropriate as in the fourth stage, but the chance of survival is limited. The symptoms of cancers are rigorous, and therefore, all the symptoms should be studied properly before the diagnosis. Thus, an automatic prediction system is necessary for classifying the tumor, i.e. malignant or benign tumor. Over the past few years, cancer classification is increased rapidly, but there is no general technique to find novel cancer classes (class discovery) or to assign tumors to known classes. Accordingly, this survey analyzes distinct cancer classification techniques. Thus, this review article provides a detailed review of 50 research papers presenting the suggested cancer classification techniques, like Deep learning-based techniques, Neural network-based techniques, and Hybrid techniques. Moreover, an elaborative analysis and discussion are made based on the year of publication, utilized datasets, accuracy range, evaluation metrics, implementation tool, and adopted classification methods. Eventually, the research gaps and issues of various cancer classification schemes are presented for extending the researchers towards a better future scope.
Collapse
Affiliation(s)
- D. Menaga
- B.S. Abdur Rahman Crescent Institute of Science and Technology, Seethakathi Estate G.S.T Main Road Vandalur, Chennai, Tamil Nadu 600048, India
| | - S. Revathi
- B.S. Abdur Rahman Crescent Institute of Science and Technology, Seethakathi Estate G.S.T Main Road Vandalur, Chennai, Tamil Nadu 600048, India
| |
Collapse
|
41
|
Abstract
Leukemia is a fatal disease that threatens the lives of many patients. Early detection can effectively improve its rate of remission. This paper proposes two automated classification models based on blood microscopic images to detect leukemia by employing transfer learning, rather than traditional approaches that have several disadvantages. In the first model, blood microscopic images are pre-processed; then, features are extracted by a pre-trained deep convolutional neural network named AlexNet, which makes classifications according to numerous well-known classifiers. In the second model, after pre-processing the images, AlexNet is fine-tuned for both feature extraction and classification. Experiments were conducted on a dataset consisting of 2820 images confirming that the second model performs better than the first because of 100% classification accuracy.
Collapse
|
42
|
Han X, Li D, Liu P, Wang L. Feature selection by recursive binary gravitational search algorithm optimization for cancer classification. Soft comput 2020. [DOI: 10.1007/s00500-019-04203-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
43
|
Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network. EVOLUTIONARY INTELLIGENCE 2020. [DOI: 10.1007/s12065-019-00346-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
44
|
A new optimal gene selection approach for cancer classification using enhanced Jaya-based forest optimization algorithm. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04355-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|
45
|
Microarray Filtering-Based Fuzzy C-Means Clustering and Classification in Genomic Signal Processing. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2019. [DOI: 10.1007/s13369-019-03945-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
46
|
Convolutional Neural Networks Approach for Solar Reconstruction in SCAO Configurations. SENSORS 2019; 19:s19102233. [PMID: 31091820 PMCID: PMC6567355 DOI: 10.3390/s19102233] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 05/06/2019] [Accepted: 05/08/2019] [Indexed: 12/03/2022]
Abstract
Correcting atmospheric turbulence effects in light with Adaptive Optics is necessary, since it produces aberrations in the wavefront of astronomical objects observed with telescopes from Earth. These corrections are performed classically with reconstruction algorithms; between them, neural networks showed good results. In the context of solar observation, the usage of Adaptive Optics on solar differs from nocturnal operations, bringing up a challenge to correct the image aberrations. In this work, a convolutional approach is given to address this issue, considering SCAO configurations. A reconstruction algorithm is presented, “Shack-Hartmann reconstruction with deep learning on solar–prototype” (proto-HELIOS), to correct on fixed solar images, achieving an average 85.39% of precision in the reconstruction. Additionally, results encourage to continue working with these techniques to achieve a reconstruction technique for all the regions of the sun.
Collapse
|
47
|
Yang Y, Hou M, Sun H, Zhang T, Weng F, Luo J. Neural network algorithm based on Legendre improved extreme learning machine for solving elliptic partial differential equations. Soft comput 2019. [DOI: 10.1007/s00500-019-03944-1] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
48
|
A survey of neural network-based cancer prediction models from microarray data. Artif Intell Med 2019; 97:204-214. [PMID: 30797633 DOI: 10.1016/j.artmed.2019.01.006] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 10/22/2018] [Accepted: 01/27/2019] [Indexed: 12/17/2022]
Abstract
Neural networks are powerful tools used widely for building cancer prediction models from microarray data. We review the most recently proposed models to highlight the roles of neural networks in predicting cancer from gene expression data. We identified articles published between 2013-2018 in scientific databases using keywords such as cancer classification, cancer analysis, cancer prediction, cancer clustering and microarray data. Analyzing the studies reveals that neural network methods have been either used for filtering (data engineering) the gene expressions in a prior step to prediction; predicting the existence of cancer, cancer type or the survivability risk; or for clustering unlabeled samples. This paper also discusses some practical issues that can be considered when building a neural network-based cancer prediction model. Results indicate that the functionality of the neural network determines its general architecture. However, the decision on the number of hidden layers, neurons, hypermeters and learning algorithm is made using trail-and-error techniques.
Collapse
|
49
|
Banjar H, Adelson D, Brown F, Chaudhri N. Intelligent Techniques Using Molecular Data Analysis in Leukaemia: An Opportunity for Personalized Medicine Support System. BIOMED RESEARCH INTERNATIONAL 2017; 2017:3587309. [PMID: 28812013 PMCID: PMC5547708 DOI: 10.1155/2017/3587309] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Revised: 06/12/2017] [Accepted: 06/15/2017] [Indexed: 12/05/2022]
Abstract
The use of intelligent techniques in medicine has brought a ray of hope in terms of treating leukaemia patients. Personalized treatment uses patient's genetic profile to select a mode of treatment. This process makes use of molecular technology and machine learning, to determine the most suitable approach to treating a leukaemia patient. Until now, no reviews have been published from a computational perspective concerning the development of personalized medicine intelligent techniques for leukaemia patients using molecular data analysis. This review studies the published empirical research on personalized medicine in leukaemia and synthesizes findings across studies related to intelligence techniques in leukaemia, with specific attention to particular categories of these studies to help identify opportunities for further research into personalized medicine support systems in chronic myeloid leukaemia. A systematic search was carried out to identify studies using intelligence techniques in leukaemia and to categorize these studies based on leukaemia type and also the task, data source, and purpose of the studies. Most studies used molecular data analysis for personalized medicine, but future advancement for leukaemia patients requires molecular models that use advanced machine-learning methods to automate decision-making in treatment management to deliver supportive medical information to the patient in clinical practice.
Collapse
Affiliation(s)
- Haneen Banjar
- School of Computer Science, University of Adelaide, Adelaide, SA, Australia
- Department of Computer Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - David Adelson
- School of Molecular and Biomedical Science, University of Adelaide, Adelaide, SA, Australia
| | - Fred Brown
- School of Computer Science, University of Adelaide, Adelaide, SA, Australia
| | - Naeem Chaudhri
- Oncology Centre, Section of Hematology, HSCT, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| |
Collapse
|