1
|
Wang J, Zhang Z, Wang Y. Utilizing Feature Selection Techniques for AI-Driven Tumor Subtype Classification: Enhancing Precision in Cancer Diagnostics. Biomolecules 2025; 15:81. [PMID: 39858475 PMCID: PMC11763904 DOI: 10.3390/biom15010081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2024] [Revised: 01/02/2025] [Accepted: 01/07/2025] [Indexed: 01/27/2025] Open
Abstract
Cancer's heterogeneity presents significant challenges in accurate diagnosis and effective treatment, including the complexity of identifying tumor subtypes and their diverse biological behaviors. This review examines how feature selection techniques address these challenges by improving the interpretability and performance of machine learning (ML) models in high-dimensional datasets. Feature selection methods-such as filter, wrapper, and embedded techniques-play a critical role in enhancing the precision of cancer diagnostics by identifying relevant biomarkers. The integration of multi-omics data and ML algorithms facilitates a more comprehensive understanding of tumor heterogeneity, advancing both diagnostics and personalized therapies. However, challenges such as ensuring data quality, mitigating overfitting, and addressing scalability remain critical limitations of these methods. Artificial intelligence (AI)-powered feature selection offers promising solutions to these issues by automating and refining the feature extraction process. This review highlights the transformative potential of these approaches while emphasizing future directions, including the incorporation of deep learning (DL) models and integrative multi-omics strategies for more robust and reproducible findings.
Collapse
Affiliation(s)
- Jihan Wang
- Yan’an Medical College of Yan’an University, Yan’an 716000, China
| | - Zhengxiang Zhang
- Yan’an Medical College of Yan’an University, Yan’an 716000, China
| | - Yangyang Wang
- School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
2
|
Khan Z, Ali A, Aldahmani S. Feature selection via robust weighted score for high dimensional binary class-imbalanced gene expression data. Heliyon 2024; 10:e38547. [PMID: 39398002 PMCID: PMC11471177 DOI: 10.1016/j.heliyon.2024.e38547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Revised: 09/24/2024] [Accepted: 09/25/2024] [Indexed: 10/15/2024] Open
Abstract
In this paper, a robust weighted score for unbalanced data (ROWSU) is proposed for selecting the most discriminative features for high dimensional gene expression binary classification with class-imbalance problem. The method addresses one of the most challenging problems of highly skewed class distributions in gene expression datasets that adversely affect the performance of classification algorithms. First, the training dataset is balanced by synthetically generating data points from minority class observations. Second, a minimum subset of genes is selected using a greedy search approach. Third, a novel weighted robust score, where the weights are computed by support vectors, is introduced to obtain a refined set of genes. The highest scoring genes based on this approach are combined with the minimum subset of genes selected by the greedy search approach to form the final set of genes. The novel method ensures the selection of the most discriminative genes, even in the presence of skewed class distribution, thereby improving the performance of the classifiers. The performance of the proposed ROWSU method is evaluated on 7 gene expression datasets. Classification accuracy, sensitivity and F1-score are used as performance metrics to compare the proposed ROWSU algorithm with several other state-of-the-art methods. Boxplots and stability plots are also constructed for a better understanding of the results. The results show that the proposed method outperforms the existing feature selection procedures based on classification performance from k nearest neighbors (kNN) and random forest (RF) classifiers.
Collapse
Affiliation(s)
- Zardad Khan
- Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Amjad Ali
- Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Saeed Aldahmani
- Department of Statistics and Business Analytics, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|
3
|
Li J, Xiang S, Song X. Screening Nonlinear miRNA Features of Breast Cancer by Using Ensemble Regularized Polynomial Logistic Regression. J Comput Biol 2024; 31:670-690. [PMID: 39017171 DOI: 10.1089/cmb.2023.0289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024] Open
Abstract
Differentiating breast cancer subtypes based on miRNA data helps doctors provide more personalized treatment plans for patients. This paper explored the interaction between miRNA pairs and developed a novel ensemble regularized polynomial logistic regression method for screening nonlinear features of breast cancer. Three different types of second-order polynomial logistic regression with elastic network penalty (SOPLR-EN) in which each type contains 10 identical models were integrated to determine the most suitable sample set for feature screening by using bootstrap sampling strategy. A single feature and 39 nonlinear features were obtained by screening features that appeared at least 15 times in 30 integrations and were involved in the classification of at least 4 subtypes. The second-order polynomial logistic regression with ridge penalty (SOPLR-R) built on screened feature set achieved 82.30% classification accuracy for distinguishing breast cancer subtypes, surpassing the performance of other six methods. Further, 11 nonlinear miRNA biomarkers were identified, and their significant relevance to breast cancer was illustrated through six types of biological analysis.
Collapse
Affiliation(s)
- Juntao Li
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, China
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, Xinxiang, China
| | - Shan Xiang
- College of Mathematics and Information Science, Henan Normal University, Xinxiang, China
- Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, Xinxiang, China
| | - Xuekun Song
- College of Information Technology, Henan University of Chinese Medicine, Zhengzhou, China
| |
Collapse
|
4
|
Tabrizi-Nezhadi P, MotieGhader H, Maleki M, Sahin S, Nematzadeh S, Torkamanian-Afshar M. Application of Protein-Protein Interaction Network Analysis in Order to Identify Cervical Cancer miRNA and mRNA Biomarkers. ScientificWorldJournal 2023; 2023:6626279. [PMID: 37746664 PMCID: PMC10513823 DOI: 10.1155/2023/6626279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/28/2023] [Accepted: 09/04/2023] [Indexed: 09/26/2023] Open
Abstract
Cervical cancer (CC) is one of the world's most common and severe cancers. This cancer includes two histological types: squamous cell carcinoma (SCC) and adenocarcinoma (ADC). The current study aims at identifying novel potential candidate mRNA and miRNA biomarkers for SCC based on a protein-protein interaction (PPI) and miRNA-mRNA network analysis. The current project utilized a transcriptome profile for normal and SCC samples. First, the PPI network was constructed for the 1335 DEGs, and then, a significant gene module was extracted from the PPI network. Next, a list of miRNAs targeting module's genes was collected from the experimentally validated databases, and a miRNA-mRNA regulatory network was formed. After network analysis, four driver genes were selected from the module's genes including MCM2, MCM10, POLA1, and TONSL and introduced as potential candidate biomarkers for SCC. In addition, two hub miRNAs, including miR-193b-3p and miR-615-3p, were selected from the miRNA-mRNA regulatory network and reported as possible candidate biomarkers. In summary, six potential candidate RNA-based biomarkers consist of four genes containing MCM2, MCM10, POLA1, and TONSL, and two miRNAs containing miR-193b-3p and miR-615-3p are opposed as potential candidate biomarkers for CC.
Collapse
Affiliation(s)
| | - Habib MotieGhader
- Department of Biology, Tabriz Branch, Islamic Azad University, Tabriz, Iran
- Department of Health Ecosystem, Medical Faculty, Nisantasi University, Istanbul, Turkey
| | - Masoud Maleki
- Department of Biology, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Soner Sahin
- Department of Health Ecosystem, Medical Faculty, Nisantasi University, Istanbul, Turkey
| | - Sajjad Nematzadeh
- Software Engineering Department, Engineering Faculty, Topkapi University, Istanbul, Turkey
| | - Mahsa Torkamanian-Afshar
- Department of Computer Engineering, Faculty of Engineering and Architecture, Nisantasi University, Istanbul, Turkey
| |
Collapse
|
5
|
Mohamed TIA, Ezugwu AE, Fonou-Dombeu JV, Ikotun AM, Mohammed M. A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data. Sci Rep 2023; 13:14644. [PMID: 37670037 PMCID: PMC10480180 DOI: 10.1038/s41598-023-41731-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Accepted: 08/30/2023] [Indexed: 09/07/2023] Open
Abstract
Breast cancer is considered one of the significant health challenges and ranks among the most prevalent and dangerous cancer types affecting women globally. Early breast cancer detection and diagnosis are crucial for effective treatment and personalized therapy. Early detection and diagnosis can help patients and physicians discover new treatment options, provide a more suitable quality of life, and ensure increased survival rates. Breast cancer detection using gene expression involves many complexities, such as the issue of dimensionality and the complicatedness of the gene expression data. This paper proposes a bio-inspired CNN model for breast cancer detection using gene expression data downloaded from the cancer genome atlas (TCGA). The data contains 1208 clinical samples of 19,948 genes with 113 normal and 1095 cancerous samples. In the proposed model, Array-Array Intensity Correlation (AAIC) is used at the pre-processing stage for outlier removal, followed by a normalization process to avoid biases in the expression measures. Filtration is used for gene reduction using a threshold value of 0.25. Thereafter the pre-processed gene expression dataset was converted into images which were later converted to grayscale to meet the requirements of the model. The model also uses a hybrid model of CNN architecture with a metaheuristic algorithm, namely the Ebola Optimization Search Algorithm (EOSA), to enhance the detection of breast cancer. The traditional CNN and five hybrid algorithms were compared with the classification result of the proposed model. The competing hybrid algorithms include the Whale Optimization Algorithm (WOA-CNN), the Genetic Algorithm (GA-CNN), the Satin Bowerbird Optimization (SBO-CNN), the Life Choice-Based Optimization (LCBO-CNN), and the Multi-Verse Optimizer (MVO-CNN). The results show that the proposed model determined the classes with high-performance measurements with an accuracy of 98.3%, a precision of 99%, a recall of 99%, an f1-score of 99%, a kappa of 90.3%, a specificity of 92.8%, and a sensitivity of 98.9% for the cancerous class. The results suggest that the proposed method has the potential to be a reliable and precise approach to breast cancer detection, which is crucial for early diagnosis and personalized therapy.
Collapse
Affiliation(s)
- Tehnan I A Mohamed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa.
| | - Absalom E Ezugwu
- Unit for Data Science and Computing, North-West University, Potchefstroom, South Africa.
| | - Jean Vincent Fonou-Dombeu
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Abiodun M Ikotun
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, King Edward Avenue, Pietermaritzburg Campus, Pietermaritzburg, 3201, KwaZulu-Natal, South Africa
| |
Collapse
|
6
|
Neagu AN, Whitham D, Bruno P, Morrissiey H, Darie CA, Darie CC. Omics-Based Investigations of Breast Cancer. Molecules 2023; 28:4768. [PMID: 37375323 DOI: 10.3390/molecules28124768] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/08/2023] [Accepted: 06/12/2023] [Indexed: 06/29/2023] Open
Abstract
Breast cancer (BC) is characterized by an extensive genotypic and phenotypic heterogeneity. In-depth investigations into the molecular bases of BC phenotypes, carcinogenesis, progression, and metastasis are necessary for accurate diagnoses, prognoses, and therapy assessments in predictive, precision, and personalized oncology. This review discusses both classic as well as several novel omics fields that are involved or should be used in modern BC investigations, which may be integrated as a holistic term, onco-breastomics. Rapid and recent advances in molecular profiling strategies and analytical techniques based on high-throughput sequencing and mass spectrometry (MS) development have generated large-scale multi-omics datasets, mainly emerging from the three "big omics", based on the central dogma of molecular biology: genomics, transcriptomics, and proteomics. Metabolomics-based approaches also reflect the dynamic response of BC cells to genetic modifications. Interactomics promotes a holistic view in BC research by constructing and characterizing protein-protein interaction (PPI) networks that provide a novel hypothesis for the pathophysiological processes involved in BC progression and subtyping. The emergence of new omics- and epiomics-based multidimensional approaches provide opportunities to gain insights into BC heterogeneity and its underlying mechanisms. The three main epiomics fields (epigenomics, epitranscriptomics, and epiproteomics) are focused on the epigenetic DNA changes, RNAs modifications, and posttranslational modifications (PTMs) affecting protein functions for an in-depth understanding of cancer cell proliferation, migration, and invasion. Novel omics fields, such as epichaperomics or epimetabolomics, could investigate the modifications in the interactome induced by stressors and provide PPI changes, as well as in metabolites, as drivers of BC-causing phenotypes. Over the last years, several proteomics-derived omics, such as matrisomics, exosomics, secretomics, kinomics, phosphoproteomics, or immunomics, provided valuable data for a deep understanding of dysregulated pathways in BC cells and their tumor microenvironment (TME) or tumor immune microenvironment (TIMW). Most of these omics datasets are still assessed individually using distinct approches and do not generate the desired and expected global-integrative knowledge with applications in clinical diagnostics. However, several hyphenated omics approaches, such as proteo-genomics, proteo-transcriptomics, and phosphoproteomics-exosomics are useful for the identification of putative BC biomarkers and therapeutic targets. To develop non-invasive diagnostic tests and to discover new biomarkers for BC, classic and novel omics-based strategies allow for significant advances in blood/plasma-based omics. Salivaomics, urinomics, and milkomics appear as integrative omics that may develop a high potential for early and non-invasive diagnoses in BC. Thus, the analysis of the tumor circulome is considered a novel frontier in liquid biopsy. Omics-based investigations have applications in BC modeling, as well as accurate BC classification and subtype characterization. The future in omics-based investigations of BC may be also focused on multi-omics single-cell analyses.
Collapse
Affiliation(s)
- Anca-Narcisa Neagu
- Laboratory of Animal Histology, Faculty of Biology, "Alexandru Ioan Cuza" University of Iasi, Carol I Bvd, No. 20A, 700505 Iasi, Romania
| | - Danielle Whitham
- Biochemistry & Proteomics Laboratories, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699, USA
| | - Pathea Bruno
- Biochemistry & Proteomics Laboratories, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699, USA
| | - Hailey Morrissiey
- Biochemistry & Proteomics Laboratories, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699, USA
| | - Celeste A Darie
- Biochemistry & Proteomics Laboratories, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699, USA
| | - Costel C Darie
- Biochemistry & Proteomics Laboratories, Department of Chemistry and Biomolecular Science, Clarkson University, 8 Clarkson Avenue, Potsdam, NY 13699, USA
| |
Collapse
|
7
|
Daneshvar NHN, Masoudi-Sobhanzadeh Y, Omidi Y. A voting-based machine learning approach for classifying biological and clinical datasets. BMC Bioinformatics 2023; 24:140. [PMID: 37041456 PMCID: PMC10088226 DOI: 10.1186/s12859-023-05274-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 04/05/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Different machine learning techniques have been proposed to classify a wide range of biological/clinical data. Given the practicability of these approaches accordingly, various software packages have been also designed and developed. However, the existing methods suffer from several limitations such as overfitting on a specific dataset, ignoring the feature selection concept in the preprocessing step, and losing their performance on large-size datasets. To tackle the mentioned restrictions, in this study, we introduced a machine learning framework consisting of two main steps. First, our previously suggested optimization algorithm (Trader) was extended to select a near-optimal subset of features/genes. Second, a voting-based framework was proposed to classify the biological/clinical data with high accuracy. To evaluate the efficiency of the proposed method, it was applied to 13 biological/clinical datasets, and the outcomes were comprehensively compared with the prior methods. RESULTS The results demonstrated that the Trader algorithm could select a near-optimal subset of features with a significant level of p-value < 0.01 relative to the compared algorithms. Additionally, on the large-sie datasets, the proposed machine learning framework improved prior studies by ~ 10% in terms of the mean values associated with fivefold cross-validation of accuracy, precision, recall, specificity, and F-measure. CONCLUSION Based on the obtained results, it can be concluded that a proper configuration of efficient algorithms and methods can increase the prediction power of machine learning approaches and help researchers in designing practical diagnosis health care systems and offering effective treatment plans.
Collapse
Affiliation(s)
| | - Yosef Masoudi-Sobhanzadeh
- Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran.
- Faculty of Advanced Medical Sciences, Tabriz University of Medical Sciences, Tabriz, Iran.
| | - Yadollah Omidi
- Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Florida, 33328, USA.
| |
Collapse
|
8
|
Cancer MiRNA biomarker classification based on Improved Generative Adversarial Network optimized with Mayfly Optimization Algorithm. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103545] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
9
|
Lin S, Lin Y, Wu K, Wang Y, Feng Z, Duan M, Liu S, Fan Y, Huang L, Zhou F. FeCO3, constructing the network biomarkers using the inter-feature correlation coefficients and its application in detecting high-order breast cancer biomarkers. Curr Bioinform 2022. [DOI: 10.2174/1574893617666220124123303] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Aims:
This study aims to formulate the inter-feature correlation as the engineered features.
Background:
Modern biotechnologies tend to generate a huge number of characteristics of a sample, while an OMIC dataset usually has a few dozens or hundreds of samples due to the high costs of generating the OMIC data. So many bio-OMIC studies assumed the inter-feature independence and selected a feature with a high phenotype-association.
Objective:
However, many features are closely associated with each other due to their physical or functional interactions, which may be utilized as a new view of features.
Method:
This study proposed a feature engineering algorithm based on the correlation coefficients (FeCO3) by utilizing the correlations between a given sample and a few reference samples. A comprehensive evaluation was carried out for the proposed FeCO3 network features using 24 bio-OMIC datasets.
Result:
The experimental data suggested that the newly calculated FeCO3 network features tended to achieve better classification performances than the original features, using the same popular feature selection and classification algorithms. The FeCO3 network features were also consistently supported by the literature. FeCO3 was utilized to investigate the high-order engineered biomarkers of breast cancer, and detected the PBX2 gene (Pre-B-Cell Leukemia Transcription Factor 2) as one of the candidate breast cancer biomarkers. Although the two methylated residues cg14851325 (Pvalue=8.06e-2) and cg16602460 (Pvalue=1.19e-1) within PBX2 did not have statistically significant association with breast cancers, the high-order inter-feature correlations showed a significant association with breast cancers.
Conclusion:
The proposed FeCO3 network features calculated the high-order inter-feature correlations as novel features, and may facilitate the investigations of complex diseases from this new perspective. The source code is available in FigShare at 10.6084/m9.figshare.13550051 or the web site http://www.healthinformaticslab.org/supp/ .
Collapse
Affiliation(s)
- Shenggeng Lin
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yuqi Lin
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Kexin Wu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yueying Wang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
- Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun, Jilin Province, China
| | - Zixuan Feng
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Meiyu Duan
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Shuai Liu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Yusi Fan
- College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Lan Huang
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| | - Fengfeng Zhou
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China
| |
Collapse
|
10
|
MotieGhader H, Safavi E, Rezapour A, Amoodizaj FF, Iranifam RA. Drug repurposing for coronavirus (SARS-CoV-2) based on gene co-expression network analysis. Sci Rep 2021; 11:21872. [PMID: 34750486 PMCID: PMC8576023 DOI: 10.1038/s41598-021-01410-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Accepted: 10/28/2021] [Indexed: 02/06/2023] Open
Abstract
Severe acute respiratory syndrome (SARS) is a highly contagious viral respiratory illness. This illness is spurred on by a coronavirus known as SARS-associated coronavirus (SARS-CoV). SARS was first detected in Asia in late February 2003. The genome of this virus is very similar to the SARS-CoV-2. Therefore, the study of SARS-CoV disease and the identification of effective drugs to treat this disease can be new clues for the treatment of SARS-Cov-2. This study aimed to discover novel potential drugs for SARS-CoV disease in order to treating SARS-Cov-2 disease based on a novel systems biology approach. To this end, gene co-expression network analysis was applied. First, the gene co-expression network was reconstructed for 1441 genes, and then two gene modules were discovered as significant modules. Next, a list of miRNAs and transcription factors that target gene co-expression modules' genes were gathered from the valid databases, and two sub-networks formed of transcription factors and miRNAs were established. Afterward, the list of the drugs targeting obtained sub-networks' genes was retrieved from the DGIDb database, and two drug-gene and drug-TF interaction networks were reconstructed. Finally, after conducting different network analyses, we proposed five drugs, including FLUOROURACIL, CISPLATIN, SIROLIMUS, CYCLOPHOSPHAMIDE, and METHYLDOPA, as candidate drugs for SARS-CoV-2 coronavirus treatment. Moreover, ten miRNAs including miR-193b, miR-192, miR-215, miR-34a, miR-16, miR-16, miR-92a, miR-30a, miR-7, and miR-26b were found to be significant miRNAs in treating SARS-CoV-2 coronavirus.
Collapse
Affiliation(s)
- Habib MotieGhader
- Department of Basic Sciences, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran.
- Department of Biology, Tabriz Branch, Islamic Azad University, Tabriz, Iran.
| | - Esmaeil Safavi
- Department of Basic Sciences, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran
- Department of Basic Sciences, Faculty of Veterinary Medicine, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Ali Rezapour
- Department of Animal Science, Faculty of Agriculture, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Fatemeh Firouzi Amoodizaj
- Department of Basic Sciences, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Roya Asl Iranifam
- Department of Basic Sciences, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| |
Collapse
|
11
|
Soleimani Zakeri NS, Pashazadeh S, MotieGhader H. Drug Repurposing for Alzheimer's Disease Based on Protein-Protein Interaction Network. BIOMED RESEARCH INTERNATIONAL 2021; 2021:1280237. [PMID: 34692825 PMCID: PMC8531773 DOI: 10.1155/2021/1280237] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 09/06/2021] [Accepted: 09/19/2021] [Indexed: 12/15/2022]
Abstract
Alzheimer's disease (AD) is known as a critical neurodegenerative disorder. It worsens as symptoms concerning dementia grow severe over the years. Due to the globalization of Alzheimer's disease, its prevention and treatment are vital. This study proposes a method to extract substantial gene complexes and then introduces potential drugs in Alzheimer's disease. To this end, a protein-protein interaction (PPI) network was utilized to extract five meaningful gene complexes functionally interconnected. An enrichment analysis to introduce the most important biological processes and pathways was accomplished on the obtained genes. The next step is extracting the drugs related to AD and introducing some new drugs which may be helpful for this disease. Finally, a complete network including all the genes associated with each gene complex group and genes' target drug was illustrated. For validating the proposed potential drugs, Connectivity Map (CMAP) analysis was accomplished to determine target genes that are up- or downregulated by proposed drugs. Medical studies and publications were analyzed thoroughly to introduce AD-related drugs. This analysis proves the accuracy of the proposed method in this study. Then, new drugs were introduced that can be experimentally examined as future work. Raloxifene and gentian violet are two new drugs, which have not been introduced as AD-related drugs in previous scientific and medical studies, recommended by the method of this study. Besides the primary goal, five bipartite networks representing the genes of each group and their target miRNAs were constructed to introduce target miRNAs.
Collapse
Affiliation(s)
- Negar Sadat Soleimani Zakeri
- Department of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Saeid Pashazadeh
- Department of Information Technology, Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran
| | - Habib MotieGhader
- Department of Computer Engineering, Gowgan Educational Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| |
Collapse
|
12
|
RIFS2D: A two-dimensional version of a randomly restarted incremental feature selection algorithm with an application for detecting low-ranked biomarkers. Comput Biol Med 2021; 133:104405. [PMID: 33930763 DOI: 10.1016/j.compbiomed.2021.104405] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/13/2021] [Accepted: 04/13/2021] [Indexed: 12/20/2022]
Abstract
The era of big data introduces both opportunities and challenges for biomedical researchers. One of the inherent difficulties in the biomedical research field is to recruit large cohorts of samples, while high-throughput biotechnologies may produce thousands or even millions of features for each sample. Researchers tend to evaluate the individual correlation of each feature with the class label and use the incremental feature selection (IFS) strategy to select the top-ranked features with the best prediction performance. Recent experimental data showed that a subset of continuously ranked features randomly restarted from a low-ranked feature (an RIFS block) may outperform the subset of top-ranked features. This study proposed a feature selection Algorithm RIFS2D by integrating multiple RIFS blocks. A comprehensive comparative experiment was conducted with the IFS, RIFS and existing feature selection algorithms and demonstrated that a subset of low-ranked features may also achieve promising prediction performance. This study suggested that a prediction model with promising performance may be trained by low-ranked features, even when top-ranked features did not achieve satisfying prediction performance. Further comparative experiments were conducted between RIFS2D and t-tests for the detection of early-stage breast cancer. The data showed that the RIFS2D-recommended features achieved better prediction accuracy and were targeted by more drugs than the t-test top-ranked features.
Collapse
|
13
|
Adhami M, Sadeghi B, Rezapour A, Haghdoost AA, MotieGhader H. Repurposing novel therapeutic candidate drugs for coronavirus disease-19 based on protein-protein interaction network analysis. BMC Biotechnol 2021; 21:22. [PMID: 33711981 PMCID: PMC7952507 DOI: 10.1186/s12896-021-00680-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2020] [Accepted: 02/24/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The coronavirus disease-19 (COVID-19) emerged in Wuhan, China and rapidly spread worldwide. Researchers are trying to find a way to treat this disease as soon as possible. The present study aimed to identify the genes involved in COVID-19 and find a new drug target therapy. Currently, there are no effective drugs targeting SARS-CoV-2, and meanwhile, drug discovery approaches are time-consuming and costly. To address this challenge, this study utilized a network-based drug repurposing strategy to rapidly identify potential drugs targeting SARS-CoV-2. To this end, seven potential drugs were proposed for COVID-19 treatment using protein-protein interaction (PPI) network analysis. First, 524 proteins in humans that have interaction with the SARS-CoV-2 virus were collected, and then the PPI network was reconstructed for these collected proteins. Next, the target miRNAs of the mentioned module genes were separately obtained from the miRWalk 2.0 database because of the important role of miRNAs in biological processes and were reported as an important clue for future analysis. Finally, the list of the drugs targeting module genes was obtained from the DGIDb database, and the drug-gene network was separately reconstructed for the obtained protein modules. RESULTS Based on the network analysis of the PPI network, seven clusters of proteins were specified as the complexes of proteins which are more associated with the SARS-CoV-2 virus. Moreover, seven therapeutic candidate drugs were identified to control gene regulation in COVID-19. PACLITAXEL, as the most potent therapeutic candidate drug and previously mentioned as a therapy for COVID-19, had four gene targets in two different modules. The other six candidate drugs, namely, BORTEZOMIB, CARBOPLATIN, CRIZOTINIB, CYTARABINE, DAUNORUBICIN, and VORINOSTAT, some of which were previously discovered to be efficient against COVID-19, had three gene targets in different modules. Eventually, CARBOPLATIN, CRIZOTINIB, and CYTARABINE drugs were found as novel potential drugs to be investigated as a therapy for COVID-19. CONCLUSIONS Our computational strategy for predicting repurposable candidate drugs against COVID-19 provides efficacious and rapid results for therapeutic purposes. However, further experimental analysis and testing such as clinical applicability, toxicity, and experimental validations are required to reach a more accurate and improved treatment. Our proposed complexes of proteins and associated miRNAs, along with discovered candidate drugs might be a starting point for further analysis by other researchers in this urgency of the COVID-19 pandemic.
Collapse
Affiliation(s)
- Masoumeh Adhami
- Pathology and Stem Cell Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Balal Sadeghi
- Food Hygiene and Public Health Department, Faculty of Veterinary Medicine, Shahid Bahonar University of Kerman, Kerman, Iran
| | - Ali Rezapour
- Department of Agriculture, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Ali Akbar Haghdoost
- Modeling in Health Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, Iran
| | - Habib MotieGhader
- Department of Basic sciences, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran.
- Department of Computer Engineering, Gowgan Educational Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran.
| |
Collapse
|
14
|
Masoudi-Sobhanzadeh Y, Motieghader H, Omidi Y, Masoudi-Nejad A. A machine learning method based on the genetic and world competitive contests algorithms for selecting genes or features in biological applications. Sci Rep 2021; 11:3349. [PMID: 33558580 PMCID: PMC7870651 DOI: 10.1038/s41598-021-82796-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Accepted: 01/25/2021] [Indexed: 01/30/2023] Open
Abstract
Gene/feature selection is an essential preprocessing step for creating models using machine learning techniques. It also plays a critical role in different biological applications such as the identification of biomarkers. Although many feature/gene selection algorithms and methods have been introduced, they may suffer from problems such as parameter tuning or low level of performance. To tackle such limitations, in this study, a universal wrapper approach is introduced based on our introduced optimization algorithm and the genetic algorithm (GA). In the proposed approach, candidate solutions have variable lengths, and a support vector machine scores them. To show the usefulness of the method, thirteen classification and regression-based datasets with different properties were chosen from various biological scopes, including drug discovery, cancer diagnostics, clinical applications, etc. Our findings confirmed that the proposed method outperforms most of the other currently used approaches and can also free the users from difficulties related to the tuning of various parameters. As a result, users may optimize their biological applications such as obtaining a biomarker diagnostic kit with the minimum number of genes and maximum separability power.
Collapse
Affiliation(s)
- Yosef Masoudi-Sobhanzadeh
- grid.412888.f0000 0001 2174 8913Research Center for Pharmaceutical Nanotechnology, Biomedicine Institute, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Habib Motieghader
- grid.459617.80000 0004 0494 2783Department of Bioinformatics, Biotechnology Research Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran ,grid.459617.80000 0004 0494 2783Department of Basic Sciences, Gowgan Educational Center, Tabriz Branch, Islamic Azad University, Tabriz, Iran
| | - Yadollah Omidi
- grid.261241.20000 0001 2168 8324Department of Pharmaceutical Sciences, College of Pharmacy, Nova Southeastern University, Fort Lauderdale, Florida, 33328 USA
| | - Ali Masoudi-Nejad
- grid.46072.370000 0004 0612 7950Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
15
|
Hamraz M, Gul N, Raza M, Khan DM, Khalil U, Zubair S, Khan Z. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Comput Sci 2021; 7:e562. [PMID: 34141889 PMCID: PMC8176540 DOI: 10.7717/peerj-cs.562] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/04/2021] [Indexed: 05/10/2023]
Abstract
In this paper, a novel feature selection method called Robust Proportional Overlapping Score (RPOS), for microarray gene expression datasets has been proposed, by utilizing the robust measure of dispersion, i.e., Median Absolute Deviation (MAD). This method robustly identifies the most discriminative genes by considering the overlapping scores of the gene expression values for binary class problems. Genes with a high degree of overlap between classes are discarded and the ones that discriminate between the classes are selected. The results of the proposed method are compared with five state-of-the-art gene selection methods based on classification error, Brier score, and sensitivity, by considering eleven gene expression datasets. Classification of observations for different sets of selected genes by the proposed method is carried out by three different classifiers, i.e., random forest, k-nearest neighbors (k-NN), and support vector machine (SVM). Box-plots and stability scores of the results are also shown in this paper. The results reveal that in most of the cases the proposed method outperforms the other methods.
Collapse
Affiliation(s)
- Muhammad Hamraz
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Naz Gul
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mushtaq Raza
- Department of Computer Sciences, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Umair Khalil
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Seema Zubair
- Department of Mathematics, Statistics and Computer Science, University of Agriculture Peshawar, Peshawar, Pakistan
| | - Zardad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, Pakistan
| |
Collapse
|
16
|
World competitive contest-based artificial neural network: A new class-specific method for classification of clinical and biological datasets. Genomics 2020; 113:541-552. [PMID: 32991962 PMCID: PMC7521912 DOI: 10.1016/j.ygeno.2020.09.047] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/05/2020] [Accepted: 09/22/2020] [Indexed: 12/26/2022]
Abstract
Many data mining methods have been proposed to generate computer-aided diagnostic systems, which may determine diseases in their early stages by categorizing the data into some proper classes. Considering the importance of the existence of a suitable classifier, the present study aims to introduce an efficient approach based on the World Competitive Contests (WCC) algorithm as well as a multi-layer perceptron artificial neural network (ANN). Unlike the previously introduced methods, which each has developed a universal model for all different kinds of data classes, our proposed approach generates a single specific model for each individual class of data. The experimental results show that the proposed method (ANNWCC), which can be applied to both the balanced and unbalanced datasets, yields more than 76% (without applying feature selection methods) and 90% (with applying feature selection methods) of the average five-fold cross-validation accuracy on the 13 clinical and biological datasets. The findings also indicate that under different conditions, our proposed method can produce better results in comparison to some state-of-art meta-heuristic algorithms and methods in terms of various statistical and classification measurements. To classify the clinical and biological data, a multi-layer ANN and the WCC algorithm were combined. It was shown that developing a specific model for each individual class of data may yield better results compared with creating a universal model for all of the existing data classes. Besides, some efficient algorithms proved to be essential to generate acceptable biological results, and the methods' performance was found to be enhanced by fuzzifying or normalizing the biological data. We combined multi-layer artificial neural networks and world competitive contests algorithms to classify biological datasets The proposed method has been investigated on 13 clinical datasets with different properties Efficient models may yield better classification models and health diagnostic systems Feature selection methods can improve the performance of a model in separating case and control samples
Collapse
|