Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dashtban M, Balafar M, Suravajhala P. Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics 2017;110:10-17. [PMID: 28780377 DOI: 10.1016/j.ygeno.2017.07.010] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2017] [Revised: 07/12/2017] [Accepted: 07/30/2017] [Indexed: 12/21/2022]

For:	Dashtban M, Balafar M, Suravajhala P. Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics 2017;110:10-17. [PMID: 28780377 DOI: 10.1016/j.ygeno.2017.07.010] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2017] [Revised: 07/12/2017] [Accepted: 07/30/2017] [Indexed: 12/21/2022]

Number

Cited by Other Article(s)

Li M, Cao R, Zhao Y, Li Y, Deng S. Population characteristic exploitation-based multi-orientation multi-objective gene selection for microarray data classification. Comput Biol Med 2024;170:108089. [PMID: 38330824 DOI: 10.1016/j.compbiomed.2024.108089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 01/23/2024] [Accepted: 01/27/2024] [Indexed: 02/10/2024]

Abstract

Gene selection is a process of selecting discriminative genes from microarray data that helps to diagnose and classify cancer samples effectively. Swarm intelligence evolution-based gene selection algorithms can never circumvent the problem that the population is prone to local optima in the process of gene selection. To tackle this challenge, previous research has focused primarily on two aspects: mitigating premature convergence to local optima and escaping from local optima. In contrast to these strategies, this paper introduces a novel perspective by adopting reverse thinking, where the issue of local optima is seen as an opportunity rather than an obstacle. Building on this foundation, we propose MOMOGS-PCE, a novel gene selection approach that effectively exploits the advantageous characteristics of populations trapped in local optima to uncover global optimal solutions. Specifically, MOMOGS-PCE employs a novel population initialization strategy, which involves the initialization of multiple populations that explore diverse orientations to foster distinct population characteristics. The subsequent step involved the utilization of an enhanced NSGA-II algorithm to amplify the advantageous characteristics exhibited by the population. Finally, a novel exchange strategy is proposed to facilitate the transfer of characteristics between populations that have reached near maturity in evolution, thereby promoting further population evolution and enhancing the search for more optimal gene subsets. The experimental results demonstrated that MOMOGS-PCE exhibited significant advantages in comprehensive indicators compared with six competitive multi-objective gene selection algorithms. It is confirmed that the "reverse-thinking" approach not only avoids local optima but also leverages it to uncover superior gene subsets for cancer diagnosis.

Collapse

Nekouie N, Romoozi M, Esmaeili M. A New Evolutionary Ensemble Learning of Multimodal Feature Selection from Microarray Data. Neural Process Lett 2023. [DOI: 10.1007/s11063-023-11159-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]

Elitist random swapped particle swarm optimization embedded with variable k-nearest neighbour classification: a new PSO variant applied to gene identification. Soft comput 2022. [DOI: 10.1007/s00500-022-07515-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]

Vahmiyan M, Kheirabadi M, Akbari E. Feature selection methods in microarray gene expression data: a systematic mapping study. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2022]

Quantitative Detection of Gastrointestinal Tumor Markers Using a Machine Learning Algorithm and Multicolor Quantum Dot Biosensor. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:9022821. [PMID: 36093502 PMCID: PMC9458379 DOI: 10.1155/2022/9022821] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 07/27/2022] [Accepted: 08/02/2022] [Indexed: 11/17/2022]

Abstract

This work was to explore the application value of gastrointestinal tumor markers based on gene feature selection model of principal component analysis (PCA) algorithm and multicolor quantum dots (QDs) immunobiosensor in the detection of gastrointestinal tumors. Based on the PCA method, the neighborhood rough set algorithm was introduced to improve it, and the tumor gene feature selection model (OPCA) was established to analyze its classification accuracy and accuracy. Four kinds of coupled biosensors were fabricated based on QDs, namely, 525 nm Cd Se/Zn S QDs-carbohydrate antigen 125 (QDs525-CA125 McAb), 605 nm Cd Se/Zn S QDs-cancer antigen 19-9 (QDs605-CA19-9 McAb), 645 nm Cd Se/Zn S QDs-anticancer embryonic antigen (QDs 645-CEA McAb), and 565 nm Cd Se/Zn S QDs-anti-alpha-fetoprotein (QDs565-AFP McAb). The quantum dot-antibody conjugates were identified and quantified by fluorescence spectroscopy and ultraviolet absorption spectroscopy. The results showed that the classification precision of OPCA model in colon tumor and gastric cancer datasets was 99.52% and 99.03%, respectively, and the classification accuracy was 94.86% and 94.2%, respectively, which were significantly higher than those of other algorithms. The fluorescence values of AFP McAb, CEA McAb, CA19-9 McAb, and CA125 McAb reached the maximum when the conjugation concentrations were 25 µg/mL, 20 µg/mL, 30 µg/mL, and 30 µg/m, respectively. The highest recovery rate of AFP was 98.51%, and its fluorescence intensity was 35.78 ± 2.99, which was significantly higher than that of other antigens (P < 0.001). In summary, the OPCA model based on PCA algorithm can obtain fewer feature gene sets and improve the accuracy of sample classification. Intelligent immunobiosensors based on machine learning algorithms and QDs have potential application value in gastrointestinal gene feature selection and tumor marker detection, which provides a new idea for clinical diagnosis of gastrointestinal tumors.

Collapse

Efficient Diagnosis of Autism with Optimized Machine Learning Models: An Experimental Analysis on Genetic and Personal Characteristic Datasets. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12083812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Abstract Early diagnosis of autism is extremely beneficial for patients. Traditional diagnosis approaches have been unable to diagnose autism in a fast and accurate way; rather, there are multiple factors that can be related to identifying the autism disorder. The gene expression (GE) of individuals may be one of these factors, in addition to personal and behavioral characteristics (PBC). Machine learning (ML) based on PBC and GE data analytics emphasizes the need to develop accurate prediction models. The quality of prediction relies on the accuracy of the ML model. To improve the accuracy of prediction, optimized feature selection algorithms are applied to solve the high dimensionality problem of the datasets used. Comparing different optimized feature selection methods using bio-inspired algorithms over different types of data can allow for the most accurate model to be identified. Therefore, in this paper, we investigated enhancing the classification process of autism spectrum disorder using 16 proposed optimized ML models (GWO-NB, GWO-SVM, GWO-KNN, GWO-DT, FPA-NB, FPA-KNN, FPA-SVM, FPA-DT, BA-NB, BA-SVM, BA-KNN, BA-DT, ABC-NB, ABC-SVM, ABV-KNN, and ABC-DT). Four bio-inspired algorithms namely, Gray Wolf Optimization (GWO), Flower Pollination Algorithm (FPA), Bat Algorithms (BA), and Artificial Bee Colony (ABC), were employed for optimizing the wrapper feature selection method in order to select the most informative features and to increase the accuracy of the classification models. Five evaluation metrics were used to evaluate the performance of the proposed models: accuracy, F1 score, precision, recall, and area under the curve (AUC). The obtained results demonstrated that the proposed models achieved a good performance as expected, with accuracies of 99.66% and 99.34% obtained by the GWO-SVM model on the PBC and GE datasets, respectively. Collapse

Adaptive feature selection framework for DNA methylation-based age prediction. Soft comput 2022. [DOI: 10.1007/s00500-022-06844-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Yan C, Li M, Ma J, Liao Y, Luo H, Wang J, Luo J. A Novel Feature Selection Method Based on MRMR and Enhanced Flower Pollination Algorithm for High Dimensional Biomedical Data. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210624130124] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Abstract Background: The massive amount of biomedical data accumulated in the past decades can be utilized for diagnosing disease. Objective: However, the high dimensionality, small sample sizes, and irrelevant features of data often have a negative influence on the accuracy and speed of disease prediction. Some existing machine learning models cannot capture the patterns on these datasets accurately without utilizing feature selection. Methods: Filter and wrapper are two prevailing feature selection methods. The filter method is fast but has low prediction accuracy, while the latter can obtain high accuracy but has a formidable computation cost. Given the drawbacks of using filter or wrapper individually, a novel feature selection method, called MRMR-EFPATS, is proposed, which hybridizes filter method Minimum Redundancy Maximum Relevance (MRMR) and wrapper method based on an improved Flower Pollination Algorithm (FPA). First, MRMR is employed to rank and screen out some important features quickly. These features are further chosen for individual populations following the wrapper method for faster convergence and less computational time. Then, due to its efficiency and flexibility, FPA is adopted to further discover an optimal feature subset. Result: FPA still has some drawbacks, such as slow convergence rate, inadequacy in terms of searching new solutions, and tends to be trapped in local optima. In our work, an elite strategy is adopted to improve the convergence speed of the FPA. Tabu search and Adaptive Gaussian Mutation are employed to improve the search capability of FPA and escape from local optima. Here, the KNN classifier with the 5-fold-CV is utilized to evaluate the classification accuracy. Conclusion: Extensive experimental results on six public high dimensional biomedical datasets show that the proposed MRMR-EFPATS has achieved superior performance compared to other state-of-theart methods. Collapse

Multi-objective feature selection based on quasi-oppositional based Jaya algorithm for microarray data. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107804] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

Elastic Correlation Adjusted Regression (ECAR) scores for high dimensional variable importance measuring. Sci Rep 2021;11:23354. [PMID: 34857823 PMCID: PMC8640025 DOI: 10.1038/s41598-021-02706-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 11/22/2021] [Indexed: 11/08/2022] Open

Comparison Analysis of Gene Expression Profiles Proximity Metrics. Symmetry (Basel) 2021. [DOI: 10.3390/sym13101812] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Abstract The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles’ proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes’ and/or proteins’ interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. Within the framework of our research, we have investigated the following metrics: Mutual information maximization (MIM) using various methods of Shannon entropy calculation, Pearson’s χ2 test and correlation distance. The accuracy of the investigated samples classification was used as the main quality criterion to evaluate the appropriate metric effectiveness. The random forest classifier (RF) was used during the simulation process. The research results have shown that results of the use of various methods of Shannon entropy within the framework of the MIM metric disagree with each other. As a result, we have proposed the modified mutual information maximization (MMIM) proximity metric based on the joint use of various methods of Shannon entropy calculation and the Harrington desirability function. The results of the simulation have also shown that the correlation proximity metric is less effective in comparison to both the MMIM metric and Pearson’s χ2 test. Finally, we propose the hybrid proximity metric (HPM) that considers both the MMIM metric and Pearson’s χ2 test. The proposed metric was investigated within the framework of one-cluster structure effectiveness evaluation. To our mind, the main benefit of the proposed HPM is in increasing the objectivity of mutually similar gene expression profiles extraction due to the joint use of the various effective proximity metrics that can contradict with each other when they are used alone. Collapse

A novel bio-inspired hybrid multi-filter wrapper gene selection method with ensemble classifier for microarray data. Neural Comput Appl 2021;35:11531-11561. [PMID: 34539088 PMCID: PMC8435304 DOI: 10.1007/s00521-021-06459-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Accepted: 08/26/2021] [Indexed: 01/04/2023]

Abstract

Microarray technology is known as one of the most important tools for collecting DNA expression data. This technology allows researchers to investigate and examine types of diseases and their origins. However, microarray data are often associated with a small sample size, a significant number of genes, imbalanced data, etc., making classification models inefficient. Thus, a new hybrid solution based on a multi-filter and adaptive chaotic multi-objective forest optimization algorithm (AC-MOFOA) is presented to solve the gene selection problem and construct the Ensemble Classifier. In the proposed solution, a multi-filter model (i.e., ensemble filter) is proposed as preprocessing step to reduce the dataset's dimensions, using a combination of five filter methods to remove redundant and irrelevant genes. Accordingly, the results of the five filter methods are combined using a voting-based function. Additionally, the results of the proposed multi-filter indicate that it has good capability in reducing the gene subset size and selecting relevant genes. Then, an AC-MOFOA based on the concepts of non-dominated sorting, crowding distance, chaos theory, and adaptive operators is presented. AC-MOFOA as a wrapper method aimed at reducing dataset dimensions, optimizing KELM, and increasing the accuracy of the classification, simultaneously. Next, in this method, an ensemble classifier model is presented using AC-MOFOA results to classify microarray data. The performance of the proposed algorithm was evaluated on nine public microarray datasets, and its results were compared in terms of the number of selected genes, classification efficiency, execution time, time complexity, hypervolume indicator, and spacing metric with five hybrid multi-objective methods, and three hybrid single-objective methods. According to the results, the proposed hybrid method could increase the accuracy of the KELM in most datasets by reducing the dataset's dimensions and achieve similar or superior performance compared to other multi-objective methods. Furthermore, the proposed Ensemble Classifier model could provide better classification accuracy and generalizability in the seven of nine microarray datasets compared to conventional ensemble methods. Moreover, the comparison results of the Ensemble Classifier model with three state-of-the-art ensemble generation methods indicate its competitive performance in which the proposed ensemble model achieved better results in the five of nine datasets.

Collapse

Gumaei A, Sammouda R, Al-Rakhami M, AlSalman H, El-Zaart A. Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression. Health Informatics J 2021;27:1460458221989402. [PMID: 33570011 DOI: 10.1177/1460458221989402] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Kaneko H. Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables. Heliyon 2021;7:e07356. [PMID: 34195450 PMCID: PMC8237311 DOI: 10.1016/j.heliyon.2021.e07356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 05/02/2021] [Accepted: 06/16/2021] [Indexed: 11/24/2022] Open

Dashtban M, Li W. Predicting non-attendance in hospital outpatient appointments using deep learning approach. Health Syst (Basingstoke) 2021;11:189-210. [PMID: 36147556 PMCID: PMC9487947 DOI: 10.1080/20476965.2021.1924085] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Pashaei E, Pashaei E. Gene selection using hybrid dragonfly black hole algorithm: A case study on RNA-seq COVID-19 data. Anal Biochem 2021;627:114242. [PMID: 33974890 DOI: 10.1016/j.ab.2021.114242] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 04/12/2021] [Accepted: 05/02/2021] [Indexed: 11/18/2022]

Mirsadeghi L, Haji Hosseini R, Banaei-Moghaddam AM, Kavousi K. EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer. BMC Med Genomics 2021;14:122. [PMID: 33962648 PMCID: PMC8105935 DOI: 10.1186/s12920-021-00974-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 04/27/2021] [Indexed: 12/27/2022] Open

Abstract

BACKGROUND

Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.

METHODS

In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI.

RESULTS

This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case.

CONCLUSIONS

This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract.

Collapse

Zhang G, Xue Z, Yan C, Wang J, Luo H. A Novel Biomarker Identification Approach for Gastric Cancer Using Gene Expression and DNA Methylation Dataset. Front Genet 2021;12:644378. [PMID: 33868380 PMCID: PMC8044773 DOI: 10.3389/fgene.2021.644378] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 02/16/2021] [Indexed: 01/09/2023] Open

Hameed SS, Hassan WH, Latiff LA, Muhammadsharif FF. A comparative study of nature-inspired metaheuristic algorithms using a three-phase hybrid approach for gene selection and classification in high-dimensional cancer datasets. Soft comput 2021. [DOI: 10.1007/s00500-021-05726-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Li H, Ding L, Hong X, Chen Y, Liao R, Wang T, Meng S, Jiang Z, Liu D. Integrative genomic expression analysis reveals stable differences between lung cancer and systemic sclerosis. BMC Cancer 2021;21:259. [PMID: 33691643 PMCID: PMC7944918 DOI: 10.1186/s12885-021-07959-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2021] [Accepted: 02/23/2021] [Indexed: 12/09/2022] Open

Abstract

BACKGROUND

The incidence and mortality of lung cancer are the highest among all cancers. Patients with systemic sclerosis show a four-fold greater risk of lung cancer than the general population. However, the underlying mechanism remains poorly understood.

METHODS

The expression profiles of 355 peripheral blood samples were integratedly analyzed, including 70 cases of lung cancer, 61 cases of systemic sclerosis, and 224 healthy controls. After data normalization and cleaning, differentially expressed genes (DEGs) between disease and control were obtained and deeply analyzed by bioinformatics methods. The gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis were performed online by DAVID and KOBAS. The protein-protein interaction (PPI) networks were constructed from the STRING database.

RESULTS

From a total of 14,191 human genes, 299 and 1644 genes were identified as DEGs in systemic sclerosis and lung cancer, respectively. Among them, 64 DEGs were overlapping, including 36 co-upregulated, 10 co-downregulated, and 18 counter-regulated DEGs. Functional and enrichment analysis showed that the two diseases had common changes in immune-related genes. The expression of innate immune response and response to virus-related genes increased significantly, while the expression of negative regulation of cell cycle-related genes decreased notably. In contrast, the expression of mitophagy regulation, chromatin binding and fatty acid metabolism-related genes showed distinct trends.

CONCLUSIONS

Stable differences and similarities between systemic sclerosis and lung cancer were revealed. In peripheral blood, enhanced innate immunity and weakened negative regulation of cell cycle may be the common mechanisms of the two diseases, which may be associated with the high risk of lung cancer in systemic sclerosis patients. On the other hand, the counter-regulated DEGs can be used as novelbiomarkers of pulmonary diseases. In addition, fat metabolism-related DEGs were consideredto be associated with clinical blood lipid data.

Collapse

Lai CM, Huang HP. A gene selection algorithm using simplified swarm optimization with multi-filter ensemble technique. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106994] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Alharthi AM, Lee MH, Algamal ZY. Gene selection and classification of microarray gene expression data based on a new adaptive L1-norm elastic net penalty. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100622] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Hamraz M, Gul N, Raza M, Khan DM, Khalil U, Zubair S, Khan Z. Robust proportional overlapping analysis for feature selection in binary classification within functional genomic experiments. PeerJ Comput Sci 2021;7:e562. [PMID: 34141889 PMCID: PMC8176540 DOI: 10.7717/peerj-cs.562] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 05/04/2021] [Indexed: 05/10/2023]

Mahendran N, Durai Raj Vincent PM, Srinivasan K, Chang CY. Machine Learning Based Computational Gene Selection Models: A Survey, Performance Evaluation, Open Issues, and Future Research Directions. Front Genet 2020;11:603808. [PMID: 33362861 PMCID: PMC7758324 DOI: 10.3389/fgene.2020.603808] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Accepted: 10/29/2020] [Indexed: 12/20/2022] Open

García-Mendoza CV, Gambino OJ, Villarreal-Cervantes MG, Calvo H. Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus. ENTROPY 2020;22:e22091020. [PMID: 33286789 PMCID: PMC7597113 DOI: 10.3390/e22091020] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Revised: 09/10/2020] [Accepted: 09/10/2020] [Indexed: 11/16/2022]

A survey on single and multi omics data mining methods in cancer data classification. J Biomed Inform 2020;107:103466. [DOI: 10.1016/j.jbi.2020.103466] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 05/01/2020] [Accepted: 05/31/2020] [Indexed: 01/09/2023]

Akramifard H, Balafar M, Razavi S, Ramli AR. Emphasis Learning, Features Repetition in Width Instead of Length to Improve Classification Performance: Case Study-Alzheimer's Disease Diagnosis. SENSORS (BASEL, SWITZERLAND) 2020;20:E941. [PMID: 32050715 PMCID: PMC7039233 DOI: 10.3390/s20030941] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 10/28/2019] [Accepted: 10/28/2019] [Indexed: 01/21/2023]

Abstract

In the past decade, many studies have been conducted to advance computer-aided systems for Alzheimer's disease (AD) diagnosis. Most of them have recently developed systems concentrated on extracting and combining features from MRI, PET, and CSF. For the most part, they have obtained very high performance. However, improving the performance of a classification problem is complicated, specifically when the model's accuracy or other performance measurements are higher than 90%. In this study, a novel methodology is proposed to address this problem, specifically in Alzheimer's disease diagnosis classification. This methodology is the first of its kind in the literature, based on the notion of replication on the feature space instead of the traditional sample space. Briefly, the main steps of the proposed method include extracting, embedding, and exploring the best subset of features. For feature extraction, we adopt VBM-SPM; for embedding features, a concatenation strategy is used on the features to ultimately create one feature vector for each subject. Principal component analysis is applied to extract new features, forming a low-dimensional compact space. A novel process is applied by replicating selected components, assessing the classification model, and repeating the replication until performance divergence or convergence. The proposed method aims to explore most significant features and highest-preforming model at the same time, to classify normal subjects from AD and mild cognitive impairment (MCI) patients. In each epoch, a small subset of candidate features is assessed by support vector machine (SVM) classifier. This repeating procedure is continued until the highest performance is achieved. Experimental results reveal the highest performance reported in the literature for this specific classification problem. We obtained a model with accuracies of 98.81%, 81.61%, and 81.40% for AD vs. normal control (NC), MCI vs. NC, and AD vs. MCI classification, respectively.

Collapse

Al-Betar MA, Alomari OA, Abu-Romman SM. A TRIZ-inspired bat algorithm for gene selection in cancer classification. Genomics 2020;112:114-126. [DOI: 10.1016/j.ygeno.2019.09.015] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Revised: 09/05/2019] [Accepted: 09/17/2019] [Indexed: 10/25/2022]

MapReduce-Based Parallel Genetic Algorithm for CpG-Site Selection in Age Prediction. Genes (Basel) 2019;10:genes10120969. [PMID: 31775313 PMCID: PMC6947642 DOI: 10.3390/genes10120969] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2019] [Revised: 11/12/2019] [Accepted: 11/15/2019] [Indexed: 11/23/2022] Open

Abstract

Genomic biomarkers such as DNA methylation (DNAm) are employed for age prediction. In recent years, several studies have suggested the association between changes in DNAm and its effect on human age. The high dimensional nature of this type of data significantly increases the execution time of modeling algorithms. To mitigate this problem, we propose a two-stage parallel algorithm for selection of age related CpG-sites. The algorithm first attempts to cluster the data into similar age ranges. In the next stage, a parallel genetic algorithm (PGA), based on the MapReduce paradigm (MR-based PGA), is used for selecting age-related features of each individual age range. In the proposed method, the execution of the algorithm for each age range (data parallel), the evaluation of chromosomes (task parallel) and the calculation of the fitness function (data parallel) are performed using a novel parallel framework. In this paper, we consider 16 different healthy DNAm datasets that are related to the human blood tissue and that contain the relevant age information. These datasets are combined into a single unioned set, which is in turn randomly divided into two sets of train and test data with a ratio of 7:3, respectively. We build a Gradient Boosting Regressor (GBR) model on the selected CpG-sites from the train set. To evaluate the model accuracy, we compared our results with state-of-the-art approaches that used these datasets, and observed that our method performs better on the unseen test dataset with a Mean Absolute Deviation (MAD) of 3.62 years, and a correlation (R²) of 95.96% between age and DNAm. In the train data, the MAD and R² are 1.27 years and 99.27%, respectively. Finally, we evaluate our method in terms of the effect of parallelization in computation time. The algorithm without parallelization requires 4123 min to complete, whereas the parallelized execution on 3 computing machines having 32 processing cores each, only takes a total of 58 min. This shows that our proposed algorithm is both efficient and scalable.

Collapse

Bir-Jmel A, Douiri SM, Elbernoussi S. Gene Selection via a New Hybrid Ant Colony Optimization Algorithm for Cancer Classification in High-Dimensional Data. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019;2019:7828590. [PMID: 31737086 PMCID: PMC6815598 DOI: 10.1155/2019/7828590] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2019] [Revised: 08/14/2019] [Accepted: 09/09/2019] [Indexed: 11/18/2022]

Sharma A, Rani R. C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019;178:219-235. [PMID: 31416551 DOI: 10.1016/j.cmpb.2019.06.029] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2019] [Revised: 06/24/2019] [Accepted: 06/27/2019] [Indexed: 05/21/2023]