Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Piao Y, Piao M, Park K, Ryu KH. An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 2012;28:3306-15. [PMID: 23060613 DOI: 10.1093/bioinformatics/bts602] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Piao Y, Piao M, Park K, Ryu KH. An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data. Bioinformatics 2012;28:3306-15. [PMID: 23060613 DOI: 10.1093/bioinformatics/bts602] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Genç M. Penalized logistic regression with prior information for microarray gene expression classification. Int J Biostat 2024;20:107-122. [PMID: 36427223 DOI: 10.1515/ijb-2022-0025] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 11/07/2022] [Indexed: 02/17/2024]

Zhang W, Kenney T, Ho LST. Evolutionary shift detection with ensemble variable selection. BMC Ecol Evol 2024;24:11. [PMID: 38245667 PMCID: PMC10800078 DOI: 10.1186/s12862-024-02201-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 01/10/2024] [Indexed: 01/22/2024] Open

Park S, Kim JH, Cha YK, Chung MJ, Woo JH, Park S. Application of Machine Learning Algorithm in Predicting Axillary Lymph Node Metastasis from Breast Cancer on Preoperative Chest CT. Diagnostics (Basel) 2023;13:2953. [PMID: 37761320 PMCID: PMC10528867 DOI: 10.3390/diagnostics13182953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Revised: 09/05/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023] Open

A Survey on Feature Selection Techniques Based on Filtering Methods for Cyber Attack Detection. INFORMATION 2023. [DOI: 10.3390/info14030191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023] Open

Liu B, Zhai J, Wang W, Liu T, Liu C, Zhu X, Wang Q, Tian W, Zhang F. Identification of Tumor Microenvironment and DNA Methylation-Related Prognostic Signature for Predicting Clinical Outcomes and Therapeutic Responses in Cervical Cancer. Front Mol Biosci 2022;9:872932. [PMID: 35517856 PMCID: PMC9061945 DOI: 10.3389/fmolb.2022.872932] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 03/17/2022] [Indexed: 01/14/2023] Open

Abstract

Background: Tumor microenvironment (TME) has been reported to have a strong association with tumor progression and therapeutic outcome, and epigenetic modifications such as DNA methylation can affect TMB and play an indispensable role in tumorigenesis. However, the potential mechanisms of TME and DNA methylation remain unclear in cervical cancer (CC).

Methods: The immune and stromal scores of TME were generated by the ESTIMATE algorithm for CC patients in The Cancer Genome Atlas (TCGA) database. The TME and DNA methylation-related genes were identified by the integrative analysis of DNA promoter methylation and gene expression. The least absolute shrinkage and selection operator (LASSO) Cox regression was performed 1,000 times to further identify a nine-gene TME and DNA methylation-related prognostic signature. The signature was further validated in Gene Expression Omnibus (GEO) dataset. Then, the identified signature was integrated with the Federation International of Gynecology and Obstetrics (FIGO) stage to establish a composite prognostic nomogram.

Results: CC patients with high immunity levels have better survival than those with low immunity levels. Both in the training and validation datasets, the risk score of the signature was an independent prognosis factor. The composite nomogram showed higher accuracy of prognosis and greater net benefits than the FIGO stage and the signature. The high-risk group had a significantly higher fraction of genome altered than the low-risk group. Eleven genes were significantly different in mutation frequencies between the high- and low-risk groups. Interestingly, patients with mutant TTN had better overall survival (OS) than those with wild type. Patients in the low-risk group had significantly higher tumor mutational burden (TMB) than those in the high-risk group. Taken together, the results of TMB, immunophenoscore (IPS), and tumor immune dysfunction and exclusion (TIDE) score suggested that patients in the low-risk group may have greater immunotherapy benefits. Finally, four drugs (panobinostat, lenvatinib, everolimus, and temsirolimus) were found to have potential therapeutic implications for patients with a high-risk score.

Conclusions: Our findings highlight that the TME and DNA methylation-related prognostic signature can accurately predict the prognosis of CC and may be important for stratified management of patients and precision targeted therapy.

Collapse

Ai H. GSEA-SDBE: A gene selection method for breast cancer classification based on GSEA and analyzing differences in performance metrics. PLoS One 2022;17:e0263171. [PMID: 35472078 PMCID: PMC9041804 DOI: 10.1371/journal.pone.0263171] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Accepted: 01/13/2022] [Indexed: 12/20/2022] Open

Abstract

MOTIVATION

Selecting the most relevant genes for sample classification is a common process in gene expression studies. Moreover, determining the smallest set of relevant genes that can achieve the required classification performance is particularly important in diagnosing cancer and improving treatment.

RESULTS

In this study, I propose a novel method to eliminate irrelevant and redundant genes, and thus determine the smallest set of relevant genes for breast cancer diagnosis. The method is based on random forest models, gene set enrichment analysis (GSEA), and my developed Sort Difference Backward Elimination (SDBE) algorithm; hence, the method is named GSEA-SDBE. Using this method, genes are filtered according to their importance following random forest training and GSEA is used to select genes by core enrichment of Kyoto Encyclopedia of Genes and Genomes pathways that are strongly related to breast cancer. Subsequently, the SDBE algorithm is applied to eliminate redundant genes and identify the most relevant genes for breast cancer diagnosis. In the SDBE algorithm, the differences in the Matthews correlation coefficients (MCCs) of performing random forest models are computed before and after the deletion of each gene to indicate the degree of redundancy of the corresponding deleted gene on the remaining genes during backward elimination. Next, the obtained MCC difference list is divided into two parts from a set position and each part is respectively sorted. By continuously iterating and changing the set position, the most relevant genes are stably assembled on the left side of the gene list, facilitating their identification, and the redundant genes are gathered on the right side of the gene list for easy elimination. A cross-comparison of the SDBE algorithm was performed by respectively computing differences between MCCs and ROC_AUC_score and then respectively using 10-fold classification models, e.g., random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), extreme gradient boosting (XGBoost), and extremely randomized trees (ExtraTrees). Finally, the classification performance of the proposed method was compared with that of three advanced algorithms for five cancer datasets. Results showed that analyzing MCC differences and using random forest models was the optimal solution for the SDBE algorithm. Accordingly, three consistently relevant genes (i.e., VEGFD, TSLP, and PKMYT1) were selected for the diagnosis of breast cancer. The performance metrics (MCC and ROC_AUC_score, respectively) of the random forest models based on 10-fold verification reached 95.28% and 98.75%. In addition, survival analysis showed that VEGFD and TSLP could be used to predict the prognosis of patients with breast cancer. Moreover, the proposed method significantly outperformed the other methods tested as it allowed selecting a smaller number of genes while maintaining the required classification accuracy.

Collapse

Asad E, Mollah AF. Biomarker Identification From Gene Expression Based on Symmetrical Uncertainty. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES 2021. [DOI: 10.4018/ijiit.289966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Yang X, Wu W, Xin X, Su L, Xue L. Adaptive factorization rank selection-based NMF and its application in tumor recognition. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01353-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Framework for the Ensemble of Feature Selection Methods. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11178122] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Abstract Feature selection (FS) has attracted the attention of many researchers in the last few years due to the increasing sizes of datasets, which contain hundreds or thousands of columns (features). Typically, not all columns represent relevant values. Consequently, the noise or irrelevant columns could confuse the algorithms, leading to a weak performance of machine learning models. Different FS algorithms have been proposed to analyze highly dimensional datasets and determine their subsets of relevant features to overcome this problem. However, very often, FS algorithms are biased by the data. Thus, methods for ensemble feature selection (EFS) algorithms have become an alternative to integrate the advantages of single FS algorithms and compensate for their disadvantages. The objective of this research is to propose a conceptual and implementation framework to understand the main concepts and relationships in the process of aggregating FS algorithms and to demonstrate how to address FS on datasets with high dimensionality. The proposed conceptual framework is validated by deriving an implementation framework, which incorporates a set of Phyton packages with functionalities to support the assembly of feature selection algorithms. The performance of the implementation framework was demonstrated in several experiments discovering relevant features in the Sonar, SPECTF, and WDBC datasets. The experiments contrasted the accuracy of two machine learning classifiers (decision tree and logistic regression), trained with subsets of features generated either by single FS algorithms or the set of features selected by the ensemble feature selection framework. We observed that for the three datasets used (Sonar, SPECTF, and WD), the highest precision percentages (86.95%, 74.73%, and 93.85%, respectively) were obtained when the classifiers were trained with the subset of features generated by our framework. Additionally, the stability of the feature sets generated using our ensemble method was evaluated. The results showed that the method achieved perfect stability for the three datasets used in the evaluation. Collapse

Gomes J, Kong J, Kurc T, Melo ACMA, Ferreira R, Saltz JH, Teodoro G. Building robust pathology image analyses with uncertainty quantification. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021;208:106291. [PMID: 34333205 DOI: 10.1016/j.cmpb.2021.106291] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 07/09/2021] [Indexed: 06/13/2023]

Ge C, Luo L, Zhang J, Meng X, Chen Y. FRL: An Integrative Feature Selection Algorithm Based on the Fisher Score, Recursive Feature Elimination, and Logistic Regression to Identify Potential Genomic Biomarkers. BIOMED RESEARCH INTERNATIONAL 2021;2021:4312850. [PMID: 34235216 PMCID: PMC8218915 DOI: 10.1155/2021/4312850] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 05/21/2021] [Indexed: 01/06/2023]

Abstract

Accurate screening on cancer biomarkers contributes to health assessment, drug screening, and targeted therapy for precision medicine. The rapid development of high-throughput sequencing technology has identified abundant genomic biomarkers, but most of them are limited to single-cancer analysis. Based on the combination of Fisher score, Recursive feature elimination, and Logistic regression (FRL), this paper proposes an integrative feature selection algorithm named FRL to explore potential cancer genomic biomarkers on cancer subsets. Fisher score is initially used to calculate the weights of genes to rapidly reduce the dimension. Recursive feature elimination and Logistic regression are then jointly employed to extract the optimal subset. Compared to the current differential expression analysis tool GEO2R based on the Limma algorithm, FRL has greater classification precision than Limma. Compared with five traditional feature selection algorithms, FRL exhibits excellent performance on accuracy (ACC) and F1-score and greatly improves computational efficiency. On high-noise datasets such as esophageal cancer, the ACC of FRL is 30% superior to the average ACC achieved with other traditional algorithms. As biomarkers found in multiple studies are more reliable and reproducible, and reveal stronger association on potential clinical value than single analysis, through literature review and spatial analyses of gene functional enrichment and functional pathways, we conduct cluster analysis on 10 diverse cancers with high mortality and form a potential biomarker module comprising 19 genes. All genes in this module can serve as potential biomarkers to provide more information on the overall oncogenesis mechanism for the detection of diverse early cancers and assist in targeted anticancer therapies for further developments in precision medicine.

Collapse

Fuzzy measure with regularization for gene selection and cancer prediction. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01319-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Nazari E, Farzin AH, Aghemiri M, Avan A, Tara M, Tabesh H. Deep Learning for Acute Myeloid Leukemia Diagnosis. J Med Life 2020;13:382-387. [PMID: 33072212 PMCID: PMC7550141 DOI: 10.25122/jml-2019-0090] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

A novel dictionary learning method based on total least squares approach with application in high dimensional biological data. ADV DATA ANAL CLASSI 2020. [DOI: 10.1007/s11634-020-00417-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Comprehensive relative importance analysis and its applications to high dimensional gene expression data analysis. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Li M, Zhang C, Zhou L, Li S, Cao YJ, Wang L, Xiang R, Shi Y, Piao Y. Identification and validation of novel DNA methylation markers for early diagnosis of lung adenocarcinoma. Mol Oncol 2020;14:2744-2758. [PMID: 32688456 PMCID: PMC7607165 DOI: 10.1002/1878-0261.12767] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 06/07/2020] [Accepted: 07/16/2020] [Indexed: 12/15/2022] Open

Abdulrauf Sharifai G, Zainol Z. Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes (Basel) 2020;11:genes11070717. [PMID: 32605144 PMCID: PMC7397300 DOI: 10.3390/genes11070717] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 12/19/2019] [Accepted: 01/07/2020] [Indexed: 11/16/2022] Open

SGL-SVM: A novel method for tumor classification via support vector machine with sparse group Lasso. J Theor Biol 2020;486:110098. [DOI: 10.1016/j.jtbi.2019.110098] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 11/27/2019] [Accepted: 11/28/2019] [Indexed: 02/07/2023]

Xu W, Xu M, Wang L, Zhou W, Xiang R, Shi Y, Zhang Y, Piao Y. Integrative analysis of DNA methylation and gene expression identified cervical cancer-specific diagnostic biomarkers. Signal Transduct Target Ther 2019;4:55. [PMID: 31871774 PMCID: PMC6908647 DOI: 10.1038/s41392-019-0081-6] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Revised: 04/25/2019] [Accepted: 05/10/2019] [Indexed: 12/24/2022] Open

Zhao Q, Zhang Y. Ensemble Method of Feature Selection and Reverse Construction of Gene Logical Network Based on Information Entropy. INT J PATTERN RECOGN 2019. [DOI: 10.1142/s0218001420590041] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Symmetrical Uncertainty-Based Feature Subset Generation and Ensemble Learning for Electricity Customer Classification. Symmetry (Basel) 2019. [DOI: 10.3390/sym11040498] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open

Novel tumor suppressor SPRYD4 inhibits tumor progression in hepatocellular carcinoma by inducing apoptotic cell death. Cell Oncol (Dordr) 2018;42:55-66. [PMID: 30238408 DOI: 10.1007/s13402-018-0407-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/29/2018] [Indexed: 02/06/2023] Open

Abstract

BACKGROUND

Hepatocellular carcinoma (HCC) is one of the leading causes of cancer-associated deaths worldwide. Although recent studies have proposed different biomarkers for HCC progression and therapy resistance, a better understanding of the molecular mechanisms underlying HCC progression and recurrence, as well as the identification of molecular markers with a higher diagnostic accuracy, are necessary for the development of more effective clinical management strategies. Here, we aimed to identify novel players in HCC progression.

METHODS

SPRYD4 mRNA and protein expression analyses were carried out on a normal liver-derived cell line (HL-7702) and four HCC-derived cell lines (HepG2, SMMC7721, Huh-7, BEL-7402) using qRT-PCR and Western blotting, respectively. Cell proliferation Cell Counting Kit-8 (CCK-8) assays, protein expression analyses for apoptosis markers using Western blotting, and Caspase-Glo 3/7 apoptosis assays were carried out on the four HCC-derived cell lines. Expression comparison, functional annotation, gene set enrichment, correlation and survival analyses were carried out on patient data retrieved from the NCBI Gene module, the NCBI GEO database and the TCGA database.

RESULTS

Through a meta-analysis we found that the expression of SPRYD4 was downregulated in primary HCC tissues compared to non-tumor tissues. We also found that the expression of SPRYD4 was downregulated in HCC-derived cells compared to normal liver-derived cells. Subsequently, we found that the expression of SPRYD4 was inversely correlated with a gene signature associated with HCC cell proliferation. Exogenous SPRYD4 expression was found to inhibit HCC cell proliferation by inducing apoptotic cell death. We also found that SPRYD4 expression was associated with a good prognosis and that its expression became downregulated when HCCs progressed towards more aggressive stages and higher grades. Finally, we found that SPRYD4 expression may serve as a biomarker for a good overall and relapse-free survival in HCC patients.

CONCLUSIONS

Our data indicate that a decreased SPRYD4 expression may serve as an independent predictor for a poor prognosis in patients with HCC and that increased SPRYD4 expression may reduce HCC growth and progression through the induction of apoptotic cell death, thereby providing a potential therapeutic target.

Collapse

Xia CQ, Han K, Qi Y, Zhang Y, Yu DJ. A Self-Training Subspace Clustering Algorithm under Low-Rank Representation for Cancer Classification on Gene Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:1315-1324. [PMID: 28600258 PMCID: PMC5986621 DOI: 10.1109/tcbb.2017.2712607] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

A novel effective diagnosis model based on optimized least squares support machine for gene microarray. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.02.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Evaluation measures for cluster ensembles based on a fuzzy generalized Rand index. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2017.03.030] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Piao Y, Piao M, Ryu KH. Multiclass cancer classification using a feature subset-based ensemble from microRNA expression profiles. Comput Biol Med 2017;80:39-44. [DOI: 10.1016/j.compbiomed.2016.11.008] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Revised: 11/15/2016] [Accepted: 11/20/2016] [Indexed: 11/24/2022]

Salem H, Attiya G, El-Fishawy N. Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2016.11.026] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Neumann U, Riemenschneider M, Sowa JP, Baars T, Kälsch J, Canbay A, Heider D. Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach. BioData Min 2016;9:36. [PMID: 27891179 PMCID: PMC5116216 DOI: 10.1186/s13040-016-0114-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2016] [Accepted: 10/27/2016] [Indexed: 11/10/2022] Open

Devi Arockia Vanitha C, Devaraj D, Venkatesulu M. Multiclass cancer diagnosis in microarray gene expression profile using mutual information and Support Vector Machine. INTELL DATA ANAL 2016. [DOI: 10.3233/ida-150203] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Izadi F, Zarrini HN, Kiani G, Jelodar NB. A comparative analytical assay of gene regulatory networks inferred using microarray and RNA-seq datasets. Bioinformation 2016;12:340-346. [PMID: 28293077 PMCID: PMC5320930 DOI: 10.6026/97320630012340] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Revised: 08/05/2016] [Accepted: 08/06/2016] [Indexed: 01/16/2023] Open

Sun S, Peng Q, Zhang X. Global feature selection from microarray data using Lagrange multipliers. Knowl Based Syst 2016. [DOI: 10.1016/j.knosys.2016.07.035] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Chen H, Zhang Y, Gutman I. A kernel-based clustering method for gene selection with gene expression data. J Biomed Inform 2016;62:12-20. [DOI: 10.1016/j.jbi.2016.05.007] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2015] [Revised: 05/08/2016] [Accepted: 05/19/2016] [Indexed: 12/21/2022]

Nayyeri M, Sharifi Noghabi H. Cancer classification by correntropy-based sparse compact incremental learning machine. GENE REPORTS 2016. [DOI: 10.1016/j.genrep.2016.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Mohammadi M, Sharifi Noghabi H, Abed Hodtani G, Rajabi Mashhadi H. Robust and stable gene selection via Maximum–Minimum Correntropy Criterion. Genomics 2016;107:83-87. [DOI: 10.1016/j.ygeno.2015.12.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Revised: 12/13/2015] [Accepted: 12/23/2015] [Indexed: 11/17/2022]

Mishra S, Mishra D. Enhanced gene ranking approaches using modified trace ratio algorithm for gene expression data. INFORMATICS IN MEDICINE UNLOCKED 2016. [DOI: 10.1016/j.imu.2016.09.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Liao B, Jiang Y, Liang W, Peng L, Peng L, Hanyurwimfura D, Li Z, Chen M. On Efficient Feature Ranking Methods for High-Throughput Data Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:1374-1384. [PMID: 26684461 DOI: 10.1109/tcbb.2015.2415790] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Li P, Piao Y, Shon HS, Ryu KH. Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data. BMC Bioinformatics 2015;16:347. [PMID: 26511205 PMCID: PMC4625728 DOI: 10.1186/s12859-015-0778-7] [Citation(s) in RCA: 103] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Accepted: 10/14/2015] [Indexed: 01/08/2023] Open

Abstract

Background

Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments.

Results

In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results.

Conclusion

Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0778-7) contains supplementary material, which is available to authorized users.

Collapse

Sachnev V, Saraswathi S, Niaz R, Kloczkowski A, Suresh S. Multi-class BCGA-ELM based classifier that identifies biomarkers associated with hallmarks of cancer. BMC Bioinformatics 2015;16:166. [PMID: 25986937 PMCID: PMC4448565 DOI: 10.1186/s12859-015-0565-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Accepted: 03/31/2015] [Indexed: 12/05/2022] Open

Abstract

Background

Traditional cancer treatments have centered on cytotoxic drugs and general purpose chemotherapy that may not be tailored to treat specific cancers. Identification of molecular markers that are related to different types of cancers might lead to discovery of drugs that are patient and disease specific. This study aims to use microarray gene expression cancer data to identify biomarkers that are indicative of different types of cancers. Our aim is to provide a multi-class cancer classifier that can simultaneously differentiate between cancers and identify type-specific biomarkers, through the application of the Binary Coded Genetic Algorithm (BCGA) and a neural network based Extreme Learning Machine (ELM) algorithm.

Results

BCGA and ELM are combined and used to select a subset of genes that are present in the Global Cancer Mapping (GCM) data set. This set of candidate genes contains over 52 biomarkers that are related to multiple cancers, according to the literature. They include APOA1, VEGFC, YWHAZ, B2M, EIF2S1, CCR9 and many other genes that have been associated with the hallmarks of cancer. BCGA-ELM is tested on several cancer data sets and the results are compared to other classification methods. BCGA-ELM compares or exceeds other algorithms in terms of accuracy. We were also able to show that over 50% of genes selected by BCGA-ELM on GCM data are cancer related biomarkers.

Conclusions

We were able to simultaneously differentiate between 14 different types of cancers, using only 92 genes, to achieve a multi-class classification accuracy of 95.4% which is between 21.6% and 38% higher than other results in the literature for multi-class cancer classification. Our findings suggest that computational algorithms such as BCGA-ELM can facilitate biomarker-driven integrated cancer research that can lead to a detailed understanding of the complexities of cancer.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0565-5) contains supplementary material, which is available to authorized users.

Collapse

Yang L, Ainali C, Kittas A, Nestle FO, Papageorgiou LG, Tsoka S. Pathway-level disease data mining through hyper-box principles. Math Biosci 2015;260:25-34. [DOI: 10.1016/j.mbs.2014.09.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Revised: 09/11/2014] [Accepted: 09/13/2014] [Indexed: 01/16/2023]

Liao B, Jiang Y, Liang W, Zhu W, Cai L, Cao Z. Gene Selection Using Locality Sensitive Laplacian Score. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:1146-1156. [PMID: 26357051 DOI: 10.1109/tcbb.2014.2328334] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Sun S, Peng Q, Shakoor A. A kernel-based multivariate feature selection method for microarray data classification. PLoS One 2014;9:e102541. [PMID: 25048512 PMCID: PMC4105478 DOI: 10.1371/journal.pone.0102541] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Accepted: 06/20/2014] [Indexed: 11/19/2022] Open