51
|
Resting-State Functional Network Scale Effects and Statistical Significance-Based Feature Selection in Machine Learning Classification. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:9108108. [PMID: 31781290 PMCID: PMC6875180 DOI: 10.1155/2019/9108108] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 08/04/2019] [Accepted: 09/06/2019] [Indexed: 12/17/2022]
Abstract
In recent years, functional brain network topological features have been widely used as classification features. Previous studies have found that network node scale differences caused by different network parcellation definitions significantly affect the structure of the constructed network and its topological properties. However, we still do not know how network scale differences affect the classification accuracy, performance of classification features, and effectiveness of the feature selection strategy using P values in terms of the machine learning method. This study used five scale parcellations, involving 90, 256, 497, 1003, and 1501 nodes. Three local properties of resting-state functional brain networks were selected (degree, betweenness centrality, and nodal efficiency), and the support vector machine method was used to construct classifiers to identify patients with major depressive disorder. We analyzed the impact of the five scales on classification accuracy. In addition, the effectiveness and redundancy of features obtained by the different scale parcellations were compared. Finally, traditional statistical significance (P value) was verified as a feature selection criterion. The results showed that the feature effectiveness of different scales was similar; in other words, parcellation with more regions did not provide more effective discriminative features. Nevertheless, parcellation with more regions did provide a greater quantity of discriminative features, which led to an improvement in the accuracy of the classification. However, due to the close distance between brain regions, the redundancy of parcellation with more regions was also greater. The traditional P value feature selection strategy is feasible with different scales, but our analysis showed that the traditional P < 0.05 threshold was too strict for feature selection. This study provides an important reference for the selection of network scales when applying topological properties of brain networks to machine learning methods.
Collapse
|
52
|
Alanni R, Hou J, Azzawi H, Xiang Y. Deep gene selection method to select genes from microarray datasets for cancer classification. BMC Bioinformatics 2019; 20:608. [PMID: 31775613 PMCID: PMC6880643 DOI: 10.1186/s12859-019-3161-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 10/15/2019] [Indexed: 12/15/2022] Open
Abstract
Background Microarray datasets consist of complex and high-dimensional samples and genes, and generally the number of samples is much smaller than the number of genes. Due to this data imbalance, gene selection is a demanding task for microarray expression data analysis. Results The gene set selected by DGS has shown its superior performances in cancer classification. DGS has a high capability of reducing the number of genes in the original microarray datasets. The experimental comparisons with other representative and state-of-the-art gene selection methods also showed that DGS achieved the best performance in terms of the number of selected genes, classification accuracy, and computational cost. Conclusions We provide an efficient gene selection algorithm can select relevant genes which are significantly sensitive to the samples’ classes. With the few discriminative genes and less cost time by the proposed algorithm achieved much high prediction accuracy on several public microarray data, which in turn verifies the efficiency and effectiveness of the proposed gene selection method.
Collapse
Affiliation(s)
- Russul Alanni
- School of Information Technology, Deakin University, Geelong, Victoria, Australia.
| | - Jingyu Hou
- School of Information Technology, Deakin University, Geelong, Victoria, Australia
| | - Hasseeb Azzawi
- School of Information Technology, Deakin University, Geelong, Victoria, Australia
| | - Yong Xiang
- School of Information Technology, Deakin University, Geelong, Victoria, Australia
| |
Collapse
|
53
|
Paul A, Sil J. Identification of Differentially Expressed Genes to Establish New Biomarker for Cancer Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1970-1985. [PMID: 29994718 DOI: 10.1109/tcbb.2018.2837095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The goal of the human genome project is to integrate genetic information into different clinical therapies. To achieve this goal, different computational algorithms are devised for identifying the biomarker genes, cause of complex diseases. However, most of the methods developed so far using DNA microarray data lack in interpreting biological findings and are less accurate in disease prediction. In the paper, we propose two parameters risk_factor and confusion_factor to identify the biologically significant genes for cancer development. First, we evaluate risk_factor of each gene and the genes with nonzero risk_factor result misclassification of data, therefore removed. Next, we calculate confusion_factor of the remaining genes which determines confusion of a gene in prediction due to closeness of the samples in the cancer and normal classes. We apply nondominated sorting genetic algorithm (NSGA-II) to select the maximally uncorrelated differentially expressed genes in the cancer class with minimum confusion_factor. The proposed Gene Selection Explore (GSE) algorithm is compared to well established feature selection algorithms using 10 microarray data with respect to sensitivity, specificity, and accuracy. The identified genes appear in KEGG pathway and have several biological importance.
Collapse
|
54
|
Su R, Wu H, Xu B, Liu X, Wei L. Developing a Multi-Dose Computational Model for Drug-Induced Hepatotoxicity Prediction Based on Toxicogenomics Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1231-1239. [PMID: 30040651 DOI: 10.1109/tcbb.2018.2858756] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Drug-induced hepatotoxicity may cause acute and chronic liver disease, leading to great concern for patient safety. It is also one of the main reasons for drug withdrawal from the market. Toxicogenomics data has been widely used in hepatotoxicity prediction. In our study, we proposed a multi-dose computational model to predict the drug-induced hepatotoxicity based on gene expression and toxicity data. The dose/concentration information after drug treatment is fully utilized in our study based on the dose-response curve, thus a more informative representative of the dose-response relationship is considered. We also proposed a new feature selection method, named MEMO, which is also one important aspect of our multi-dose model in our study, to deal with the high-dimensional toxicogenomics data. We validated the proposed model using the TG-GATEs, which is a large database recording toxicogenomics data from multiple views. The experimental results show that the drug-induced hepatotoxicity can be predicted with high accuracy and efficiency using the proposed predictive model.
Collapse
|
55
|
Liu H, Hu QV, He L. Term-Based Personalization for Feature Selection in Clinical Handover Form Auto-Filling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1219-1230. [PMID: 30296238 DOI: 10.1109/tcbb.2018.2874237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Feature learning and selection have been widely applied in many research areas because of their good performance and lower complexity. Traditional methods usually treat all terms with same feature sets, such that performance can be damaged when noisy information is brought via wrong features for a given term. In this paper, we propose a term-based personalization approach to finding the best features for each term. First, features are given as the input so that we focus on selection strategies. Second, the importance of each feature subset to a given term is evaluated by the term-feature probabilistic relevance model. We present a feature searching method to generate feature candidate subsets for each term, since evaluating all the possible feature subsets is computationally intensive. Finally, we obtain the personalized feature set for each term as a subset of all features. Experiments have been conducted on the NICTA Synthetic Nursing Handover dataset and the results show that our approach is promising and effective.
Collapse
|
56
|
Alsmadi I, Gan KH. Review of short-text classification. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS 2019. [DOI: 10.1108/ijwis-12-2017-0083] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
PurposeRapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.Design/methodology/approachThe paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.FindingsThis paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.Originality/valueUsing a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.
Collapse
|
57
|
Alanni R, Hou J, Azzawi H, Xiang Y. Cancer adjuvant chemotherapy prediction model for non‐small cell lung cancer. IET Syst Biol 2019; 13:129-135. [DOI: 10.1049/iet-syb.2018.5060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Affiliation(s)
- Russul Alanni
- School of Information Technology, Deakin UniversityBurwoodAustralia
| | - Jingyu Hou
- School of Information Technology, Deakin UniversityBurwoodAustralia
| | - Hasseeb Azzawi
- School of Information Technology, Deakin UniversityBurwoodAustralia
| | - Yong Xiang
- School of Information Technology, Deakin UniversityBurwoodAustralia
| |
Collapse
|
58
|
Abstract
Many biological or medical data have numerous features. Feature selection is one of the data preprocessing steps that can remove the noise from data as well as save the computing time when the dataset has several hundred thousand or more features. Another goal of feature selection is improving the classification accuracy in machine learning tasks. Minimum Redundancy Maximum Relevance (mRMR) is a well-known feature selection algorithm that selects features by calculating redundancy between features and relevance between features and class vector. mRMR adopts mutual information theory to measure redundancy and relevance. In this research, we propose a method to improve the performance of mRMR feature selection. We apply Pearson’s correlation coefficient as a measure of redundancy and R-value as a measure of relevance. To compare original mRMR and the proposed method, features were selected using both of two methods from various datasets, and then we performed a classification test. The classification accuracy was used as a measure of performance comparison. In many cases, the proposed method showed higher accuracy than original mRMR.
Collapse
|
59
|
Shukla AK, Singh P, Vardhan M. A hybrid framework for optimal feature subset selection. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2019. [DOI: 10.3233/jifs-169936] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Alok Kumar Shukla
- Department of Computer Science & Engineering, NIT Raipur, Chhattisgarh (C.G), India
| | - Pradeep Singh
- Department of Computer Science & Engineering, NIT Raipur, Chhattisgarh (C.G), India
| | - Manu Vardhan
- Department of Computer Science & Engineering, NIT Raipur, Chhattisgarh (C.G), India
| |
Collapse
|
60
|
Deng L, Sui Y, Zhang J. XGBPRH: Prediction of Binding Hot Spots at Protein⁻RNA Interfaces Utilizing Extreme Gradient Boosting. Genes (Basel) 2019; 10:genes10030242. [PMID: 30901953 PMCID: PMC6471955 DOI: 10.3390/genes10030242] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Revised: 03/14/2019] [Accepted: 03/15/2019] [Indexed: 01/24/2023] Open
Abstract
Hot spot residues at protein⁻RNA complexes are vitally important for investigating the underlying molecular recognition mechanism. Accurately identifying protein⁻RNA binding hot spots is critical for drug designing and protein engineering. Although some progress has been made by utilizing various available features and a series of machine learning approaches, these methods are still in the infant stage. In this paper, we present a new computational method named XGBPRH, which is based on an eXtreme Gradient Boosting (XGBoost) algorithm and can effectively predict hot spot residues in protein⁻RNA interfaces utilizing an optimal set of properties. Firstly, we download 47 protein⁻RNA complexes and calculate a total of 156 sequence, structure, exposure, and network features. Next, we adopt a two-step feature selection algorithm to extract a combination of 6 optimal features from the combination of these 156 features. Compared with the state-of-the-art approaches, XGBPRH achieves better performances with an area under the ROC curve (AUC) score of 0.817 and an F1-score of 0.802 on the independent test set. Meanwhile, we also apply XGBPRH to two case studies. The results demonstrate that the method can effectively identify novel energy hotspots.
Collapse
Affiliation(s)
- Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410075, China.
| | - Yuanchao Sui
- School of Computer Science and Engineering, Central South University, Changsha 410075, China.
| | - Jingpu Zhang
- School of Computer and Data Science, Henan University of Urban Construction, Pingdingshan 467000, China.
| |
Collapse
|
61
|
Bhola A, Singh S. Visualisation and Modelling of High-Dimensional Cancerous Gene Expression Dataset. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT 2019. [DOI: 10.1142/s0219649219500011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The increase in the number of dimensions of cancerous gene expression dataset causes an increase in complexity, misinterpretation and decrease in the visualisation of the particular dataset for further analysis. Therefore, dimensionality reduction, visualisation and modelling tasks of these dataset become challenging. In this paper, a framework is developed which helps to understand, visualise and model high-dimensional cancerous gene expression dataset into lower dimensions which may be helpful in revealing cancer mechanism and diagnosis. Initially, cancerous gene expression datasets are preprocessed to make them complete, precise and efficient; and principal component analysis is applied for dimensionality reduction and visualisation purpose. The regression is used to model the cancerous gene expression dataset so that type of association (linear or nonlinear) and directions between gene profiles may be estimated. To assess the performance of the developed framework, three different types of cancerous gene expression datasets are taken namely: breast (GEO Acc. No. GDS5076), lung (GEO Acc. No. GDS5040) and prostate (GEO Acc. No. GDS5072) which are publicly available. To validate the results of the regression the cross-validation method is used. The results revealed that a linear approach is to be used for prostate cancer dataset and nonlinear approach for breast and lung cancer datasets in finding an association between gene pairs.
Collapse
Affiliation(s)
- Abhishek Bhola
- Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Sector 12, Chandigarh 160012, India
| | - Shailendra Singh
- Department of Computer Science and Engineering, Punjab Engineering College (Deemed to be University), Sector 12, Chandigarh 160012, India
| |
Collapse
|
62
|
Zhao D, Liu H, Zheng Y, He Y, Lu D, Lyu C. Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis. J Biomed Inform 2019; 92:103124. [PMID: 30796977 DOI: 10.1016/j.jbi.2019.103124] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2018] [Revised: 01/15/2019] [Accepted: 02/04/2019] [Indexed: 12/17/2022]
Abstract
Microarray technique is a prevalent method for the classification and prediction of colorectal cancer (CRC). Nevertheless, microarray data suffers from the curse of dimensionality when selecting feature genes of the disease based on imbalance samples, thus causing low prediction accuracy. Hence, it is of vital significance to build proper models that can avoid the above problems and predict the CRC more accurately. In this paper, we use an ensemble model to classify samples into healthy and CRC groups and improve prediction performance. The proposed model is composed of three functional modules. The first module mainly performs the function of removing redundant genes. The main feature genes are selected using minimum redundancy maximum relevance (mRMR) method to reduce the dimensionality of features thereby increasing the prediction results. The second module aims to solve the problem caused by imbalanced data using hybrid sampling algorithm RUSBoost. The third module focuses on the classification algorithm optimization. We use mixed kernel function (MKF) based support vector machine (SVM) model to classify an unknown sample into healthy individuals and CRC patients, and then, the Whale Optimization Algorithm (WOA) is applied to find most optimal parameters of the proposed MKF-SVM. The final results show that the proposed model achieves higher G-means than other comparable models. The conclusion comes to show that RUSBoost wrapping WOA + MKF-SVM model can be applied to improve the predictive performance of colorectal cancer based on the imbalanced data.
Collapse
Affiliation(s)
- Dandan Zhao
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China.
| | - Yuanjie Zheng
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| | - Yanlin He
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| | - Dianjie Lu
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| | - Chen Lyu
- School of Information Science and Engineering, Shandong Normal University, Jinan City, China; Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan City, China
| |
Collapse
|
63
|
Bakhshandeh S, Azmi R, Teshnehlab M. Symmetric uncertainty class-feature association map for feature selection in microarray dataset. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-00932-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
|
64
|
Dasgupta S, Goldberg Y, Kosorok MR. FEATURE ELIMINATION IN KERNEL MACHINES IN MODERATELY HIGH DIMENSIONS. Ann Stat 2019; 47:497-526. [PMID: 30559548 DOI: 10.1214/18-aos1696] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
We develop an approach for feature elimination in statistical learning with kernel machines, based on recursive elimination of features. We present theoretical properties of this method and show that it is uniformly consistent in finding the correct feature space under certain generalized assumptions. We present a few case studies to show that the assumptions are met in most practical situations and present simulation results to demonstrate performance of the proposed approach.
Collapse
|
65
|
Abstract
The advent of DNA microarray datasets has stimulated a new line of research both in bioinformatics and in machine learning. This type of data is used to collect information from tissue and cell samples regarding gene expression differences that could be useful for disease diagnosis or for distinguishing specific types of tumor. Microarray data classification is a difficult challenge for machine learning researchers due to its high number of features and the small sample sizes. This chapter is devoted to reviewing the microarray databases most frequently used in the literature. We also make the interested reader aware of the problematic of data characteristics in this domain, such as the imbalance of the data, their complexity, and the so-called dataset shift.
Collapse
|
66
|
Perscheid C, Grasnick B, Uflacker M. Integrative Gene Selection on Gene Expression Data: Providing Biological Context to Traditional Approaches. J Integr Bioinform 2018; 16:/j/jib.ahead-of-print/jib-2018-0064/jib-2018-0064.xml. [PMID: 30785707 PMCID: PMC6798862 DOI: 10.1515/jib-2018-0064] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 11/12/2018] [Indexed: 12/30/2022] Open
Abstract
The advance of high-throughput RNA-Sequencing techniques enables researchers to analyze the complete gene activity in particular cells. From the insights of such analyses, researchers can identify disease-specific expression profiles, thus understand complex diseases like cancer, and eventually develop effective measures for diagnosis and treatment. The high dimensionality of gene expression data poses challenges to its computational analysis, which is addressed with measures of gene selection. Traditional gene selection approaches base their findings on statistical analyses of the actual expression levels, which implies several drawbacks when it comes to accurately identifying the underlying biological processes. In turn, integrative approaches include curated information on biological processes from external knowledge bases during gene selection, which promises to lead to better interpretability and improved predictive performance. Our work compares the performance of traditional and integrative gene selection approaches. Moreover, we propose a straightforward approach to integrate external knowledge with traditional gene selection approaches. We introduce a framework enabling the automatic external knowledge integration, gene selection, and evaluation. Evaluation results prove our framework to be a useful tool for evaluation and show that integration of external knowledge improves overall analysis results.
Collapse
Affiliation(s)
- Cindy Perscheid
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Bastien Grasnick
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | - Matthias Uflacker
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| |
Collapse
|
67
|
Yang F, Yang X, Teo SK, Lee G, Zhong L, Tan RS, Su Y. Multi-dimensional proprio-proximus machine learning for assessment of myocardial infarction. Comput Med Imaging Graph 2018; 70:63-72. [DOI: 10.1016/j.compmedimag.2018.09.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Revised: 08/13/2018] [Accepted: 09/21/2018] [Indexed: 10/28/2022]
|
68
|
Jafarpisheh N, Teshnehlab M. Cancers classification based on deep neural networks and emotional learning approach. IET Syst Biol 2018; 12:258-263. [PMID: 30472689 PMCID: PMC8687421 DOI: 10.1049/iet-syb.2018.5002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
In the present era, enormous factors contribute to causing cancer. So cancer classification cannot rely only on doctor's thoughts. As a result, intelligent algorithms concerning doctor's help are inevitable. Therefore, the authors are motivated to suggest a novel algorithm to classify three cancer datasets; colon, ALL‐AML, and leukaemia cancers. Their proposed algorithm is based on the deep neural network and emotional learning process. First of all, by applying the principal component analysis, they had a feature reduction. Then, they used deep neural as a feature extraction. Then, they implemented different classifiers; multi‐layer perceptron, support vector machine (SVM), decision tree, and Gaussian mixture model. In the end, because in the real world, especially when working on systems biology, unpredictable events, and uncertainties are undeniable, the robustness of their model against uncertainties is important. So they added Gaussian noise to the input features of the first encoder in each dataset, then, they applied the stacked denoising method. Experimental results disclosed that, generally, using emotional learning increased the accuracy. In addition, the highest accuracy was gained by SVM, 91.66, 92.27, and 96.56% for colon, ALL‐AML, and leukaemia, respectively. However, GMM led to the lowest accuracy. The best accuracy gained by GMM was 60%.
Collapse
Affiliation(s)
- Noushin Jafarpisheh
- Department of Electrical EngineeringK.N. Toosi University of TechnologyTehranIran
| | - Mohammad Teshnehlab
- Department of Electrical EngineeringK.N. Toosi University of TechnologyTehranIran
| |
Collapse
|
69
|
Dao FY, Lv H, Wang F, Feng CQ, Ding H, Chen W, Lin H. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics 2018; 35:2075-2083. [DOI: 10.1093/bioinformatics/bty943] [Citation(s) in RCA: 147] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Revised: 11/06/2018] [Accepted: 11/13/2018] [Indexed: 02/07/2023] Open
Affiliation(s)
- Fu-Ying Dao
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lv
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Fang Wang
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Chao-Qin Feng
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
70
|
Wu HC, Wei XG, Chan SC. Novel Consensus Gene Selection Criteria for Distributed GPU Partial Least Squares-Based Gene Microarray Analysis in Diffused Large B Cell Lymphoma (DLBCL) and Related Findings. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:2039-2052. [PMID: 28991749 DOI: 10.1109/tcbb.2017.2760827] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This paper proposes a novel consensus gene selection criteria for partial least squares-based gene microarray analysis. By quantifying the extent of consistency and distinctiveness of the differential gene expressions across different double cross validations (CV) or randomizations in terms of occurrence and randomization p-values, the proposed criteria are able to identify a more comprehensive genes associated with the underlying disease. A Distributed GPU implementation has been proposed to accelerate the gene selection problem and about 8-11 times speed up has been achieved based on the microarray datasets considered. Simulation results using various cancer gene microarray datasets show that the proposed approach is able to achieve highly comparable classification accuracy in comparing with many conventional approaches. Furthermore, enrichment analysis on the selected genes for Diffused Large B Cell Lymphoma (DLBCL) and Prostate Cancer datasets and show that only the proposed approach is able to identify gene lists enriched in different pathways with significant p-values. In contrast, sufficient statistical significance cannot be found for conventional SVM-RFE and the t-test. The reliability in identifying and establishing statistical significance of the gene findings makes the proposed approach an attractive alternative for cancer related researches based on gene expression profiling or other similar data.
Collapse
|
71
|
Prasad Y, Biswas K, Hanmandlu M. A recursive PSO scheme for gene selection in microarray data. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.06.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
72
|
Li Z, Xie W, Liu T. Efficient feature selection and classification for microarray data. PLoS One 2018; 13:e0202167. [PMID: 30125332 PMCID: PMC6101392 DOI: 10.1371/journal.pone.0202167] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 07/30/2018] [Indexed: 11/19/2022] Open
Abstract
Feature selection and classification are the main topics in microarray data analysis. Although many feature selection methods have been proposed and developed in this field, SVM-RFE (Support Vector Machine based on Recursive Feature Elimination) is proved as one of the best feature selection methods, which ranks the features (genes) by training support vector machine classification model and selects key genes combining with recursive feature elimination strategy. The principal drawback of SVM-RFE is the huge time consumption. To overcome this limitation, we introduce a more efficient implementation of linear support vector machines and improve the recursive feature elimination strategy and then combine them together to select informative genes. Besides, we propose a simple resampling method to preprocess the datasets, which makes the information distribution of different kinds of samples balanced and the classification results more credible. Moreover, the applicability of four common classifiers is also studied in this paper. Extensive experiments are conducted on six most frequently used microarray datasets in this field, and the results show that the proposed methods have not only reduced the time consumption greatly but also obtained comparable classification performance.
Collapse
Affiliation(s)
- Zifa Li
- Department of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, China
| | - Weibo Xie
- Department of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, China
| | - Tao Liu
- Department of Computer Science and Technology, Huaqiao University, Xiamen, Fujian, China
| |
Collapse
|
73
|
|
74
|
Sahran S, Albashish D, Abdullah A, Shukor NA, Hayati Md Pauzi S. Absolute cosine-based SVM-RFE feature selection method for prostate histopathological grading. Artif Intell Med 2018; 87:78-90. [PMID: 29680688 DOI: 10.1016/j.artmed.2018.04.002] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2017] [Revised: 04/02/2018] [Accepted: 04/07/2018] [Indexed: 01/09/2023]
Abstract
OBJECTIVE Feature selection (FS) methods are widely used in grading and diagnosing prostate histopathological images. In this context, FS is based on the texture features obtained from the lumen, nuclei, cytoplasm and stroma, all of which are important tissue components. However, it is difficult to represent the high-dimensional textures of these tissue components. To solve this problem, we propose a new FS method that enables the selection of features with minimal redundancy in the tissue components. METHODOLOGY We categorise tissue images based on the texture of individual tissue components via the construction of a single classifier and also construct an ensemble learning model by merging the values obtained by each classifier. Another issue that arises is overfitting due to the high-dimensional texture of individual tissue components. We propose a new FS method, SVM-RFE(AC), that integrates a Support Vector Machine-Recursive Feature Elimination (SVM-RFE) embedded procedure with an absolute cosine (AC) filter method to prevent redundancy in the selected features of the SV-RFE and an unoptimised classifier in the AC. RESULTS We conducted experiments on H&E histopathological prostate and colon cancer images with respect to three prostate classifications, namely benign vs. grade 3, benign vs. grade 4 and grade 3 vs. grade 4. The colon benchmark dataset requires a distinction between grades 1 and 2, which are the most difficult cases to distinguish in the colon domain. The results obtained by both the single and ensemble classification models (which uses the product rule as its merging method) confirm that the proposed SVM-RFE(AC) is superior to the other SVM and SVM-RFE-based methods. CONCLUSION We developed an FS method based on SVM-RFE and AC and successfully showed that its use enabled the identification of the most crucial texture feature of each tissue component. Thus, it makes possible the distinction between multiple Gleason grades (e.g. grade 3 vs. grade 4) and its performance is far superior to other reported FS methods.
Collapse
Affiliation(s)
- Shahnorbanun Sahran
- Pattern Recognition Research Group, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Malaysia.
| | - Dheeb Albashish
- Computer Science Department, Prince Abdullah Bin Ghazi Faculty of Information Technology, Al-Balqa Applied University, Jordan.
| | - Azizi Abdullah
- Pattern Recognition Research Group, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Malaysia.
| | - Nordashima Abd Shukor
- Department of Pathology, University Kebangsaan Malaysia Medical Center, 56000 Batu 9 Cheras, Malaysia.
| | - Suria Hayati Md Pauzi
- Department of Pathology, University Kebangsaan Malaysia Medical Center, 56000 Batu 9 Cheras, Malaysia.
| |
Collapse
|
75
|
Pal JK, Ray SS, Cho SB, Pal SK. Fuzzy-Rough Entropy Measure and Histogram Based Patient Selection for miRNA Ranking in Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:659-672. [PMID: 27831888 DOI: 10.1109/tcbb.2016.2623605] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
MicroRNAs (miRNAs) are known as an important indicator of cancers. The presence of cancer can be detected by identifying the responsible miRNAs. A fuzzy-rough entropy measure (FREM) is developed which can rank the miRNAs and thereby identify the relevant ones. FREM is used to determine the relevance of a miRNA in terms of separability between normal and cancer classes. While computing the FREM for a miRNA, fuzziness takes care of the overlapping between normal and cancer expressions, whereas rough lower approximation determines their class sizes. MiRNAs are sorted according to the highest relevance (i.e., the capability of class separation) and a percentage among them is selected from the top ranked ones. FREM is also used to determine the redundancy between two miRNAs and the redundant ones are removed from the selected set, as per the necessity. A histogram based patient selection method is also developed which can help to reduce the number of patients to be dealt during the computation of FREM, while compromising very little with the performance of the selected miRNAs for most of the data sets. The superiority of the FREM as compared to some existing methods is demonstrated extensively on six data sets in terms of sensitivity, specificity, and score. While for these data sets the score of the miRNAs selected by our method varies from 0.70 to 0.91 using SVM, those results vary from 0.37 to 0.90 for some other methods. Moreover, all the selected miRNAs corroborate with the findings of biological investigations or pathway analysis tools. The source code of FREM is available at http://www.jayanta.droppages.com/FREM.html.
Collapse
|
76
|
Statistical approach for selection of biologically informative genes. Gene 2018; 655:71-83. [PMID: 29458166 DOI: 10.1016/j.gene.2018.02.044] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Revised: 11/26/2017] [Accepted: 02/14/2018] [Indexed: 11/23/2022]
Abstract
Selection of informative genes from high dimensional gene expression data has emerged as an important research area in genomics. Many gene selection techniques have been proposed so far are either based on relevancy or redundancy measure. Further, the performance of these techniques has been adjudged through post selection classification accuracy computed through a classifier using the selected genes. This performance metric may be statistically sound but may not be biologically relevant. A statistical approach, i.e. Boot-MRMR, was proposed based on a composite measure of maximum relevance and minimum redundancy, which is both statistically sound and biologically relevant for informative gene selection. For comparative evaluation of the proposed approach, we developed two biological sufficient criteria, i.e. Gene Set Enrichment with QTL (GSEQ) and biological similarity score based on Gene Ontology (GO). Further, a systematic and rigorous evaluation of the proposed technique with 12 existing gene selection techniques was carried out using five gene expression datasets. This evaluation was based on a broad spectrum of statistically sound (e.g. subject classification) and biological relevant (based on QTL and GO) criteria under a multiple criteria decision-making framework. The performance analysis showed that the proposed technique selects informative genes which are more biologically relevant. The proposed technique is also found to be quite competitive with the existing techniques with respect to subject classification and computational time. Our results also showed that under the multiple criteria decision-making setup, the proposed technique is best for informative gene selection over the available alternatives. Based on the proposed approach, an R Package, i.e. BootMRMR has been developed and available at https://cran.r-project.org/web/packages/BootMRMR. This study will provide a practical guide to select statistical techniques for selecting informative genes from high dimensional expression data for breeding and system biology studies.
Collapse
|
77
|
Shukla AK, Singh P, Vardhan M. A hybrid gene selection method for microarray recognition. Biocybern Biomed Eng 2018. [DOI: 10.1016/j.bbe.2018.08.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
78
|
Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics. Molecules 2017; 23:molecules23010052. [PMID: 29278382 PMCID: PMC5943966 DOI: 10.3390/molecules23010052] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 12/15/2017] [Accepted: 12/16/2017] [Indexed: 11/29/2022] Open
Abstract
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data.
Collapse
|
79
|
Lai C, Guo S, Cheng L, Wang W. A Comparative Study of Feature Selection Methods for the Discriminative Analysis of Temporal Lobe Epilepsy. Front Neurol 2017; 8:633. [PMID: 29375459 PMCID: PMC5770628 DOI: 10.3389/fneur.2017.00633] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Accepted: 11/13/2017] [Indexed: 01/09/2023] Open
Abstract
It is crucial to differentiate patients with temporal lobe epilepsy (TLE) from the healthy population and determine abnormal brain regions in TLE. The cortical features and changes can reveal the unique anatomical patterns of brain regions from structural magnetic resonance (MR) images. In this study, structural MR images from 41 patients with left TLE, 34 patients with right TLE, and 58 normal controls (NC) were acquired, and four kinds of cortical measures, namely cortical thickness, cortical surface area, gray matter volume (GMV), and mean curvature, were explored for discriminative analysis. Three feature selection methods including the independent sample t-test filtering, the sparse-constrained dimensionality reduction model (SCDRM), and the support vector machine-recursive feature elimination (SVM-RFE) were investigated to extract dominant features among the compared groups for classification using the support vector machine (SVM) classifier. The results showed that the SVM-RFE achieved the highest performance (most classifications with more than 84% accuracy), followed by the SCDRM, and the t-test. Especially, the surface area and GMV exhibited prominent discriminative ability, and the performance of the SVM was improved significantly when the four cortical measures were combined. Additionally, the dominant regions with higher classification weights were mainly located in the temporal and the frontal lobe, including the entorhinal cortex, rostral middle frontal, parahippocampal cortex, superior frontal, insula, and cuneus. This study concluded that the cortical features provided effective information for the recognition of abnormal anatomical patterns and the proposed methods had the potential to improve the clinical diagnosis of TLE.
Collapse
Affiliation(s)
- Chunren Lai
- Department of Biomedical Engineering, South China University of Technology, Guangzhou, China.,Department of Radiation Oncology, The People's Hospital of Gaozhou, Gaozhou, China
| | - Shengwen Guo
- Department of Biomedical Engineering, South China University of Technology, Guangzhou, China
| | - Lina Cheng
- Medical Imaging Center, Guangdong 999 Brain Hospital, Guangzhou, China
| | - Wensheng Wang
- Medical Imaging Center, Guangdong 999 Brain Hospital, Guangzhou, China
| |
Collapse
|
80
|
Abd Elaziz ME. Simultaneous feature extraction and selection of microarray data using fuzzy-rough based multiobjective nonnegative matrix factorization. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-17954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
81
|
Pal JK, Ray SS, Pal SK. Fuzzy mutual information based grouping and new fitness function for PSO in selection of miRNAs in cancer. Comput Biol Med 2017; 89:540-548. [PMID: 28844466 DOI: 10.1016/j.compbiomed.2017.08.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Revised: 07/17/2017] [Accepted: 08/11/2017] [Indexed: 01/17/2023]
Abstract
MicroRNAs (miRNA) are one of the important regulators of cell division and also responsible for cancer development. Among the discovered miRNAs, not all are important for cancer detection. In this regard a fuzzy mutual information (FMI) based grouping and miRNA selection method (FMIGS) is developed to identify the miRNAs responsible for a particular cancer. First, the miRNAs are ranked and divided into several groups. Then the most important group is selected among the generated groups. Both the steps viz., ranking of miRNAs and selection of the most relevant group of miRNAs, are performed using FMI. Here the number of groups is automatically determined by the grouping method. After the selection process, redundant miRNAs are removed from the selected set of miRNAs as per user's necessity. In a part of the investigation we proposed a FMI based particle swarm optimization (PSO) method for selecting relevant miRNAs, where FMI is used as a fitness function to determine the fitness of the particles. The effectiveness of FMIGS and FMI based PSO is tested on five data sets and their efficiency in selecting relevant miRNAs are demonstrated. The superior performance of FMIGS to some existing methods are established and the biological significance of the selected miRNAs is observed by the findings of the biological investigation and publicly available pathway analysis tools. The source code related to our investigation is available at http://www.jayanta.droppages.com/FMIGS.html.
Collapse
Affiliation(s)
- Jayanta Kumar Pal
- Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India.
| | - Shubhra Sankar Ray
- Center for Soft Computing Research & Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India.
| | - Sankar K Pal
- Center for Soft Computing Research & Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India.
| |
Collapse
|
82
|
Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS One 2017; 12:e0185587. [PMID: 28961273 PMCID: PMC5621689 DOI: 10.1371/journal.pone.0185587] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 09/17/2017] [Indexed: 12/04/2022] Open
Abstract
Since the importance of DNA-binding proteins in multiple biomolecular functions has been recognized, an increasing number of researchers are attempting to identify DNA-binding proteins. In recent years, the machine learning methods have become more and more compelling in the case of protein sequence data soaring, because of their favorable speed and accuracy. In this paper, we extract three features from the protein sequence, namely NMBAC (Normalized Moreau-Broto Autocorrelation), PSSM-DWT (Position-specific scoring matrix—Discrete Wavelet Transform), and PSSM-DCT (Position-specific scoring matrix—Discrete Cosine Transform). We also employ feature selection algorithm on these feature vectors. Then, these features are fed into the training SVM (support vector machine) model as classifier to predict DNA-binding proteins. Our method applys three datasets, namely PDB1075, PDB594 and PDB186, to evaluate the performance of our approach. The PDB1075 and PDB594 datasets are employed for Jackknife test and the PDB186 dataset is used for the independent test. Our method achieves the best accuracy in the Jacknife test, from 79.20% to 86.23% and 80.5% to 86.20% on PDB1075 and PDB594 datasets, respectively. In the independent test, the accuracy of our method comes to 76.3%. The performance of independent test also shows that our method has a certain ability to be effectively used for DNA-binding protein prediction. The data and source code are at https://doi.org/10.6084/m9.figshare.5104084.
Collapse
|
83
|
Kugiumtzis D, Koutlis C, Tsimpiris A, Kimiskidis VK. Dynamics of Epileptiform Discharges Induced by Transcranial Magnetic Stimulation in Genetic Generalized Epilepsy. Int J Neural Syst 2017; 27:1750037. [DOI: 10.1142/s012906571750037x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Objective: In patients with Genetic Generalized Epilepsy (GGE), transcranial magnetic stimulation (TMS) can induce epileptiform discharges (EDs) of varying duration. We hypothesized that (a) the ED duration is determined by the dynamic states of critical network nodes (brain areas) at the early post-TMS period, and (b) brain connectivity changes before, during and after the ED, as well as within the ED. Methods: EEG recordings from two GGE patients were analyzed. For hypothesis (a), the characteristics of the brain dynamics at the early ED stage are measured with univariate and multivariate EEG measures and the dependence of the ED duration on these measures is evaluated. For hypothesis (b), effective connectivity measures are combined with network indices so as to quantify the brain network characteristics and identify changes in brain connectivity. Results: A number of measures combined with specific channels computed on the first EEG segment post-TMS correlate with the ED duration. In addition, brain connectivity is altered from pre-ED to ED and post-ED and statistically significant changes were also detected across stages within the ED. Conclusion: ED duration is not purely stochastic, but depends on the dynamics of the post-TMS brain state. The brain network dynamics is significantly altered in the course of EDs.
Collapse
Affiliation(s)
- Dimitris Kugiumtzis
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Christos Koutlis
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Alkiviadis Tsimpiris
- Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Vasilios K. Kimiskidis
- Laboratory of Clinical Neurophysiology, Medical School, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| |
Collapse
|
84
|
Feature clustering based support vector machine recursive feature elimination for gene selection. APPL INTELL 2017. [DOI: 10.1007/s10489-017-0992-2] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
85
|
Paul A, Sil J, Mukhopadhyay CD. Gene selection for designing optimal fuzzy rule base classifier by estimating missing value. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2017.01.046] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
86
|
Al-Anni R, Hou J, Abdu-Aljabar RD, Xiang Y. Prediction of NSCLC recurrence from microarray data with GEP. IET Syst Biol 2017; 11:77-85. [PMID: 28518058 DOI: 10.1049/iet-syb.2016.0033] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Lung cancer is one of the deadliest diseases in the world. Non-small cell lung cancer (NSCLC) is the most common and dangerous type of lung cancer. Despite the fact that NSCLC is preventable and curable for some cases if diagnosed at early stages, the vast majority of patients are diagnosed very late. Furthermore, NSCLC usually recurs sometime after treatment. Therefore, it is of paramount importance to predict NSCLC recurrence, so that specific and suitable treatments can be sought. Nonetheless, conventional methods of predicting cancer recurrence rely solely on histopathology data and predictions are not reliable in many cases. The microarray gene expression (GE) technology provides a promising and reliable way to predict NSCLC recurrence by analysing the GE of sample cells. This study proposes a new model from GE programming to use microarray datasets for NSCLC recurrence prediction. To this end, the authors also propose a hybrid method to rank and select relevant prognostic genes that are related to NSCLC recurrence prediction. The proposed model was evaluated on real NSCLC microarray datasets and compared with other representational models. The results demonstrated the effectiveness of the proposed model.
Collapse
Affiliation(s)
- Russul Al-Anni
- School of Information Technology, Deakin University, Victoria, Australia.
| | - Jingyu Hou
- School of Information Technology, Deakin University, Victoria, Australia
| | | | - Yong Xiang
- School of Information Technology, Deakin University, Victoria, Australia
| |
Collapse
|
87
|
Du W, Cao Z, Song T, Li Y, Liang Y. A feature selection method based on multiple kernel learning with expression profiles of different types. BioData Min 2017; 10:4. [PMID: 28184251 PMCID: PMC5288949 DOI: 10.1186/s13040-017-0124-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Accepted: 01/11/2017] [Indexed: 11/28/2022] Open
Abstract
Background With the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. However, most of these methods only are limited to measure the performances on one single type of expression data by accuracy or error rate of classification. Results In this article, we propose a hybrid feature selection method based on Multiple Kernel Learning (MKL) and evaluate the performance on expression datasets of different types. Firstly, the relevance between features and classifying samples is measured by using the optimizing function of MKL. In this step, an iterative gradient descent process is used to perform the optimization both on the parameters of Support Vector Machine (SVM) and kernel confidence. Then, a set of relevant features is selected by sorting the optimizing function of each feature. Furthermore, we apply an embedded scheme of forward selection to detect the compact feature subsets from the relevant feature set. Conclusions We not only compare the classification accuracy with other methods, but also compare the stability, similarity and consistency of different algorithms. The proposed method has a satisfactory capability of feature selection for analysing expression datasets of different types using different performance measurements. Electronic supplementary material The online version of this article (doi:10.1186/s13040-017-0124-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wei Du
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Zhongbo Cao
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China.,School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun, 130012 China
| | - Tianci Song
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Ying Li
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China
| | - Yanchun Liang
- College of Computer Science and Technology, Key Laboratory of Symbol Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, 130012 China.,Zhuhai Laboratory of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai, 519041 China
| |
Collapse
|
88
|
|
89
|
Kimiskidis VK, Tsimpiris A, Ryvlin P, Kalviainen R, Koutroumanidis M, Valentin A, Laskaris N, Kugiumtzis D. TMS combined with EEG in genetic generalized epilepsy: A phase II diagnostic accuracy study. Clin Neurophysiol 2017; 128:367-381. [DOI: 10.1016/j.clinph.2016.11.013] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Revised: 09/09/2016] [Accepted: 11/12/2016] [Indexed: 02/05/2023]
|
90
|
An improved social spider optimization algorithm based on rough sets for solving minimum number attribute reduction problem. Neural Comput Appl 2017. [DOI: 10.1007/s00521-016-2804-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
91
|
A Meta-Review of Feature Selection Techniques in the Context of Microarray Data. BIOINFORMATICS AND BIOMEDICAL ENGINEERING 2017. [DOI: 10.1007/978-3-319-56148-6_3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
92
|
Ebrahimpour MK, Eftekhari M. Ensemble of feature selection methods: A hesitant fuzzy sets approach. Appl Soft Comput 2017. [DOI: 10.1016/j.asoc.2016.11.021] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
93
|
Classification of Gene Expression Data Using Multiobjective Differential Evolution. ENERGIES 2016. [DOI: 10.3390/en9121061] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
94
|
Differentially Coexpressed Disease Gene Identification Based on Gene Coexpression Network. BIOMED RESEARCH INTERNATIONAL 2016; 2016:3962761. [PMID: 28042568 PMCID: PMC5155124 DOI: 10.1155/2016/3962761] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 10/26/2016] [Indexed: 11/17/2022]
Abstract
Screening disease-related genes by analyzing gene expression data has become a popular theme. Traditional disease-related gene selection methods always focus on identifying differentially expressed gene between case samples and a control group. These traditional methods may not fully consider the changes of interactions between genes at different cell states and the dynamic processes of gene expression levels during the disease progression. However, in order to understand the mechanism of disease, it is important to explore the dynamic changes of interactions between genes in biological networks at different cell states. In this study, we designed a novel framework to identify disease-related genes and developed a differentially coexpressed disease-related gene identification method based on gene coexpression network (DCGN) to screen differentially coexpressed genes. We firstly constructed phase-specific gene coexpression network using time-series gene expression data and defined the conception of differential coexpression of genes in coexpression network. Then, we designed two metrics to measure the value of gene differential coexpression according to the change of local topological structures between different phase-specific networks. Finally, we conducted meta-analysis of gene differential coexpression based on the rank-product method. Experimental results demonstrated the feasibility and effectiveness of DCGN and the superior performance of DCGN over other popular disease-related gene selection methods through real-world gene expression data sets.
Collapse
|
95
|
Tiwari P, Prasanna P, Wolansky L, Pinho M, Cohen M, Nayate AP, Gupta A, Singh G, Hatanpaa KJ, Sloan A, Rogers L, Madabhushi A. Computer-Extracted Texture Features to Distinguish Cerebral Radionecrosis from Recurrent Brain Tumors on Multiparametric MRI: A Feasibility Study. AJNR Am J Neuroradiol 2016; 37:2231-2236. [PMID: 27633806 DOI: 10.3174/ajnr.a4931] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 07/16/2016] [Indexed: 11/07/2022]
Abstract
BACKGROUND AND PURPOSE Despite availability of advanced imaging, distinguishing radiation necrosis from recurrent brain tumors noninvasively is a big challenge in neuro-oncology. Our aim was to determine the feasibility of radiomic (computer-extracted texture) features in differentiating radiation necrosis from recurrent brain tumors on routine MR imaging (gadolinium T1WI, T2WI, FLAIR). MATERIALS AND METHODS A retrospective study of brain tumor MR imaging performed 9 months (or later) post-radiochemotherapy was performed from 2 institutions. Fifty-eight patient studies were analyzed, consisting of a training (n = 43) cohort from one institution and an independent test (n = 15) cohort from another, with surgical histologic findings confirmed by an experienced neuropathologist at the respective institutions. Brain lesions on MR imaging were manually annotated by an expert neuroradiologist. A set of radiomic features was extracted for every lesion on each MR imaging sequence: gadolinium T1WI, T2WI, and FLAIR. Feature selection was used to identify the top 5 most discriminating features for every MR imaging sequence on the training cohort. These features were then evaluated on the test cohort by a support vector machine classifier. The classification performance was compared against diagnostic reads by 2 expert neuroradiologists who had access to the same MR imaging sequences (gadolinium T1WI, T2WI, and FLAIR) as the classifier. RESULTS On the training cohort, the area under the receiver operating characteristic curve was highest for FLAIR with 0.79; 95% CI, 0.77-0.81 for primary (n = 22); and 0.79, 95% CI, 0.75-0.83 for metastatic subgroups (n = 21). Of the 15 studies in the holdout cohort, the support vector machine classifier identified 12 of 15 studies correctly, while neuroradiologist 1 diagnosed 7 of 15 and neuroradiologist 2 diagnosed 8 of 15 studies correctly, respectively. CONCLUSIONS Our preliminary results suggest that radiomic features may provide complementary diagnostic information on routine MR imaging sequences that may improve the distinction of radiation necrosis from recurrence for both primary and metastatic brain tumors.
Collapse
Affiliation(s)
- P Tiwari
- From the Department of Biomedical Engineering (P.T., P.P., G.S., A.M.), Case Western Reserve University, Cleveland, Ohio
| | - P Prasanna
- From the Department of Biomedical Engineering (P.T., P.P., G.S., A.M.), Case Western Reserve University, Cleveland, Ohio
| | - L Wolansky
- University Hospitals Case Medical Center (A.P.N., A.G., L.W., M.C., A.S., L.R.), Cleveland, Ohio
| | - M Pinho
- University of Texas Southwestern Medical Center (M.P., K.J.H.), Dallas, Texas
| | - M Cohen
- University Hospitals Case Medical Center (A.P.N., A.G., L.W., M.C., A.S., L.R.), Cleveland, Ohio
| | - A P Nayate
- University Hospitals Case Medical Center (A.P.N., A.G., L.W., M.C., A.S., L.R.), Cleveland, Ohio
| | - A Gupta
- University Hospitals Case Medical Center (A.P.N., A.G., L.W., M.C., A.S., L.R.), Cleveland, Ohio
| | - G Singh
- From the Department of Biomedical Engineering (P.T., P.P., G.S., A.M.), Case Western Reserve University, Cleveland, Ohio
| | - K J Hatanpaa
- University of Texas Southwestern Medical Center (M.P., K.J.H.), Dallas, Texas
| | - A Sloan
- University Hospitals Case Medical Center (A.P.N., A.G., L.W., M.C., A.S., L.R.), Cleveland, Ohio
| | - L Rogers
- University Hospitals Case Medical Center (A.P.N., A.G., L.W., M.C., A.S., L.R.), Cleveland, Ohio
| | - A Madabhushi
- From the Department of Biomedical Engineering (P.T., P.P., G.S., A.M.), Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
96
|
Ang JC, Mirzal A, Haron H, Hamed HNA. Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:971-989. [PMID: 26390495 DOI: 10.1109/tcbb.2015.2478454] [Citation(s) in RCA: 186] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Recently, feature selection and dimensionality reduction have become fundamental tools for many data mining tasks, especially for processing high-dimensional data such as gene expression microarray data. Gene expression microarray data comprises up to hundreds of thousands of features with relatively small sample size. Because learning algorithms usually do not work well with this kind of data, a challenge to reduce the data dimensionality arises. A huge number of gene selection are applied to select a subset of relevant features for model construction and to seek for better cancer classification performance. This paper presents the basic taxonomy of feature selection, and also reviews the state-of-the-art gene selection methods by grouping the literatures into three categories: supervised, unsupervised, and semi-supervised. The comparison of experimental results on top 5 representative gene expression datasets indicates that the classification accuracy of unsupervised and semi-supervised feature selection is competitive with supervised feature selection.
Collapse
|
97
|
Vafaee Sharbaf F, Mosafer S, Moattar MH. A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 2016; 107:231-8. [DOI: 10.1016/j.ygeno.2016.05.001] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2016] [Revised: 04/20/2016] [Accepted: 05/01/2016] [Indexed: 10/21/2022]
|
98
|
Spetale FE, Bulacio P, Guillaume S, Murillo J, Tapia E. A spectral envelope approach towards effective SVM-RFE on infrared data. Pattern Recognit Lett 2016. [DOI: 10.1016/j.patrec.2015.12.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
99
|
Mundra PA, Rajapakse JC. Gene and sample selection using T-score with sample selection. J Biomed Inform 2016; 59:31-41. [DOI: 10.1016/j.jbi.2015.11.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Revised: 10/13/2015] [Accepted: 11/04/2015] [Indexed: 10/22/2022]
|
100
|
Mollaee M, Moattar MH. A novel feature extraction approach based on ensemble feature selection and modified discriminant independent component analysis for microarray data classification. Biocybern Biomed Eng 2016. [DOI: 10.1016/j.bbe.2016.05.001] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|