1
|
Kunkyab T, Mou B, Jirasek A, Haston C, Andrews J, Thomas S, Hyde D. Radiomic analysis for early differentiation of lung cancer recurrence from fibrosis in patients treated with lung stereotactic ablative radiotherapy. Phys Med Biol 2023; 68:165015. [PMID: 37164024 DOI: 10.1088/1361-6560/acd431] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 05/10/2023] [Indexed: 05/12/2023]
Abstract
Objective. The development of radiation-induced fibrosis after stereotactic ablative radiotherapy (SABR) can obscure follow-up images and delay detection of a local recurrence in early-stage lung cancer patients. The objective of this study was to develop a radiomics model for computer-assisted detection of local recurrence and fibrosis for an earlier timepoint (<1 year) after the SABR treatment.Approach. This retrospective clinical study included CT images (n= 107) of 66 patients treated with SABR. A z-score normalization technique was used for radiomic feature standardization across scanner protocols. The training set for the radiomics model consisted of CT images (66 patients; 22 recurrences and 44 fibrosis) obtained at 24 months (median) follow-up. The test set included CT-images of 41 patients acquired at 5-12 months follow-up. Combinations of four widely used machine learning techniques (support vector machines, gradient boosting, random forests (RF), and logistic regression) and feature selection methods (Relief feature scoring, maximum relevance minimum redundancy, mutual information maximization, forward feature selection, and LASSO) were investigated. Pyradiomics was used to extract 106 radiomic features from the CT-images for feature selection and classification.Main results. An RF + LASSO model scored the highest in terms of AUC (0.87) and obtained a sensitivity of 75% and a specificity of 88% in identifying a local recurrence in the test set. In the training set, 86% accuracy was achieved using five-fold cross-validation. Delong's test indicated that AUC achieved by the RF+LASSO is significantly better than 11 other machine learning models presented here. The top three radiomic features: interquartile range (first order), Cluster Prominence (GLCM), and Autocorrelation (GLCM), were revealed as differentiating a recurrence from fibrosis with this model.Significance. The radiomics model selected, out of multiple machine learning and feature selection algorithms, was able to differentiate a recurrence from fibrosis in earlier follow-up CT-images with a high specificity rate and satisfactory sensitivity performance.
Collapse
Affiliation(s)
- Tenzin Kunkyab
- Department of Physics, University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | | | - Andrew Jirasek
- Department of Physics, University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Christina Haston
- Department of Physics, University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | - Jeff Andrews
- Department of Statistics, University of British Columbia Okanagan, Kelowna, British Columbia, Canada
| | | | - Derek Hyde
- Department of Physics, University of British Columbia Okanagan, Kelowna, British Columbia, Canada
- BC Cancer-Kelowna, Canada
| |
Collapse
|
2
|
Brancato V, Brancati N, Esposito G, La Rosa M, Cavaliere C, Allarà C, Romeo V, De Pietro G, Salvatore M, Aiello M, Sangiovanni M. A Two-Step Feature Selection Radiomic Approach to Predict Molecular Outcomes in Breast Cancer. SENSORS (BASEL, SWITZERLAND) 2023; 23:1552. [PMID: 36772592 PMCID: PMC9921618 DOI: 10.3390/s23031552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 01/13/2023] [Accepted: 01/24/2023] [Indexed: 06/18/2023]
Abstract
Breast Cancer (BC) is the most common cancer among women worldwide and is characterized by intra- and inter-tumor heterogeneity that strongly contributes towards its poor prognosis. The Estrogen Receptor (ER), Progesterone Receptor (PR), Human Epidermal Growth Factor Receptor 2 (HER2), and Ki67 antigen are the most examined markers depicting BC heterogeneity and have been shown to have a strong impact on BC prognosis. Radiomics can noninvasively predict BC heterogeneity through the quantitative evaluation of medical images, such as Magnetic Resonance Imaging (MRI), which has become increasingly important in the detection and characterization of BC. However, the lack of comprehensive BC datasets in terms of molecular outcomes and MRI modalities, and the absence of a general methodology to build and compare feature selection approaches and predictive models, limit the routine use of radiomics in the BC clinical practice. In this work, a new radiomic approach based on a two-step feature selection process was proposed to build predictors for ER, PR, HER2, and Ki67 markers. An in-house dataset was used, containing 92 multiparametric MRIs of patients with histologically proven BC and all four relevant biomarkers available. Thousands of radiomic features were extracted from post-contrast and subtracted Dynamic Contrast-Enanched (DCE) MRI images, Apparent Diffusion Coefficient (ADC) maps, and T2-weighted (T2) images. The two-step feature selection approach was used to identify significant radiomic features properly and then to build the final prediction models. They showed remarkable results in terms of F1-score for all the biomarkers: 84%, 63%, 90%, and 72% for ER, HER2, Ki67, and PR, respectively. When possible, the models were validated on the TCGA/TCIA Breast Cancer dataset, returning promising results (F1-score = 88% for the ER+/ER- classification task). The developed approach efficiently characterized BC heterogeneity according to the examined molecular biomarkers.
Collapse
Affiliation(s)
- Valentina Brancato
- IRCCS SYNLAB SDN, Istituto di Ricerca Diagnostica e Nucleare, Via E. Gianturco 113, 80143 Naples, Italy
| | - Nadia Brancati
- Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), Via P. Castellino 111, 80131 Naples, Italy
| | - Giusy Esposito
- Bio Check Up S.r.l., Via Riviera di Chiaia 9a, 80122 Naples, Italy
- Department of Advanced Biomedical Sciences, University of Naples Federico II, 80131 Naples, Italy
| | - Massimo La Rosa
- Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), Via P. Castellino 111, 80131 Naples, Italy
| | - Carlo Cavaliere
- IRCCS SYNLAB SDN, Istituto di Ricerca Diagnostica e Nucleare, Via E. Gianturco 113, 80143 Naples, Italy
| | - Ciro Allarà
- Bio Check Up S.r.l., Via Riviera di Chiaia 9a, 80122 Naples, Italy
| | - Valeria Romeo
- Department of Advanced Biomedical Sciences, University of Naples Federico II, 80131 Naples, Italy
| | - Giuseppe De Pietro
- Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), Via P. Castellino 111, 80131 Naples, Italy
| | - Marco Salvatore
- IRCCS SYNLAB SDN, Istituto di Ricerca Diagnostica e Nucleare, Via E. Gianturco 113, 80143 Naples, Italy
| | - Marco Aiello
- IRCCS SYNLAB SDN, Istituto di Ricerca Diagnostica e Nucleare, Via E. Gianturco 113, 80143 Naples, Italy
| | - Mara Sangiovanni
- Institute for High Performance Computing and Networking, National Research Council of Italy (ICAR-CNR), Via P. Castellino 111, 80131 Naples, Italy
| |
Collapse
|
3
|
Nassiri Z, Omranpour H. Learning the transfer function in binary metaheuristic algorithm for feature selection in classification problems. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07869-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/10/2022]
|
4
|
Jia C, Zhang M, Fan C, Li F, Song J. Formator: Predicting Lysine Formylation Sites Based on the Most Distant Undersampling and Safe-Level Synthetic Minority Oversampling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1937-1945. [PMID: 31804942 DOI: 10.1109/tcbb.2019.2957758] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Lysine formylation is a reversible type of protein post-translational modification and has been found to be involved in a myriad of biological processes, including modulation of chromatin conformation and gene expression in histones and other nuclear proteins. Accurate identification of lysine formylation sites is essential for elucidating the underlying molecular mechanisms of formylation. Traditional experimental methods are time-consuming and expensive. As such, it is desirable and necessary to develop computational methods for accurate prediction of formylation sites. In this study, we propose a novel predictor, termed Formator, for identifying lysine formylation sites from sequences information. Formator is developed using the ensemble learning (EL) strategy based on four individual support vector machine classifiers via a voting system. Moreover, the most distant undersampling and Safe-Level-SMOTE oversampling techniques were integrated to deal with the data imbalance problem of the training dataset. Four effective feature extraction methods, namely bi-profile Bayes (BPB), k-nearest neighbor (KNN), amino acid physicochemical properties (AAindex), and composition and transition (CTD) were employed to encode the surrounding sequence features of potential formylation sites. Extensive empirical studies show that Formator achieved the accuracy of 87.24 and 74.96 percent on jackknife test and the independent test, respectively. Performance comparison results on the independent test indicate that Formator outperforms current existing prediction tool, LFPred, suggesting that it has a great potential to serve as a useful tool in identifying novel lysine formylation sites and facilitating hypothesis-driven experimental efforts.
Collapse
|
5
|
Khan ZU, Pi D. DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder. Protein Pept Lett 2021; 28:708-721. [PMID: 33267753 DOI: 10.2174/0929866527666201202103411] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 10/14/2020] [Accepted: 10/18/2020] [Indexed: 11/22/2022]
Abstract
BACKGROUND S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine. OBJECTIVE In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites. METHODS In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via nSegmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2D-Convolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication. RESULTS Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies. CONCLUSION In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.
Collapse
Affiliation(s)
- Zaheer Ullah Khan
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Dechang Pi
- College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| |
Collapse
|
6
|
Yang YS, Qiu YJ, Zheng GH, Gong HP, Ge YQ, Zhang YF, Feng F, Wang YT. High resolution MRI-based radiomic nomogram in predicting perineural invasion in rectal cancer. Cancer Imaging 2021; 21:40. [PMID: 34039436 PMCID: PMC8157664 DOI: 10.1186/s40644-021-00408-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 05/12/2021] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND To establish and validate a high-resolution magnetic resonance imaging (HRMRI)-based radiomic nomogram for prediction of preoperative perineural invasion (PNI) of rectal cancer (RC). METHODS Our retrospective study included 140 subjects with RC (99 in the training cohort and 41 in the validation cohort) who underwent a preoperative HRMRI scan between December 2016 and December 2019. All subjects underwent radical surgery, and then PNI status was evaluated by a qualified pathologist. A total of 396 radiomic features were extracted from oblique axial T2 weighted images, and optimal features were selected to construct a radiomic signature. A combined nomogram was established by incorporating the radiomic signature, HRMRI findings, and clinical risk factors selected by using multivariable logistic regression. RESULTS The predictive nomogram of PNI included a radiomic signature, and MRI-reported tumor stage (mT-stage). Clinical risk factors failed to increase the predictive value. Favorable discrimination was achieved between PNI-positive and PNI-negative groups using the radiomic nomogram. The area under the curve (AUC) was 0.81 (95% confidence interval [CI], 0.71-0.91) in the training cohort and 0.75 (95% CI, 0.58-0.92) in the validation cohort. Moreover, our result highlighted that the radiomic nomogram was clinically beneficial, as evidenced by a decision curve analysis. CONCLUSIONS HRMRI-based radiomic nomogram could be helpful in the prediction of preoperative PNI in RC patients.
Collapse
Affiliation(s)
- Yan-Song Yang
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, 226001, Jiangsu Province, China.,Department of Nuclear Medicine, The Third Affiliated Hospital of Soochow University, No.185, Juqian Street, Changzhou, 213003, Jiangsu Province, China
| | - Yong-Juan Qiu
- Department of Nuclear Medicine, The Third Affiliated Hospital of Soochow University, No.185, Juqian Street, Changzhou, 213003, Jiangsu Province, China
| | - Gui-Hua Zheng
- Department of Pathology, Affiliated Tumor Hospital of Nantong University, Nantong, 226001, Jiangsu Province, China
| | - Hai-Peng Gong
- Department of Nuclear Medicine, The Third Affiliated Hospital of Soochow University, No.185, Juqian Street, Changzhou, 213003, Jiangsu Province, China
| | | | - Yi-Fei Zhang
- Department of Nuclear Medicine, The Third Affiliated Hospital of Soochow University, No.185, Juqian Street, Changzhou, 213003, Jiangsu Province, China
| | - Feng Feng
- Department of Radiology, Affiliated Tumor Hospital of Nantong University, Nantong, 226001, Jiangsu Province, China.
| | - Yue-Tao Wang
- Department of Nuclear Medicine, The Third Affiliated Hospital of Soochow University, No.185, Juqian Street, Changzhou, 213003, Jiangsu Province, China.
| |
Collapse
|
7
|
Yang L, Gao H, Wu K, Zhang H, Li C, Tang L. Identification of Cancerlectins By Using Cascade Linear Discriminant Analysis and Optimal g-gap Tripeptide Composition. Curr Bioinform 2020. [DOI: 10.2174/1574893614666190730103156] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Background:
Lectins are a diverse group of glycoproteins or glycoconjugate proteins
that can be extracted from plants, invertebrates and higher animals. Cancerlectins, a kind of lectins,
which play a key role in the process of tumor cells interacting with each other and are being employed
as therapeutic agents. A full understanding of cancerlectins is significant because it provides
a tool for the future direction of cancer therapy.
Objective:
To develop an accurate and practically useful timesaving tool to identify cancerlectins.
A novel sequence-based method is proposed along with a correlative webserver to access the proposed
tool.
Methods:
Firstly, protein features were extracted in a newly feature building way termed, g-gap
tripeptide composition. After which a proposed cascade linear discriminant analysis (Cascade
LDA) is used to alleviate the high dimensional difficulties with the Analysis Of Variance (ANOVA)
as a feature importance criterion. Finally, Support Vector Machine (SVM) is used as the classifier
to identify cancerlectins.
Results:
The proposed method achieved an accuracy of 91.34% with sensitivity of 89.89%, specificity
of 92.48% and an 0.8318 Mathew’s correlation coefficient based on only 13 fusion features
in jackknife cross validation, the result of which is superior to other published methods in this domain.
Conclusion:
In this study, a new method based only on primary structure of protein is proposed
and experimental results show that it could be a promising tool to identify cancerlectins. An openaccess
webserver is made available in this work to facilitate other related works.
Collapse
Affiliation(s)
- Liangwei Yang
- Center for Informational Biology, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Gao
- Center for Informational Biology, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
| | - Keyu Wu
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Haotian Zhang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Changyu Li
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Lixia Tang
- Center for Informational Biology, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
8
|
Predictions of Apoptosis Proteins by Integrating Different Features Based on Improving Pseudo-Position-Specific Scoring Matrix. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4071508. [PMID: 32420339 PMCID: PMC7201498 DOI: 10.1155/2020/4071508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 12/19/2019] [Indexed: 11/25/2022]
Abstract
Apoptosis proteins are strongly related to many diseases and play an indispensable role in maintaining the dynamic balance between cell death and division in vivo. Obtaining localization information on apoptosis proteins is necessary in understanding their function. To date, few researchers have focused on the problem of apoptosis data imbalance before classification, while this data imbalance is prone to misclassification. Therefore, in this work, we introduce a method to resolve this problem and to enhance prediction accuracy. Firstly, the features of the protein sequence are captured by combining Improving Pseudo-Position-Specific Scoring Matrix (IM-Psepssm) with the Bidirectional Correlation Coefficient (Bid-CC) algorithm from position-specific scoring matrix. Secondly, different features of fusion and resampling strategies are used to reduce the impact of imbalance on apoptosis protein datasets. Finally, the eigenvector adopts the Support Vector Machine (SVM) to the training classification model, and the prediction accuracy is evaluated by jackknife cross-validation tests. The experimental results indicate that, under the same feature vector, adopting resampling methods remarkably boosts many significant indicators in the unsampling method for predicting the localization of apoptosis proteins in the ZD98, ZW225, and CL317 databases. Additionally, we also present new user-friendly local software for readers to apply; the codes and software can be freely accessed at https://github.com/ruanxiaoli/Im-Psepssm.
Collapse
|
9
|
Qian L, Wen Y, Han G. Identification of Cancerlectins Using Support Vector Machines With Fusion of G-Gap Dipeptide. Front Genet 2020; 11:275. [PMID: 32318092 PMCID: PMC7147460 DOI: 10.3389/fgene.2020.00275] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Accepted: 03/06/2020] [Indexed: 12/13/2022] Open
Abstract
The cancerlectin plays an important role in the initiation, survival, growth, metastasis, and spread of cancer. Therefore, to study the function of cancerlectin is greatly significant because it can help to identify tumor markers and tumor prevention, treatment, and prognosis. However, plenty of studies have generated a large amount of protein data. Traditional prediction methods have been unable to meet the needs of analysis. Developing powerful computational models based on these data to discriminate cancerlectins and non-cancerlectins on a large scale has been treated as one of the most important topics. In this study, we developed a feature extraction method to identify cancerlectins based on fusion of g-gap dipeptides. The analysis of variance was used to select the optimal feature set and a support vector machine was used to classify the data. The rigorous nested 10-fold cross-validation results, demonstrated that our method obtained the prediction accuracy of 83.91% and sensitivity of 83.15%. At the same time, in order to evaluate the performance of the classification model constructed in this work, we constructed a new data set. The prediction accuracy of the new data set reaches 83.3%. Experimental results show that the performance of our method is better than the state-of-the-art methods.
Collapse
Affiliation(s)
- Lili Qian
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Yaping Wen
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Guosheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| |
Collapse
|
10
|
Masoudi-Sobhanzadeh Y, Motieghader H, Masoudi-Nejad A. FeatureSelect: a software for feature selection based on machine learning approaches. BMC Bioinformatics 2019; 20:170. [PMID: 30943889 PMCID: PMC6446290 DOI: 10.1186/s12859-019-2754-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 03/19/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Feature selection, as a preprocessing stage, is a challenging problem in various sciences such as biology, engineering, computer science, and other fields. For this purpose, some studies have introduced tools and softwares such as WEKA. Meanwhile, these tools or softwares are based on filter methods which have lower performance relative to wrapper methods. In this paper, we address this limitation and introduce a software application called FeatureSelect. In addition to filter methods, FeatureSelect consists of optimisation algorithms and three types of learners. It provides a user-friendly and straightforward method of feature selection for use in any kind of research, and can easily be applied to any type of balanced and unbalanced data based on several score functions like accuracy, sensitivity, specificity, etc. RESULTS: In addition to our previously introduced optimisation algorithm (WCC), a total of 10 efficient, well-known and recently developed algorithms have been implemented in FeatureSelect. We applied our software to a range of different datasets and evaluated the performance of its algorithms. Acquired results show that the performances of algorithms are varying on different datasets, but WCC, LCA, FOA, and LA are suitable than others in the overall state. The results also show that wrapper methods are better than filter methods. CONCLUSIONS FeatureSelect is a feature or gene selection software application which is based on wrapper methods. Furthermore, it includes some popular filter methods and generates various comparison diagrams and statistical measurements. It is available from GitHub ( https://github.com/LBBSoft/FeatureSelect ) and is free open source software under an MIT license.
Collapse
Affiliation(s)
- Yosef Masoudi-Sobhanzadeh
- Laboratory of system Biology and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Habib Motieghader
- Laboratory of system Biology and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of system Biology and Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|