1
|
Matrouk AY, Mohammad H, Daoud S, Taha MO. Discovery of New HER2 Inhibitors via Computational Docking, Pharmacophore Modeling, and Machine Learning. Mol Inform 2025; 44:e202400336. [PMID: 39976334 DOI: 10.1002/minf.202400336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Revised: 01/28/2025] [Accepted: 02/06/2025] [Indexed: 02/21/2025]
Abstract
The human epidermal growth factor receptor 2 (HER2) is a critical oncogene implicated in the development of various aggressive cancers, particularly breast cancer. Discovering novel HER2 inhibitors is crucial for expanding therapeutic options for HER2-related malignancies. In this study, we present a computational workflow that focuses on generating pharmacophores derived from docked poses of a selected list of 15 diverse, potent HER2 inhibitors, utilizing flexible docking. The resulting pharmacophores, along with other physicochemical molecular descriptors, were then evaluated in a machine learning-quantitative structure-activity relationship (ML-QSAR) analysis against 1,272 HER2 inhibitors. Several machine learning methods were assessed, and a genetic function algorithm (GFA) was employed for feature selection. Ultimately, GFA combined with Bagging and J48Graft classifiers produced the best self-consistent and predictive models. These models highlighted the significance of two pharmacophores, Hypo_1 and Hypo_2, in distinguishing potent from less active inhibitors. The successful ML-QSAR models and their associated pharmacophores were used to screen the National Cancer Institute (NCI) database for novel HER2 inhibitors. Three promising anti-HER2 leads were identified, with the top-performing lead demonstrating an experimental anti-HER2 IC50 value of 3.85 μM. Notably, the three inhibitors exhibited distinct chemical scaffolds compared to existing HER2 inhibitors, as indicated by principal component analysis.
Collapse
Affiliation(s)
- Aseel Yasin Matrouk
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, 11942, Jordan
| | - Haneen Mohammad
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, 11942, Jordan
| | - Safa Daoud
- Department of Pharmaceutical Chemistry and Pharmacognosy, Faculty of Pharmacy, Applied Sciences Private University, Amman, Jordan
| | - Mutasem Omar Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman, 11942, Jordan
| |
Collapse
|
2
|
Trabelsi A, Elouedi Z, Lefevre E. An ensemble classifier through rough set reducts for handling data with evidential attributes. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
3
|
Lu X, Li J, Zhu Z, Yuan Y, Chen G, He K. Predicting miRNA-Disease Associations via Combining Probability Matrix Feature Decomposition With Neighbor Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3160-3170. [PMID: 34260356 DOI: 10.1109/tcbb.2021.3097037] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Predicting the associations of miRNAs and diseases may uncover the causation of various diseases. Many methods are emerging to tackle the sparse and unbalanced disease related miRNA prediction. Here, we propose a Probabilistic matrix decomposition combined with neighbor learning to identify MiRNA-Disease Associations utilizing heterogeneous data(PMDA). First, we build similarity networks for diseases and miRNAs, respectively, by integrating semantic information and functional interactions. Second, we construct a neighbor learning model in which the neighbor information of individual miRNA or disease is utilized to enhance the association relationship to tackle the spare problem. Third, we predict the potential association between miRNAs and diseases via probability matrix decomposition. The experimental results show that PMDA is superior to other five methods in sparse and unbalanced data. The case study shows that the new miRNA-disease interactions predicted by the PMDA are effective and the performance of the PMDA is superior to other methods.
Collapse
|
4
|
Vahmiyan M, Kheirabadi M, Akbari E. Feature selection methods in microarray gene expression data: a systematic mapping study. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07661-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/07/2022]
|
5
|
Exploiting activity cliffs for building pharmacophore models and comparison with other pharmacophore generation methods: sphingosine kinase 1 as case study. J Comput Aided Mol Des 2022; 36:39-62. [PMID: 35059939 DOI: 10.1007/s10822-021-00435-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 11/24/2021] [Indexed: 12/20/2022]
|
6
|
Hatmal MM, Alshaer W, Mahmoud IS, Al-Hatamleh MAI, Al-Ameer HJ, Abuyaman O, Zihlif M, Mohamud R, Darras M, Al Shhab M, Abu-Raideh R, Ismail H, Al-Hamadi A, Abdelhay A. Investigating the association of CD36 gene polymorphisms (rs1761667 and rs1527483) with T2DM and dyslipidemia: Statistical analysis, machine learning based prediction, and meta-analysis. PLoS One 2021; 16:e0257857. [PMID: 34648514 PMCID: PMC8516279 DOI: 10.1371/journal.pone.0257857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/11/2021] [Indexed: 12/15/2022] Open
Abstract
CD36 (cluster of differentiation 36) is a membrane protein involved in lipid metabolism and has been linked to pathological conditions associated with metabolic disorders, such as diabetes and dyslipidemia. A case-control study was conducted and included 177 patients with type-2 diabetes mellitus (T2DM) and 173 control subjects to study the involvement of CD36 gene rs1761667 (G>A) and rs1527483 (C>T) polymorphisms in the pathogenesis of T2DM and dyslipidemia among Jordanian population. Lipid profile, blood sugar, gender and age were measured and recorded. Also, genotyping analysis for both polymorphisms was performed. Following statistical analysis, 10 different neural networks and machine learning (ML) tools were used to predict subjects with diabetes or dyslipidemia. Towards further understanding of the role of CD36 protein and gene in T2DM and dyslipidemia, a protein-protein interaction network and meta-analysis were carried out. For both polymorphisms, the genotypic frequencies were not significantly different between the two groups (p > 0.05). On the other hand, some ML tools like multilayer perceptron gave high prediction accuracy (≥ 0.75) and Cohen's kappa (κ) (≥ 0.5). Interestingly, in K-star tool, the accuracy and Cohen's κ values were enhanced by including the genotyping results as inputs (0.73 and 0.46, respectively, compared to 0.67 and 0.34 without including them). This study confirmed, for the first time, that there is no association between CD36 polymorphisms and T2DM or dyslipidemia among Jordanian population. Prediction of T2DM and dyslipidemia, using these extensive ML tools and based on such input data, is a promising approach for developing diagnostic and prognostic prediction models for a wide spectrum of diseases, especially based on large medical databases.
Collapse
Affiliation(s)
- Ma’mon M. Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
- * E-mail:
| | - Walhan Alshaer
- Cell Therapy Centre, The University of Jordan, Amman, Jordan
| | - Ismail S. Mahmoud
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Mohammad A. I. Al-Hatamleh
- Department of Immunology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia
| | - Hamzeh J. Al-Ameer
- Department of Biology and Biotechnology, American University of Madaba, Madaba, Jordan
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Omar Abuyaman
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Malek Zihlif
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Rohimah Mohamud
- Department of Immunology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia
| | - Mais Darras
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Mohammad Al Shhab
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Rand Abu-Raideh
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Hilweh Ismail
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Ali Al-Hamadi
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Ali Abdelhay
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| |
Collapse
|
7
|
Hatmal MM, Abuyaman O, Taha M. Docking-generated multiple ligand poses for bootstrapping bioactivity classifying Machine Learning: Repurposing covalent inhibitors for COVID-19-related TMPRSS2 as case study. Comput Struct Biotechnol J 2021; 19:4790-4824. [PMID: 34426763 PMCID: PMC8373588 DOI: 10.1016/j.csbj.2021.08.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 08/03/2021] [Accepted: 08/16/2021] [Indexed: 01/10/2023] Open
Abstract
In the present work we introduce the use of multiple docked poses for bootstrapping machine learning-based QSAR modelling. Ligand-receptor contact fingerprints are implemented as descriptor variables. We implemented this method for the discovery of potential inhibitors of the serine protease enzyme TMPRSS2 involved the infectivity of coronaviruses. Several machine learners were scanned, however, Xgboost, support vector machines (SVM) and random forests (RF) were the best with testing set accuracies reaching 90%. Three potential hits were identified upon using the method to scan known untested FDA approved drugs against TMPRSS2. Subsequent molecular dynamics simulation and covalent docking supported the results of the new computational approach.
Collapse
Affiliation(s)
- Ma'mon M. Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan
| | - Omar Abuyaman
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan
| | - Mutasem Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman 11942, Jordan
| |
Collapse
|
8
|
Qu C, Zhang L, Li J, Deng F, Tang Y, Zeng X, Peng X. Improving feature selection performance for classification of gene expression data using Harris Hawks optimizer with variable neighborhood learning. Brief Bioinform 2021; 22:6238587. [PMID: 33876181 DOI: 10.1093/bib/bbab097] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Revised: 02/28/2021] [Accepted: 03/03/2021] [Indexed: 11/14/2022] Open
Abstract
Gene expression profiling has played a significant role in the identification and classification of tumor molecules. In gene expression data, only a few feature genes are closely related to tumors. It is a challenging task to select highly discriminative feature genes, and existing methods fail to deal with this problem efficiently. This article proposes a novel metaheuristic approach for gene feature extraction, called variable neighborhood learning Harris Hawks optimizer (VNLHHO). First, the F-score is used for a primary selection of the genes in gene expression data to narrow down the selection range of the feature genes. Subsequently, a variable neighborhood learning strategy is constructed to balance the global exploration and local exploitation of the Harris Hawks optimization. Finally, mutation operations are employed to increase the diversity of the population, so as to prevent the algorithm from falling into a local optimum. In addition, a novel activation function is used to convert the continuous solution of the VNLHHO into binary values, and a naive Bayesian classifier is utilized as a fitness function to select feature genes that can help classify biological tissues of binary and multi-class cancers. An experiment is conducted on gene expression profile data of eight types of tumors. The results show that the classification accuracy of the VNLHHO is greater than 96.128% for tumors in the colon, nervous system and lungs and 100% for the rest. We compare seven other algorithms and demonstrate the superiority of the VNLHHO in terms of the classification accuracy, fitness value and AUC value in feature selection for gene expression data.
Collapse
Affiliation(s)
- Chiwen Qu
- College of Mathematics and Statistics, Hunan Normal University, China
| | - Lupeng Zhang
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Jinlong Li
- Department of Pathology and Pathophysiology, Jishou University School of Medicine, Jishou University, China
| | - Fang Deng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Yifan Tang
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| | - Xiaomin Zeng
- Department of Epidemiology and Health Statistics, Xiangya Public Health School, Central South University, China
| | - Xiaoning Peng
- Department of Pathology and Pathophysiology, Hunan Normal University School of Medicine, Hunan Normal University, China
| |
Collapse
|
9
|
Jaya Ant lion optimization-driven Deep recurrent neural network for cancer classification using gene expression data. Med Biol Eng Comput 2021; 59:1005-1021. [PMID: 33851321 DOI: 10.1007/s11517-021-02350-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 03/17/2021] [Indexed: 10/21/2022]
Abstract
Cancer is one of the deadly diseases prevailing worldwide and the patients with cancer are rescued only when the cancer is detected at the very early stage. Early detection of cancer is essential as, in the final stage, the chance of survival is limited. The symptoms of cancers are rigorous and therefore, all the symptoms should be studied properly before the diagnosis. Thus, an automatic prediction system is necessary for classifying cancer as malignant or benign. Hence, this paper introduces the novel strategy based on the JayaAnt lion optimization-based Deep recurrent neural network (JayaALO-based DeepRNN) for cancer classification. The steps followed in the developed model are data normalization, data transformation, feature dimension detection, and classification. The first step is data normalization. The goal of data normalization is to eliminate data redundancy and to mitigate the storage of objects in a relational database that maintains the same information in several places. After that, the data transformation is carried out based on log transformation that generates the patterns using more interpretable and helps fulfill the supposition, and to reduce skew. Also, the non-negative matrix factorization is employed for reducing the feature dimension. Finally, the proposed JayaALO-based DeepRNN method effectively classifies cancer based on the reduced dimension features to produce a satisfactory result. Thus, the resulted output of the proposed JayaALO-based DeepRNN is employed for cancer classification. The proposed JayaALO-based DeepRNN showed improved results with maximal accuracy of 95.97%, maximal sensitivity of 95.95%, and maximal specificity of 96.96%. The goal of this research is to devise the cancer classification strategy using the proposed JayaALO-based DeepRNN. It is required to detect the cancer at an early stage to prevent the destruction caused to the other organs. The developed model involves four phases to perform the cancer classification, namely data normalization, data transformation, feature dimension detection, and the classification. Initially, the input images are gathered and are adapted to perform data normalization. The normalized data is fed to the data transformation, which will be performed using log transformation. The obtained transformed data is fed to feature dimension reduction which is performed using non-negative matrix factorization. The reduced features will be employed in DeepRNN for cancer classification. The training of DeepRNN is done using the proposed JayaALO, which is designed by combining ALO and the Jaya algorithm the block diagram of the proposed cancer classification approach using JayaALO-based DeepRNN approach is given below.
Collapse
|
10
|
Alizadehsani R, Roshanzamir M, Hussain S, Khosravi A, Koohestani A, Zangooei MH, Abdar M, Beykikhoshk A, Shoeibi A, Zare A, Panahiazar M, Nahavandi S, Srinivasan D, Atiya AF, Acharya UR. Handling of uncertainty in medical data using machine learning and probability theory techniques: a review of 30 years (1991-2020). ANNALS OF OPERATIONS RESEARCH 2021; 339:1-42. [PMID: 33776178 PMCID: PMC7982279 DOI: 10.1007/s10479-021-04006-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/23/2021] [Indexed: 05/17/2023]
Abstract
Understanding the data and reaching accurate conclusions are of paramount importance in the present era of big data. Machine learning and probability theory methods have been widely used for this purpose in various fields. One critically important yet less explored aspect is capturing and analyzing uncertainties in the data and model. Proper quantification of uncertainty helps to provide valuable information to obtain accurate diagnosis. This paper reviewed related studies conducted in the last 30 years (from 1991 to 2020) in handling uncertainties in medical data using probability theory and machine learning techniques. Medical data is more prone to uncertainty due to the presence of noise in the data. So, it is very important to have clean medical data without any noise to get accurate diagnosis. The sources of noise in the medical data need to be known to address this issue. Based on the medical data obtained by the physician, diagnosis of disease, and treatment plan are prescribed. Hence, the uncertainty is growing in healthcare and there is limited knowledge to address these problems. Our findings indicate that there are few challenges to be addressed in handling the uncertainty in medical raw data and new models. In this work, we have summarized various methods employed to overcome this problem. Nowadays, various novel deep learning techniques have been proposed to deal with such uncertainties and improve the performance in decision making.
Collapse
Affiliation(s)
- Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovations (IISRI), Deakin University, Geelong, Australia
| | - Mohamad Roshanzamir
- Department of Computer Engineering, Faculty of Engineering, Fasa University, 74617-81189 Fasa, Iran
| | - Sadiq Hussain
- System Administrator, Dibrugarh University, Dibrugarh, Assam 786004 India
| | - Abbas Khosravi
- Institute for Intelligent Systems Research and Innovations (IISRI), Deakin University, Geelong, Australia
| | - Afsaneh Koohestani
- Institute for Intelligent Systems Research and Innovations (IISRI), Deakin University, Geelong, Australia
| | | | - Moloud Abdar
- Institute for Intelligent Systems Research and Innovations (IISRI), Deakin University, Geelong, Australia
| | - Adham Beykikhoshk
- Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia
| | - Afshin Shoeibi
- Computer Engineering Department, Ferdowsi University of Mashhad, Mashhad, Iran
- Faculty of Electrical and Computer Engineering, Biomedical Data Acquisition Lab, K. N. Toosi University of Technology, Tehran, Iran
| | - Assef Zare
- Faculty of Electrical Engineering, Gonabad Branch, Islamic Azad University, Gonabad, Iran
| | - Maryam Panahiazar
- Institute for Computational Health Sciences, University of California, San Francisco, USA
| | - Saeid Nahavandi
- Institute for Intelligent Systems Research and Innovations (IISRI), Deakin University, Geelong, Australia
| | - Dipti Srinivasan
- Dept. of Electrical and Computer Engineering, National University of Singapore, Singapore, 117576 Singapore
| | - Amir F. Atiya
- Department of Computer Engineering, Faculty of Engineering, Cairo University, Cairo, 12613 Egypt
| | - U. Rajendra Acharya
- Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Singapore, Singapore
- Department of Biomedical Engineering, School of Science and Technology, Singapore University of Social Sciences, Singapore, Singapore
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung, Taiwan
| |
Collapse
|
11
|
Li Z, Fan J, Ren Y, Tang L. A novel feature extraction approach based on neighborhood rough set and PCA for migraine rs-fMRI. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-179661] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Zhanhui Li
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
| | - Jiancong Fan
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
- Provincial Key Lab. for Information Technology of Wisdom Mining of Shandong Province, Shandong University of Science and Technology, Qingdao, China
| | - Yande Ren
- The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Leiyu Tang
- College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
| |
Collapse
|
12
|
Wu H, Yang R, Fu Q, Chen J, Lu W, Li H. Research on predicting 2D-HP protein folding using reinforcement learning with full state space. BMC Bioinformatics 2019; 20:685. [PMID: 31874607 PMCID: PMC6929271 DOI: 10.1186/s12859-019-3259-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein structure prediction has always been an important issue in bioinformatics. Prediction of the two-dimensional structure of proteins based on the hydrophobic polarity model is a typical non-deterministic polynomial hard problem. Currently reported hydrophobic polarity model optimization methods, greedy method, brute-force method, and genetic algorithm usually cannot converge robustly to the lowest energy conformations. Reinforcement learning with the advantages of continuous Markov optimal decision-making and maximizing global cumulative return is especially suitable for solving global optimization problems of biological sequences. RESULTS In this study, we proposed a novel hydrophobic polarity model optimization method derived from reinforcement learning which structured the full state space, and designed an energy-based reward function and a rigid overlap detection rule. To validate the performance, sixteen sequences were selected from the classical data set. The results indicated that reinforcement learning with full states successfully converged to the lowest energy conformations against all sequences, while the reinforcement learning with partial states folded 50% sequences to the lowest energy conformations. Reinforcement learning with full states hits the lowest energy on an average 5 times, which is 40 and 100% higher than the three and zero hit by the greedy algorithm and reinforcement learning with partial states respectively in the last 100 episodes. CONCLUSIONS Our results indicate that reinforcement learning with full states is a powerful method for predicting two-dimensional hydrophobic-polarity protein structure. It has obvious competitive advantages compared with greedy algorithm and reinforcement learning with partial states.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Ru Yang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Qiming Fu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China. .,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Jianping Chen
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.,Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Weizhong Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| | - Haiou Li
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China
| |
Collapse
|
13
|
Abstract
Background Cost-sensitive algorithm is an effective strategy to solve imbalanced classification problem. However, the misclassification costs are usually determined empirically based on user expertise, which leads to unstable performance of cost-sensitive classification. Therefore, an efficient and accurate method is needed to calculate the optimal cost weights. Results In this paper, two approaches are proposed to search for the optimal cost weights, targeting at the highest weighted classification accuracy (WCA). One is the optimal cost weights grid searching and the other is the function fitting. Comparisons are made between these between the two algorithms above. In experiments, we classify imbalanced gene expression data using extreme learning machine to test the cost weights obtained by the two approaches. Conclusions Comprehensive experimental results show that the function fitting method is generally more efficient, which can well find the optimal cost weights with acceptable WCA.
Collapse
|
14
|
Chen M, Zhang Y, Li Z, Li A, Liu W, Liu L, Chen Z. A Novel Gene Selection Algorithm based on Sparse Representation and Minimum-redundancy Maximum-relevancy of Maximum Compatibility Center. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666190123144020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Tumor classification is important for accurate diagnosis and personalized
treatment and has recently received great attention. Analysis of gene expression profile has shown relevant
biological significance and thus has become a research hotspot and a new challenge for bio-data
mining. In the research methods, some algorithms can identify few genes but with great time
complexity, some algorithms can get small time complex methods but with unsatisfactory classification
accuracy, this article proposed a new extraction method for gene expression profile.
Methods:
In this paper, we propose a classification method for tumor subtypes based on the Minimum-
Redundancy Maximum-Relevancy (MRMR) of maximum compatibility center. First, we performed a
fuzzy clustering of gene expression profiles based on the compatibility relation. Next, we used the
sparse representation coefficient to assess the importance of the gene for the category, extracted the
top-ranked genes, and removed the uncorrelated genes. Finally, the MRMR search strategy was used to
select the characteristic gene, reject the redundant gene, and obtain the final subset of characteristic
genes.
Results:
Our method and four others were tested on four different datasets to verify its effectiveness.
Results show that the classification accuracy and standard deviation of our method are better than
those of other methods.
Conclusion:
Our proposed method is robust, adaptable, and superior in classification. This method can
help us discover the susceptibility genes associated with complex diseases and understand the interaction
between these genes. Our technique provides a new way of thinking and is important to understand
the pathogenesis of complex diseases and prevent diseases, diagnosis and treatment.
Collapse
Affiliation(s)
- Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, 421002 Hengyang, China
| | - Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, 541004 Guilin, China
| | - Zejun Li
- School of Computer Science and Technology, Hunan Institute of Technology, 421002 Hengyang, China
| | - Ang Li
- School of Computer Science and Technology, Hunan Institute of Technology, 421002 Hengyang, China
| | - Wenhua Liu
- School of Computer Science and Technology, Hunan Institute of Technology, 421002 Hengyang, China
| | - Liubin Liu
- Cloud Collaboration Technology Group, Cisco System Inc., 95035 Milpitas, CA, United States
| | - Zheng Chen
- School of Computer Science and Technology, Hunan Institute of Technology, 421002 Hengyang, China
| |
Collapse
|
15
|
Ye M, Wang W, Yao C, Fan R, Wang P. Gene Selection Method for Microarray Data Classification Using Particle Swarm Optimization and Neighborhood Rough Set. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190204150918] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Mining knowledge from microarray data is one of the popular research
topics in biomedical informatics. Gene selection is a significant research trend in biomedical data
mining, since the accuracy of tumor identification heavily relies on the genes biologically relevant
to the identified problems.
Objective:
In order to select a small subset of informative genes from numerous genes for tumor
identification, various computational intelligence methods were presented. However, due to the
high data dimensions, small sample size, and the inherent noise available, many computational
methods confront challenges in selecting small gene subset.
Methods:
In our study, we propose a novel algorithm PSONRS_KNN for gene selection based on
the particle swarm optimization (PSO) algorithm along with the neighborhood rough set (NRS) reduction
model and the K-nearest neighborhood (KNN) classifier.
Results:
First, the top-ranked candidate genes are obtained by the GainRatioAttributeEval preselection
algorithm in WEKA. Then, the minimum possible meaningful set of genes is selected by
combining PSO with NRS and KNN classifier.
Conclusion:
Experimental results on five microarray gene expression datasets demonstrate that the
performance of the proposed method is better than existing state-of-the-art methods in terms of
classification accuracy and the number of selected genes.
Collapse
Affiliation(s)
- Mingquan Ye
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Weiwei Wang
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Chuanwen Yao
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Rong Fan
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| | - Peipei Wang
- School of Medical Information, Wannan Medical College, Wuhu 241002, China
| |
Collapse
|
16
|
Akbar S, Hayat M, Kabir M, Iqbal M. iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180816101653] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions of living organisms and protect their cell and body from freezing in extremely cold conditions. Owing to high diversity in protein sequences and structures, the discrimination of AFPs from non- AFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately. In a sequel, a new predictor called “iAFP-gap-SMOTE” is proposed for the identification of AFPs. Protein sequences are expressed by adopting three numerical feature extraction schemes namely; Split Amino Acid Composition, G-gap di-peptide Composition and Reduce Amino Acid alphabet composition. Usually, classification hypothesis biased towards majority class in case of the imbalanced dataset. Oversampling technique Synthetic Minority Over-sampling Technique is employed in order to increase the instances of the lower class and control the biasness. 10-fold cross-validation test is applied to appraise the success rates of “iAFP-gap-SMOTE” model. After the empirical investigation, “iAFP-gap-SMOTE” model obtained 95.02% accuracy. The comparison suggested that the accuracy of” iAFP-gap-SMOTE” model is higher than that of the present techniques in the literature so far. It is greatly recommended that our proposed model “iAFP-gap-SMOTE” might be helpful for the research community and academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| |
Collapse
|
17
|
Chen Y, Qin N, Li W, Xu F. Granule structures, distances and measures in neighborhood systems. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.11.032] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
18
|
Iraji MS. Combining predictors for multi-layer architecture of adaptive fuzzy inference system. COGN SYST RES 2019. [DOI: 10.1016/j.cogsys.2018.05.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
|
19
|
A Novel Classification and Identification Scheme of Emitter Signals Based on Ward's Clustering and Probabilistic Neural Networks with Correlation Analysis. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2018; 2018:1458962. [PMID: 30532768 PMCID: PMC6247724 DOI: 10.1155/2018/1458962] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2018] [Revised: 08/22/2018] [Accepted: 09/13/2018] [Indexed: 11/18/2022]
Abstract
The rapid development of modern communication technology makes the identification of emitter signals more complicated. Based on Ward's clustering and probabilistic neural networks method with correlation analysis, an ensemble identification algorithm for mixed emitter signals is proposed in this paper. The algorithm mainly consists of two parts, one is the classification of signals and the other is the identification of signals. First, self-adaptive filtering and Fourier transform are used to obtain the frequency spectrum of the signals. Then, the Ward clustering method and some clustering validity indexes are used to determine the range of the optimal number of clusters. In order to narrow this scope and find the optimal number of classifications, a sufficient number of samples are selected in the vicinity of each class center to train probabilistic neural networks, which correspond to different number of classifications. Then, the classifier of the optimal probabilistic neural network is obtained by calculating the maximum value of classification validity index. Finally, the identification accuracy of the classifier is improved effectively by using the method of Bivariable correlation analysis. Simulation results also illustrate that the proposed algorithms can accurately identify the pulse emitter signals.
Collapse
|
20
|
Alsalem MA, Zaidan AA, Zaidan BB, Hashim M, Albahri OS, Albahri AS, Hadi A, Mohammed KI. Systematic Review of an Automated Multiclass Detection and Classification System for Acute Leukaemia in Terms of Evaluation and Benchmarking, Open Challenges, Issues and Methodological Aspects. J Med Syst 2018; 42:204. [PMID: 30232632 DOI: 10.1007/s10916-018-1064-9] [Citation(s) in RCA: 56] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Accepted: 09/06/2018] [Indexed: 10/28/2022]
Abstract
This study aims to systematically review prior research on the evaluation and benchmarking of automated acute leukaemia classification tasks. The review depends on three reliable search engines: ScienceDirect, Web of Science and IEEE Xplore. A research taxonomy developed for the review considers a wide perspective for automated detection and classification of acute leukaemia research and reflects the usage trends in the evaluation criteria in this field. The developed taxonomy consists of three main research directions in this domain. The taxonomy involves two phases. The first phase includes all three research directions. The second one demonstrates all the criteria used for evaluating acute leukaemia classification. The final set of studies includes 83 investigations, most of which focused on enhancing the accuracy and performance of detection and classification through proposed methods or systems. Few efforts were made to undertake the evaluation issues. According to the final set of articles, three groups of articles represented the main research directions in this domain: 56 articles highlighted the proposed methods, 22 articles involved proposals for system development and 5 papers centred on evaluation and comparison. The other taxonomy side included 16 main and sub-evaluation and benchmarking criteria. This review highlights three serious issues in the evaluation and benchmarking of multiclass classification of acute leukaemia, namely, conflicting criteria, evaluation criteria and criteria importance. It also determines the weakness of benchmarking tools. To solve these issues, multicriteria decision-making (MCDM) analysis techniques were proposed as effective recommended solutions in the methodological aspect. This methodological aspect involves a proposed decision support system based on MCDM for evaluation and benchmarking to select suitable multiclass classification models for acute leukaemia. The said support system is examined and has three sequential phases. Phase One presents the identification procedure and process for establishing a decision matrix based on a crossover of evaluation criteria and acute leukaemia multiclass classification models. Phase Two describes the decision matrix development for the selection of acute leukaemia classification models based on the integrated Best and worst method (BWM) and VIKOR. Phase Three entails the validation of the proposed system.
Collapse
Affiliation(s)
- M A Alsalem
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - A A Zaidan
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia.
| | - B B Zaidan
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - M Hashim
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - O S Albahri
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - A S Albahri
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - Ali Hadi
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| | - K I Mohammed
- Department of Computing, Universiti Pendidikan Sultan Idris, Tanjong Malim, Perak, Malaysia
| |
Collapse
|
21
|
A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. ADV DATA ANAL CLASSI 2018. [DOI: 10.1007/s11634-018-0334-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
22
|
Fan X, Zhao W, Wang C, Huang Y. Attribute reduction based on max-decision neighborhood rough set model. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.03.015] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
23
|
Li W, Liao B, Zhu W, Chen M, Li Z, Wei X, Peng L, Huang G, Cai L, Chen H. Fisher Discrimination Regularized Robust Coding Based on a Local Center for Tumor Classification. Sci Rep 2018; 8:9152. [PMID: 29904059 PMCID: PMC6002553 DOI: 10.1038/s41598-018-27364-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2018] [Accepted: 05/31/2018] [Indexed: 11/29/2022] Open
Abstract
Tumor classification is crucial to the clinical diagnosis and proper treatment of cancers. In recent years, sparse representation-based classifier (SRC) has been proposed for tumor classification. The employed dictionary plays an important role in sparse representation-based or sparse coding-based classification. However, sparse representation-based tumor classification models have not used the employed dictionary, thereby limiting their performance. Furthermore, this sparse representation model assumes that the coding residual follows a Gaussian or Laplacian distribution, which may not effectively describe the coding residual in practical tumor classification. In the present study, we formulated a novel effective cancer classification technique, namely, Fisher discrimination regularized robust coding (FDRRC), by combining the Fisher discrimination dictionary learning method with the regularized robust coding (RRC) model, which searches for a maximum a posteriori solution to coding problems by assuming that the coding residual and representation coefficient are independent and identically distributed. The proposed FDRRC model is extensively evaluated on various tumor datasets and shows superior performance compared with various state-of-the-art tumor classification methods in a variety of classification tasks.
Collapse
Affiliation(s)
- Weibiao Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China.
| | - Wen Zhu
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Min Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Zejun Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Xiaohui Wei
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Lihong Peng
- Hunan University of Technology, Zhu Zhou, Hunan, 412007, China
| | - Guohua Huang
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - Lijun Cai
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| | - HaoWen Chen
- College of Information Science and Engineering, Hunan University, Changsha, Hunan, 410082, China
| |
Collapse
|
24
|
Algamal ZY, Alhamzawi R, Mohammad Ali HT. Gene selection for microarray gene expression classification using Bayesian Lasso quantile regression. Comput Biol Med 2018; 97:145-152. [DOI: 10.1016/j.compbiomed.2018.04.018] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Revised: 04/22/2018] [Accepted: 04/22/2018] [Indexed: 01/01/2023]
|
25
|
Alsalem MA, Zaidan AA, Zaidan BB, Hashim M, Madhloom HT, Azeez ND, Alsyisuf S. A review of the automated detection and classification of acute leukaemia: Coherent taxonomy, datasets, validation and performance measurements, motivation, open challenges and recommendations. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 158:93-112. [PMID: 29544792 DOI: 10.1016/j.cmpb.2018.02.005] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Revised: 01/19/2018] [Accepted: 02/02/2018] [Indexed: 06/08/2023]
Abstract
CONTEXT Acute leukaemia diagnosis is a field requiring automated solutions, tools and methods and the ability to facilitate early detection and even prediction. Many studies have focused on the automatic detection and classification of acute leukaemia and their subtypes to promote enable highly accurate diagnosis. OBJECTIVE This study aimed to review and analyse literature related to the detection and classification of acute leukaemia. The factors that were considered to improve understanding on the field's various contextual aspects in published studies and characteristics were motivation, open challenges that confronted researchers and recommendations presented to researchers to enhance this vital research area. METHODS We systematically searched all articles about the classification and detection of acute leukaemia, as well as their evaluation and benchmarking, in three main databases: ScienceDirect, Web of Science and IEEE Xplore from 2007 to 2017. These indices were considered to be sufficiently extensive to encompass our field of literature. RESULTS Based on our inclusion and exclusion criteria, 89 articles were selected. Most studies (58/89) focused on the methods or algorithms of acute leukaemia classification, a number of papers (22/89) covered the developed systems for the detection or diagnosis of acute leukaemia and few papers (5/89) presented evaluation and comparative studies. The smallest portion (4/89) of articles comprised reviews and surveys. DISCUSSION Acute leukaemia diagnosis, which is a field requiring automated solutions, tools and methods, entails the ability to facilitate early detection or even prediction. Many studies have been performed on the automatic detection and classification of acute leukaemia and their subtypes to promote accurate diagnosis. CONCLUSIONS Research areas on medical-image classification vary, but they are all equally vital. We expect this systematic review to help emphasise current research opportunities and thus extend and create additional research fields.
Collapse
Affiliation(s)
- M A Alsalem
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - A A Zaidan
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia.
| | - B B Zaidan
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - M Hashim
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - H T Madhloom
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - N D Azeez
- Department of Computing, Faculty of Arts, Computing and Creative Industry, Universiti Pendidikan Sultan Idris, Malaysia
| | - S Alsyisuf
- Faculty of on information Science and Engineering, Management and Science university, Shah Alam, Malaysia
| |
Collapse
|
26
|
Guo Z, Xin Y, Zhao Y. Cancer classification using entropy analysis in fractional Fourier domain of gene expression profile. BIOTECHNOL BIOTEC EQ 2017. [DOI: 10.1080/13102818.2017.1413596] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open
Affiliation(s)
- ZhiPeng Guo
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, P.R. China
| | - Yi Xin
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, P.R. China
| | - YiZhang Zhao
- Department of Biomedical Engineering, School of Life Science, Beijing Institute of Technology, Beijing, P.R. China
| |
Collapse
|
27
|
Meng J, Zhang J, Luan YS, He XY, Li LS, Zhu YF. Parallel gene selection and dynamic ensemble pruning based on Affinity Propagation. Comput Biol Med 2017; 87:8-21. [PMID: 28544912 DOI: 10.1016/j.compbiomed.2017.05.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Revised: 05/13/2017] [Accepted: 05/13/2017] [Indexed: 12/01/2022]
Abstract
Gene selection and sample classification based on gene expression data are important research areas in bioinformatics. Selecting important genes closely related to classification is a challenging task due to high dimensionality and small sample size of microarray data. Extended rough set based on neighborhood has been successfully applied to gene selection, as it can select attributes without redundancy and deal with numerical attributes directly. However, the computation of approximations in rough set is extremely time consuming. In this paper, in order to accelerate the process of gene selection, a parallel computation method is proposed to calculate approximations of intersection neighborhood rough set. Furthermore, a novel dynamic ensemble pruning approach based on Affinity Propagation clustering and dynamic pruning framework is proposed to reduce memory usage and computational cost. Experimental results on three Arabidopsis thaliana biotic and abiotic stress response datasets demonstrate that the proposed method can obtain better classification performance than ensemble method with gene pre-selection.
Collapse
Affiliation(s)
- Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China
| | - Jing Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China
| | - Yu-Shi Luan
- School of Life Science and Biotechnology, Dalian University of Technology, Dalian 116023, China.
| | - Xin-Yu He
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China
| | - Li-Shuang Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian 116023, China
| | - Yuan-Feng Zhu
- BorderX Lab Inc, Silicon Valley, California, 94086, USA
| |
Collapse
|
28
|
Gene selection for tumor classification using neighborhood rough sets and entropy measures. J Biomed Inform 2017; 67:59-68. [PMID: 28215562 DOI: 10.1016/j.jbi.2017.02.007] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Revised: 01/25/2017] [Accepted: 02/09/2017] [Indexed: 01/04/2023]
Abstract
With the development of bioinformatics, tumor classification from gene expression data becomes an important useful technology for cancer diagnosis. Since a gene expression data often contains thousands of genes and a small number of samples, gene selection from gene expression data becomes a key step for tumor classification. Attribute reduction of rough sets has been successfully applied to gene selection field, as it has the characters of data driving and requiring no additional information. However, traditional rough set method deals with discrete data only. As for the gene expression data containing real-value or noisy data, they are usually employed by a discrete preprocessing, which may result in poor classification accuracy. In this paper, we propose a novel gene selection method based on the neighborhood rough set model, which has the ability of dealing with real-value data whilst maintaining the original gene classification information. Moreover, this paper addresses an entropy measure under the frame of neighborhood rough sets for tackling the uncertainty and noisy of gene expression data. The utilization of this measure can bring about a discovery of compact gene subsets. Finally, a gene selection algorithm is designed based on neighborhood granules and the entropy measure. Some experiments on two gene expression data show that the proposed gene selection is an effective method for improving the accuracy of tumor classification.
Collapse
|
29
|
Zhu P, Hu Q, Han Y, Zhang C, Du Y. Combining neighborhood separable subspaces for classification via sparsity regularized optimization. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.08.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
30
|
Zhao H, Wang P, Hu Q. Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.05.025] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
31
|
Das DK, Chakraborty C, Bhattacharya PS. Automated Screening Methodology for Asthma Diagnosis that Ensembles Clinical and Spirometric Information. J Med Biol Eng 2016. [DOI: 10.1007/s40846-016-0137-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
32
|
Liu Y, Xie H, Wang L, Tan K. Hyperspectral band selection based on a variable precision neighborhood rough set. APPLIED OPTICS 2016; 55:462-472. [PMID: 26835918 DOI: 10.1364/ao.55.000462] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Band selection is a well-known approach for reducing dimensionality in hyperspectral images. We propose a band-selection method based on the variable precision neighborhood rough set theory to select informative bands from hyperspectral images. A decision-making information system was established by hyperspectral data derived from soybean samples between 400 and 1000 nm wavelengths. The dependency was used to evaluate band significance. The optimal band subset was selected by a forward greedy search algorithm. After adjusting appropriate threshold values, stable optimized results were obtained. To assess the effectiveness of the proposed band-selection technique, two classification models were constructed. The experimental results showed that admitting inclusion errors could improve classification performance, including band selection and generalization ability.
Collapse
|
33
|
Bonilla-Huerta E, Hernández-Montiel A, Caporal RM, López MA. Hybrid Framework Using Multiple-Filters and an Embedded Approach for an Efficient Selection and Classification of Microarray Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:12-26. [PMID: 26336138 DOI: 10.1109/tcbb.2015.2474384] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
A hybrid framework composed of two stages for gene selection and classification of DNA microarray data is proposed. At the first stage, five traditional statistical methods are combined for preliminary gene selection (Multiple Fusion Filter). Then, different relevant gene subsets are selected by using an embedded Genetic Algorithm (GA), Tabu Search (TS), and Support Vector Machine (SVM). A gene subset, consisting of the most relevant genes, is obtained from this process, by analyzing the frequency of each gene in the different gene subsets. Finally, the most frequent genes are evaluated by the embedded approach to obtain a final relevant small gene subset with high performance. The proposed method is tested in four DNA microarray datasets. From simulation study, it is observed that the proposed approach works better than other methods reported in the literature.
Collapse
|
34
|
Hsieh SY, Chou YC. A Faster cDNA Microarray Gene Expression Data Classifier for Diagnosing Diseases. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:43-54. [PMID: 26336139 DOI: 10.1109/tcbb.2015.2474389] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Profiling cancer molecules has several advantages; however, using microarray technology in routine clinical diagnostics is challenging for physicians. The classification of microarray data has two main limitations: 1) the data set is unreliable for building classifiers; and 2) the classifiers exhibit poor performance. Current microarray classification algorithms typically yield a high rate of false-positives cases, which is unacceptable in diagnostic applications. Numerous algorithms have been developed to detect false-positive cases; however, they require a considerable computation time. To address this problem, this study enhanced a previously proposed gene expression graph (GEG)-based classifier to shorten the computation time. The modified classifier filters genes by using an edge weight to determine their significance, thereby facilitating accurate comparison and classification. This study experimentally compared the proposed classifier with a GEG-based classifier by using real data and benchmark tests. The results show that the proposed classifier is faster at detecting false-positives.
Collapse
|
35
|
Mollaee M, Moattar MH. A novel feature extraction approach based on ensemble feature selection and modified discriminant independent component analysis for microarray data classification. Biocybern Biomed Eng 2016. [DOI: 10.1016/j.bbe.2016.05.001] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
36
|
Meng J, Zhang J, Luan Y. Gene Selection Integrated with Biological Knowledge for Plant Stress Response Using Neighborhood System and Rough Set Theory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:433-444. [PMID: 26357229 DOI: 10.1109/tcbb.2014.2361329] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Mining knowledge from gene expression data is a hot research topic and direction of bioinformatics. Gene selection and sample classification are significant research trends, due to the large amount of genes and small size of samples in gene expression data. Rough set theory has been successfully applied to gene selection, as it can select attributes without redundancy. To improve the interpretability of the selected genes, some researchers introduced biological knowledge. In this paper, we first employ neighborhood system to deal directly with the new information table formed by integrating gene expression data with biological knowledge, which can simultaneously present the information in multiple perspectives and do not weaken the information of individual gene for selection and classification. Then, we give a novel framework for gene selection and propose a significant gene selection method based on this framework by employing reduction algorithm in rough set theory. The proposed method is applied to the analysis of plant stress response. Experimental results on three data sets show that the proposed method is effective, as it can select significant gene subsets without redundancy and achieve high classification accuracy. Biological analysis for the results shows that the interpretability is well.
Collapse
|
37
|
Yang L, Ainali C, Kittas A, Nestle FO, Papageorgiou LG, Tsoka S. Pathway-level disease data mining through hyper-box principles. Math Biosci 2015; 260:25-34. [DOI: 10.1016/j.mbs.2014.09.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Revised: 09/11/2014] [Accepted: 09/13/2014] [Indexed: 01/16/2023]
|
38
|
Wang SL, Sun L, Fang J. Molecular cancer classification using a meta-sample-based regularized robust coding method. BMC Bioinformatics 2014; 15 Suppl 15:S2. [PMID: 25473795 PMCID: PMC4271561 DOI: 10.1186/1471-2105-15-s15-s2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
MOTIVATION Previous studies have demonstrated that machine learning based molecular cancer classification using gene expression profiling (GEP) data is promising for the clinic diagnosis and treatment of cancer. Novel classification methods with high efficiency and prediction accuracy are still needed to deal with high dimensionality and small sample size of typical GEP data. Recently the sparse representation (SR) method has been successfully applied to the cancer classification. Nevertheless, its efficiency needs to be improved when analyzing large-scale GEP data. RESULTS In this paper we present the meta-sample-based regularized robust coding classification (MRRCC), a novel effective cancer classification technique that combines the idea of meta-sample-based cluster method with regularized robust coding (RRC) method. It assumes that the coding residual and the coding coefficient are respectively independent and identically distributed. Similar to meta-sample-based SR classification (MSRC), MRRCC extracts a set of meta-samples from the training samples, and then encodes a testing sample as the sparse linear combination of these meta-samples. The representation fidelity is measured by the l2-norm or l1-norm of the coding residual. CONCLUSIONS Extensive experiments on publicly available GEP datasets demonstrate that the proposed method is more efficient while its prediction accuracy is equivalent to existing MSRC-based methods and better than other state-of-the-art dimension reduction based methods.
Collapse
|
39
|
Gene selection using rough set based on neighborhood for the analysis of plant stress response. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.09.013] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
40
|
Lotfi E, Keshavarz A. Gene expression microarray classification using PCA–BEL. Comput Biol Med 2014; 54:180-7. [DOI: 10.1016/j.compbiomed.2014.09.008] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 09/13/2014] [Accepted: 09/16/2014] [Indexed: 01/15/2023]
|
41
|
Zhu P, Hu Q. Adaptive neighborhood granularity selection and combination based on margin distribution optimization. Inf Sci (N Y) 2013. [DOI: 10.1016/j.ins.2013.06.012] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
42
|
Liang Y, Liu C, Luan XZ, Leung KS, Chan TM, Xu ZB, Zhang H. Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification. BMC Bioinformatics 2013; 14:198. [PMID: 23777239 PMCID: PMC3718705 DOI: 10.1186/1471-2105-14-198] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2012] [Accepted: 05/30/2013] [Indexed: 11/21/2022] Open
Abstract
Background Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L1), and many L1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L1/2 regularization can be taken as a representative of Lq (0 <q < 1) regularizations and has been demonstrated many attractive properties. Results In this work, we investigate a sparse logistic regression with the L1/2 penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L1/2 penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L1/2 regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L1 and elastic net regularization approaches. Conclusions From our evaluations, it is clear that the sparse logistic regression with the L1/2 penalty achieves higher classification accuracy than those of ordinary L1 and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L1/2 penalty is effective technique for gene selection in real classification problems.
Collapse
Affiliation(s)
- Yong Liang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau, China.
| | | | | | | | | | | | | |
Collapse
|
43
|
Wang SL, Fang Y, Fang J. Diagnostic prediction of complex diseases using phase-only correlation based on virtual sample template. BMC Bioinformatics 2013; 14 Suppl 8:S11. [PMID: 23815677 PMCID: PMC3654928 DOI: 10.1186/1471-2105-14-s8-s11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
MOTIVATION Complex diseases induce perturbations to interaction and regulation networks in living systems, resulting in dynamic equilibrium states that differ for different diseases and also normal states. Thus identifying gene expression patterns corresponding to different equilibrium states is of great benefit to the diagnosis and treatment of complex diseases. However, it remains a major challenge to deal with the high dimensionality and small size of available complex disease gene expression datasets currently used for discovering gene expression patterns. RESULTS Here we present a phase-only correlation (POC) based classification method for recognizing the type of complex diseases. First, a virtual sample template is constructed for each subclass by averaging all samples of each subclass in a training dataset. Then the label of a test sample is determined by measuring the similarity between the test sample and each template. This novel method can detect the similarity of overall patterns emerged from the differentially expressed genes or proteins while ignoring small mismatches. CONCLUSIONS The experimental results obtained on seven publicly available complex disease datasets including microarray and protein array data demonstrate that the proposed POC-based disease classification method is effective and robust for diagnosing complex diseases with regard to the number of initially selected features, and its recognition accuracy is better than or comparable to other state-of-the-art machine learning methods. In addition, the proposed method does not require parameter tuning and data scaling, which can effectively reduce the occurrence of over-fitting and bias.
Collapse
Affiliation(s)
- Shu-Lin Wang
- Applied Bioinformatics Laboratory, the University of Kansas, 2034 Becker Drive, Lawrence, KS 66047, USA
| | - Yaping Fang
- Applied Bioinformatics Laboratory, the University of Kansas, 2034 Becker Drive, Lawrence, KS 66047, USA
| | - Jianwen Fang
- Applied Bioinformatics Laboratory, the University of Kansas, 2034 Becker Drive, Lawrence, KS 66047, USA
| |
Collapse
|
44
|
Wu MY, Dai DQ, Shi Y, Yan H, Zhang XF. Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:1649-1662. [PMID: 22868679 DOI: 10.1109/tcbb.2012.105] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Biomarker identification and cancer classification are two closely related problems. In gene expression data sets, the correlation between genes can be high when they share the same biological pathway. Moreover, the gene expression data sets may contain outliers due to either chemical or electrical reasons. A good gene selection method should take group effects into account and be robust to outliers. In this paper, we propose a Laplace naive Bayes model with mean shrinkage (LNB-MS). The Laplace distribution instead of the normal distribution is used as the conditional distribution of the samples for the reasons that it is less sensitive to outliers and has been applied in many fields. The key technique is the L1 penalty imposed on the mean of each class to achieve automatic feature selection. The objective function of the proposed model is a piecewise linear function with respect to the mean of each class, of which the optimal value can be evaluated at the breakpoints simply. An efficient algorithm is designed to estimate the parameters in the model. A new strategy that uses the number of selected features to control the regularization parameter is introduced. Experimental results on simulated data sets and 17 publicly available cancer data sets attest to the accuracy, sparsity, efficiency, and robustness of the proposed algorithm. Many biomarkers identified with our method have been verified in biochemical or biomedical research. The analysis of biological and functional correlation of the genes based on Gene Ontology (GO) terms shows that the proposed method guarantees the selection of highly correlated genes simultaneously
Collapse
Affiliation(s)
- Meng-Yun Wu
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University,Guangzhou 510275, China.
| | | | | | | | | |
Collapse
|
45
|
Wang SL, Li XL, Fang J. Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification. BMC Bioinformatics 2012; 13:178. [PMID: 22830977 PMCID: PMC3465202 DOI: 10.1186/1471-2105-13-178] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 05/18/2012] [Indexed: 01/03/2023] Open
Abstract
Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network.
Collapse
Affiliation(s)
- Shu-Lin Wang
- Applied Bioinformatics Laboratory, University of Kansas, 2034 Becker Drive, Lawrence, KS 66047, USA
| | | | | |
Collapse
|
46
|
Fernández-Navarro F, Hervás-Martínez C, Ruiz R, Riquelme JC. Evolutionary Generalized Radial Basis Function neural networks for improving prediction accuracy in gene classification using feature selection. Appl Soft Comput 2012. [DOI: 10.1016/j.asoc.2012.01.008] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
47
|
Cornero A, Acquaviva M, Fardin P, Versteeg R, Schramm A, Eva A, Bosco MC, Blengio F, Barzaghi S, Varesio L. Design of a multi-signature ensemble classifier predicting neuroblastoma patients' outcome. BMC Bioinformatics 2012; 13 Suppl 4:S13. [PMID: 22536959 PMCID: PMC3314564 DOI: 10.1186/1471-2105-13-s4-s13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Neuroblastoma is the most common pediatric solid tumor of the sympathetic nervous system. Development of improved predictive tools for patients stratification is a crucial requirement for neuroblastoma therapy. Several studies utilized gene expression-based signatures to stratify neuroblastoma patients and demonstrated a clear advantage of adding genomic analysis to risk assessment. There is little overlapping among signatures and merging their prognostic potential would be advantageous. Here, we describe a new strategy to merge published neuroblastoma related gene signatures into a single, highly accurate, Multi-Signature Ensemble (MuSE)-classifier of neuroblastoma (NB) patients outcome. Methods Gene expression profiles of 182 neuroblastoma tumors, subdivided into three independent datasets, were used in the various phases of development and validation of neuroblastoma NB-MuSE-classifier. Thirty three signatures were evaluated for patients' outcome prediction using 22 classification algorithms each and generating 726 classifiers and prediction results. The best-performing algorithm for each signature was selected, validated on an independent dataset and the 20 signatures performing with an accuracy > = 80% were retained. Results We combined the 20 predictions associated to the corresponding signatures through the selection of the best performing algorithm into a single outcome predictor. The best performance was obtained by the Decision Table algorithm that produced the NB-MuSE-classifier characterized by an external validation accuracy of 94%. Kaplan-Meier curves and log-rank test demonstrated that patients with good and poor outcome prediction by the NB-MuSE-classifier have a significantly different survival (p < 0.0001). Survival curves constructed on subgroups of patients divided on the bases of known prognostic marker suggested an excellent stratification of localized and stage 4s tumors but more data are needed to prove this point. Conclusions The NB-MuSE-classifier is based on an ensemble approach that merges twenty heterogeneous, neuroblastoma-related gene signatures to blend their discriminating power, rather than numeric values, into a single, highly accurate patients' outcome predictor. The novelty of our approach derives from the way to integrate the gene expression signatures, by optimally associating them with a single paradigm ultimately integrated into a single classifier. This model can be exported to other types of cancer and to diseases for which dedicated databases exist.
Collapse
Affiliation(s)
- Andrea Cornero
- Laboratory of Molecular Biology, G. Gaslini Institute, Genoa 16147, Italy
| | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
A fuzzy intelligent approach to the classification problem in gene expression data analysis. Knowl Based Syst 2012. [DOI: 10.1016/j.knosys.2011.10.012] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
49
|
Zheng CH, Zhang L, Ng TY, Shiu SCK, Huang DS. Metasample-based sparse representation for tumor classification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1273-1282. [PMID: 21282864 DOI: 10.1109/tcbb.2011.20] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A reliable and accurate identification of the type of tumors is crucial to the proper treatment of cancers. In recent years, it has been shown that sparse representation (SR) by l1-norm minimization is robust to noise, outliers and even incomplete measurements, and SR has been successfully used for classification. This paper presents a new SR-based method for tumor classification using gene expression data. A set of metasamples are extracted from the training samples, and then an input testing sample is represented as the linear combination of these metasamples by l1-regularized least square method. Classification is achieved by using a discriminating function defined on the representation coefficients. Since l1-norm minimization leads to a sparse solution, the proposed method is called metasample-based SR classification (MSRC). Extensive experiments on publicly available gene expression data sets show that MSRC is efficient for tumor classification, achieving higher accuracy than many existing representative schemes.
Collapse
Affiliation(s)
- Chun-Hou Zheng
- College of Information and Communication Technology, Qufu Normal University, Rizhao, Shandong 276826, China.
| | | | | | | | | |
Collapse
|
50
|
Chuang LY, Yang CH, Wu KC, Yang CH. A hybrid feature selection method for DNA microarray data. Comput Biol Med 2011; 41:228-37. [DOI: 10.1016/j.compbiomed.2011.02.004] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2010] [Revised: 01/01/2011] [Accepted: 02/08/2011] [Indexed: 12/27/2022]
|