1
|
A reliable method for colorectal cancer prediction based on feature selection and support vector machine. Med Biol Eng Comput 2018; 57:901-912. [PMID: 30478811 DOI: 10.1007/s11517-018-1930-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2017] [Accepted: 11/17/2018] [Indexed: 02/07/2023]
Abstract
Colorectal cancer (CRC) is a common cancer responsible for approximately 600,000 deaths per year worldwide. Thus, it is very important to find the related factors and detect the cancer accurately. However, timely and accurate prediction of the disease is challenging. In this study, we build an integrated model based on logistic regression (LR) and support vector machine (SVM) to classify the CRC into cancer and normal samples. From various factors, human location, age, gender, BMI, and cancer tumor type, tumor grade, and DNA, of the cancer, we select the most significant factors (p < 0.05) using logistic regression as main features, and with these features, a grid-search SVM model is designed using different kernel types (Linear, radial basis function (RBF), Sigmoid, and Polynomial). The result of the logistic regression indicates that the Firmicutes (AUC 0.918), Bacteroidetes (AUC 0.856), body mass index (BMI) (AUC 0.777), and age (AUC 0.710) and their combined factors (AUC 0.942) are effective for CRC detection. And the best kernel type is RBF, which achieves an accuracy of 90.1% when k = 5, and 91.2% when k = 10. This study provides a new method for colorectal cancer prediction based on independent risky factors. Graphical abstract Flow chart depicting the method adopted in the study. LR (logistic regression) and ROC curve are used to select independent features as input of SVM. SVM kernel selection aims to find the best kernel function for classification by comparing Linear, RBF, Sigmoid, and Polynomial kernel types of SVM, and the result shows the best kernel is RBF. Classification performance of LR + RF, LR + NB, LR + KNN, and LR + ANNs models are compared with LR + SVM. After these steps, the cancer and healthy individuals can be classified, and the best model is selected.
Collapse
|
2
|
Li S, Shi J, Gao H, Yuan Y, Chen Q, Zhao Z, Wang X, Li B, Ming L, Zhong J, Zhou P, He H, Tao B, Li S. Identification of a gene signature associated with radiotherapy and prognosis in gliomas. Oncotarget 2017; 8:88974-88987. [PMID: 29179492 PMCID: PMC5687662 DOI: 10.18632/oncotarget.21634] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 08/06/2017] [Indexed: 02/06/2023] Open
Abstract
Glioma is one of the most common primary brain tumors with poor prognosis. Although radiotherapy is an important treatment method for gliomas, the efficacy is still limited by the high occurrence of radioresistance and the underlying molecular mechanism is unclear. Here, we performed a data mining work based on four glioma expression datasets. These datasets were classified into training set and validation set. Radiotherapy-induced differential expressed genes and prognosis-associated genes were screened using different classifiers. The Kaplan-Meier curves along with the two-sided Log Rank (Mantel-Cox) test were used to evaluate overall survival. We found the gene expression profiles of gliomas between those patients received radiotherapy and those patients without received radiotherapy were quite different. A 20-gene signature was identified, which was associated with radiotherapy.Furthermore, a novel 5-gene signature (HOXC10, LOC101928747, CYB561D2, RPL36A and RPS4XP2) as an independent predictor of glioma patients’ prognosis was further derived from the 20-gene signature. These findings provided a new insight into the molecular mechanism of radioresistance in gliomas. The 5-gene signature might represent therapeutic target for gliomas.
Collapse
Affiliation(s)
- Shu Li
- Department of Pathophysiology, Wannan Medical College, Wuhu 241002, China.,Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Juanhong Shi
- Department of Pathology Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Hongliang Gao
- Department of Pathophysiology, Wannan Medical College, Wuhu 241002, China.,Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Yan Yuan
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Qi Chen
- Department of Anesthesiology, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Zhenyu Zhao
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Xiaoqiang Wang
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Bin Li
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - LinZhao Ming
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Jun Zhong
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Ping Zhou
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Hua He
- Department of Neurosurgery, Changzheng Hospital, The Second Hospital Affiliated with The Second Military Medical University, Shanghai 200092, China
| | - Bangbao Tao
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| | - Shiting Li
- Department of Neurosurgery, Xinhua Hospital, School of Medicine, Shanghai Jiaotong University, Shanghai 200092, China
| |
Collapse
|
3
|
Integrating Domain Specific Knowledge and Network Analysis to Predict Drug Sensitivity of Cancer Cell Lines. PLoS One 2016; 11:e0162173. [PMID: 27607242 PMCID: PMC5015856 DOI: 10.1371/journal.pone.0162173] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 08/18/2016] [Indexed: 12/20/2022] Open
Abstract
One of fundamental challenges in cancer studies is that varying molecular characteristics of different tumor types may lead to resistance to certain drugs. As a result, the same drug can lead to significantly different results in different types of cancer thus emphasizing the need for individualized medicine. Individual prediction of drug response has great potential to aid in improving the clinical outcome and reduce the financial costs associated with prescribing chemotherapy drugs to which the patient's tumor might be resistant. In this paper we develop a network based classifier (NBC) method for predicting sensitivity of cell lines to anticancer drugs from transcriptome data. In the literature, this strategy has been used for predicting cancer types. Here, we extend it to estimate sensitivity of cells from different tumor types to various anticancer drugs. Furthermore, we incorporate domain specific knowledge such as the use of apoptotic gene list and clinical dose information in our method to impart biological significance to the prediction. Our experimental results suggest that our network based classifier (NBC) method outperforms existing classifiers in estimating sensitivity of cell lines for different drugs.
Collapse
|
4
|
Gabere MN, Hussein MA, Aziz MA. Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer. Onco Targets Ther 2016; 9:3313-25. [PMID: 27330311 PMCID: PMC4898422 DOI: 10.2147/ott.s98910] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples.
Collapse
Affiliation(s)
- Musa Nur Gabere
- Department of Bioinformatics, King Abdullah International Medical Research Center/King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Mohamed Aly Hussein
- Department of Bioinformatics, King Abdullah International Medical Research Center/King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
| | - Mohammad Azhar Aziz
- Colorectal Cancer Research Program, Department of Medical Genomics, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia
| |
Collapse
|