1
|
Li G, Zhou X, Liu J, Chen Y, Zhang H, Chen Y, Liu J, Jiang H, Yang J, Nie S. Comparison of three data mining models for prediction of advanced schistosomiasis prognosis in the Hubei province. PLoS Negl Trop Dis 2018; 12:e0006262. [PMID: 29447165 PMCID: PMC5831639 DOI: 10.1371/journal.pntd.0006262] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 02/28/2018] [Accepted: 01/23/2018] [Indexed: 01/27/2023] Open
Abstract
BACKGROUND In order to better assist medical professionals, this study aimed to develop and compare the performance of three models-a multivariate logistic regression (LR) model, an artificial neural network (ANN) model, and a decision tree (DT) model-to predict the prognosis of patients with advanced schistosomiasis residing in the Hubei province. METHODOLOGY/PRINCIPAL FINDINGS Schistosomiasis surveillance data were collected from a previous study based on a Hubei population sample including 4136 advanced schistosomiasis cases. The predictive models use LR, ANN, and DT methods. From each of the three groups, 70% of the cases (2896 cases) were used as training data for the predictive models. The remaining 30% of the cases (1240 cases) were used as validation groups for performance comparisons between the three models. Prediction performance was evaluated using area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. Univariate analysis indicated that 16 risk factors were significantly associated with a patient's outcome of prognosis. In the training group, the mean AUC was 0.8276 for LR, 0.9267 for ANN, and 0.8229 for DT. In the validation group, the mean AUC was 0.8349 for LR, 0.8318 for ANN, and 0.8148 for DT. The three models yielded similar results in terms of accuracy, sensitivity, and specificity. CONCLUSIONS/SIGNIFICANCE Predictive models for advanced schistosomiasis prognosis, respectively using LR, ANN and DT models were proved to be effective approaches based on our dataset. The ANN model outperformed the LR and DT models in terms of AUC.
Collapse
Affiliation(s)
- Guo Li
- Department of Epidemiology and Health Statistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, Hubei, China
| | - Xiaorong Zhou
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, Hubei, China
| | - Jianbing Liu
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, Hubei, China
| | - Yuanqi Chen
- Department of Mathematics, Wuhan University, Wuhan, Hubei, China
| | - Hengtao Zhang
- Department of Mathematics, Wuhan University, Wuhan, Hubei, China
| | - Yanyan Chen
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, Hubei, China
| | - Jianhua Liu
- Yichang Center for Disease Control and Prevention, Yichang, Hubei, China
| | - Hongbo Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Guangdong Pharmaceutical University, Guangzhou, China
| | - Junjing Yang
- Hubei Provincial Center for Disease Control and Prevention, Wuhan, Hubei, China
| | - Shaofa Nie
- Department of Epidemiology and Health Statistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China
| |
Collapse
|
2
|
Pourhoseingholi MA, Kheirian S, Zali MR. Comparison of Basic and Ensemble Data Mining Methods in Predicting 5-Year Survival of Colorectal Cancer Patients. Acta Inform Med 2017; 25:254-258. [PMID: 29284916 PMCID: PMC5723205 DOI: 10.5455/aim.2017.25.254-258] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2017] [Accepted: 11/11/2017] [Indexed: 01/27/2023] Open
Abstract
INTRODUCTION Colorectal cancer (CRC) is one of the most common malignancies and cause of cancer mortality worldwide. Given the importance of predicting the survival of CRC patients and the growing use of data mining methods, this study aims to compare the performance of models for predicting 5-year survival of CRC patients using variety of basic and ensemble data mining methods. METHODS The CRC dataset from The Shahid Beheshti University of Medical Sciences Research Center for Gastroenterology and Liver Diseases were used for prediction and comparative study of the base and ensemble data mining techniques. Feature selection methods were used to select predictor attributes for classification. The WEKA toolkit and MedCalc software were respectively utilized for creating and comparing the models. RESULTS The obtained results showed that the predictive performance of developed models was altogether high (all greater than 90%). Overall, the performance of ensemble models was higher than that of basic classifiers and the best result achieved by ensemble voting model in terms of area under the ROC curve (AUC= 0.96). CONCLUSION AUC Comparison of models showed that the ensemble voting method significantly outperformed all models except for two methods of Random Forest (RF) and Bayesian Network (BN) considered the overlapping 95% confidence intervals. This result may indicate high predictive power of these two methods along with ensemble voting for predicting 5-year survival of CRC patients.
Collapse
Affiliation(s)
- Mohamad Amin Pourhoseingholi
- Gastroenterology and Liver Diseases Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Sedigheh Kheirian
- Department of Health Informatics Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mohammad Reza Zali
- Basic and Molecular Epidemiology of Gastrointestinal Disorders Research Center, Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
3
|
Huy NT, Thao NTH, Ha TTN, Lan NTP, Nga PTT, Thuy TT, Tuan HM, Nga CTP, Tuong VV, Dat TV, Huong VTQ, Karbwang J, Hirayama K. Development of clinical decision rules to predict recurrent shock in dengue. CRITICAL CARE : THE OFFICIAL JOURNAL OF THE CRITICAL CARE FORUM 2013; 17:R280. [PMID: 24295509 PMCID: PMC4057383 DOI: 10.1186/cc13135] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 11/01/2013] [Indexed: 11/10/2022]
Abstract
INTRODUCTION Mortality from dengue infection is mostly due to shock. Among dengue patients with shock, approximately 30% have recurrent shock that requires a treatment change. Here, we report development of a clinical rule for use during a patient's first shock episode to predict a recurrent shock episode. METHODS The study was conducted in Center for Preventive Medicine in Vinh Long province and the Children's Hospital No. 2 in Ho Chi Minh City, Vietnam. We included 444 dengue patients with shock, 126 of whom had recurrent shock (28%). Univariate and multivariate analyses and a preprocessing method were used to evaluate and select 14 clinical and laboratory signs recorded at shock onset. Five variables (admission day, purpura/ecchymosis, ascites/pleural effusion, blood platelet count and pulse pressure) were finally trained and validated by a 10-fold validation strategy with 10 times of repetition, using a logistic regression model. RESULTS The results showed that shorter admission day (fewer days prior to admission), purpura/ecchymosis, ascites/pleural effusion, low platelet count and narrow pulse pressure were independently associated with recurrent shock. Our logistic prediction model was capable of predicting recurrent shock when compared to the null method (P < 0.05) and was not outperformed by other prediction models. Our final scoring rule provided relatively good accuracy (AUC, 0.73; sensitivity and specificity, 68%). Score points derived from the logistic prediction model revealed identical accuracy with AUCs at 0.73. Using a cutoff value greater than -154.5, our simple scoring rule showed a sensitivity of 68.3% and a specificity of 68.2%. CONCLUSIONS Our simple clinical rule is not to replace clinical judgment, but to help clinicians predict recurrent shock during a patient's first dengue shock episode.
Collapse
|
4
|
Gao P, Zhou X, Wang ZN, Song YX, Tong LL, Xu YY, Yue ZY, Xu HM. Which is a more accurate predictor in colorectal survival analysis? Nine data mining algorithms vs. the TNM staging system. PLoS One 2012; 7:e42015. [PMID: 22848691 PMCID: PMC3404978 DOI: 10.1371/journal.pone.0042015] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2012] [Accepted: 06/29/2012] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVE Over the past decades, many studies have used data mining technology to predict the 5-year survival rate of colorectal cancer, but there have been few reports that compared multiple data mining algorithms to the TNM classification of malignant tumors (TNM) staging system using a dataset in which the training and testing data were from different sources. Here we compared nine data mining algorithms to the TNM staging system for colorectal survival analysis. METHODS Two different datasets were used: 1) the National Cancer Institute's Surveillance, Epidemiology, and End Results dataset; and 2) the dataset from a single Chinese institution. An optimization and prediction system based on nine data mining algorithms as well as two variable selection methods was implemented. The TNM staging system was based on the 7(th) edition of the American Joint Committee on Cancer TNM staging system. RESULTS When the training and testing data were from the same sources, all algorithms had slight advantages over the TNM staging system in predictive accuracy. When the data were from different sources, only four algorithms (logistic regression, general regression neural network, bayesian networks, and Naïve Bayes) had slight advantages over the TNM staging system. Also, there was no significant differences among all the algorithms (p>0.05). CONCLUSIONS The TNM staging system is simple and practical at present, and data mining methods are not accurate enough to replace the TNM staging system for colorectal cancer survival prediction. Furthermore, there were no significant differences in the predictive accuracy of all the algorithms when the data were from different sources. Building a larger dataset that includes more variables may be important for furthering predictive accuracy.
Collapse
Affiliation(s)
- Peng Gao
- Department of Surgical Oncology and General Surgery, The First Hospital of China Medical University, Shenyang, P.R. China
| | - Xin Zhou
- Department of Gynecology and Obstetrics, Shengjing Hospital of China Medical University, Shenyang, P.R. China
| | - Zhen-ning Wang
- Department of Surgical Oncology and General Surgery, The First Hospital of China Medical University, Shenyang, P.R. China
| | - Yong-xi Song
- Department of Surgical Oncology and General Surgery, The First Hospital of China Medical University, Shenyang, P.R. China
| | - Lin-lin Tong
- Department of Surgical Oncology and General Surgery, The First Hospital of China Medical University, Shenyang, P.R. China
| | - Ying-ying Xu
- Department of Surgical Oncology and General Surgery, The First Hospital of China Medical University, Shenyang, P.R. China
| | - Zhen-yu Yue
- Department of Surgical Oncology and General Surgery, The First Hospital of China Medical University, Shenyang, P.R. China
| | - Hui-mian Xu
- Department of Surgical Oncology and General Surgery, The First Hospital of China Medical University, Shenyang, P.R. China
| |
Collapse
|
5
|
Epithelial-mesenchymal transition biomarkers and support vector machine guided model in preoperatively predicting regional lymph node metastasis for rectal cancer. Br J Cancer 2012; 106:1735-41. [PMID: 22538975 PMCID: PMC3364123 DOI: 10.1038/bjc.2012.82] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
Background: Current imaging modalities are inadequate in preoperatively predicting regional lymph node metastasis (RLNM) status in rectal cancer (RC). Here, we designed support vector machine (SVM) model to address this issue by integrating epithelial–mesenchymal-transition (EMT)-related biomarkers along with clinicopathological variables. Methods: Using tissue microarrays and immunohistochemistry, the EMT-related biomarkers expression was measured in 193 RC patients. Of which, 74 patients were assigned to the training set to select the robust variables for designing SVM model. The SVM model predictive value was validated in the testing set (119 patients). Results: In training set, eight variables, including six EMT-related biomarkers and two clinicopathological variables, were selected to devise SVM model. In testing set, we identified 63 patients with high risk to RLNM and 56 patients with low risk. The sensitivity, specificity and overall accuracy of SVM in predicting RLNM were 68.3%, 81.1% and 72.3%, respectively. Importantly, multivariate logistic regression analysis showed that SVM model was indeed an independent predictor of RLNM status (odds ratio, 11.536; 95% confidence interval, 4.113–32.361; P<0.0001). Conclusion: Our SVM-based model displayed moderately strong predictive power in defining the RLNM status in RC patients, providing an important approach to select RLNM high-risk subgroup for neoadjuvant chemoradiotherapy.
Collapse
|
6
|
Molecular prognostic prediction for locally advanced nasopharyngeal carcinoma by support vector machine integrated approach. PLoS One 2012; 7:e31989. [PMID: 22427815 PMCID: PMC3302890 DOI: 10.1371/journal.pone.0031989] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Accepted: 01/19/2012] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Accurate prognostication of locally advanced nasopharyngeal carcinoma (NPC) will benefit patients for tailored therapy. Here, we addressed this issue by developing a mathematical algorithm based on support vector machine (SVM) through integrating the expression levels of multi-biomarkers. METHODOLOGY/PRINCIPAL FINDINGS Ninety-seven locally advanced NPC patients in a randomized controlled trial (RCT), consisting of 48 cases serving as training set and 49 cases as testing set of SVM models, with 5-year follow-up were studied. We designed SVM models by selecting the variables from 38 tissue molecular biomarkers, which represent 6 tumorigenesis signaling pathways, and 3 EBV-related serological biomarkers. We designed 3 SVM models to refine prognosis of NPC with 5-year follow-up. The SVM1 displayed highly predictive sensitivity (sensitivity, specificity were 88.0% and 81.9%, respectively) by integrating the expression of 7 molecular biomarkers. The SVM2 model showed highly predictive specificity (sensitivity, specificity were 84.0% and 94.5%, respectively) by grouping the expression level of 12 molecular biomarkers and 3 EBV-related serological biomarkers. The SVM3 model, constructed by combination SVM1 with SVM2, displayed a high predictive capacity (sensitivity, specificity were 88.0% and 90.3%, respectively). We found that 3 SVM models had strong power in classification of prognosis. Moreover, Cox multivariate regression analysis confirmed these 3 SVM models were all the significant independent prognostic model for overall survival in testing set and overall patients. CONCLUSIONS/SIGNIFICANCE Our SVM prognostic models designed in the RCT displayed strong power in refining patient prognosis for locally advanced NPC, potentially directing future target therapy against the related signaling pathways.
Collapse
|