1
|
Lee TF, Lee SH, Tseng CD, Lin CH, Chiu CM, Lin GZ, Yang J, Chang L, Chiu YH, Su CT, Yeh SA. Using machine learning algorithm to analyse the hypothyroidism complications caused by radiotherapy in patients with head and neck cancer. Sci Rep 2023; 13:19185. [PMID: 37932394 PMCID: PMC10628223 DOI: 10.1038/s41598-023-46509-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 11/01/2023] [Indexed: 11/08/2023] Open
Abstract
Machine learning algorithms were used to analyze the odds and predictors of complications of thyroid damage after radiation therapy in patients with head and neck cancer. This study used decision tree (DT), random forest (RF), and support vector machine (SVM) algorithms to evaluate predictors for the data of 137 head and neck cancer patients. Candidate factors included gender, age, thyroid volume, minimum dose, average dose, maximum dose, number of treatments, and relative volume of the organ receiving X dose (X: 10, 20, 30, 40, 50, 60 Gy). The algorithm was optimized according to these factors and tenfold cross-validation to analyze the state of thyroid damage and select the predictors of thyroid dysfunction. The importance of the predictors identified by the three machine learning algorithms was ranked: the top five predictors were age, thyroid volume, average dose, V50 and V60. Of these, age and volume were negatively correlated with thyroid damage, indicating that the greater the age and thyroid volume, the lower the risk of thyroid damage; the average dose, V50 and V60 were positively correlated with thyroid damage, indicating that the larger the average dose, V50 and V60, the higher the risk of thyroid damage. The RF algorithm was most accurate in predicting the probability of thyroid damage among the three algorithms optimized using the above factors. The Area under the receiver operating characteristic curve (AUC) was 0.827 and the accuracy (ACC) was 0.824. This study found that five predictors (age, thyroid volume, mean dose, V50 and V60) are important factors affecting the chance that patients with head and neck cancer who received radiation therapy will develop hypothyroidism. Using these factors as the prediction basis of the algorithm and using RF to predict the occurrence of hypothyroidism had the highest ACC, which was 82.4%. This algorithm is quite helpful in predicting the probability of radiotherapy complications. It also provides references for assisting medical decision-making in the future.
Collapse
Affiliation(s)
- Tsair-Fwu Lee
- Medical Physics and Informatics Laboratory of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
- Department of Medical Imaging and Radiological Sciences, Kaohsiung Medical University, Kaohsiung, 80708, Taiwan
- PhD Program in Biomedical Engineering, Kaohsiung Medical University, Kaohsiung, 80708, Taiwan
- School of Dentistry, College of Dental Medicine, Kaohsiung Medical University, Kaohsiung, 80708, Taiwan
| | - Shen-Hao Lee
- Medical Physics and Informatics Laboratory of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
| | - Chin-Dar Tseng
- Medical Physics and Informatics Laboratory of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan.
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan.
| | - Chih-Hsueh Lin
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
- PhD Program in Biomedical Engineering, Kaohsiung Medical University, Kaohsiung, 80708, Taiwan
| | - Chi-Min Chiu
- Medical Physics and Informatics Laboratory of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
| | - Guang-Zhi Lin
- Medical Physics and Informatics Laboratory of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
- Department of Tactical Control Air Traffic Control & Meteorology, Air Force Institute of Technology, Kaohsiung, 82047, Taiwan
| | - Jack Yang
- Department of Radiation Oncology, RWJ Medical School, Long Branch, NJ, USA
- Department of Radiation Oncology, Monmouth Medical Center, RWJBH Medical School, Long Branch, NJ, USA
| | - Liyun Chang
- Department of Medical Imaging and Radiological Sciences, I-Shou University, Kaohsiung, 82445, Taiwan
| | - Yu-Hao Chiu
- Medical Physics and Informatics Laboratory of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan
| | - Chun-Ting Su
- Department of Medical Imaging and Radiological Sciences, I-Shou University, Kaohsiung, 82445, Taiwan
- Department of Radiation Oncology, E-DA Hospital, Kaohsiung, 82445, Taiwan
| | - Shyh-An Yeh
- Medical Physics and Informatics Laboratory of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan.
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, 80778, Taiwan.
- Department of Medical Imaging and Radiological Sciences, I-Shou University, Kaohsiung, 82445, Taiwan.
- Department of Radiation Oncology, E-DA Hospital, Kaohsiung, 82445, Taiwan.
| |
Collapse
|
2
|
|
3
|
|
4
|
|
5
|
Khalili-Damghani K, Abdi F, Abolmakarem S. Hybrid soft computing approach based on clustering, rule mining, and decision tree analysis for customer segmentation problem: Real case of customer-centric industries. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.09.001] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
6
|
A DBN-based resampling SVM ensemble learning paradigm for credit classification with imbalanced data. Appl Soft Comput 2018. [DOI: 10.1016/j.asoc.2018.04.049] [Citation(s) in RCA: 92] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
7
|
A multi-objective evolutionary approach to training set selection for support vector machine. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.02.022] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Tan SC, Wang S, Watada J. A self-adaptive class-imbalance TSK neural network with applications to semiconductor defects detection. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2017.10.040] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
9
|
Fernández A, Carmona CJ, José del Jesus M, Herrera F. A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets. Int J Neural Syst 2017. [DOI: 10.1142/s0129065717500289] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
Collapse
Affiliation(s)
- Alberto Fernández
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada 18071, Spain
| | - Cristobal José Carmona
- Department of Civil Engineering, University of Burgos, Burgos 09006, Spain
- Leicester School of Pharmacy, De Montfort University, Leicester, LE1 9BH, UK
| | | | - Francisco Herrera
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada 18071, Spain
- Faculty of Computing and Information Technology — North Jeddah, King Abdulaziz University (KAU), Jeddah 80200, Saudi Arabia
| |
Collapse
|
10
|
Ibarguren I, Lasarguren A, Pérez JM, Muguerza J, Gurrutxaga I, Arbelaitz O. BFPART: Best-First PART. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2016.07.023] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
11
|
Verbiest N, Derrac J, Cornelis C, García S, Herrera F. Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.09.006] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
12
|
Tan SC, Watada J, Ibrahim Z, Khalid M. Evolutionary fuzzy ARTMAP neural networks for classification of semiconductor defects. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:933-950. [PMID: 25014967 DOI: 10.1109/tnnls.2014.2329097] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Wafer defect detection using an intelligent system is an approach of quality improvement in semiconductor manufacturing that aims to enhance its process stability, increase production capacity, and improve yields. Occasionally, only few records that indicate defective units are available and they are classified as a minority group in a large database. Such a situation leads to an imbalanced data set problem, wherein it engenders a great challenge to deal with by applying machine-learning techniques for obtaining effective solution. In addition, the database may comprise overlapping samples of different classes. This paper introduces two models of evolutionary fuzzy ARTMAP (FAM) neural networks to deal with the imbalanced data set problems in a semiconductor manufacturing operations. In particular, both the FAM models and hybrid genetic algorithms are integrated in the proposed evolutionary artificial neural networks (EANNs) to classify an imbalanced data set. In addition, one of the proposed EANNs incorporates a facility to learn overlapping samples of different classes from the imbalanced data environment. The classification results of the proposed evolutionary FAM neural networks are presented, compared, and analyzed using several classification metrics. The outcomes positively indicate the effectiveness of the proposed networks in handling classification problems with imbalanced data sets.
Collapse
|
13
|
|
14
|
RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique. INTELLIGENT INFORMATION AND DATABASE SYSTEMS 2015. [DOI: 10.1007/978-3-319-15702-3_37] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|
15
|
|
16
|
Farquad M, Ravi V, Raju SB. Churn prediction using comprehensible support vector machine: An analytical CRM application. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2014.01.031] [Citation(s) in RCA: 94] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
17
|
Gonzalez-Abril L, Nuñez H, Angulo C, Velasco F. GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2013.12.013] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
18
|
López V, Triguero I, Carmona CJ, García S, Herrera F. Addressing imbalanced classification with instance generation techniques: IPADE-ID. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2013.01.050] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
19
|
Park M, Kim H, Kim SK. Knowledge Discovery in a Community Data Set: Malnutrition among the Elderly. Healthc Inform Res 2014; 20:30-8. [PMID: 24627816 PMCID: PMC3950263 DOI: 10.4258/hir.2014.20.1.30] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 01/16/2014] [Accepted: 01/20/2014] [Indexed: 12/04/2022] Open
Abstract
Objectives The purpose of this study was to design a prediction model that explains the characteristics of elderly adults at risk of malnutrition. Methods Data were obtained from a large data set, 2008 Korean Elderly Survey, in which the data of 15,146 subjects were entered. With nutritional status a target variable, the input variables included the demographic and socioeconomic status of participants. The data were analyzed by using the SPSS Clementine 12.0 program's feature selection node to select meaningful variables. Results Among the C5.0, C&R Tree, QUEST, and CHAID models, the highest predictability was reported by C&R Tree with the accuracy rate of 77.1%. The presence of more than two comorbidities, living alone status, having severe difficulty in daily activities, and lower perceived economic status were identified as risk factors of malnutrition in elderly. Conclusions A reliable decision support model was designed to provide accurate information regarding the characteristics of elderly individuals with malnutrition. The findings demonstrated the good feasibility of data mining when used for a large community data set and its value in assisting health professionals and local decision makers to come up with effective strategies for achieving public health goals.
Collapse
Affiliation(s)
- Myonghwa Park
- College of Nursing, Chungnam National University, Daejeon, Korea
| | - Hyeyoung Kim
- Department of Nursing, Catholic Sangji College, Andong, Korea
| | - Sun Kyung Kim
- College of Nursing, Chungnam National University, Daejeon, Korea
| |
Collapse
|
20
|
Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients. Appl Soft Comput 2014. [DOI: 10.1016/j.asoc.2013.07.016] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Enhancement of artificial neural network learning using centripetal accelerated particle swarm optimization for medical diseases diagnosis. Soft comput 2013. [DOI: 10.1007/s00500-013-1198-0] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
22
|
Fazzolari M, Giglio B, Alcalá R, Marcelloni F, Herrera F. A study on the application of instance selection techniques in genetic fuzzy rule-based classification systems: Accuracy-complexity trade-off. Knowl Based Syst 2013. [DOI: 10.1016/j.knosys.2013.07.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
23
|
López V, Fernández A, García S, Palade V, Herrera F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf Sci (N Y) 2013. [DOI: 10.1016/j.ins.2013.07.007] [Citation(s) in RCA: 932] [Impact Index Per Article: 84.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
24
|
García-Pedrajas N, de Haro-García A, Pérez-Rodríguez J. A scalable memetic algorithm for simultaneous instance and feature selection. EVOLUTIONARY COMPUTATION 2013; 22:1-45. [PMID: 23544367 DOI: 10.1162/evco_a_00102] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Instance selection is becoming increasingly relevant due to the huge amount of data that is constantly produced in many fields of research. At the same time, most of the recent pattern recognition problems involve highly complex datasets with a large number of possible explanatory variables. For many reasons, this abundance of variables significantly harms classification or recognition tasks. There are efficiency issues, too, because the speed of many classification algorithms is largely improved when the complexity of the data is reduced. One of the approaches to address problems that have too many features or instances is feature or instance selection, respectively. Although most methods address instance and feature selection separately, both problems are interwoven, and benefits are expected from facing these two tasks jointly. This paper proposes a new memetic algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection. The proposed method performs four different local search procedures with the aim of obtaining the most relevant subsets of instances and features to perform an accurate classification. A new fitness function is also proposed that enforces instance selection but avoids putting too much pressure on removing features. We prove experimentally that this fitness function improves the results in terms of testing error. Regarding the scalability of the method, an extension of the stratification approach is developed for simultaneous instance and feature selection. This extension allows the application of the proposed algorithm to large datasets. An extensive comparison using 55 medium to large datasets from the UCI Machine Learning Repository shows the usefulness of our method. Additionally, the method is applied to 30 large problems, with very good results. The accuracy of the method for class-imbalanced problems in a set of 40 datasets is shown. The usefulness of the method is also tested using decision trees and support vector machines as classification methods.
Collapse
Affiliation(s)
- Nicolás García-Pedrajas
- Department of Computing and Numerical Analysis, University of Cordoba, Córdoba, 14014, Spain
| | | | | |
Collapse
|
25
|
García-Pedrajas N, Perez-Rodríguez J, de Haro-García A. OligoIS: Scalable Instance Selection for Class-Imbalanced Data Sets. IEEE TRANSACTIONS ON CYBERNETICS 2013; 43:332-346. [PMID: 22868583 DOI: 10.1109/tsmcb.2012.2206381] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In current research, an enormous amount of information is constantly being produced, which poses a challenge for data mining algorithms. Many of the problems in extremely active research areas, such as bioinformatics, security and intrusion detection, or text mining, share the following two features: large data sets and class-imbalanced distribution of samples. Although many methods have been proposed for dealing with class-imbalanced data sets, most of these methods are not scalable to the very large data sets common to those research fields. In this paper, we propose a new approach to dealing with the class-imbalance problem that is scalable to data sets with many millions of instances and hundreds of features. This proposal is based on the divide-and-conquer principle combined with application of the selection process to balanced subsets of the whole data set. This divide-and-conquer principle allows the execution of the algorithm in linear time. Furthermore, the proposed method is easy to implement using a parallel environment and can work without loading the whole data set into memory. Using 40 class-imbalanced medium-sized data sets, we will demonstrate our method's ability to improve the results of state-of-the-art instance selection methods for class-imbalanced data sets. Using three very large data sets, we will show the scalability of our proposal to millions of instances and hundreds of features.
Collapse
|
26
|
Overlapping, Rare Examples and Class Decomposition in Learning Classifiers from Imbalanced Data. EMERGING PARADIGMS IN MACHINE LEARNING 2013. [DOI: 10.1007/978-3-642-28699-5_11] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2022]
|
27
|
Arif F, Suryana N, Hussin B. Cascade Quality Prediction Method Using Multiple PCA+ID3 for Multi-Stage Manufacturing System. ACTA ACUST UNITED AC 2013. [DOI: 10.1016/j.ieri.2013.11.029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
28
|
The quest for the optimal class distribution: an approach for enhancing the effectiveness of learning via resampling methods for imbalanced data sets. PROGRESS IN ARTIFICIAL INTELLIGENCE 2012. [DOI: 10.1007/s13748-012-0034-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
29
|
García-Pedrajas N, Pérez-Rodríguez J. Multi-selection of instances: A straightforward way to improve evolutionary instance selection. Appl Soft Comput 2012. [DOI: 10.1016/j.asoc.2012.06.013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
30
|
|
31
|
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. ACTA ACUST UNITED AC 2012. [DOI: 10.1109/tsmcc.2011.2161285] [Citation(s) in RCA: 1533] [Impact Index Per Article: 127.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
32
|
Cruz Díaz NP, Maña López MJ, Vázquez JM, Álvarez VP. A machine‐learning approach to negation and speculation detection in clinical texts. ACTA ACUST UNITED AC 2012. [DOI: 10.1002/asi.22679] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Noa P. Cruz Díaz
- Department of Information TechnologyUniversity of Huelva Huelva Spain
| | | | | | | |
Collapse
|
33
|
Son CS, Jang BK, Seo ST, Kim MS, Kim YN. A hybrid decision support model to discover informative knowledge in diagnosing acute appendicitis. BMC Med Inform Decis Mak 2012; 12:17. [PMID: 22410346 PMCID: PMC3314559 DOI: 10.1186/1472-6947-12-17] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Accepted: 03/13/2012] [Indexed: 12/29/2022] Open
Abstract
Background The aim of this study is to develop a simple and reliable hybrid decision support model by combining statistical analysis and decision tree algorithms to ensure high accuracy of early diagnosis in patients with suspected acute appendicitis and to identify useful decision rules. Methods We enrolled 326 patients who attended an emergency medical center complaining mainly of acute abdominal pain. Statistical analysis approaches were used as a feature selection process in the design of decision support models, including the Chi-square test, Fisher's exact test, the Mann-Whitney U-test (p < 0.01), and Wald forward logistic regression (entry and removal criteria of 0.01 and 0.05, or 0.05 and 0.10, respectively). The final decision support models were constructed using the C5.0 decision tree algorithm of Clementine 12.0 after pre-processing. Results Of 55 variables, two subsets were found to be indispensable for early diagnostic knowledge discovery in acute appendicitis. The two subsets were as follows: (1) lymphocytes, urine glucose, total bilirubin, total amylase, chloride, red blood cell, neutrophils, eosinophils, white blood cell, complaints, basophils, glucose, monocytes, activated partial thromboplastin time, urine ketone, and direct bilirubin in the univariate analysis-based model; and (2) neutrophils, complaints, total bilirubin, urine glucose, and lipase in the multivariate analysis-based model. The experimental results showed that the model with univariate analysis (80.2%, 82.4%, 78.3%, 76.8%, 83.5%, and 80.3%) outperformed models using multivariate analysis (71.6%, 69.3%, 73.7%, 69.7%, 73.3%, and 71.5% with entry and removal criteria of 0.01 and 0.05; 73.5%, 66.0%, 80.0%, 74.3%, 72.9%, and 73.0% with entry and removal criteria of 0.05 and 0.10) in terms of accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under ROC curve, during a 10-fold cross validation. A statistically significant difference was detected in the pairwise comparison of ROC curves (p < 0.01, 95% CI, 3.13-14.5; p < 0.05, 95% CI, 1.54-13.1). The larger induced decision model was more effective for identifying acute appendicitis in patients with acute abdominal pain, whereas the smaller induced decision tree was less accurate with the test data. Conclusions The decision model developed in this study can be applied as an aid in the initial decision making of clinicians to increase vigilance in cases of suspected acute appendicitis.
Collapse
Affiliation(s)
- Chang Sik Son
- Department of Medical Informatics, School of Medicine, Keimyung University, 2800 Dalgubeoldaero, Dalseo-Gu, Daegu, Republic of Korea
| | | | | | | | | |
Collapse
|
34
|
|
35
|
|
36
|
|
37
|
Albisua I, Arbelaitz O, Gurrutxaga I, Muguerza J, Pérez JM. C4.5 Consolidation Process: An Alternative to Intelligent Oversampling Methods in Class Imbalance Problems. ADVANCES IN ARTIFICIAL INTELLIGENCE 2011. [DOI: 10.1007/978-3-642-25274-7_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
|
38
|
Addressing the Classification with Imbalanced Data: Open Problems and New Challenges on Class Distribution. LECTURE NOTES IN COMPUTER SCIENCE 2011. [DOI: 10.1007/978-3-642-21219-2_1] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
39
|
Addressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling. Soft comput 2010. [DOI: 10.1007/s00500-010-0625-8] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|