1
|
Liu ZW, Chen G, Dong CF, Qiu WR, Zhang SH. Intelligent assistant diagnosis for pediatric inguinal hernia based on a multilayer and unbalanced classification model. Front Physiol 2023; 14:1105891. [PMID: 36998990 PMCID: PMC10043203 DOI: 10.3389/fphys.2023.1105891] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 02/27/2023] [Indexed: 03/17/2023] Open
Abstract
As one of the most common diseases in pediatric surgery, an inguinal hernia is usually diagnosed by medical experts based on clinical data collected from magnetic resonance imaging (MRI), computed tomography (CT), or B-ultrasound. The parameters of blood routine examination, such as white blood cell count and platelet count, are often used as diagnostic indicators of intestinal necrosis. Based on the medical numerical data on blood routine examination parameters and liver and kidney function parameters, this paper used machine learning algorithm to assist the diagnosis of intestinal necrosis in children with inguinal hernia before operation. In the work, we used clinical data consisting of 3,807 children with inguinal hernia symptoms and 170 children with intestinal necrosis and perforation caused by the disease. Three different models were constructed according to the blood routine examination and liver and kidney function. Some missing values were replaced by using the RIN-3M (median, mean, or mode region random interpolation) method according to the actual necessity, and the ensemble learning based on the voting principle was used to deal with the imbalanced datasets. The model trained after feature selection yielded satisfactory results with an accuracy of 86.43%, sensitivity of 84.34%, specificity of 96.89%, and AUC value of 0.91. Therefore, the proposed methods may be a potential idea for auxiliary diagnosis of inguinal hernia in children.
Collapse
Affiliation(s)
- Zhi-Wen Liu
- Department of General Surgery, Jiangxi Provincial Children’s Hospital, Nanchang, China
| | - Gang Chen
- Computer Department, Jing-De-Zhen Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Chao-Fan Dong
- Department of General Surgery, Jingdezhen No. 1 People’s Hospital, Jingdezhen, China
| | - Wang-Ren Qiu
- Computer Department, Jing-De-Zhen Jingdezhen Ceramic Institute, Jingdezhen, China
- *Correspondence: Wang-Ren Qiu, , ; Shou-Hua Zhang,
| | - Shou-Hua Zhang
- Department of General Surgery, Jiangxi Provincial Children’s Hospital, Nanchang, China
- *Correspondence: Wang-Ren Qiu, , ; Shou-Hua Zhang,
| |
Collapse
|
2
|
Prediction of Important Factors for Bleeding in Liver Cirrhosis Disease Using Ensemble Data Mining Approach. MATHEMATICS 2020. [DOI: 10.3390/math8111887] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The main motivation to conduct the study presented in this paper was the fact that due to the development of improved solutions for prediction risk of bleeding and thus a faster and more accurate diagnosis of complications in cirrhotic patients, mortality of cirrhosis patients caused by bleeding of varices fell at the turn in the 21th century. Due to this fact, an additional research in this field is needed. The objective of this paper is to develop one prediction model that determines most important factors for bleeding in liver cirrhosis, which is useful for diagnosis and future treatment of patients. To achieve this goal, authors proposed one ensemble data mining methodology, as the most modern in the field of prediction, for integrating on one new way the two most commonly used techniques in prediction, classification with precede attribute number reduction and multiple logistic regression for calibration. Method was evaluated in the study, which analyzed the occurrence of variceal bleeding for 96 patients from the Clinical Center of Nis, Serbia, using 29 data from clinical to the color Doppler. Obtained results showed that proposed method with such big number and different types of data demonstrates better characteristics than individual technique integrated into it.
Collapse
|
3
|
Rešetar J, Pfeifer D, Mišigoj-Duraković M, Sorić M, Gajdoš Kljusurić J, Šatalić Z. Eveningness in Energy Intake among Adolescents with Implication on Anthropometric Indicators of Nutritional Status: The CRO-PALS Longitudinal Study. Nutrients 2020; 12:nu12061710. [PMID: 32517370 PMCID: PMC7352272 DOI: 10.3390/nu12061710] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 06/03/2020] [Accepted: 06/04/2020] [Indexed: 12/19/2022] Open
Abstract
Shifting of energy intake towards a later time in the day is associated with an increased risk of obesity in adults. However, there is a lack of data for adolescents. The aim of this study was to investigate adolescents eveningness in energy intake (EV) and its association with anthropometric indicators of nutritional status. This investigation was based on results from the Croatian physical activity in adolescence longitudinal study (CRO-PALS). The cohort included 607 adolescents (50.25% females and 49.75% males) who were assessed at the age of 15/16 and 18/19. A single multi-pass 24-h recall was used as a dietary assessment method, while anthropometric indicators of nutritional status included body mass index (BMI), waist to hip ratio (WHR) and the sum of four skinfolds. The School Health Action, Planning and Evaluation System (SHAPES) questionnaire was used to assess active daily energy expenditure and sedentary behaviors. EV was significantly higher at 18/19 years compared to 15/16 years in whole population (p < 0.01), among male adolescents (p < 0.01), but not among female adolescents (p > 0.05). Although a significant correlation between EV and WHR was found in females at the age of 15/16 (p < 0.01), the results of this study suggest that EV has no or a minor effect on anthropometric indicators of nutritional status in adolescence.
Collapse
Affiliation(s)
- Josip Rešetar
- Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (J.R.); (D.P.); (Z.Š.)
| | - Danijela Pfeifer
- Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (J.R.); (D.P.); (Z.Š.)
| | - Marjeta Mišigoj-Duraković
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, 10000 Zagreb, Croatia; (M.M.-D.); (M.S.)
| | - Maroje Sorić
- Faculty of Kinesiology, University of Zagreb, Horvaćanski zavoj 15, 10000 Zagreb, Croatia; (M.M.-D.); (M.S.)
- Faculty of Sport, University of Ljubljana, Gortanova ulica 22, 1000 Ljubljana, Slovenia
| | - Jasenka Gajdoš Kljusurić
- Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (J.R.); (D.P.); (Z.Š.)
- Correspondence: ; Tel.: +385-14-605-025
| | - Zvonimir Šatalić
- Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (J.R.); (D.P.); (Z.Š.)
| |
Collapse
|
4
|
Grochowska E, Kinal A, Sobek Z, Siatkowski I, Bednarczyk M. Field study on the factors affecting egg weight loss, early embryonic mortality, hatchability, and chick mortality with the use of classification tree technique. Poult Sci 2019; 98:3626-3636. [DOI: 10.3382/ps/pez180] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 03/16/2019] [Indexed: 11/20/2022] Open
|
5
|
Zhang B, Shweikh Y, Khawaja AP, Gallacher J, Bauermeister S, Foster PJ. Associations with Corneal Hysteresis in a Population Cohort: Results from 96 010 UK Biobank Participants. Ophthalmology 2019; 126:1500-1510. [PMID: 31471087 DOI: 10.1016/j.ophtha.2019.06.029] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 06/07/2019] [Accepted: 06/28/2019] [Indexed: 11/16/2022] Open
Abstract
PURPOSE To describe the distribution of corneal hysteresis (CH) in a large cohort and explore its associated factors and possible clinical applications. DESIGN Cross-sectional study within the UK Biobank, a large cohort study in the United Kingdom. PARTICIPANTS We analyzed CH data from 93 345 eligible participants in the UK Biobank cohort, aged 40 to 69 years. METHODS All analyses were performed using left eye data. Linear regression models were used to evaluate associations between CH and demographic, lifestyle, ocular, and systemic variables. Piecewise logistic regression models were used to explore the relationship between self-reported glaucoma and CH. MAIN OUTCOME MEASURES Corneal hysteresis (mmHg). RESULTS The mean CH was 10.6 mmHg (10.4 mmHg in male and 10.8 mmHg in female participants). After adjusting for covariables, CH was significantly negatively associated with male sex, age, black ethnicity, self-reported glaucoma, diastolic blood pressure, and height. Corneal hysteresis was significantly positively associated with smoking, hyperopia, diabetes, systemic lupus erythematosus (SLE), greater deprivation (Townsend index), and Goldmann-correlated intraocular pressure (IOPg). Self-reported glaucoma and CH were significantly associated when CH was less than 10.1 mmHg (odds ratio, 0.86; 95% confidence interval, 0.79-0.94 per mmHg CH increase) after adjusting for covariables. When CH exceeded 10.1 mmHg, there was no significant association between CH and self-reported glaucoma. CONCLUSIONS In our analyses, CH was significantly associated with factors including age, sex, and ethnicity, which should be taken into account when interpreting CH values. In our cohort, lower CH was significantly associated with a higher prevalence of self-reported glaucoma when CH was less than 10.1 mmHg. Corneal hysteresis may serve as a biomarker aiding glaucoma case detection.
Collapse
Affiliation(s)
- Bing Zhang
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | - Yusrah Shweikh
- National Institute for Health Research Biomedical Research Centre, Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom; UCL Institute of Ophthalmology, London, United Kingdom
| | - Anthony P Khawaja
- National Institute for Health Research Biomedical Research Centre, Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom; UCL Institute of Ophthalmology, London, United Kingdom
| | - John Gallacher
- Department of Psychiatry, University of Oxford, Oxford, United Kingdom
| | | | - Paul J Foster
- National Institute for Health Research Biomedical Research Centre, Moorfields Eye Hospital NHS Foundation Trust, London, United Kingdom; UCL Institute of Ophthalmology, London, United Kingdom.
| |
Collapse
|
6
|
Fritz BA, Chen Y, Murray-Torres TM, Gregory S, Ben Abdallah A, Kronzer A, McKinnon SL, Budelier T, Helsten DL, Wildes TS, Sharma A, Avidan MS. Using machine learning techniques to develop forecasting algorithms for postoperative complications: protocol for a retrospective study. BMJ Open 2018; 8:e020124. [PMID: 29643160 PMCID: PMC5898287 DOI: 10.1136/bmjopen-2017-020124] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
INTRODUCTION Mortality and morbidity following surgery are pressing public health concerns in the USA. Traditional prediction models for postoperative adverse outcomes demonstrate good discrimination at the population level, but the ability to forecast an individual patient's trajectory in real time remains poor. We propose to apply machine learning techniques to perioperative time-series data to develop algorithms for predicting adverse perioperative outcomes. METHODS AND ANALYSIS This study will include all adult patients who had surgery at our tertiary care hospital over a 4-year period. Patient history, laboratory values, minute-by-minute intraoperative vital signs and medications administered will be extracted from the electronic medical record. Outcomes will include in-hospital mortality, postoperative acute kidney injury and postoperative respiratory failure. Forecasting algorithms for each of these outcomes will be constructed using density-based logistic regression after employing a Nadaraya-Watson kernel density estimator. Time-series variables will be analysed using first and second-order feature extraction, shapelet methods and convolutional neural networks. The algorithms will be validated through measurement of precision and recall. ETHICS AND DISSEMINATION This study has been approved by the Human Research Protection Office at Washington University in St Louis. The successful development of these forecasting algorithms will allow perioperative healthcare clinicians to predict more accurately an individual patient's risk for specific adverse perioperative outcomes in real time. Knowledge of a patient's dynamic risk profile may allow clinicians to make targeted changes in the care plan that will alter the patient's outcome trajectory. This hypothesis will be tested in a future randomised controlled trial.
Collapse
Affiliation(s)
- Bradley A Fritz
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Yixin Chen
- Department of Computer Science and Engineering, Washington University in St Louis, St Louis, Missouri, USA
| | - Teresa M Murray-Torres
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Stephen Gregory
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Arbi Ben Abdallah
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Alex Kronzer
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Sherry Lynn McKinnon
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Thaddeus Budelier
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Daniel L Helsten
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Troy S Wildes
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Anshuman Sharma
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| | - Michael Simon Avidan
- Department of Anesthesiology, Washington University in St Louis, St Louis, Missouri, USA
| |
Collapse
|
7
|
Arevalillo JM, Sztein MB, Kotloff KL, Levine MM, Simon JK. Identification of immune correlates of protection in Shigella infection by application of machine learning. J Biomed Inform 2017; 74:1-9. [PMID: 28802838 DOI: 10.1016/j.jbi.2017.08.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2017] [Revised: 07/10/2017] [Accepted: 08/08/2017] [Indexed: 11/25/2022]
Abstract
BACKGROUND Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. METHODS The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. RESULTS The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. CONCLUSION Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines.
Collapse
Affiliation(s)
- Jorge M Arevalillo
- Department of Statistics and Operational Research, University Nacional Educación a Distancia, Paseo Senda del Rey 9, 28040 Madrid, Spain.
| | - Marcelo B Sztein
- Center for Vaccine Development, Departments of Pediatrics and Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
| | - Karen L Kotloff
- Center for Vaccine Development, Departments of Pediatrics and Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
| | - Myron M Levine
- Center for Vaccine Development, Departments of Pediatrics and Medicine, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
| | | |
Collapse
|
8
|
Liu R, Yue Y, Jiang H, Lu J, Wu A, Geng D, Wang J, Lu J, Li S, Tang H, Lu X, Zhang K, Liu T, Yuan Y, Wang Q. A risk prediction model for post-stroke depression in Chinese stroke survivors based on clinical and socio-psychological features. Oncotarget 2017; 8:62891-62899. [PMID: 28968957 PMCID: PMC5609889 DOI: 10.18632/oncotarget.16907] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 03/14/2017] [Indexed: 01/22/2023] Open
Abstract
Background Post-stroke depression (PSD) is a frequent complication that worsens rehabilitation outcomes and patient quality of life. This study developed a risk prediction model for PSD based on patient clinical and socio-psychology features for the early detection of high risk PSD patients. Results Risk predictors included a history of brain cerebral infarction (odds ratio [OR], 3.84; 95% confidence interval [CI], 2.22-6.70; P < 0.0001) and four socio-psychological factors including Eysenck Personality Questionnaire with Neuroticism/Stability (OR, 1.18; 95% CI, 1.12-1.20; P < 0.0001), life event scale (OR, 0.99; 95% CI, 0.98-0.99; P = 0.0007), 20 items Toronto Alexithymia Scale (OR, 1.06; 95% CI, 1.02-1.10; P = 0.002) and Social Support Rating Scale (OR, 0.91; 95% CI, 0.87-0.90; P < 0.001) in the logistic model. In addition, 11 rules were generated in the tree model. The areas under the curve of the ROC and the accuracy for the tree model were 0.85 and 0.86, respectively. Methods This study recruited 562 stroke patients in China who were assessed for demographic data, medical history, vascular risk factors, functional status post-stroke, and socio-psychological factors. Multivariate backward logistic regression was used to extract risk factors for depression in 1-month after stroke. We converted the logistic model to a visible tree model using the decision tree method. Receiver operating characteristic (ROC) was used to evaluate the performance of the model. Conclusion This study provided an effective risk model for PSD and indicated that the socio-psychological factors were important risk factors of PSD.
Collapse
Affiliation(s)
- Rui Liu
- School of Information Science and Engineering, Southeast University, Nanjing, China
| | - Yingying Yue
- Department of Psychosomatics and Psychiatry, Zhongda Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Haitang Jiang
- Department of Psychosomatics and Psychiatry, Zhongda Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Jian Lu
- School of Information Science and Engineering, Southeast University, Nanjing, China
| | - Aiqin Wu
- Department of Psychosomatics, The Affiliated First Hospital of Suzhou University, Suzhou, China
| | - Deqin Geng
- Department of Neurology, Affiliated Hospital of Xuzhou Medical College, Xuzhou, China
| | - Jun Wang
- Department of Neurology, Nanjing First Hospital, Nanjing, China
| | - Jianxin Lu
- Department of Neurology, Gaochun People's Hospital, Nanjing, China
| | - Shenghua Li
- Department of Neurology, Jiangning Nanjing Hospital, Nanjing, China
| | - Hua Tang
- Department of Psychiatry, Huai'an No.3 People's Hospital, Huai'an, China
| | - Xuesong Lu
- Department of Rehabilitation, Affiliated Zhongda Hospital of Southeast University, Nanjing, China
| | - Kezhong Zhang
- Department of Neurology, The First Affiliated Hospital of Nanjing Medical University, Nanjing, China
| | - Tian Liu
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, Institute of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong Univerisity, Xi'an, China
| | - Yonggui Yuan
- Department of Psychosomatics and Psychiatry, Zhongda Hospital, School of Medicine, Southeast University, Nanjing, China
| | - Qiao Wang
- School of Information Science and Engineering, Southeast University, Nanjing, China
| |
Collapse
|
9
|
Pal C, Okabe T, Kulothungan V, Sangolla N, Manoharan J, Stewart W, Combest J. Factors Influencing Specificity and Sensitivity of Injury Severity Prediction (ISP) Algorithm for AACN. ACTA ACUST UNITED AC 2016. [DOI: 10.20485/jsaeijae.7.1_15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
10
|
Peek N, Combi C, Marin R, Bellazzi R. Thirty years of artificial intelligence in medicine (AIME) conferences: A review of research themes. Artif Intell Med 2015; 65:61-73. [DOI: 10.1016/j.artmed.2015.07.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Revised: 07/17/2015] [Accepted: 07/17/2015] [Indexed: 10/23/2022]
|
11
|
Ishfaq R, Raja U. Bridging the Healthcare Access Divide: A Strategic Planning Model for Rural Telemedicine Network. DECISION SCIENCES 2015. [DOI: 10.1111/deci.12165] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Rafay Ishfaq
- Department of Aviation and Supply Chain Management; Harbert College of Business, Auburn University; Auburn AL 36849 U.S.A
| | - Uzma Raja
- Department of Information Systems; Statistics and Management Science, Culverhouse College of Commerce, The University of Alabama; Tuscaloosa AL 35487 U.S.A
| |
Collapse
|
12
|
Grochowska E, Piwczyński D, Portolano B, Mroczkowski S. Analysis of the influence of the PrP genotype on the litter size in Polish sheep using classification trees and logistic regression. Livest Sci 2014. [DOI: 10.1016/j.livsci.2013.11.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
13
|
Leslie WD, Lix LM. Comparison between various fracture risk assessment tools. Osteoporos Int 2014; 25:1-21. [PMID: 23797847 DOI: 10.1007/s00198-013-2409-3] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 05/24/2013] [Indexed: 11/28/2022]
Abstract
The suboptimal performance of bone mineral density as the sole predictor of fracture risk and treatment decision making has led to the development of risk prediction algorithms that estimate fracture probability using multiple risk factors for fracture, such as demographic and physical characteristics, personal and family history, other health conditions, and medication use. We review theoretical aspects for developing and validating risk assessment tools, and illustrate how these principles apply to the best studied fracture probability tools: the World Health Organization FRAX®, the Garvan Fracture Risk Calculator, and the QResearch Database's QFractureScores. Model development should follow a systematic and rigorous methodology around variable selection, model fit evaluation, performance evaluation, and internal and external validation. Consideration must always be given to how risk prediction tools are integrated into clinical practice guidelines to support better clinical decision making and improved patient outcomes. Accurate fracture risk assessment can guide clinicians and individuals in understanding the risk of having an osteoporosis-related fracture and inform their decision making to mitigate these risks.
Collapse
|
14
|
Jarvis SW, Kovacs C, Badriyah T, Briggs J, Mohammed MA, Meredith P, Schmidt PE, Featherstone PI, Prytherch DR, Smith GB. Development and validation of a decision tree early warning score based on routine laboratory test results for the discrimination of hospital mortality in emergency medical admissions. Resuscitation 2013; 84:1494-9. [DOI: 10.1016/j.resuscitation.2013.05.018] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Revised: 05/11/2013] [Accepted: 05/24/2013] [Indexed: 11/24/2022]
|
15
|
Azimi M, Kamrani A, Smadi H. Statistics-Based Prediction Analysis for Head and Neck Cancer Tumor Deformation. JOURNAL OF HEALTHCARE ENGINEERING 2012. [DOI: 10.1260/2040-2295.3.4.571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
16
|
Piwczyński D, Sitkowska B, Wiśniewska E. Application of classification trees and logistic regression to determine factors responsible for lamb mortality. Small Rumin Res 2012. [DOI: 10.1016/j.smallrumres.2011.09.014] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
17
|
Venermo M, Biancari F, Arvela E, Korhonen M, Söderström M, Halmesmäki K, Albäck A, Lepäntalo M. The role of chronic kidney disease as a predictor of outcome after revascularisation of the ulcerated diabetic foot. Diabetologia 2011; 54:2971-7. [PMID: 21845468 DOI: 10.1007/s00125-011-2279-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/15/2011] [Accepted: 07/27/2011] [Indexed: 12/29/2022]
Abstract
AIMS/HYPOTHESIS The aim of the study was to stratify the risk of diabetic patients with leg ulcer or gangrene undergoing infrainguinal revascularisation for critical limb ischaemia. METHODS The study cohort included 732 revascularisation procedures performed in 597 diabetic patients with ulcer or gangrene. Logistic regression and CART analysis were used for identification of predictors of 1-year outcome. RESULTS Logistic regression showed that chronic kidney disease (CKD) class (OR 1.38, 95% CI 1.16, 1.65) was an independent predictor of 1-year leg salvage (area under the receiver operating characteristic [ROC] curve 0.60, 95% CI 0.54, 0.65). The terminal nodes of the CART for 1-year leg salvage were CKD classes 4-5, the level (infrapopliteal vs femoropopliteal revascularisation), type of revascularisation (bypass surgery vs percutaneous transluminal angioplasty) and gangrene (area under the ROC curve 0.62, 95% CI 0.57, 0.68). Logistic regression showed that pulmonary disease (OR 1.76, 95% CI 1.11, 2.78), CKD class (OR 1.43, 95% CI 1.24, 1.65), foot gangrene (OR 1.76, 95% CI 1.21, 2.60) and patient age (OR 1.02, 95% CI 1.01, 1.04) were independent predictors of 1-year amputation-free survival (area under the ROC curve 0.65, 95% CI 0.60, 0.69). The terminal nodes of the CART for 1-year amputation-free survival were CKD classes 3-5, patient's age of ≥ 75 years and foot gangrene (area under the ROC curve 0.64, 95% CI 0.60, 0.68). CONCLUSIONS/INTERPRETATION CKD is a formidable risk factor for poor intermediate outcome after infrainguinal revascularisation in diabetic patients with foot ulcer or gangrene. CART analysis indicates that foot gangrene is also a significant risk factor for adverse outcome.
Collapse
Affiliation(s)
- M Venermo
- Department of Vascular Surgery, Helsinki University Central Hospital, PO Box 340, 00029 HUS Helsinki, Finland.
| | | | | | | | | | | | | | | |
Collapse
|
18
|
Biancari F, Myllyl M, Porela P, Laitio T, Kuttila K, Satta J, Lepojärvi M, Juvonen T, Airaksinen JKE. Postoperative stroke in patients on oral anticoagulation undergoing coronary artery bypass surgery. SCAND CARDIOVASC J 2011; 45:360-8. [PMID: 21615240 DOI: 10.3109/14017431.2011.585403] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
OBJECTIVE Patients on long-term warfarin treatment have an inherent high risk of stroke and here we aimed to identify the determinants of postoperative stroke after coronary artery bypass grafting (CABG) in these patients. METHODS A consecutive series of 270 patients on long-term warfarin treatment who underwent isolated CABG in two university hospitals was assessed by logistic regression as well as classification and regression tree (CART) analysis. RESULTS Postoperative stroke occurred in 10 patients during in-hospital stay (3.7%). Logistic regression showed that CHADS(2) > 2 (p = 0.036), recent thrombolysis (p < 0.0001) and history of deep vein thrombosis (p = 0.025) were independent predictors of postoperative stroke (area under the ROC curve 0.77). CART analysis showed that CHADS(2) > 2, history of stroke/TIA, no preoperative use of aspirin and preoperative use of low molecular weight heparins were associated with an increased risk of stroke (area under the ROC curve of 0.77). CONCLUSIONS Both CART and logistic regression analyses showed that the patient characteristics included in CHADS(2) score are important also in the prediction of postoperative stroke risk. Preoperative antiplatelet treatment may be beneficial in the high risk patients and the preoperative bridging with low molecular weight heparins may even be harmful in this respect.
Collapse
Affiliation(s)
- Fausto Biancari
- Department of Surgery, Oulu University Hospital, Oulu, Finland.
| | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Wulsin DF, Gupta JR, Mani R, Blanco JA, Litt B. Modeling electroencephalography waveforms with semi-supervised deep belief nets: fast classification and anomaly measurement. J Neural Eng 2011; 8:036015. [PMID: 21525569 DOI: 10.1088/1741-2560/8/3/036015] [Citation(s) in RCA: 126] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Clinical electroencephalography (EEG) records vast amounts of human complex data yet is still reviewed primarily by human readers. Deep belief nets (DBNs) are a relatively new type of multi-layer neural network commonly tested on two-dimensional image data but are rarely applied to times-series data such as EEG. We apply DBNs in a semi-supervised paradigm to model EEG waveforms for classification and anomaly detection. DBN performance was comparable to standard classifiers on our EEG dataset, and classification time was found to be 1.7-103.7 times faster than the other high-performing classifiers. We demonstrate how the unsupervised step of DBN learning produces an autoencoder that can naturally be used in anomaly measurement. We compare the use of raw, unprocessed data--a rarity in automated physiological waveform analysis--with hand-chosen features and find that raw data produce comparable classification and better anomaly measurement performance. These results indicate that DBNs and raw data inputs may be more effective for online automated EEG waveform recognition than other common techniques.
Collapse
Affiliation(s)
- D F Wulsin
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, USA.
| | | | | | | | | |
Collapse
|
20
|
Gustafsson MG, Wallman M, Wickenberg Bolin U, Göransson H, Fryknäs M, Andersson CR, Isaksson A. Improving Bayesian credibility intervals for classifier error rates using maximum entropy empirical priors. Artif Intell Med 2010; 49:93-104. [DOI: 10.1016/j.artmed.2010.02.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2009] [Revised: 12/07/2009] [Accepted: 02/16/2010] [Indexed: 10/19/2022]
|
21
|
Trujillano J, Badia M, Serviá L, March J, Rodriguez-Pozo A. Stratification of the severity of critically ill patients with classification trees. BMC Med Res Methodol 2009; 9:83. [PMID: 20003229 PMCID: PMC2797013 DOI: 10.1186/1471-2288-9-83] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2009] [Accepted: 12/09/2009] [Indexed: 11/27/2022] Open
Abstract
Background Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69-75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients.
Collapse
Affiliation(s)
- Javier Trujillano
- Intensive Care Unit, Hospital Universitario Arnau de Vilanova, IRBLLEIDA, Lleida (25198), Spain.
| | | | | | | | | |
Collapse
|
22
|
Meyfroidt G, Güiza F, Ramon J, Bruynooghe M. Machine learning techniques to examine large patient databases. Best Pract Res Clin Anaesthesiol 2009; 23:127-43. [PMID: 19449621 DOI: 10.1016/j.bpa.2008.09.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Computerization in healthcare in general, and in the operating room (OR) and intensive care unit (ICU) in particular, is on the rise. This leads to large patient databases, with specific properties. Machine learning techniques are able to examine and to extract knowledge from large databases in an automatic way. Although the number of potential applications for these techniques in medicine is large, few medical doctors are familiar with their methodology, advantages and pitfalls. A general overview of machine learning techniques, with a more detailed discussion of some of these algorithms, is presented in this review.
Collapse
Affiliation(s)
- Geert Meyfroidt
- Department of Intensive Care Medicine, UZ Leuven--Campus Gasthuisberg, Catholic University of Leuven, Herestraat 49, 3000 Leuven, Belgium.
| | | | | | | |
Collapse
|
23
|
Huang ML, Chen HY. Glaucoma Classification Model Based on GDx VCC Measured Parameters by Decision Tree. J Med Syst 2009; 34:1141-7. [DOI: 10.1007/s10916-009-9333-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2009] [Accepted: 06/11/2009] [Indexed: 11/28/2022]
|
24
|
de Toledo P, Rios PM, Ledezma A, Sanchis A, Alen JF, Lagares A. Predicting the outcome of patients with subarachnoid hemorrhage using machine learning techniques. ACTA ACUST UNITED AC 2009; 13:794-801. [PMID: 19369161 DOI: 10.1109/titb.2009.2020434] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Outcome prediction for subarachnoid hemorrhage (SAH) helps guide care and compare global management strategies. Logistic regression models for outcome prediction may be cumbersome to apply in clinical practice. OBJECTIVE To use machine learning techniques to build a model of outcome prediction that makes the knowledge discovered from the data explicit and communicable to domain experts. MATERIAL AND METHODS A derivation cohort (n = 441) of nonselected SAH cases was analyzed using different classification algorithms to generate decision trees and decision rules. Algorithms used were C4.5, fast decision tree learner, partial decision trees, repeated incremental pruning to produce error reduction, nearest neighbor with generalization, and ripple down rule learner. Outcome was dichotomized in favorable [Glasgow outcome scale (GOS) = I-II] and poor (GOS = III-V). An independent cohort (n = 193) was used for validation. An exploratory questionnaire was given to potential users (specialist doctors) to gather their opinion on the classifier and its usability in clinical routine. RESULTS The best classifier was obtained with the C4.5 algorithm. It uses only two attributes [World Federation of Neurological Surgeons (WFNS) and Fisher's scale] and leads to a simple decision tree. The accuracy of the classifier [area under the ROC curve (AUC) = 0.84; confidence interval (CI) = 0.80-0.88] is similar to that obtained by a logistic regression model (AUC = 0.86; CI = 0.83-0.89) derived from the same data and is considered better fit for clinical use.
Collapse
Affiliation(s)
- Paula de Toledo
- Control, Learning, and Systems Optimization Group, Universidad Carlos III de Madrid, Madrid 28040, Spain.
| | | | | | | | | | | |
Collapse
|
25
|
Green ST, Small MJ, Casman EA. Determinants of national diarrheal disease burden. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2009; 43:993-999. [PMID: 19320148 DOI: 10.1021/es8023226] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Diarrheal illness is a leading cause of child mortality in developing nations. Previous longitudinal studies have attempted to identify the factors that contribute to child mortality, but few have examined the determinants of diarrheal illness at a country level. Here we demonstrate the use of Classification and Regression Trees (CART) to predict diarrheal illness from a 192-country data set of country-level attributes and compare the performance of CART with a linear regression model. The CART model identifies improvements in rural sanitation as the most important spending priority for reducing diarrheal illness. We estimate that reducing unmet rural sanitation need worldwide by 65% would save the equivalent of 1.2 million lives annually.
Collapse
Affiliation(s)
- Sean T Green
- Engineering and Public Policy, Carnegie Mellon University, Baker Hall 129, Pittsburgh, PA 15213, USA.
| | | | | |
Collapse
|
26
|
|
27
|
Toma T, Abu-Hanna A, Bosman RJ. Discovery and integration of univariate patterns from daily individual organ-failure scores for intensive care mortality prediction. Artif Intell Med 2008; 43:47-60. [PMID: 18394871 DOI: 10.1016/j.artmed.2008.01.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2007] [Revised: 01/10/2008] [Accepted: 01/20/2008] [Indexed: 11/27/2022]
Abstract
OBJECTIVES The current established mortality predictive models in the intensive care rely only on patient information gathered within the first 24 hours of admission. Recent research demonstrated the added prognostic value residing in the sequential organ-failure assessment (SOFA) score which quantifies on each day the cumulative patient organ derangement. The objective of this paper is to develop and study predictive models that also incorporate univariate patterns of the six individual organ systems underlining the SOFA score. A model for a given day d predicts the probability of in-hospital mortality. MATERIALS AND METHODS We use the logistic framework to combine a summary statistic of the historic SOFA information for a patient together with selected dummy variables indicating the occurrence of univariate frequent temporal patterns of individual organ system functioning. We demonstrate the application of our method to a large real-life data set from an intensive care unit (ICU) in a teaching hospital. Model performance is tested in terms of the AUC and the Brier score. RESULTS An algorithm for categorization, discovery, and selection of univariate patterns of individual organ scores and the induction of predictive models. The case-study resulted in six daily models corresponding to days 2-7. Their AUC ranged between 0.715 and 0.794 and the Brier scores between 0.161 and 0.216. Models using only admission data but recalibrated for days 2-7 generated AUC ranging between 0.643 and 0.761 and Brier scores ranged between 0.175 and 0.230. CONCLUSIONS The results show that temporal organ-failure episodes improve predictions' quality in terms of both discrimination and calibration. In addition, they enhance the interpretability of models. Our approach should be applicable to many other medical domains where severity scores and sub-scores are collected.
Collapse
Affiliation(s)
- Tudor Toma
- Academic Medical Center, Universiteit van Amsterdam, Department of Medical Informatics, P.O. Box 22700, 1100 DE Amsterdam, The Netherlands.
| | | | | |
Collapse
|
28
|
Toma T, Abu-Hanna A, Bosman RJ. Discovery and inclusion of SOFA score episodes in mortality prediction. J Biomed Inform 2007; 40:649-60. [PMID: 17485242 DOI: 10.1016/j.jbi.2007.03.007] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2006] [Revised: 02/19/2007] [Accepted: 03/09/2007] [Indexed: 01/31/2023]
Abstract
Predicting the survival status of Intensive Care patients at the end of their hospital stay is useful for various clinical and organizational tasks. Current models for predicting mortality use logistic regression models that rely solely on data collected during the first 24h of patient admission. These models do not exploit information contained in daily organ failure scores which nowadays are being routinely collected in many Intensive Care Units. We propose a novel method for mortality prediction that, in addition to admission-related data, takes advantage of daily data as well. The method is characterized by the data-driven discovery of temporal patterns, called episodes, of the organ failure scores and by embedding them in the familiar logistic regression framework for prediction. Our method results in a set of D logistic regression models, one for each of the first D days of Intensive Care Unit stay. A model for day d<or=D is trained on the patient subpopulation that stayed at least d days in the Intensive Care Unit and predicts the probability of death at the end of hospital stay for such patients. We implemented our method, with a specific form of episodes, called aligned episodes, on a large dataset of Intensive Care Unit patients for the first 5 days of stay (D=5) in the unit. We compared our models with ones that were developed on the same patient subpopulations but which did not use the episodes. The new models show improved performance on each of the five days. They also provide insight in the effect of the various selected episodes on mortality.
Collapse
Affiliation(s)
- Tudor Toma
- Department of Medical Informatics, Academic Medical Center, Universiteit van Amsterdam, P.O. Box 22700, 1100 DE Amsterdam, The Netherlands.
| | | | | |
Collapse
|
29
|
Discovery and Integration of Organ-Failure Episodes in Mortality Prediction. Artif Intell Med 2007. [DOI: 10.1007/978-3-540-73599-1_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
30
|
Akdag B, Fenkci S, Degirmencioglu S, Rota S, Sermez Y, Camdeviren H. Determination of risk factors for hypertension through the classification tree method. Adv Ther 2006; 23:885-92. [PMID: 17276957 DOI: 10.1007/bf02850210] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Most current statistical strategies for determining risk factors for hypertension (HT) among certain populations have proved inconclusive. In this study, the classification tree method, which is more practical and easy to understand than other statistical methods, was used to determine the risk for HT among outpatients in a clinic in Denizli province, western Turkey, between January 2002 and July 2004. The effects of 14 risk factors (body mass index, waist-to-hip ratio, age, serum total cholesterol, serum triglycerides, sex, HT in first-degree relatives, diabetes mellitus, smoking, stress factors, alcohol consumption, dyslipidemia in first-degree relatives, dyslipidemia [previously diagnosed], and saturated fat consumption) on HT were evaluated in this population. In all, 1761 adults at the outpatient clinic were recruited for lipid and HT measurements. The classification tree method revealed 7 main risk factors (body mass index, waist-to-hip ratio, sex, serum triglycerides, serum total cholesterol, HT in first-degree relatives, and saturated fat consumption) for HT. The findings of the present study suggest that the classification tree is a valuable statistical method for evaluating multiple risk factors for HT.
Collapse
Affiliation(s)
- Beyza Akdag
- Department of Biostatistics, Pamukkale University, Denizli, Turkey
| | | | | | | | | | | |
Collapse
|
31
|
Magazzù D, Comelli M, Marinoni A. Are car drivers holding a motorcycle licence less responsible for motorcycle--car crash occurrence? A non-parametric approach. ACCIDENT; ANALYSIS AND PREVENTION 2006; 38:365-70. [PMID: 16368068 DOI: 10.1016/j.aap.2005.10.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2005] [Revised: 09/22/2005] [Accepted: 10/18/2005] [Indexed: 05/05/2023]
Abstract
The purpose of this work is to evaluate the effect of a specific motorcycle licence, held by car drivers, in responsibility for motorcycle-car crashes. The data were provided by a multicentric case-control study (MAIDS) regarding the risk of crash and serious injuries of motorcyclists. A non-parametric method, classification and regression tree (CART), was used to accomplish the objective, and then compared to standard unconditional logistic regression. Drivers owning a motorcycle licence turned out to be less responsible for motorcycle-car crashes than drivers who do not have one; both types of analysis are consistent with this result. It is reasonable to assume that car drivers who hold a motorcycle licence have acquired more ability in riding and controlling two wheeled vehicles than drivers without a licence, and this may help them in predicting motorcycles manoeuvres.
Collapse
Affiliation(s)
- Domenico Magazzù
- Department of Applied Health Sciences, University of Pavia, 21 Bassi Avenue, 27100 Pavia, Italy.
| | | | | |
Collapse
|
32
|
Delen D, Walker G, Kadam A. Predicting breast cancer survivability: a comparison of three data mining methods. Artif Intell Med 2005; 34:113-27. [PMID: 15894176 DOI: 10.1016/j.artmed.2004.07.002] [Citation(s) in RCA: 314] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2004] [Revised: 06/30/2004] [Accepted: 07/15/2004] [Indexed: 12/12/2022]
Abstract
OBJECTIVE The prediction of breast cancer survivability has been a challenging research problem for many researchers. Since the early dates of the related research, much advancement has been recorded in several related fields. For instance, thanks to innovative biomedical technologies, better explanatory prognostic factors are being measured and recorded; thanks to low cost computer hardware and software technologies, high volume better quality data is being collected and stored automatically; and finally thanks to better analytical methods, those voluminous data is being processed effectively and efficiently. Therefore, the main objective of this manuscript is to report on a research project where we took advantage of those available technological advancements to develop prediction models for breast cancer survivability. METHODS AND MATERIAL We used two popular data mining algorithms (artificial neural networks and decision trees) along with a most commonly used statistical method (logistic regression) to develop the prediction models using a large dataset (more than 200,000 cases). We also used 10-fold cross-validation methods to measure the unbiased estimate of the three prediction models for performance comparison purposes. RESULTS The results indicated that the decision tree (C5) is the best predictor with 93.6% accuracy on the holdout sample (this prediction accuracy is better than any reported in the literature), artificial neural networks came out to be the second with 91.2% accuracy and the logistic regression models came out to be the worst of the three with 89.2% accuracy. CONCLUSION The comparative study of multiple prediction models for breast cancer survivability using a large dataset along with a 10-fold cross-validation provided us with an insight into the relative prediction ability of different data mining methods. Using sensitivity analysis on neural network models provided us with the prioritized importance of the prognostic factors used in the study.
Collapse
Affiliation(s)
- Dursun Delen
- Department of Management Science and Information Systems, Oklahoma State University, 700 North Greenwood Venue, Tulsa, OK 74106, USA.
| | | | | |
Collapse
|
33
|
Neumann A, Holstein J, Le Gall JR, Lepage E. Measuring performance in health care: case-mix adjustment by boosted decision trees. Artif Intell Med 2004; 32:97-113. [PMID: 15364094 DOI: 10.1016/j.artmed.2004.06.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2003] [Revised: 06/11/2004] [Accepted: 06/16/2004] [Indexed: 11/26/2022]
Abstract
OBJECTIVE The purpose of this paper is to investigate the suitability of boosted decision trees for the case-mix adjustment involved in comparing the performance of various health care entities. METHODS First, we present logistic regression, decision trees, and boosted decision trees in a unified framework. Second, we study in detail their application for two common performance indicators, the mortality rate in intensive care and the rate of potentially avoidable hospital readmissions. RESULTS For both examples the technique of boosting decision trees outperformed standard prognostic models, in particular linear logistic regression models, with regard to predictive power. On the other hand, boosting decision trees was computationally demanding and the resulting models were rather complex and needed additional tools for interpretation. CONCLUSION Boosting decision trees represents a powerful tool for case-mix adjustment in health care performance measurement. Depending on the specific priorities set in each context, the gain in predictive power might compensate for the inconvenience in the use of boosted decision trees.
Collapse
Affiliation(s)
- Anke Neumann
- Assistance Publique--Hôpitaux de Paris, Direction de la Politique Médicale, 3 Av Victoria, F-75184 Paris Cedex 04, France.
| | | | | | | |
Collapse
|