Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ramezankhani A, Pournik O, Shahrabi J, Khalili D, Azizi F, Hadaegh F. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study. Diabetes Res Clin Pract 2014;105:391-8. [PMID: 25085758 DOI: 10.1016/j.diabres.2014.07.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 04/15/2014] [Accepted: 07/05/2014] [Indexed: 01/06/2023]

For:	Ramezankhani A, Pournik O, Shahrabi J, Khalili D, Azizi F, Hadaegh F. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study. Diabetes Res Clin Pract 2014;105:391-8. [PMID: 25085758 DOI: 10.1016/j.diabres.2014.07.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 04/15/2014] [Accepted: 07/05/2014] [Indexed: 01/06/2023]

Number

Cited by Other Article(s)

Wang S, Bao C, Pei D. Application of Data Mining Technology in the Screening for Gallbladder Stones: A Cross-Sectional Retrospective Study of Chinese Adults. Yonsei Med J 2024;65:210-216. [PMID: 38515358 PMCID: PMC10973557 DOI: 10.3349/ymj.2023.0246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 08/21/2023] [Accepted: 11/07/2023] [Indexed: 03/23/2024] Open

Poudineh M, Mansoori A, Sadooghi Rad E, Hosseini ZS, Salmani Izadi F, Hoseinpour M, Mahmoudi Zo M, Ghoflchi S, Tanbakuchi D, Nazar E, Ferns G, Effati S, Esmaily H, Ghayour-Mobarhan M. Platelet distribution widths and white blood cell are associated with cardiovascular diseases: data mining approaches. Acta Cardiol 2023;78:1033-1044. [PMID: 37694924 DOI: 10.1080/00015385.2023.2246199] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 06/12/2023] [Accepted: 08/03/2023] [Indexed: 09/12/2023]

Affiliation(s)

Mohadeseh Poudineh Student Research Committee, School of Medicine, Zanjan University of Medical Sciences, Zanjan, Iran
Amin Mansoori International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
Elias Sadooghi Rad Student Research Committee, School of Medicine, Birjand University of Medical Sciences, Birjand, Iran
Zeinab Sadat Hosseini Faculty of Medicine, Islamic Azad University of Mashhad, Mashhad, Iran
Faezeh Salmani Izadi Student Research Committee, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Mahdieh Hoseinpour Student Research Committee, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Mostafa Mahmoudi Zo Student Research Committee, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Sahar Ghoflchi International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
Davoud Tanbakuchi Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
Eisa Nazar International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran Department of Applied Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran
Gordon Ferns Brighton and Sussex Medical School, Division of Medical Education, Brighton, United Kingdom
Sohrab Effati Department of Applied Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran
Habibollah Esmaily Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
Majid Ghayour-Mobarhan International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran

Collapse

Mohsen F, Al-Absi HRH, Yousri NA, El Hajj N, Shah Z. A scoping review of artificial intelligence-based methods for diabetes risk prediction. NPJ Digit Med 2023;6:197. [PMID: 37880301 PMCID: PMC10600138 DOI: 10.1038/s41746-023-00933-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 09/25/2023] [Indexed: 10/27/2023] Open

Mistry S, Riches NO, Gouripeddi R, Facelli JC. Environmental exposures in machine learning and data mining approaches to diabetes etiology: A scoping review. Artif Intell Med 2023;135:102461. [PMID: 36628796 PMCID: PMC9834645 DOI: 10.1016/j.artmed.2022.102461] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 10/06/2022] [Accepted: 11/23/2022] [Indexed: 12/03/2022]

Abstract

BACKGROUND

Environmental exposures are implicated in diabetes etiology, but are poorly understood due to disease heterogeneity, complexity of exposures, and analytical challenges. Machine learning and data mining are artificial intelligence methods that can address these limitations. Despite their increasing adoption in etiology and prediction of diabetes research, the types of methods and exposures analyzed have not been thoroughly reviewed.

OBJECTIVE

We aimed to review articles that implemented machine learning and data mining methods to understand environmental exposures in diabetes etiology and disease prediction.

METHODS

We queried PubMed and Scopus databases for machine learning and data mining studies that used environmental exposures to understand diabetes etiology on September 19th, 2022. Exposures were classified into specific external, general external, or internal exposures. We reviewed machine learning and data mining methods and characterized the scope of environmental exposures studied in the etiology of general diabetes, type 1 diabetes, type 2 diabetes, and other types of diabetes.

RESULTS

We identified 44 articles for inclusion. Specific external exposures were the most common exposures studied, and supervised models were the most common methods used. Well-established specific external exposures of low physical activity, high cholesterol, and high triglycerides were predictive of general diabetes, type 2 diabetes, and prediabetes, while novel metabolic and gut microbiome biomarkers were implicated in type 1 diabetes.

DISCUSSION

The use of machine learning and data mining methods to elucidate environmental triggers of diabetes was largely limited to well-established risk factors identified using easily explainable and interpretable models. Future studies should seek to leverage machine learning and data mining to explore the temporality and co-occurrence of multiple exposures and further evaluate the role of general external and internal exposures in diabetes etiology.

Collapse

Olusanya MO, Ogunsakin RE, Ghai M, Adeleke MA. Accuracy of Machine Learning Classification Models for the Prediction of Type 2 Diabetes Mellitus: A Systematic Survey and Meta-Analysis Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:ijerph192114280. [PMID: 36361161 PMCID: PMC9655196 DOI: 10.3390/ijerph192114280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/22/2022] [Accepted: 10/25/2022] [Indexed: 05/13/2023]

Kushwaha S, Srivastava R, Jain R, Sagar V, Aggarwal AK, Bhadada SK, Khanna P. Harnessing machine learning models for non-invasive pre-diabetes screening in children and adolescents. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022;226:107180. [PMID: 36279639 DOI: 10.1016/j.cmpb.2022.107180] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 10/02/2022] [Accepted: 10/06/2022] [Indexed: 06/16/2023]

Abstract

BACKGROUND AND OBJECTIVES

Pre-diabetes has been identified as an intermediate diagnosis and a sign of a relatively high chance of developing diabetes in the future. Diabetes has become one of the most frequent chronic disorders in children and adolescents around the world; therefore, predicting the onset of pre-diabetes allows a person at risk to make efforts to avoid or restrict disease progression. This research aims to create and implement a cross-validated machine learning model that can predict pre-diabetes using non-invasive methods.

METHODS

We have analysed the national representative dataset of children and adolescents (5-19 years) to develop a machine learning model for non-invasive pre-diabetes screening. Based on HbA1c levels the data (n = 26,567) was segregated into normal (n = 23,777) and pre-diabetes (n = 2790). We have considered eight features, six hyper-tuned machine learning models and different metrics for model evaluation. The final model was selected based on the area under the receiver operator curve (AUC), Cohen's kappa and cross-validation score. The selected model was integrated into the screening tool for automated pre-diabetes prediction.

RESULTS

The XG boost classifier was the best model, including all eight features. The 10-fold cross-validation score was highest for the XG boost model (90.13%) and least for the support vector machine (61.17%). The AUC was highest for RF (0.970), followed by GB (0.968), XGB (0.959), ETC (0.918), DT (0.908), and SVM (0.574) models. The XGB model was used to develop the screening tool.

CONCLUSION

We have developed and deployed a machine learning model for automated real-time pre-diabetes screening. The screening tool can be used over computers and can be transformed into software for easy usage. The detection of pre-diabetes in the pediatric age may help avoid its enhancement. Machine learning can also show great competence in determining important features in pre-diabetes.

Collapse

Ramezankhani A, Habibi-Moeini AS, Zadeh SST, Azizi F, Hadaegh F. Effect of family history of diabetes and obesity status on lifetime risk of type 2 diabetes in the Iranian population. J Glob Health 2022;12:04068. [PMID: 35939397 PMCID: PMC9359461 DOI: 10.7189/jogh.12.04068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Ji W, Xue M, Zhang Y, Yao H, Wang Y. A Machine Learning Based Framework to Identify and Classify Non-alcoholic Fatty Liver Disease in a Large-Scale Population. Front Public Health 2022;10:846118. [PMID: 35444985 PMCID: PMC9013842 DOI: 10.3389/fpubh.2022.846118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 02/23/2022] [Indexed: 12/12/2022] Open

Vehi J, Mujahid O, Contreras I. Aim and Diabetes. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Yang T, Zhao B, Pei D. Estimation of the Prevalence of Nonalcoholic Fatty Liver Disease in an Adult Population in Northern China Using the Data Mining Approach. Diabetes Metab Syndr Obes 2021;14:3437-3445. [PMID: 34349537 PMCID: PMC8326527 DOI: 10.2147/dmso.s320808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/15/2021] [Indexed: 11/29/2022] Open

Abstract

BACKGROUND

Nonalcoholic fatty liver disease (NAFLD) is the commonest form of chronic liver disease worldwide and its prevalence is rapidly increasing. Screening and early diagnosis of high-risk groups are important for the prevention and treatment of NAFLD; however, traditional imaging examinations are expensive and difficult to perform on a large scale. This study aimed to develop a simple and reliable predictive model based on the risk factors for NAFLD using a decision tree algorithm for the diagnosis of NAFLD and reduction of healthcare costs.

METHODS

This retrospective cross-sectional study included 22,819 participants who underwent annual health examinations between January 2019 and December 2019 at Physical Examination Center in Shengjing Hospital of China Medical University. After rigorous data screening, data of 9190 participants were retained in the final dataset for use in the J48 decision tree algorithm for the construction of predictive models. Approximately 66% of these patients (n=6065) were randomly assigned to the training dataset for the construction of the decision tree, while 34% of the patients (n=3125) were assigned to the test dataset to evaluate the performance of the decision tree.

RESULTS

The results showed that the J48 decision tree classifier exhibited good performance (accuracy=0.830, precision=0.837, recall=0.830, F-measure=0.830, and area under the curve=0.905). The decision tree structure revealed waist circumference as the most significant attribute, followed by triglyceride levels, systolic blood pressure, sex, age, and total cholesterol level.

CONCLUSION

Our study suggests that a decision tree analysis can be used to screen high-risk individuals for NAFLD. The key attributes in the tree structure can further contribute to the prevention of NAFLD by suggesting implementable targeted community interventions, which can help improve the outcome of NAFLD and reduce the burden on the healthcare system.

Collapse

A multiple combined method for rebalancing medical data with class imbalances. Comput Biol Med 2021;134:104527. [PMID: 34091384 DOI: 10.1016/j.compbiomed.2021.104527] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 11/24/2022]

Liu SQ, Ma XB, Song WM, Li YF, Li N, Wang LN, Liu JY, Tao NN, Li SJ, Xu TT, Zhang QY, An QQ, Liang B, Li HC. Using a risk model for probability of cancer in pulmonary nodules. Thorac Cancer 2021;12:1881-1889. [PMID: 33973725 PMCID: PMC8201526 DOI: 10.1111/1759-7714.13991] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/19/2021] [Indexed: 12/24/2022] Open

Affiliation(s)

Si-Qi Liu Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
Xiao-Bin Ma Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
Wan-Mei Song Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
Yi-Fan Li Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
Ning Li Shandong Medical Imaging Research Institute, Cheeloo College of Medicine, Shandong University, Jinan, China
Li-Na Wang Department of Medical Imaging, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
Jin-Yue Liu Department of Intensive Care Unit, Shandong Provincial Third Hospital, Jinan, China
Ning-Ning Tao Department of Respiratory and Critical Care Medicine, Beijing Hospital, Beijing, China.,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
Shi-Jin Li Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
Ting-Ting Xu Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
Qian-Yun Zhang Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
Qi-Qi An Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
Bin Liang Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
Huai-Chen Li Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China

Collapse

Saberi-Karimian M, Khorasanchi Z, Ghazizadeh H, Tayefi M, Saffar S, Ferns GA, Ghayour-Mobarhan M. Potential value and impact of data mining and machine learning in clinical diagnostics. Crit Rev Clin Lab Sci 2021;58:275-296. [PMID: 33739235 DOI: 10.1080/10408363.2020.1857681] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Abstract

Data mining involves the use of mathematical sciences, statistics, artificial intelligence, and machine learning to determine the relationships between variables from a large sample of data. It has previously been shown that data mining can improve the prediction and diagnostic precision of type 2 diabetes mellitus. A few studies have applied machine learning to assess hypertension and metabolic syndrome-related biomarkers, as well as refine the assessment of cardiovascular disease risk. Machine learning methods have also been applied to assess new biomarkers and survival outcomes in patients with renal diseases to predict the development of chronic kidney disease, disease progression, and renal graft survival. In the latter, random forest methods were found to be the best for the prediction of chronic kidney disease. Some studies have investigated the prognosis of nonalcoholic fatty liver disease and acute liver failure, as well as therapy response prediction in patients with viral disorders, using decision tree models. Machine learning techniques, such as Sparse High-Order Interaction Model with Rejection Option, have been used for diagnosing Alzheimer's disease. Data mining techniques have also been applied to identify the risk factors for serious mental illness, such as depression and dementia, and help to diagnose and predict the quality of life of such patients. In relation to child health, some studies have determined the best algorithms for predicting obesity and malnutrition. Machine learning has determined the important risk factors for preterm birth and low birth weight. Published studies of patients with cancer and bacterial diseases are limited and should perhaps be addressed more comprehensively in future studies. Herein, we provide an in-depth review of studies in which biochemical biomarker data were analyzed using machine learning methods to assess the risk of several common diseases, in order to summarize the potential applications of data mining methods in clinical diagnosis. Data mining techniques have now been increasingly applied to clinical diagnostics, and they have the potential to support this field.

Collapse

Knowledge discovery in open data for epidemic disease prediction. HEALTH POLICY AND TECHNOLOGY 2021. [DOI: 10.1016/j.hlpt.2021.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Contador Pachón S, Botella Serrano M, Garnica Alcázar O, Velasco Cabo JM, Aramendi Zurimendi A, Rodríguez Martínez R, Maqueda Villaizán E, Hidalgo Pérez JI. Identification of blood glucose patterns in patients with type 1 diabetes using continuous glucose monitoring and clustering technique. ENDOCRINOL DIAB NUTR 2021;68:170-174. [PMID: 34167696 DOI: 10.1016/j.endien.2021.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 12/24/2019] [Indexed: 06/13/2023]

Contador Pachón S, Botella Serrano M, Garnica Alcázar O, Velasco Cabo JM, Aramendi Zurimendi A, Rodríguez Martínez R, Maqueda Villaizán E, Hidalgo Pérez JI. Identificación de patrones de glucemia en pacientes con diabetes tipo 1 mediante monitorización continua de glucosa y técnicas de clusterización. ENDOCRINOL DIAB NUTR 2021;68:170-174. [DOI: 10.1016/j.endinu.2019.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 12/22/2019] [Accepted: 12/24/2019] [Indexed: 10/24/2022]

Aim and Diabetes. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_158-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Amirabadizadeh A, Nakhaee S, Mehrpour O. Risk assessment of elevated blood lead concentrations in the adult population using a decision tree approach. Drug Chem Toxicol 2020;45:878-885. [DOI: 10.1080/01480545.2020.1783286] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Zhang L, Wang Y, Niu M, Wang C, Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci Rep 2020;10:4406. [PMID: 32157171 PMCID: PMC7064542 DOI: 10.1038/s41598-020-61123-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 02/19/2020] [Indexed: 01/19/2023] Open

Abstract

With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression (LR), classification and regression tree (CART), artificial neural networks (ANN), support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM). The model performance was measured in an area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value and area under precision recall curve. The importance of variables was identified based on each classifier and the shapley additive explanations approach. Using all available variables, all models for predicting risk of T2DM demonstrated strong predictive performance, with AUCs ranging between 0.811 and 0.872 using laboratory data and from 0.767 to 0.817 without laboratory data. Among them, the GBM model performed best (AUC: 0.872 with laboratory data and 0.817 without laboratory data). Performance of models plateaued when introduced 30 variables to each model except CART model. Among the top-10 variables across all methods were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension. New important risk factors (urinary indicators, sweet flavor) were not found in previous risk prediction methods, but determined by machine learning in our study. Through the results, machine learning methods showed competence in predicting risk of T2DM, leading to greater insights on disease risk factors with no priori assumption of causality.

Collapse

Vallée A, Cinaud A, Protogerou A, Zhang Y, Topouchian J, Safar ME, Blacher J. Arterial Stiffness and Coronary Ischemia: New Aspects and Paradigms. Curr Hypertens Rep 2020;22:5. [PMID: 31925555 DOI: 10.1007/s11906-019-1006-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Xue M, Su Y, Li C, Wang S, Yao H. Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. J Diabetes Res 2020;2020:6873891. [PMID: 33029536 PMCID: PMC7532405 DOI: 10.1155/2020/6873891] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 08/01/2020] [Accepted: 09/02/2020] [Indexed: 12/19/2022] Open

Abstract

BACKGROUND

An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas.

METHODS

A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM.

RESULTS

The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F-1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving).

CONCLUSIONS

We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.

Collapse

Pei D, Yang T, Zhang C. Estimation of Diabetes in a High-Risk Adult Chinese Population Using J48 Decision Tree Model. Diabetes Metab Syndr Obes 2020;13:4621-4630. [PMID: 33273837 PMCID: PMC7705272 DOI: 10.2147/dmso.s279329] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 10/27/2020] [Indexed: 12/31/2022] Open

Exploring the Important Attributes of Human Immunodeficiency Virus and Generating Decision Rules. Symmetry (Basel) 2020. [DOI: 10.3390/sym12010067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Vallée A, Safar ME, Blacher J. Application of a decision tree to establish factors associated with a nomogram of aortic stiffness. J Clin Hypertens (Greenwich) 2019;21:1484-1492. [PMID: 31479194 DOI: 10.1111/jch.13662] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 05/20/2019] [Accepted: 05/28/2019] [Indexed: 11/29/2022]

Gonoodi K, Tayefi M, Bahrami A, Amirabadi Zadeh A, Ferns GA, Mohammadi F, Eslami S, Ghayour Mobarhan M. Determinants of the magnitude of response to vitamin D supplementation in adolescent girls identified using a decision tree algorithm. Biofactors 2019;45:795-802. [PMID: 31355993 DOI: 10.1002/biof.1540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 06/13/2019] [Indexed: 12/23/2022]

Vallée A, Petruescu L, Kretz S, Safar ME, Blacher J. Added Value of Aortic Pulse Wave Velocity Index in a Predictive Diagnosis Decision Tree of Coronary Heart Disease. Am J Hypertens 2019;32:375-383. [PMID: 30624553 DOI: 10.1093/ajh/hpz004] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 01/01/2019] [Accepted: 01/08/2019] [Indexed: 11/12/2022] Open

Pei D, Gong Y, Kang H, Zhang C, Guo Q. Accurate and rapid screening model for potential diabetes mellitus. BMC Med Inform Decis Mak 2019;19:41. [PMID: 30866905 PMCID: PMC6416888 DOI: 10.1186/s12911-019-0790-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 03/03/2019] [Indexed: 11/26/2022] Open

Abstract

Background

Prediction or early diagnosis of diabetes is crucial for populations with high risk of diabetes.

Methods

In this study, we assessed the ability of five popular classifiers (J48, AdaboostM1, SMO, Bayes Net, and Naïve Bayes) to identify individuals with diabetes based on nine non-invasive and easily obtained clinical features, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work stress, and salty food preference. A total of 4205 data entries were obtained from annual physical examination reports for adults in the Shengjing Hospital of China Medical University during January–April 2017. Weka data mining software was used to identify the best algorithm for diabetes classification.

Results

The results indicate that decision tree classifier J48 has the best performance (accuracy = 0.9503, precision = 0.950, recall = 0.950, F-measure = 0.948, and AUC = 0.964). The decision tree structure shows that age is the most significant feature, followed by family history of diabetes, work stress, BMI, salty food preference, physical activity, hypertension, gender, and history of cardiovascular disease or stroke.

Conclusions

Our study shows that decision tree analyses can be applied to screen individuals for early diabetes risk without the need for invasive tests. This procedure will be particularly useful in developing regions with high epidemiological risk and poor socioeconomic status, and enable clinical practitioners to rapidly screen patients for increased risk of diabetes. The key features in the tree structure could further facilitate diabetes prevention through targeted community interventions, which can potentially improve early diabetes diagnosis and reduce burdens on the healthcare system.

Collapse

Pei D, Zhang C, Quan Y, Guo Q. Identification of Potential Type II Diabetes in a Chinese Population with a Sensitive Decision Tree Approach. J Diabetes Res 2019;2019:4248218. [PMID: 30805372 PMCID: PMC6362481 DOI: 10.1155/2019/4248218] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 11/20/2018] [Accepted: 12/18/2018] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

Diabetes mellitus is a chronic disease with a steadfast increase in prevalence. Due to the chronic course of the disease combining with devastating complications, this disorder could easily carry a financial burden. The early diagnosis of diabetes remains as one of the major challenges medical providers are facing, and the satisfactory screening tools or methods are still required, especially a population- or community-based tool.

METHODS

This is a retrospective cross-sectional study involving 15,323 subjects who underwent the annual check-up in the Department of Family Medicine of Shengjing Hospital of China Medical University from January 2017 to June 2017. With a strict data filtration, 10,436 records from the eligible participants were utilized to develop a prediction model using the J48 decision tree algorithm. Nine variables, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work-related stress, and salty food preference, were considered.

RESULTS

The accuracy, precision, recall, and area under the receiver operating characteristic curve (AUC) value for identifying potential diabetes were 94.2%, 94.0%, 94.2%, and 94.8%, respectively. The structure of the decision tree shows that age is the most significant feature. The decision tree demonstrated that among those participants with age ≤ 49, 5497 participants (97%) of the individuals were identified as nondiabetic, while age > 49, 771 participants (50%) of the individuals were identified as nondiabetic. In the subgroup where people were 34 < age ≤ 49 and BMI ≥ 25, when with positive family history of diabetes, 89 (92%) out of 97 individuals were identified as diabetic and, when without family history of diabetes, 576 (58%) of the individuals were identified as nondiabetic. Work-related stress was identified as being associated with diabetes. In individuals with 34 < age ≤ 49 and BMI ≥ 25 and without family history of diabetes, 22 (51%) of the individuals with high work-related stress were identified as nondiabetic while 349 (88%) of the individuals with low or moderate work-related stress were identified as not having diabetes.

CONCLUSIONS

We proposed a classifier based on a decision tree which used nine features of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of diabetes. The classifier indicates that a decision tree analysis can be successfully applied to screen diabetes, which will support clinical practitioners for rapid diabetes identification. The model provides a means to target the prevention of diabetes which could reduce the burden on the health system through effective case management.

Collapse

Ramezankhani A, Harati H, Bozorgmanesh M, Tohidi M, Khalili D, Azizi F, Hadaegh F. Diabetes Mellitus: Findings from 20 Years of the Tehran Lipid and Glucose Study. Int J Endocrinol Metab 2018;16:e84784. [PMID: 30584445 PMCID: PMC6289292 DOI: 10.5812/ijem.84784] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Revised: 10/02/2018] [Accepted: 10/07/2018] [Indexed: 12/17/2022] Open

Open data mining for Taiwan's dengue epidemic. Acta Trop 2018;183:1-7. [PMID: 29549012 DOI: 10.1016/j.actatropica.2018.03.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Revised: 02/19/2018] [Accepted: 03/10/2018] [Indexed: 11/22/2022]

Noshad S, Afarideh M, Heidari B, Mechanick JI, Esteghamati A. Diabetes Care in Iran: Where We Stand and Where We Are Headed. Ann Glob Health 2018;81:839-50. [PMID: 27108151 DOI: 10.1016/j.aogh.2015.10.003] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Ramezankhani A, Tohidi M, Azizi F, Hadaegh F. Application of survival tree analysis for exploration of potential interactions between predictors of incident chronic kidney disease: a 15-year follow-up study. J Transl Med 2017;15:240. [PMID: 29183386 PMCID: PMC5706148 DOI: 10.1186/s12967-017-1346-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 11/14/2017] [Indexed: 12/23/2022] Open

Abstract

Background

Chronic kidney disease (CKD) is a growing public health challenges worldwide. Various studies have investigated risk factors of incident CKD; however, a very few studies examined interaction between these risk factors. In an attempt to clarify the potential interactions between risk factors of CKD, we performed survival tree analysis.

Methods

A total of 8238 participants (46.1% men) aged > 20 years without CKD at baseline [(1999–2001) and (2002–2005)], were followed until 2014. The first occurrence of CKD, defined as the estimated glomerular filtration rate (eGFR) < 60 ml/min/1.73 m², was set as the main outcome. Multivariable Cox proportional hazard (Cox PH) regression was used to identify significant independent predictors of CKD; moreover, survival tree analysis was performed to gain further insight into the potential interactions between predictors.

Results

The crude incidence rates of CKD were 20.2 and 35.2 per 1000 person-years in men and women, respectively. The Cox PH identified the main effect of significant predictors of CKD incidence in men and women. In addition, using a limited number of predictors, survival trees identified 12 and 10 subgroups among men and women, respectively, with different survival probability. Accordingly, a group of men with eGFR > 74 ml/min/1.73 m², age ≤ 46 years, low level of physical activity, waist circumference ≤ 100 cm and FPG ≤ 4.7 mmol/l had the lowest risk of CKD incidence; while men with eGFR ≤ 63.4 ml/min/1.73 m², age > 50 years had the highest risk for CKD compared to men in the lowest risk group [hazard ratio (HR), 70.68 (34.57–144.52)]. Also, a group of women aged ≤ 45 years and eGFR > 83.5 ml/min/1.73 m² had the lowest risk; while women with age > 48 years and eGFR ≤ 69 ml/min/1.73 m² had the highest risk compared to low risk group [HR 27.25 (19.88–37.34)].

Conclusion

In this post hoc analysis, we found the independent predictors of CKD using Cox PH; furthermore, by applying survival tree analysis we identified several numbers of homogeneous subgroups with different risk for incidence of CKD. Our study suggests that two methods can be used simultaneously to provide new insights for intervention programs and improve clinical decision making.

Collapse

Varanka-Ruuska T, Rautio N, Lehtiniemi H, Miettunen J, Keinänen-Kiukaanniemi S, Sebert S, Ala-Mursula L. The association of unemployment with glucose metabolism: a systematic review and meta-analysis. Int J Public Health 2017;63:435-446. [PMID: 29170882 DOI: 10.1007/s00038-017-1040-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 09/12/2017] [Accepted: 09/16/2017] [Indexed: 10/18/2022] Open

Evaluating of associated risk factors of metabolic syndrome by using decision tree. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s00580-017-2580-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]

Ramezankhani A, Bagherzadeh-Khiabani F, Khalili D, Azizi F, Hadaegh F. A new look at risk patterns related to coronary heart disease incidence using survival tree analysis: 12 Years Longitudinal Study. Sci Rep 2017;7:3237. [PMID: 28607472 PMCID: PMC5468345 DOI: 10.1038/s41598-017-03577-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 04/21/2017] [Indexed: 12/25/2022] Open

Olivera AR, Roesler V, Iochpe C, Schmidt MI, Vigo Á, Barreto SM, Duncan BB. Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study. SAO PAULO MED J 2017;135:234-246. [PMID: 28746659 PMCID: PMC10019841 DOI: 10.1590/1516-3180.2016.0309010217] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 02/01/2017] [Indexed: 01/23/2023] Open

Tayefi M, Tajfard M, Saffar S, Hanachi P, Amirabadizadeh AR, Esmaeily H, Taghipour A, Ferns GA, Moohebati M, Ghayour-Mobarhan M. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;141:105-109. [PMID: 28241960 DOI: 10.1016/j.cmpb.2017.02.001] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2016] [Revised: 01/25/2017] [Accepted: 02/02/2017] [Indexed: 06/06/2023]

Affiliation(s)

Maryam Tayefi Metabolic Syndrome Research Center, School of Medicine, Mashhad University of Medical Sciences, 99199-91766 Mashhad, Iran ; Department of New Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Mohammad Tajfard Department of Health Education and Health Promotion, School of Health, Management and Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
Sara Saffar Neurogenic Inflammation Research Center, Department of New Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
Parichehr Hanachi Department of Biology, Biochemistry Unit, Alzahra University, Tehran, Iran
Ali Reza Amirabadizadeh Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences, Birjand, Iran
Habibollah Esmaeily Department of Biostatistics and Epidemiology, School of Health, Management and Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
Ali Taghipour Department of Biostatistics and Epidemiology, School of Health, Management and Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
Gordon A Ferns Division of Medical Education, Brighton & Sussex Medical School, Falmer, Brighton, Sussex BN1 9PH, UK
Mohsen Moohebati Cardiovascular Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
Majid Ghayour-Mobarhan Metabolic Syndrome Research Center, School of Medicine, Mashhad University of Medical Sciences, 99199-91766 Mashhad, Iran ; Department of New Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Cardiovascular Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.

Collapse

Fei Y, Gao K, Hu J, Tu J, Li WQ, Wang W, Zong GQ. Predicting the incidence of portosplenomesenteric vein thrombosis in patients with acute pancreatitis using classification and regression tree algorithm. J Crit Care 2017;39:124-130. [PMID: 28254727 DOI: 10.1016/j.jcrc.2017.02.019] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Revised: 02/03/2017] [Accepted: 02/05/2017] [Indexed: 02/07/2023]

Tayefi M, Esmaeili H, Saberi Karimian M, Amirabadi Zadeh A, Ebrahimi M, Safarian M, Nematy M, Parizadeh SMR, Ferns GA, Ghayour-Mobarhan M. The application of a decision tree to establish the parameters associated with hypertension. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017;139:83-91. [PMID: 28187897 DOI: 10.1016/j.cmpb.2016.10.020] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Revised: 09/13/2016] [Accepted: 10/18/2016] [Indexed: 06/06/2023]

Moon M, Lee SK. Applying of Decision Tree Analysis to Risk Factors Associated with Pressure Ulcers in Long-Term Care Facilities. Healthc Inform Res 2017;23:43-52. [PMID: 28261530 PMCID: PMC5334131 DOI: 10.4258/hir.2017.23.1.43] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Revised: 01/24/2017] [Accepted: 01/24/2017] [Indexed: 11/23/2022] Open

Ramezankhani A, Hadavandi E, Pournik O, Shahrabi J, Azizi F, Hadaegh F. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study. BMJ Open 2016;6:e013336. [PMID: 27909038 PMCID: PMC5168628 DOI: 10.1136/bmjopen-2016-013336] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Revised: 09/09/2016] [Accepted: 10/03/2016] [Indexed: 12/18/2022] Open

Abstract

OBJECTIVE

The current study was undertaken for use of the decision tree (DT) method for development of different prediction models for incidence of type 2 diabetes (T2D) and for exploring interactions between predictor variables in those models.

DESIGN

Prospective cohort study.

SETTING

Tehran Lipid and Glucose Study (TLGS).

METHODS

A total of 6647 participants (43.4% men) aged >20 years, without T2D at baselines ((1999-2001) and (2002-2005)), were followed until 2012. 2 series of models (with and without 2-hour postchallenge plasma glucose (2h-PCPG)) were developed using 3 types of DT algorithms. The performances of the models were assessed using sensitivity, specificity, area under the ROC curve (AUC), geometric mean (G-Mean) and F-Measure.

PRIMARY OUTCOME MEASURE

T2D was primary outcome which defined if fasting plasma glucose (FPG) was ≥7 mmol/L or if the 2h-PCPG was ≥11.1 mmol/L or if the participant was taking antidiabetic medication.

RESULTS

During a median follow-up of 9.5 years, 729 new cases of T2D were identified. The Quick Unbiased Efficient Statistical Tree (QUEST) algorithm had the highest sensitivity and G-Mean among all the models for men and women. The models that included 2h-PCPG had sensitivity and G-Mean of (78% and 0.75%) and (78% and 0.78%) for men and women, respectively. Both models achieved good discrimination power with AUC above 0.78. FPG, 2h-PCPG, waist-to-height ratio (WHtR) and mean arterial blood pressure (MAP) were the most important factors to incidence of T2D in both genders. Among men, those with an FPG≤4.9 mmol/L and 2h-PCPG≤7.7 mmol/L had the lowest risk, and those with an FPG>5.3 mmol/L and 2h-PCPG>4.4 mmol/L had the highest risk for T2D incidence. In women, those with an FPG≤5.2 mmol/L and WHtR≤0.55 had the lowest risk, and those with an FPG>5.2 mmol/L and WHtR>0.56 had the highest risk for T2D incidence.

CONCLUSIONS

Our study emphasises the utility of DT for exploring interactions between predictor variables.

Collapse

Ramezankhani A, Pournik O, Shahrabi J, Azizi F, Hadaegh F. An application of association rule mining to extract risk pattern for type 2 diabetes using tehran lipid and glucose study database. Int J Endocrinol Metab 2015;13:e25389. [PMID: 25926855 PMCID: PMC4393501 DOI: 10.5812/ijem.25389] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 12/17/2014] [Accepted: 12/27/2014] [Indexed: 01/14/2023] Open