1
|
Wang S, Bao C, Pei D. Application of Data Mining Technology in the Screening for Gallbladder Stones: A Cross-Sectional Retrospective Study of Chinese Adults. Yonsei Med J 2024; 65:210-216. [PMID: 38515358 PMCID: PMC10973557 DOI: 10.3349/ymj.2023.0246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 08/21/2023] [Accepted: 11/07/2023] [Indexed: 03/23/2024] Open
Abstract
PURPOSE The purpose of this study was to use data mining methods to establish a simple and reliable predictive model based on the risk factors related to gallbladder stones (GS) to assist in their diagnosis and reduce medical costs. MATERIALS AND METHODS This was a retrospective cross-sectional study. A total of 4215 participants underwent annual health examinations between January 2019 and December 2019 at the Physical Examination Center of Shengjing Hospital Affiliated to China Medical University. After rigorous data screening, the records of 2105 medical examiners were included for the construction of J48, multilayer perceptron (MLP), Bayes Net, and Naïve Bayes algorithms. A ten-fold cross-validation method was used to verify the recognition model and determine the best classification algorithm for GS. RESULTS The performance of these models was evaluated using metrics of accuracy, precision, recall, F-measure, and area under the receiver operating characteristic curve. Comparison of the F-measure for each algorithm revealed that the F-measure values for MLP and J48 (0.867 and 0.858, respectively) were not statistically significantly different (p>0.05), although they were significantly higher than the F-measure values for Bayes Net and Naïve Bayes (0.824 and 0.831, respectively; p<0.05). CONCLUSION The results of this study showed that MLP and J48 algorithms are effective at screening individuals for the risk of GS. The key attributes of data mining can further promote the prevention of GS through targeted community intervention, improve the outcome of GS, and reduce the burden on the medical system.
Collapse
Affiliation(s)
- Shuang Wang
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, China
| | - Chenhui Bao
- Department of General Surgery, Shengjing Hospital of China Medical University, Shenyang, China
| | - Dongmei Pei
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, China.
| |
Collapse
|
2
|
Poudineh M, Mansoori A, Sadooghi Rad E, Hosseini ZS, Salmani Izadi F, Hoseinpour M, Mahmoudi Zo M, Ghoflchi S, Tanbakuchi D, Nazar E, Ferns G, Effati S, Esmaily H, Ghayour-Mobarhan M. Platelet distribution widths and white blood cell are associated with cardiovascular diseases: data mining approaches. Acta Cardiol 2023; 78:1033-1044. [PMID: 37694924 DOI: 10.1080/00015385.2023.2246199] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 06/12/2023] [Accepted: 08/03/2023] [Indexed: 09/12/2023]
Abstract
OBJECTIVE To investigate the association between cardiovascular diseases (CVDs) and haematologic factors in a cohort of Iranian adults. METHOD For a total population of 9,704 aged 35 to 65, a prospective study was designed. Haematologic factors and demographic characteristics (such as gender, age, and smoking status) were completed for all participants. The association between haematologic factors and CVDs was assessed through logistic regression (LR) analysis, decision tree (DT), and bootstrap forest (BF). RESULTS Almost all of the included factors were significantly associated with CVD (p<.001). Among the included factors, were: age, white blood cell (WBC), and platelet distribution width (PDW) had the strongest correlation with the development of CVD. For unit OR interpretation, WBC has been represented as the most remarkable risk factor for CVD (OR: 1.22 (CI 95% (1.18, 1.27))). Also, age is associated with an increase in the odds of CVD + occurrence (OR: 1.12 (CI 95% (1.11, 1.13))). Moreover, males are times more likely to develop CVD than females (OR: 1.39 (CI 95% (1.22, 1.58))). In DT model, age is the best classifier factor in CVD development, followed by WBC and PDW. Furthermore, based on the BF algorithm, the most crucial factors correlated with CVD are age, WBC, PDW, sex, and smoking status. CONCLUSION The obtained result from LR, DT, and BF models confirmed that age, WBC, and PDW are the most crucial factors for the development of CVD.
Collapse
Affiliation(s)
- Mohadeseh Poudineh
- Student Research Committee, School of Medicine, Zanjan University of Medical Sciences, Zanjan, Iran
| | - Amin Mansoori
- International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Elias Sadooghi Rad
- Student Research Committee, School of Medicine, Birjand University of Medical Sciences, Birjand, Iran
| | | | - Faezeh Salmani Izadi
- Student Research Committee, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahdieh Hoseinpour
- Student Research Committee, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mostafa Mahmoudi Zo
- Student Research Committee, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Sahar Ghoflchi
- International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Davoud Tanbakuchi
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Eisa Nazar
- International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
- Department of Applied Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Gordon Ferns
- Brighton and Sussex Medical School, Division of Medical Education, Brighton, United Kingdom
| | - Sohrab Effati
- Department of Applied Mathematics, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Habibollah Esmaily
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
- Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Ghayour-Mobarhan
- International UNESCO center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
3
|
Mohsen F, Al-Absi HRH, Yousri NA, El Hajj N, Shah Z. A scoping review of artificial intelligence-based methods for diabetes risk prediction. NPJ Digit Med 2023; 6:197. [PMID: 37880301 PMCID: PMC10600138 DOI: 10.1038/s41746-023-00933-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 09/25/2023] [Indexed: 10/27/2023] Open
Abstract
The increasing prevalence of type 2 diabetes mellitus (T2DM) and its associated health complications highlight the need to develop predictive models for early diagnosis and intervention. While many artificial intelligence (AI) models for T2DM risk prediction have emerged, a comprehensive review of their advancements and challenges is currently lacking. This scoping review maps out the existing literature on AI-based models for T2DM prediction, adhering to the PRISMA extension for Scoping Reviews guidelines. A systematic search of longitudinal studies was conducted across four databases, including PubMed, Scopus, IEEE-Xplore, and Google Scholar. Forty studies that met our inclusion criteria were reviewed. Classical machine learning (ML) models dominated these studies, with electronic health records (EHR) being the predominant data modality, followed by multi-omics, while medical imaging was the least utilized. Most studies employed unimodal AI models, with only ten adopting multimodal approaches. Both unimodal and multimodal models showed promising results, with the latter being superior. Almost all studies performed internal validation, but only five conducted external validation. Most studies utilized the area under the curve (AUC) for discrimination measures. Notably, only five studies provided insights into the calibration of their models. Half of the studies used interpretability methods to identify key risk predictors revealed by their models. Although a minority highlighted novel risk predictors, the majority reported commonly known ones. Our review provides valuable insights into the current state and limitations of AI-based models for T2DM prediction and highlights the challenges associated with their development and clinical integration.
Collapse
Affiliation(s)
- Farida Mohsen
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Hamada R H Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Noha A Yousri
- Genetic Medicine, Weill Cornell Medicine-Qatar, Qatar Foundation, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
- Computer and Systems Engineering, Alexandria University, Alexandria, Egypt
| | - Nady El Hajj
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Zubair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar.
| |
Collapse
|
4
|
Mistry S, Riches NO, Gouripeddi R, Facelli JC. Environmental exposures in machine learning and data mining approaches to diabetes etiology: A scoping review. Artif Intell Med 2023; 135:102461. [PMID: 36628796 PMCID: PMC9834645 DOI: 10.1016/j.artmed.2022.102461] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 10/06/2022] [Accepted: 11/23/2022] [Indexed: 12/03/2022]
Abstract
BACKGROUND Environmental exposures are implicated in diabetes etiology, but are poorly understood due to disease heterogeneity, complexity of exposures, and analytical challenges. Machine learning and data mining are artificial intelligence methods that can address these limitations. Despite their increasing adoption in etiology and prediction of diabetes research, the types of methods and exposures analyzed have not been thoroughly reviewed. OBJECTIVE We aimed to review articles that implemented machine learning and data mining methods to understand environmental exposures in diabetes etiology and disease prediction. METHODS We queried PubMed and Scopus databases for machine learning and data mining studies that used environmental exposures to understand diabetes etiology on September 19th, 2022. Exposures were classified into specific external, general external, or internal exposures. We reviewed machine learning and data mining methods and characterized the scope of environmental exposures studied in the etiology of general diabetes, type 1 diabetes, type 2 diabetes, and other types of diabetes. RESULTS We identified 44 articles for inclusion. Specific external exposures were the most common exposures studied, and supervised models were the most common methods used. Well-established specific external exposures of low physical activity, high cholesterol, and high triglycerides were predictive of general diabetes, type 2 diabetes, and prediabetes, while novel metabolic and gut microbiome biomarkers were implicated in type 1 diabetes. DISCUSSION The use of machine learning and data mining methods to elucidate environmental triggers of diabetes was largely limited to well-established risk factors identified using easily explainable and interpretable models. Future studies should seek to leverage machine learning and data mining to explore the temporality and co-occurrence of multiple exposures and further evaluate the role of general external and internal exposures in diabetes etiology.
Collapse
Affiliation(s)
- Sejal Mistry
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA
| | - Naomi O Riches
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Department of Obstetrics and Gynecology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Ramkiran Gouripeddi
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Clinical and Translational Science Institute, University of Utah, Salt Lake City, UT, USA
| | - Julio C Facelli
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA; Center of Excellence for Exposure Health Informatics, University of Utah, Salt Lake City, UT, USA; Clinical and Translational Science Institute, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
5
|
Olusanya MO, Ogunsakin RE, Ghai M, Adeleke MA. Accuracy of Machine Learning Classification Models for the Prediction of Type 2 Diabetes Mellitus: A Systematic Survey and Meta-Analysis Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph192114280. [PMID: 36361161 PMCID: PMC9655196 DOI: 10.3390/ijerph192114280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/22/2022] [Accepted: 10/25/2022] [Indexed: 05/13/2023]
Abstract
Soft-computing and statistical learning models have gained substantial momentum in predicting type 2 diabetes mellitus (T2DM) disease. This paper reviews recent soft-computing and statistical learning models in T2DM using a meta-analysis approach. We searched for papers using soft-computing and statistical learning models focused on T2DM published between 2010 and 2021 on three different search engines. Of 1215 studies identified, 34 with 136952 patients met our inclusion criteria. The pooled algorithm's performance was able to predict T2DM with an overall accuracy of 0.86 (95% confidence interval [CI] of [0.82, 0.89]). The classification of diabetes prediction was significantly greater in models with a screening and diagnosis (pooled proportion [95% CI] = 0.91 [0.74, 0.97]) when compared to models with nephropathy (pooled proportion = 0.48 [0.76, 0.89] to 0.88 [0.83, 0.91]). For the prediction of T2DM, the decision trees (DT) models had a pooled accuracy of 0.88 [95% CI: 0.82, 0.92], and the neural network (NN) models had a pooled accuracy of 0.85 [95% CI: 0.79, 0.89]. Meta-regression did not provide any statistically significant findings for the heterogeneous accuracy in studies with different diabetes predictions, sample sizes, and impact factors. Additionally, ML models showed high accuracy for the prediction of T2DM. The predictive accuracy of ML algorithms in T2DM is promising, mainly through DT and NN models. However, there is heterogeneity among ML models. We compared the results and models and concluded that this evidence might help clinicians interpret data and implement optimum models for their dataset for T2DM prediction.
Collapse
Affiliation(s)
- Micheal O. Olusanya
- Department of Computer Science and Information Technology, Sol Plaatje University, Kimberley 8300, South Africa
- Correspondence:
| | - Ropo Ebenezer Ogunsakin
- Biostatistics Unit, Discipline of Public Health Medicine, School of Nursing & Public Health, College of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Meenu Ghai
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | - Matthew Adekunle Adeleke
- Discipline of Genetics, School of Life Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| |
Collapse
|
6
|
Kushwaha S, Srivastava R, Jain R, Sagar V, Aggarwal AK, Bhadada SK, Khanna P. Harnessing machine learning models for non-invasive pre-diabetes screening in children and adolescents. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107180. [PMID: 36279639 DOI: 10.1016/j.cmpb.2022.107180] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 10/02/2022] [Accepted: 10/06/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVES Pre-diabetes has been identified as an intermediate diagnosis and a sign of a relatively high chance of developing diabetes in the future. Diabetes has become one of the most frequent chronic disorders in children and adolescents around the world; therefore, predicting the onset of pre-diabetes allows a person at risk to make efforts to avoid or restrict disease progression. This research aims to create and implement a cross-validated machine learning model that can predict pre-diabetes using non-invasive methods. METHODS We have analysed the national representative dataset of children and adolescents (5-19 years) to develop a machine learning model for non-invasive pre-diabetes screening. Based on HbA1c levels the data (n = 26,567) was segregated into normal (n = 23,777) and pre-diabetes (n = 2790). We have considered eight features, six hyper-tuned machine learning models and different metrics for model evaluation. The final model was selected based on the area under the receiver operator curve (AUC), Cohen's kappa and cross-validation score. The selected model was integrated into the screening tool for automated pre-diabetes prediction. RESULTS The XG boost classifier was the best model, including all eight features. The 10-fold cross-validation score was highest for the XG boost model (90.13%) and least for the support vector machine (61.17%). The AUC was highest for RF (0.970), followed by GB (0.968), XGB (0.959), ETC (0.918), DT (0.908), and SVM (0.574) models. The XGB model was used to develop the screening tool. CONCLUSION We have developed and deployed a machine learning model for automated real-time pre-diabetes screening. The screening tool can be used over computers and can be transformed into software for easy usage. The detection of pre-diabetes in the pediatric age may help avoid its enhancement. Machine learning can also show great competence in determining important features in pre-diabetes.
Collapse
Affiliation(s)
- Savitesh Kushwaha
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Rachana Srivastava
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Rachita Jain
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Vivek Sagar
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Arun Kumar Aggarwal
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Sanjay Kumar Bhadada
- Department of Endocrinology, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Poonam Khanna
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India.
| |
Collapse
|
7
|
Ramezankhani A, Habibi-Moeini AS, Zadeh SST, Azizi F, Hadaegh F. Effect of family history of diabetes and obesity status on lifetime risk of type 2 diabetes in the Iranian population. J Glob Health 2022; 12:04068. [PMID: 35939397 PMCID: PMC9359461 DOI: 10.7189/jogh.12.04068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Data are scarce for the lifetime risk of diabetes in the Middle East and North Africa region countries. We estimated the lifetime risk of type 2 diabetes among Iranian adults at age 20 and 40 years, and their variation by family history of diabetes and body mass index (BMI). Methods The data from 8435 diabetes-free participants from the Tehran Lipid and Glucose study were used in this analysis. We estimated the lifetime risk of diabetes stratified by sex, and quantified the impact of family history of diabetes and BMI status on the lifetime risks, singly and jointly. Results At age 20 years, the overall lifetime risk of diabetes was 57.8% (95% CI = 54.0%-61.8%) for men and 61.3% (57.2%-65.4%) for women. Having both family history of diabetes and increased level of BMI, alone, increased the lifetime risk of diabetes in both sexes. Moreover, the simultaneous presence of family history of diabetes and overweigh/obesity increased the lifetime risk of diabetes in both sexes. So that, at age 20 years the lifetime risk in obese men with positive family history of diabetes was about 54% higher, compared to normal weight men without family history of diabetes; the corresponding value for women was 42%. Also, normal weight men without family history of diabetes lived 24 years longer free of diabetes, compared with obese men with family history of diabetes. In women, the corresponding value was 20 years. Conclusions Our study shows the alarming lifetime risk of diabetes across the strata of BMI, which emphasizes the need for more effective interventions to reduce incidence, particularly, among individuals with a positive family history of diabetes.
Collapse
Affiliation(s)
- Azra Ramezankhani
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Siamak Habibi-Moeini
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed Saeed Tamehri Zadeh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzad Hadaegh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
8
|
Ji W, Xue M, Zhang Y, Yao H, Wang Y. A Machine Learning Based Framework to Identify and Classify Non-alcoholic Fatty Liver Disease in a Large-Scale Population. Front Public Health 2022; 10:846118. [PMID: 35444985 PMCID: PMC9013842 DOI: 10.3389/fpubh.2022.846118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2021] [Accepted: 02/23/2022] [Indexed: 12/12/2022] Open
Abstract
Non-alcoholic fatty liver disease (NAFLD) is a common serious health problem worldwide, which lacks efficient medical treatment. We aimed to develop and validate the machine learning (ML) models which could be used to the accurate screening of large number of people. This paper included 304,145 adults who have joined in the national physical examination and used their questionnaire and physical measurement parameters as model's candidate covariates. Absolute shrinkage and selection operator (LASSO) was used to feature selection from candidate covariates, then four ML algorithms were used to build the screening model for NAFLD, used a classifier with the best performance to output the importance score of the covariate in NAFLD. Among the four ML algorithms, XGBoost owned the best performance (accuracy = 0.880, precision = 0.801, recall = 0.894, F-1 = 0.882, and AUC = 0.951), and the importance ranking of covariates is accordingly BMI, age, waist circumference, gender, type 2 diabetes, gallbladder disease, smoking, hypertension, dietary status, physical activity, oil-loving and salt-loving. ML classifiers could help medical agencies achieve the early identification and classification of NAFLD, which is particularly useful for areas with poor economy, and the covariates' importance degree will be helpful to the prevention and treatment of NAFLD.
Collapse
Affiliation(s)
- Weidong Ji
- Department of Medical Information, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Mingyue Xue
- Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
| | - Yushan Zhang
- Department of Maternal and Child Health, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Hua Yao
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Yushan Wang
- Center of Health Management, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
- *Correspondence: Yushan Wang
| |
Collapse
|
9
|
Vehi J, Mujahid O, Contreras I. Aim and Diabetes. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
10
|
Yang T, Zhao B, Pei D. Estimation of the Prevalence of Nonalcoholic Fatty Liver Disease in an Adult Population in Northern China Using the Data Mining Approach. Diabetes Metab Syndr Obes 2021; 14:3437-3445. [PMID: 34349537 PMCID: PMC8326527 DOI: 10.2147/dmso.s320808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 07/15/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Nonalcoholic fatty liver disease (NAFLD) is the commonest form of chronic liver disease worldwide and its prevalence is rapidly increasing. Screening and early diagnosis of high-risk groups are important for the prevention and treatment of NAFLD; however, traditional imaging examinations are expensive and difficult to perform on a large scale. This study aimed to develop a simple and reliable predictive model based on the risk factors for NAFLD using a decision tree algorithm for the diagnosis of NAFLD and reduction of healthcare costs. METHODS This retrospective cross-sectional study included 22,819 participants who underwent annual health examinations between January 2019 and December 2019 at Physical Examination Center in Shengjing Hospital of China Medical University. After rigorous data screening, data of 9190 participants were retained in the final dataset for use in the J48 decision tree algorithm for the construction of predictive models. Approximately 66% of these patients (n=6065) were randomly assigned to the training dataset for the construction of the decision tree, while 34% of the patients (n=3125) were assigned to the test dataset to evaluate the performance of the decision tree. RESULTS The results showed that the J48 decision tree classifier exhibited good performance (accuracy=0.830, precision=0.837, recall=0.830, F-measure=0.830, and area under the curve=0.905). The decision tree structure revealed waist circumference as the most significant attribute, followed by triglyceride levels, systolic blood pressure, sex, age, and total cholesterol level. CONCLUSION Our study suggests that a decision tree analysis can be used to screen high-risk individuals for NAFLD. The key attributes in the tree structure can further contribute to the prevention of NAFLD by suggesting implementable targeted community interventions, which can help improve the outcome of NAFLD and reduce the burden on the healthcare system.
Collapse
Affiliation(s)
- TengFei Yang
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
| | - Bo Zhao
- Department of Pulmonary and Critical Care Medicine, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
| | - Dongmei Pei
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
| |
Collapse
|
11
|
A multiple combined method for rebalancing medical data with class imbalances. Comput Biol Med 2021; 134:104527. [PMID: 34091384 DOI: 10.1016/j.compbiomed.2021.104527] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 05/24/2021] [Accepted: 05/25/2021] [Indexed: 11/24/2022]
Abstract
Most classification algorithms assume that classes are in a balanced state. However, datasets with class imbalances are everywhere. The classes of actual medical datasets are imbalanced, severely impacting identification models and even sacrificing the classification accuracy of the minority class, even though it is the most influential and representative. The medical field has irreversible characteristics. Its tolerance rate for misjudgment is relatively low, and errors may cause irreparable harm to patients. Therefore, this study proposes a multiple combined method to rebalance medical data featuring class imbalances. The combined methods include (1) resampling methods (synthetic minority oversampling technique [SMOTE] and undersampling [US]), (2) particle swarm optimization (PSO), and (3) MetaCost. This study conducted two experiments with nine medical datasets to verify and compare the proposed method with the listing methods. A decision tree is used to generate decision rules for easy understanding of the research results. The results show that (1) the proposed method with ensemble learning can improve the area under a receiver operating characteristic curve (AUC), recall, precision, and F1 metrics; (2) MetaCost can increase sensitivity; (3) SMOTE can effectively enhance AUC; (4) US can improve sensitivity, F1, and misclassification costs in data with a high-class imbalance ratio; and (5) PSO-based attribute selection can increase sensitivity and reduce data dimension. Finally, we suggest that the dataset with an imbalanced ratio >9 must use the US results to make the decision. As the imbalanced ratio is < 9, the decision-maker can simultaneously consider the results of SMOTE and US to identify the best decision.
Collapse
|
12
|
Liu SQ, Ma XB, Song WM, Li YF, Li N, Wang LN, Liu JY, Tao NN, Li SJ, Xu TT, Zhang QY, An QQ, Liang B, Li HC. Using a risk model for probability of cancer in pulmonary nodules. Thorac Cancer 2021; 12:1881-1889. [PMID: 33973725 PMCID: PMC8201526 DOI: 10.1111/1759-7714.13991] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/19/2021] [Indexed: 12/24/2022] Open
Abstract
Background Considering the high morbidity and mortality of lung cancer and the high incidence of pulmonary nodules, clearly distinguishing benign from malignant lung nodules at an early stage is of great significance. However, determining the kind of lung nodule which is more prone to lung cancer remains a problem worldwide. Methods A total of 480 patients with pulmonary nodule data were collected from Shandong, China. We assessed the clinical characteristics and computed tomography (CT) imaging features among pulmonary nodules in patients who had undergone video‐assisted thoracoscopic surgery (VATS) lobectomy from 2013 to 2018. Preliminary selection of features was based on a statistical analysis using SPSS. We used WEKA to assess the machine learning models using its multiple algorithms and selected the best decision tree model using its optimization algorithm. Results The combination of decision tree and logistics regression optimized the decision tree without affecting its AUC. The decision tree structure showed that lobulation was the most important feature, followed by spiculation, vessel convergence sign, nodule type, satellite nodule, nodule size and age of patient. Conclusions Our study shows that decision tree analyses can be applied to screen individuals for early lung cancer with CT. Our decision tree provides a new way to help clinicians establish a logical diagnosis by a stepwise progression method, but still needs to be validated for prospective trials in a larger patient population.
Collapse
Affiliation(s)
- Si-Qi Liu
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Xiao-Bin Ma
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Wan-Mei Song
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Yi-Fan Li
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ning Li
- Shandong Medical Imaging Research Institute, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Li-Na Wang
- Department of Medical Imaging, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Jin-Yue Liu
- Department of Intensive Care Unit, Shandong Provincial Third Hospital, Jinan, China
| | - Ning-Ning Tao
- Department of Respiratory and Critical Care Medicine, Beijing Hospital, Beijing, China.,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Shi-Jin Li
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Ting-Ting Xu
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Qian-Yun Zhang
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Qi-Qi An
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Bin Liang
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Huai-Chen Li
- Department of Respiratory and Critical Care Medicine, Shandong Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,College of Traditional Chinese Medicine, Shandong University of Traditional Chinese Medicine, Jinan, China
| |
Collapse
|
13
|
Saberi-Karimian M, Khorasanchi Z, Ghazizadeh H, Tayefi M, Saffar S, Ferns GA, Ghayour-Mobarhan M. Potential value and impact of data mining and machine learning in clinical diagnostics. Crit Rev Clin Lab Sci 2021; 58:275-296. [PMID: 33739235 DOI: 10.1080/10408363.2020.1857681] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Data mining involves the use of mathematical sciences, statistics, artificial intelligence, and machine learning to determine the relationships between variables from a large sample of data. It has previously been shown that data mining can improve the prediction and diagnostic precision of type 2 diabetes mellitus. A few studies have applied machine learning to assess hypertension and metabolic syndrome-related biomarkers, as well as refine the assessment of cardiovascular disease risk. Machine learning methods have also been applied to assess new biomarkers and survival outcomes in patients with renal diseases to predict the development of chronic kidney disease, disease progression, and renal graft survival. In the latter, random forest methods were found to be the best for the prediction of chronic kidney disease. Some studies have investigated the prognosis of nonalcoholic fatty liver disease and acute liver failure, as well as therapy response prediction in patients with viral disorders, using decision tree models. Machine learning techniques, such as Sparse High-Order Interaction Model with Rejection Option, have been used for diagnosing Alzheimer's disease. Data mining techniques have also been applied to identify the risk factors for serious mental illness, such as depression and dementia, and help to diagnose and predict the quality of life of such patients. In relation to child health, some studies have determined the best algorithms for predicting obesity and malnutrition. Machine learning has determined the important risk factors for preterm birth and low birth weight. Published studies of patients with cancer and bacterial diseases are limited and should perhaps be addressed more comprehensively in future studies. Herein, we provide an in-depth review of studies in which biochemical biomarker data were analyzed using machine learning methods to assess the risk of several common diseases, in order to summarize the potential applications of data mining methods in clinical diagnosis. Data mining techniques have now been increasingly applied to clinical diagnostics, and they have the potential to support this field.
Collapse
Affiliation(s)
- Maryam Saberi-Karimian
- International UNESCO Center for Health Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran.,Student Research Committee, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Zahra Khorasanchi
- Department of Nutrition, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Hamideh Ghazizadeh
- International UNESCO Center for Health Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran.,Student Research Committee, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Maryam Tayefi
- Norwegian Center for e-health Research, University Hospital of North Norway, Tromsø, Norway
| | - Sara Saffar
- International UNESCO Center for Health Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Division of Medical Education, Brighton and Sussex Medical School, Falmer, UK
| | - Majid Ghayour-Mobarhan
- International UNESCO Center for Health Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
14
|
|
15
|
Contador Pachón S, Botella Serrano M, Garnica Alcázar O, Velasco Cabo JM, Aramendi Zurimendi A, Rodríguez Martínez R, Maqueda Villaizán E, Hidalgo Pérez JI. Identification of blood glucose patterns in patients with type 1 diabetes using continuous glucose monitoring and clustering technique. ENDOCRINOL DIAB NUTR 2021; 68:170-174. [PMID: 34167696 DOI: 10.1016/j.endien.2021.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 12/24/2019] [Indexed: 06/13/2023]
Abstract
OBJECTIVE To show that statistical techniques allow for obtaining a reduced number of four-hour glucose profiles that can identify any glucose behavior in patients with type 1 diabetes mellitus. PATIENTS AND METHODS A retrospective study of 10 patients with type 1 diabetes mellitus was conducted using data collected by continuous glucose monitoring. A data mining technique based on decision trees called CHAID (Chi-square Automatic Interaction Detection) was used to classify glucose profiles into groups using two decision criteria. These were 1, the seven days of the week and 2, different time slots, the day being divided into six sections of four hours each. Clustering was performed according to the glucose levels recorded using the statistically significant differences found. RESULTS Significant differences (P-value <.05) and dependencies were seen between the glucose profiles classified depending on the independent variables 'day of the week' and 'time slot'. The relationships found were different for each patient, showing the need for individualized studies. CONCLUSIONS The results obtained will facilitate mathematical modeling of glucose, and can be used to develop an individualized classifier for each patient that categorizes glucose profiles based on the day of the week and time slot variables. Using this classifier, it will be possible to predict the glucose levels of the patient knowing on which day of the week and in which time slot he/she is, leading to more precise models. Healthcare professionals will also be able to improve patient habits and therapies.
Collapse
Affiliation(s)
| | - Marta Botella Serrano
- Servicio de Endocrinología y Nutrición, Hospital Universitario Príncipe de Asturias, Alcalá de Henares, Madrid, Spain
| | - Oscar Garnica Alcázar
- Departamento de Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Madrid, Spain
| | - José Manuel Velasco Cabo
- Departamento de Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Madrid, Spain
| | - Aranzanzu Aramendi Zurimendi
- Servicio de Endocrinología y Nutrición, Hospital Universitario Príncipe de Asturias, Alcalá de Henares, Madrid, Spain
| | - Remedios Rodríguez Martínez
- Servicio de Endocrinología y Nutrición, Hospital Universitario Príncipe de Asturias, Alcalá de Henares, Madrid, Spain
| | | | - José Ignacio Hidalgo Pérez
- Departamento de Arquitectura de Computadores y Automática, Universidad Complutense de Madrid, Madrid, Spain.
| |
Collapse
|
16
|
Contador Pachón S, Botella Serrano M, Garnica Alcázar O, Velasco Cabo JM, Aramendi Zurimendi A, Rodríguez Martínez R, Maqueda Villaizán E, Hidalgo Pérez JI. Identificación de patrones de glucemia en pacientes con diabetes tipo 1 mediante monitorización continua de glucosa y técnicas de clusterización. ENDOCRINOL DIAB NUTR 2021; 68:170-174. [DOI: 10.1016/j.endinu.2019.12.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 12/22/2019] [Accepted: 12/24/2019] [Indexed: 10/24/2022]
|
17
|
Aim and Diabetes. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_158-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
Amirabadizadeh A, Nakhaee S, Mehrpour O. Risk assessment of elevated blood lead concentrations in the adult population using a decision tree approach. Drug Chem Toxicol 2020; 45:878-885. [DOI: 10.1080/01480545.2020.1783286] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Alireza Amirabadizadeh
- Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences, Birjand, Iran
| | - Samaneh Nakhaee
- Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences, Birjand, Iran
| | - Omid Mehrpour
- Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences, Birjand, Iran
- Rocky Mountain Poison and Drug Safety, Denver Health and Hospital Authority, Denver, CO, USA
| |
Collapse
|
19
|
Zhang L, Wang Y, Niu M, Wang C, Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci Rep 2020; 10:4406. [PMID: 32157171 PMCID: PMC7064542 DOI: 10.1038/s41598-020-61123-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 02/19/2020] [Indexed: 01/19/2023] Open
Abstract
With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression (LR), classification and regression tree (CART), artificial neural networks (ANN), support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM). The model performance was measured in an area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value and area under precision recall curve. The importance of variables was identified based on each classifier and the shapley additive explanations approach. Using all available variables, all models for predicting risk of T2DM demonstrated strong predictive performance, with AUCs ranging between 0.811 and 0.872 using laboratory data and from 0.767 to 0.817 without laboratory data. Among them, the GBM model performed best (AUC: 0.872 with laboratory data and 0.817 without laboratory data). Performance of models plateaued when introduced 30 variables to each model except CART model. Among the top-10 variables across all methods were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension. New important risk factors (urinary indicators, sweet flavor) were not found in previous risk prediction methods, but determined by machine learning in our study. Through the results, machine learning methods showed competence in predicting risk of T2DM, leading to greater insights on disease risk factors with no priori assumption of causality.
Collapse
Affiliation(s)
- Liying Zhang
- School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, P.R. China
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Yikang Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Miaomiao Niu
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Chongjian Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan, P.R. China
| | - Zhenfei Wang
- School of Information Engineering, Zhengzhou University, Zhengzhou, Henan, P.R. China.
| |
Collapse
|
20
|
Vallée A, Cinaud A, Protogerou A, Zhang Y, Topouchian J, Safar ME, Blacher J. Arterial Stiffness and Coronary Ischemia: New Aspects and Paradigms. Curr Hypertens Rep 2020; 22:5. [PMID: 31925555 DOI: 10.1007/s11906-019-1006-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
PURPOSE OF REVIEW Aortic stiffness (AS) is widely associated with hypertension and considered as a major predictor of coronary heart disease (CHD). AS is measured using carotid-femoral pulse wave velocity (PWV), particularly when this parameter is associated with an index involving age, gender, heart rate, and mean blood pressure. The present review focuses on the interest of measurement of PWV and the calculation of individual PWV index for the prediction of CHD, in addition with the use of new statistical nonlinear models enabling results with very high levels of accuracy. RECENT FINDINGS PWV index may so constitute a substantial marker of large arteries prediction and damage in CHD and may be also used in cerebrovascular and renal circulations models. PWV index determinations are particularly relevant to consider in angiographic CHD decisions and in the presence of vulnerable plaques with high cardiovascular risk. Due to the variability in symptoms and clinical characteristics of patients, together with some imperfections in results, there is no very simple adequate diagnosis approach enabling to improve the so defined CHD prediction in usual clinical practice. In recent works in relation to "artificial intelligence" and involving "decision tree" models and "artificial neural networks," it has been possible to determine consistent pathways introducing predictive medicine and enabling to obtain efficient algorithm classification models of coronary prediction.
Collapse
Affiliation(s)
- Alexandre Vallée
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Paris-Descartes University, AP-HP, Paris, France.
| | - Alexandre Cinaud
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Paris-Descartes University, AP-HP, Paris, France
| | - Athanase Protogerou
- Cardiovascular Prevention and Research Unit, Department of Pathophysiology, National and Kapodistrian University of Athens, Athens, Greece
| | - Yi Zhang
- Department of Cardiology, Shanghai Tenth People's Hospital, Tongji University School of Medicine, Shanghai, China
| | - Jirar Topouchian
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Paris-Descartes University, AP-HP, Paris, France
| | - Michel E Safar
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Paris-Descartes University, AP-HP, Paris, France
| | - Jacques Blacher
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Paris-Descartes University, AP-HP, Paris, France
| |
Collapse
|
21
|
Xue M, Su Y, Li C, Wang S, Yao H. Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework. J Diabetes Res 2020; 2020:6873891. [PMID: 33029536 PMCID: PMC7532405 DOI: 10.1155/2020/6873891] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 08/01/2020] [Accepted: 09/02/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND An estimated 425 million people globally have diabetes, accounting for 12% of the world's health expenditures, and the number continues to grow, placing a huge burden on the healthcare system, especially in those remote, underserved areas. METHODS A total of 584,168 adult subjects who have participated in the national physical examination were enrolled in this study. The risk factors for type II diabetes mellitus (T2DM) were identified by p values and odds ratio, using logistic regression (LR) based on variables of physical measurement and a questionnaire. Combined with the risk factors selected by LR, we used a decision tree, a random forest, AdaBoost with a decision tree (AdaBoost), and an extreme gradient boosting decision tree (XGBoost) to identify individuals with T2DM, compared the performance of the four machine learning classifiers, and used the best-performing classifier to output the degree of variables' importance scores of T2DM. RESULTS The results indicated that XGBoost had the best performance (accuracy = 0.906, precision = 0.910, recall = 0.902, F-1 = 0.906, and AUC = 0.968). The degree of variables' importance scores in XGBoost showed that BMI was the most significant feature, followed by age, waist circumference, systolic pressure, ethnicity, smoking amount, fatty liver, hypertension, physical activity, drinking status, dietary ratio (meat to vegetables), drink amount, smoking status, and diet habit (oil loving). CONCLUSIONS We proposed a classifier based on LR-XGBoost which used fourteen variables of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of T2DM. The classifier can accurately screen the risk of diabetes in the early phrase, and the degree of variables' importance scores gives a clue to prevent diabetes occurrence.
Collapse
Affiliation(s)
- Mingyue Xue
- Hospital of Traditional Chinese Medicine Affiliated to the Fourth Clinical Medical College of Xinjiang Medical University, Urumqi, China
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Yinxia Su
- College of Public Health, Xinjiang Medical University, Urumqi, China
| | - Chen Li
- The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
| | - Shuxia Wang
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| | - Hua Yao
- Center of Health Management, The First Affiliated Hospital, Xinjiang Medical University, Urumqi, China
| |
Collapse
|
22
|
Pei D, Yang T, Zhang C. Estimation of Diabetes in a High-Risk Adult Chinese Population Using J48 Decision Tree Model. Diabetes Metab Syndr Obes 2020; 13:4621-4630. [PMID: 33273837 PMCID: PMC7705272 DOI: 10.2147/dmso.s279329] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 10/27/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND To predict and make an early diagnosis of diabetes is a critical approach in a population with high risk of diabetes, one of the devastating diseases globally. Traditional and conventional blood tests are recommended for screening the suspected patients; however, applying these tests could have health side effects and expensive cost. The goal of this study was to establish a simple and reliable predictive model based on the risk factors associated with diabetes using a decision tree algorithm. METHODS A retrospective cross-sectional study was used in this study. A total of 10,436 participants who had a health check-up from January 2017 to July 2017 were recruited. With appropriate data mining approaches, 3454 participants remained in the final dataset for further analysis. Seventy percent of these participants (2420 cases) were then randomly allocated to either the training dataset for the construction of the decision tree or the testing dataset (30%, 1034 cases) for evaluation of the performance of the decision tree. For this purpose, the cost-sensitive J48 algorithm was used to develop the decision tree model. RESULTS Utilizing all the key features of the dataset consisting of 14 input variables and two output variables, the constructed decision tree model identified several key factors that are closely linked to the development of diabetes and are also modifiable. Furthermore, our model achieved an accuracy of classification of 90.3% with a precision of 89.7% and a recall of 90.3%. CONCLUSION By applying simple and cost-effective classification rules, our decision tree model estimates the development of diabetes in a high-risk adult Chinese population with strong potential for implementation of diabetes management.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
- Correspondence: Dongmei Pei Department of Health Management, Shengjing Hospital of China Medical University, No. 36, Sanhao Street, Heping District, Shenyang110004, People’s Republic of China Email
| | - Tengfei Yang
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
| | - Chengpu Zhang
- Department of Health Management, Shengjing Hospital of China Medical University, Shenyang, People’s Republic of China
| |
Collapse
|
23
|
Exploring the Important Attributes of Human Immunodeficiency Virus and Generating Decision Rules. Symmetry (Basel) 2020. [DOI: 10.3390/sym12010067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Acquired Immunodeficiency Syndrome (AIDS) is the most severe phase of Human Immunodeficiency Virus (HIV) infection. Living with HIV results in a weakened immune system, with AIDS being the final stage of HIV and puzzling the world. The current medical environment remains unable to effectively cure AIDS, with treatment depending on long-term antiretroviral therapy (ART). To effectively treat and prevent HIV, it is important to elucidate the key factors of HIV propagation. This study proposes a rough set classifier based on adding recency (R) (i.e., the last physician visit), frequency (F) (i.e., the frequency of medical visits), and monetary (M) (i.e., medication adherence) attributes and integrated attribute selection methods to generate discriminatory rules and find the core attributes of HIV. The collected data consist of 1308 HIV infection records from Taiwan. From the experimental results, the frequency of CD4+ cells in the peripheral blood is able to determine patient medication, treatment willingness, and HIV infection stages, because HIV patients are less likely to be willing to receive long-term ART. Furthermore, drug abuse is found to be the greatest cause of HIV infection. These results show that the additional RFM attributes can improve classification accuracy, with the core attributes being M, R, plasma viral load (PVL) and age. Hence, we suggest that clinical physicians use these core attributes to understand the HIV infection stages.
Collapse
|
24
|
Vallée A, Safar ME, Blacher J. Application of a decision tree to establish factors associated with a nomogram of aortic stiffness. J Clin Hypertens (Greenwich) 2019; 21:1484-1492. [PMID: 31479194 DOI: 10.1111/jch.13662] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 05/20/2019] [Accepted: 05/28/2019] [Indexed: 11/29/2022]
Abstract
Aortic stiffness is a marker of vascular aging and may reflect occurrence of cardiovascular (CV) diseases. Aortic pulse wave velocity (PWV), a marker of aortic stiffness, can be measured by applanation tonometry. A nomogram of aortic stiffness was evaluated by the calculation of PWV index. Theoretical PWV can be calculated according to age, gender, mean blood pressure, and heart rate, allowing to form an individual PWV index [(measured PWV - theoretical PWV)/theoretical PWV]. The purpose of the present cross-sectional study was to investigate the determinants of the PWV index, by applying a decision tree. A cross-sectional study was conducted from 2012 to 2017, and 597 individuals were included. A training decision tree was constructed based on seventy percent of these subjects (N = 428). The remaining 30% (N = 169) were used as the testing dataset to evaluate the performance of the decision trees. The input variables for the models were clinical and biochemical parameters. The different input variables remained in the model were diabetes, tobacco status, carotid plaque, albuminuria, C-reactive protein, total cholesterol, BMI, and previous CV diseases. For the validation decision model, the sensitivity, specificity, and accuracy values for identifying the related risk factors of PWV index were 70%, 78%, and 0.73. Since determinants of PWV index were all well-accepted CV risk factors, a nomogram of aortic stiffness could be considered as an integrator of CV risk factors on their duration of exposure and could be utilized to develop future programs for CV risk assessment and reduction strategies.
Collapse
Affiliation(s)
- Alexandre Vallée
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, AP-HP, Paris-Descartes University, Paris, France
| | - Michel E Safar
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, AP-HP, Paris-Descartes University, Paris, France
| | - Jacques Blacher
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, AP-HP, Paris-Descartes University, Paris, France
| |
Collapse
|
25
|
Gonoodi K, Tayefi M, Bahrami A, Amirabadi Zadeh A, Ferns GA, Mohammadi F, Eslami S, Ghayour Mobarhan M. Determinants of the magnitude of response to vitamin D supplementation in adolescent girls identified using a decision tree algorithm. Biofactors 2019; 45:795-802. [PMID: 31355993 DOI: 10.1002/biof.1540] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 06/13/2019] [Indexed: 12/23/2022]
Abstract
Vitamin D (VitD) supplementation is an inexpensive and effective approach for improving VitD insufficiency/deficiency. However, the response to supplementation, with respect to the increase in serum 25(OH)D level varies between individuals. In this study, we have assessed the factors associated with the response to VitD supplementation using a decision-tree algorithm. Serum VitD levels, pre- and post-VitD supplementation was used as the determinant of responsiveness. The model was validated by constructing a receiver operating characteristic curve. Serum VitD at baseline levels was at the apex of the tree in our model, followed by serum low-density lipoprotein cholesterol and triglyceride, age, waist-hip ratio, and high-density lipoprotein cholesterol. Our model suggests that these determinants of responsiveness to VitD supplementation had sensitivity, specificity, and accuracy, 59.4, 75.8 and 69.3%, respectively. The decision tree model appears to be a relatively accurate, specific, and sensitive approach for identifying the factors associated with response to VitD supplementation.
Collapse
Affiliation(s)
- Kayhan Gonoodi
- Department of Nutrition, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Maryam Tayefi
- Norwegian Center for e-health Research, University hospital of North Norway, Tromsø, Norway
- Clinical Research Unit, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Afsane Bahrami
- Cellular and Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran
| | - Alireza Amirabadi Zadeh
- Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences, Birjand, Iran
| | - Gordon A Ferns
- Brighton & Sussex Medical School, Division of Medical Education, Sussex, UK
| | - Farzaneh Mohammadi
- Department of Nutrition, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Saeid Eslami
- Pharmaceutical Research Center, Mashhad University of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Majid Ghayour Mobarhan
- Department of Nutrition, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
26
|
Vallée A, Petruescu L, Kretz S, Safar ME, Blacher J. Added Value of Aortic Pulse Wave Velocity Index in a Predictive Diagnosis Decision Tree of Coronary Heart Disease. Am J Hypertens 2019; 32:375-383. [PMID: 30624553 DOI: 10.1093/ajh/hpz004] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 01/01/2019] [Accepted: 01/08/2019] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Coronary heart disease (CHD) is among the main causes of death in the world. Individual study of cardiovascular risk is an important way to predict CHD risk. The aim of this study was to evaluate the added role of the aortic pulse wave velocity (PWV) index in the prediction of CHD risk. METHODS A cross-sectional study was conducted from December 2012 to September 2017; 530 patients were included: 99 CHD, 338 non-CHD patients, and 93 nonhypertensives, nondiabetics and non-CHD subjects, whose theoretical PWV were calculated. Theoretical PWV was calculated according to age, blood pressure, gender, and heart rate. The results were expressed as an index ((measured PWV - theoretical PWV)/theoretical PWV) for each patient. The differences observed, the differential diagnostic performance, and the quantification of the added value of diagnostic performance of PWV index were tested using logistic regression, comparisons between receiver operating characteristic (ROC) curves, and decision tree nonlinear methodology. RESULTS PWV index (P = 0.006), carotid plaque (P = 0.005), and dyslipidemia (P = 0.04) were the independent modulators of CHD diagnosis. PWV index appears to be the highest specific classifier (81%) compared to carotid plaque (75%) and dyslipidemia (78%). For the decision tree, sensitivity, specificity, and area under the ROC curve for CHD diagnosis were 62%, 83%, and 0.87, respectively. CONCLUSIONS PWV index yielded added value to CHD by assessment of combined classifiers with clinical determinants and decision tree construction and significantly increased the specificity of the differential diagnostic performances of the common risk factors of CHD in daily clinical practice.
Collapse
Affiliation(s)
- Alexandre Vallée
- Paris-Descartes University, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - Laura Petruescu
- Paris-Descartes University, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - Sandrine Kretz
- Paris-Descartes University, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - Michel E Safar
- Paris-Descartes University, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - Jacques Blacher
- Paris-Descartes University, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
- Diagnosis and Therapeutic Center, Hypertension and Cardiovascular Prevention Unit, Hôtel-Dieu Hospital, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| |
Collapse
|
27
|
Pei D, Gong Y, Kang H, Zhang C, Guo Q. Accurate and rapid screening model for potential diabetes mellitus. BMC Med Inform Decis Mak 2019; 19:41. [PMID: 30866905 PMCID: PMC6416888 DOI: 10.1186/s12911-019-0790-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 03/03/2019] [Indexed: 11/26/2022] Open
Abstract
Background Prediction or early diagnosis of diabetes is crucial for populations with high risk of diabetes. Methods In this study, we assessed the ability of five popular classifiers (J48, AdaboostM1, SMO, Bayes Net, and Naïve Bayes) to identify individuals with diabetes based on nine non-invasive and easily obtained clinical features, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work stress, and salty food preference. A total of 4205 data entries were obtained from annual physical examination reports for adults in the Shengjing Hospital of China Medical University during January–April 2017. Weka data mining software was used to identify the best algorithm for diabetes classification. Results The results indicate that decision tree classifier J48 has the best performance (accuracy = 0.9503, precision = 0.950, recall = 0.950, F-measure = 0.948, and AUC = 0.964). The decision tree structure shows that age is the most significant feature, followed by family history of diabetes, work stress, BMI, salty food preference, physical activity, hypertension, gender, and history of cardiovascular disease or stroke. Conclusions Our study shows that decision tree analyses can be applied to screen individuals for early diabetes risk without the need for invasive tests. This procedure will be particularly useful in developing regions with high epidemiological risk and poor socioeconomic status, and enable clinical practitioners to rapidly screen patients for increased risk of diabetes. The key features in the tree structure could further facilitate diabetes prevention through targeted community interventions, which can potentially improve early diabetes diagnosis and reduce burdens on the healthcare system.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Yang Gong
- University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hong Kang
- University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Chengpu Zhang
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Qiyong Guo
- Department of radiology, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China.
| |
Collapse
|
28
|
Pei D, Zhang C, Quan Y, Guo Q. Identification of Potential Type II Diabetes in a Chinese Population with a Sensitive Decision Tree Approach. J Diabetes Res 2019; 2019:4248218. [PMID: 30805372 PMCID: PMC6362481 DOI: 10.1155/2019/4248218] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 11/20/2018] [Accepted: 12/18/2018] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Diabetes mellitus is a chronic disease with a steadfast increase in prevalence. Due to the chronic course of the disease combining with devastating complications, this disorder could easily carry a financial burden. The early diagnosis of diabetes remains as one of the major challenges medical providers are facing, and the satisfactory screening tools or methods are still required, especially a population- or community-based tool. METHODS This is a retrospective cross-sectional study involving 15,323 subjects who underwent the annual check-up in the Department of Family Medicine of Shengjing Hospital of China Medical University from January 2017 to June 2017. With a strict data filtration, 10,436 records from the eligible participants were utilized to develop a prediction model using the J48 decision tree algorithm. Nine variables, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work-related stress, and salty food preference, were considered. RESULTS The accuracy, precision, recall, and area under the receiver operating characteristic curve (AUC) value for identifying potential diabetes were 94.2%, 94.0%, 94.2%, and 94.8%, respectively. The structure of the decision tree shows that age is the most significant feature. The decision tree demonstrated that among those participants with age ≤ 49, 5497 participants (97%) of the individuals were identified as nondiabetic, while age > 49, 771 participants (50%) of the individuals were identified as nondiabetic. In the subgroup where people were 34 < age ≤ 49 and BMI ≥ 25, when with positive family history of diabetes, 89 (92%) out of 97 individuals were identified as diabetic and, when without family history of diabetes, 576 (58%) of the individuals were identified as nondiabetic. Work-related stress was identified as being associated with diabetes. In individuals with 34 < age ≤ 49 and BMI ≥ 25 and without family history of diabetes, 22 (51%) of the individuals with high work-related stress were identified as nondiabetic while 349 (88%) of the individuals with low or moderate work-related stress were identified as not having diabetes. CONCLUSIONS We proposed a classifier based on a decision tree which used nine features of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of diabetes. The classifier indicates that a decision tree analysis can be successfully applied to screen diabetes, which will support clinical practitioners for rapid diabetes identification. The model provides a means to target the prevention of diabetes which could reduce the burden on the health system through effective case management.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Chengpu Zhang
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Yu Quan
- Department of Informatics, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Qiyong Guo
- Department of Radiology, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| |
Collapse
|
29
|
Ramezankhani A, Harati H, Bozorgmanesh M, Tohidi M, Khalili D, Azizi F, Hadaegh F. Diabetes Mellitus: Findings from 20 Years of the Tehran Lipid and Glucose Study. Int J Endocrinol Metab 2018; 16:e84784. [PMID: 30584445 PMCID: PMC6289292 DOI: 10.5812/ijem.84784] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Revised: 10/02/2018] [Accepted: 10/07/2018] [Indexed: 12/17/2022] Open
Abstract
CONTEXT We summarized findings from Tehran lipid and glucose study (TLGS) about different aspects of type 2 diabetes (T2D) over the span of nearly 2 decades. EVIDENCE ACQUISITION A review was undertaken to retrieve papers related to all aspects of T2D from the earliest date available up to January 30, 2018. RESULTS An annual crude incidence rate of 10 per 1000 person-years of follow-up was found for T2D in adult participants. Overall incidence rate of pre-diabetes/T2D was 36.3 per 1000 person-years or about 1% each year among youth. Diabetes was associated with increased risk of CVD [hazard ratio (HR): 1.86, 95% confidence interval (95% CI): 1.57 - 2.27] and mortality [HR: 2.56; 95% CI: 2.08 - 3.16] in the total population. Compared with non-diabetic men and women, their diabetic counterparts survived 1.4 and 0.7 years shorter, respectively, during 15 years of follow-up. Wrist circumference, hyperinsulinaemia, 25-hydroxy vitamin D and increase in alanin aminotranferase provided incremental prognostic information beyond the traditional risk factors for incident T2D in adults. Using decision tree algorithms, a number of high risk groups were found for incident T2D. A probability of 84% was found for incidence of T2D among a group of men with fasting plasma glucose (FPG) > 5.3 mmol/L and waist to height ratio (WHtR) > 0.56, and women with FPG > 5.2 mmol/L and WHtR > 0.56. CONCLUSIONS Original TLGS studies have contributed greatly to clarify important evidence regarding the epidemiology and risk factors for T2D among Iranian population.
Collapse
Affiliation(s)
- Azra Ramezankhani
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University
of Medical Sciences, Tehran, Iran
| | - Hadi Harati
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University
of Medical Sciences, Tehran, Iran
| | - Mohammadreza Bozorgmanesh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University
of Medical Sciences, Tehran, Iran
| | - Maryam Tohidi
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University
of Medical Sciences, Tehran, Iran
| | - Davood Khalili
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University
of Medical Sciences, Tehran, Iran
- Department of Biostatistics and Epidemiology, Research Institute for Endocrine Sciences,
Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Fereidoun Azizi
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University
of Medical Sciences, Tehran, Iran
- Endocrine Research Center, Research Institute for Endocrine
Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzad Hadaegh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University
of Medical Sciences, Tehran, Iran
- Corresponding Author: Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University
of Medical Sciences, Tehran, Iran. Tel: +98-2122435200, E-mail:
| |
Collapse
|
30
|
Open data mining for Taiwan's dengue epidemic. Acta Trop 2018; 183:1-7. [PMID: 29549012 DOI: 10.1016/j.actatropica.2018.03.017] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Revised: 02/19/2018] [Accepted: 03/10/2018] [Indexed: 11/22/2022]
Abstract
By using a quantitative approach, this study examines the applicability of data mining technique to discover knowledge from open data related to Taiwan's dengue epidemic. We compare results when Google trend data are included or excluded. Data sources are government open data, climate data, and Google trend data. Research findings from analysis of 70,914 cases are obtained. Location and time (month) in open data show the highest classification power followed by climate variables (temperature and humidity), whereas gender and age show the lowest values. Both prediction accuracy and simplicity decrease when Google trends are considered (respectively 0.94 and 0.37, compared to 0.96 and 0.46). The article demonstrates the value of open data mining in the context of public health care.
Collapse
|
31
|
Noshad S, Afarideh M, Heidari B, Mechanick JI, Esteghamati A. Diabetes Care in Iran: Where We Stand and Where We Are Headed. Ann Glob Health 2018; 81:839-50. [PMID: 27108151 DOI: 10.1016/j.aogh.2015.10.003] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The prevalence of diabetes has steadily increased in Iran from the time of the first published nationally representative survey in 1999 and despite efforts and strategies to reduce disease burden. OBJECTIVES The aim of the present review was to describe the current status of diabetes care in Iran. METHODS A selective review of the relevant literature, focusing on properly conducted studies, describing past and present diabetes care strategies, policies, and outcomes in Iran was performed. FINDINGS The quality of diabetes care has gradually improved as suggested by a reduction in the proportion of undiagnosed patients and an increase in affordability of diabetes medications. The National Program for Prevention and Control of Diabetes has proven successful at identifying high-risk individuals, particularly in rural and remote-access areas. Unfortunately, the rising tide of diabetes is outpacing these efforts by a considerable margin. CONCLUSIONS Substantial opportunities and challenges in the areas of prevention, diagnosis, and management of diabetes exist in Iran that need to be addressed to further improve the quality of care and clinical outcomes.
Collapse
Affiliation(s)
- Sina Noshad
- Endocrinology and Metabolism Research Center, Vali-Asr Hospital, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Mohsen Afarideh
- Endocrinology and Metabolism Research Center, Vali-Asr Hospital, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Behnam Heidari
- Endocrinology and Metabolism Research Center, Vali-Asr Hospital, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran
| | - Jeffrey I Mechanick
- Division of Endocrinology, Diabetes, and Bone Disease, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Alireza Esteghamati
- Endocrinology and Metabolism Research Center, Vali-Asr Hospital, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
32
|
Ramezankhani A, Tohidi M, Azizi F, Hadaegh F. Application of survival tree analysis for exploration of potential interactions between predictors of incident chronic kidney disease: a 15-year follow-up study. J Transl Med 2017; 15:240. [PMID: 29183386 PMCID: PMC5706148 DOI: 10.1186/s12967-017-1346-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 11/14/2017] [Indexed: 12/23/2022] Open
Abstract
Background Chronic kidney disease (CKD) is a growing public health challenges worldwide. Various studies have investigated risk factors of incident CKD; however, a very few studies examined interaction between these risk factors. In an attempt to clarify the potential interactions between risk factors of CKD, we performed survival tree analysis. Methods A total of 8238 participants (46.1% men) aged > 20 years without CKD at baseline [(1999–2001) and (2002–2005)], were followed until 2014. The first occurrence of CKD, defined as the estimated glomerular filtration rate (eGFR) < 60 ml/min/1.73 m2, was set as the main outcome. Multivariable Cox proportional hazard (Cox PH) regression was used to identify significant independent predictors of CKD; moreover, survival tree analysis was performed to gain further insight into the potential interactions between predictors. Results The crude incidence rates of CKD were 20.2 and 35.2 per 1000 person-years in men and women, respectively. The Cox PH identified the main effect of significant predictors of CKD incidence in men and women. In addition, using a limited number of predictors, survival trees identified 12 and 10 subgroups among men and women, respectively, with different survival probability. Accordingly, a group of men with eGFR > 74 ml/min/1.73 m2, age ≤ 46 years, low level of physical activity, waist circumference ≤ 100 cm and FPG ≤ 4.7 mmol/l had the lowest risk of CKD incidence; while men with eGFR ≤ 63.4 ml/min/1.73 m2, age > 50 years had the highest risk for CKD compared to men in the lowest risk group [hazard ratio (HR), 70.68 (34.57–144.52)]. Also, a group of women aged ≤ 45 years and eGFR > 83.5 ml/min/1.73 m2 had the lowest risk; while women with age > 48 years and eGFR ≤ 69 ml/min/1.73 m2 had the highest risk compared to low risk group [HR 27.25 (19.88–37.34)]. Conclusion In this post hoc analysis, we found the independent predictors of CKD using Cox PH; furthermore, by applying survival tree analysis we identified several numbers of homogeneous subgroups with different risk for incidence of CKD. Our study suggests that two methods can be used simultaneously to provide new insights for intervention programs and improve clinical decision making.
Collapse
Affiliation(s)
- Azra Ramezankhani
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Floor 3th, Number 24, Yemen Street, Shahid Chamran Highway, P.O. Box: 19395-4763, Tehran, Iran
| | - Maryam Tohidi
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Floor 3th, Number 24, Yemen Street, Shahid Chamran Highway, P.O. Box: 19395-4763, Tehran, Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzad Hadaegh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Floor 3th, Number 24, Yemen Street, Shahid Chamran Highway, P.O. Box: 19395-4763, Tehran, Iran.
| |
Collapse
|
33
|
Varanka-Ruuska T, Rautio N, Lehtiniemi H, Miettunen J, Keinänen-Kiukaanniemi S, Sebert S, Ala-Mursula L. The association of unemployment with glucose metabolism: a systematic review and meta-analysis. Int J Public Health 2017; 63:435-446. [PMID: 29170882 DOI: 10.1007/s00038-017-1040-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Revised: 09/12/2017] [Accepted: 09/16/2017] [Indexed: 10/18/2022] Open
Abstract
OBJECTIVES Unemployment has been linked with poor health. We hypothesized that being unemployed is associated with disorders of glucose metabolism and performed a systematic review and meta-analysis of the literature to ascertain the relationship. METHODS We searched the databases of Scopus, Medline Ovid and Web of Science for population-based original studies for past 20 years. Random effects meta-analyses were used to estimate odds ratios (OR) with 95% confidence intervals (CI) for prediabetes and type 2 diabetes among the unemployed as compared to those employed, separately for men and women when possible. RESULTS Out of 981 articles found, 12 articles were included in the systematic review and eight articles in the meta-analyses. Unemployment was associated with 1.6-fold odds for prediabetes (OR 1.58; 95% CI 1.07-2.35), and 1.7-fold odds for type 2 diabetes (OR 1.72; 95% CI 1.14-2.58) in the total sample. The corresponding associations for type 2 diabetes were also found stratified for men (OR 1.53; 95% CI 1.47-1.60) and women (OR 1.60; 95% CI 1.33-1.92). CONCLUSIONS Unemployment is associated with prediabetes and type 2 diabetes, global concerns of public health with potential for prevention.
Collapse
Affiliation(s)
- Tuulia Varanka-Ruuska
- Center for Life Course Health Research, University of Oulu, P.O. Box 5000, 90014, Oulu, Finland. .,Kallio Primary Health Care Unit, Kirkkotie 4, 84100, Ylivieska, Finland.
| | - Nina Rautio
- Center for Life Course Health Research, University of Oulu, P.O. Box 5000, 90014, Oulu, Finland.,Unit of Primary Health Care, Oulu University Hospital, P.O. Box 20, 90029 OYS, Oulu, Finland
| | - Heli Lehtiniemi
- Center for Life Course Health Research, University of Oulu, P.O. Box 5000, 90014, Oulu, Finland
| | - Jouko Miettunen
- Center for Life Course Health Research, University of Oulu, P.O. Box 5000, 90014, Oulu, Finland.,Medical Research Center Oulu, Oulu University Hospital and University of Oulu, P.O. Box 5000, 90014, Oulu, Finland
| | - Sirkka Keinänen-Kiukaanniemi
- Center for Life Course Health Research, University of Oulu, P.O. Box 5000, 90014, Oulu, Finland.,Unit of Primary Health Care, Oulu University Hospital, P.O. Box 20, 90029 OYS, Oulu, Finland.,Medical Research Center Oulu, Oulu University Hospital and University of Oulu, P.O. Box 5000, 90014, Oulu, Finland.,Health Center of Oulu, P.O. Box 27, 90015, Oulu, Finland
| | - Sylvain Sebert
- Center for Life Course Health Research, University of Oulu, P.O. Box 5000, 90014, Oulu, Finland.,Biocenter Oulu, University of Oulu, P.O. Box 5000, 90014, Oulu, Finland.,Department of Genomics of Complex Diseases, Imperial College London, South Kensington Campus, London, SW7, UK
| | - Leena Ala-Mursula
- Center for Life Course Health Research, University of Oulu, P.O. Box 5000, 90014, Oulu, Finland
| |
Collapse
|
34
|
Evaluating of associated risk factors of metabolic syndrome by using decision tree. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s00580-017-2580-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
35
|
Ramezankhani A, Bagherzadeh-Khiabani F, Khalili D, Azizi F, Hadaegh F. A new look at risk patterns related to coronary heart disease incidence using survival tree analysis: 12 Years Longitudinal Study. Sci Rep 2017; 7:3237. [PMID: 28607472 PMCID: PMC5468345 DOI: 10.1038/s41598-017-03577-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 04/21/2017] [Indexed: 12/25/2022] Open
Abstract
We identified risk patterns associated with incident coronary heart disease (CHD) using survival tree, and compared performance of survival tree versus Cox proportional hazards (Cox PH) in a cohort of Iranian adults. Data on 8,279 participants (3,741 men) aged ≥30 yr were used to analysis. Survival trees identified seven subgroups with different risk patterns using four [(age, non-HDL-C, fasting plasma glucose (FPG) and family history of diabetes] and five [(age, systolic blood pressure (SBP), non-HDL-C, FPG and family history of CVD] predictors in women and men, respectively. Additional risk factors were identified by Cox models which included: family history of CVD and waist circumference (in both genders); hip circumference, former smoking and using aspirin among men; diastolic blood pressure and lipid lowering drug among women. Survival trees and multivariate Cox models yielded comparable performance, as measured by integrated Brier score (IBS) and Harrell’s C-index on validation datasets; however, survival trees produced more parsimonious models with a minimum number of well recognized risk factors of CHD incidence, and identified important interactions between these factors which have important implications for intervention programs and improve clinical decision making.
Collapse
Affiliation(s)
- Azra Ramezankhani
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farideh Bagherzadeh-Khiabani
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Davood Khalili
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran.,Department of Epidemiology, School of Public Health, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzad Hadaegh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
36
|
Olivera AR, Roesler V, Iochpe C, Schmidt MI, Vigo Á, Barreto SM, Duncan BB. Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study. SAO PAULO MED J 2017; 135:234-246. [PMID: 28746659 PMCID: PMC10019841 DOI: 10.1590/1516-3180.2016.0309010217] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 02/01/2017] [Indexed: 01/23/2023] Open
Abstract
CONTEXT AND OBJECTIVE: Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: The best models were created using artificial neural networks and logistic regression. -These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.
Collapse
Affiliation(s)
- André Rodrigues Olivera
- MSc. IT Analyst, Postgraduate Computing Program, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil.
| | - Valter Roesler
- PhD. Professor, Postgraduate Computing Program, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil.
| | - Cirano Iochpe
- PhD. Professor, Postgraduate Computing Program, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil.
| | - Maria Inês Schmidt
- PhD. Professor, Postgraduate Epidemiology Program and Hospital de Clínicas, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil.
| | - Álvaro Vigo
- PhD. Professor, Postgraduate Epidemiology Program, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil.
| | - Sandhi Maria Barreto
- PhD. Professor, Department of Social and Preventive Medicine & Postgraduate Program in Public Health, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte (MG), Brazil.
| | - Bruce Bartholow Duncan
- PhD. Professor, Postgraduate Epidemiology Program and Hospital de Clínicas, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil.
| |
Collapse
|
37
|
Tayefi M, Tajfard M, Saffar S, Hanachi P, Amirabadizadeh AR, Esmaeily H, Taghipour A, Ferns GA, Moohebati M, Ghayour-Mobarhan M. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 141:105-109. [PMID: 28241960 DOI: 10.1016/j.cmpb.2017.02.001] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2016] [Revised: 01/25/2017] [Accepted: 02/02/2017] [Indexed: 06/06/2023]
Abstract
BACKGROUND AND AIMS Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. METHODS Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). RESULTS Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. CONCLUSION Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies.
Collapse
Affiliation(s)
- Maryam Tayefi
- Metabolic Syndrome Research Center, School of Medicine, Mashhad University of Medical Sciences, 99199-91766 Mashhad, Iran ; Department of New Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohammad Tajfard
- Department of Health Education and Health Promotion, School of Health, Management and Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Sara Saffar
- Neurogenic Inflammation Research Center, Department of New Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Parichehr Hanachi
- Department of Biology, Biochemistry Unit, Alzahra University, Tehran, Iran
| | - Ali Reza Amirabadizadeh
- Medical Toxicology and Drug Abuse Research Center (MTDRC), Birjand University of Medical Sciences, Birjand, Iran
| | - Habibollah Esmaeily
- Department of Biostatistics and Epidemiology, School of Health, Management and Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Ali Taghipour
- Department of Biostatistics and Epidemiology, School of Health, Management and Social Determinants of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Division of Medical Education, Brighton & Sussex Medical School, Falmer, Brighton, Sussex BN1 9PH, UK
| | - Mohsen Moohebati
- Cardiovascular Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Majid Ghayour-Mobarhan
- Metabolic Syndrome Research Center, School of Medicine, Mashhad University of Medical Sciences, 99199-91766 Mashhad, Iran ; Department of New Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Cardiovascular Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
38
|
Fei Y, Gao K, Hu J, Tu J, Li WQ, Wang W, Zong GQ. Predicting the incidence of portosplenomesenteric vein thrombosis in patients with acute pancreatitis using classification and regression tree algorithm. J Crit Care 2017; 39:124-130. [PMID: 28254727 DOI: 10.1016/j.jcrc.2017.02.019] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Revised: 02/03/2017] [Accepted: 02/05/2017] [Indexed: 02/07/2023]
Abstract
BACKGROUND AND OBJECTIVE The accurate prediction of portosplenomesenteric vein thrombosis (PVT) in patients with acute pancreatitis(AP) is very important but may also be difficult because of our insufficient understanding of the characteristics of AP-induced PVT. The purpose of this study is to design a decision tree model that provides critical factors associated with PVT using an approach that makes use of classification and regression tree (CART) algorithm. METHODS The analysis included 353 patients with AP who were admitted between January 2011 and December 2015. CART model and logistic regression model were each applied to the same 50% of the sample to develop the predictive training models, and these models were tested on the remaining 50%. Statistical indexes were used to evaluate the value of the prediction in the 2 models. RESULTS The predicted sensitivity, specificity, positive predictive value, negative predictive value, and accuracy by CART for PVT were 78.0%, 87.2%, 64.0%, 93.2%, and 85.2%, respectively. Significant differences could be found between the CART model and the logistic regression model in these parameters. There were significant differences between the CART and logistic regression models in these parameters (P<.05). When the CART model was used to identify PVT, the area under receiver operating characteristic curve was 0.803, which demonstrated better overall properties than the logistic regression model (area under the curve=0.696) (95% confidence interval, 0.603-0.812). CONCLUSION The CART model based on serum amylase, d-dimer, Acute Physiology and Chronic Health Evaluation II, and prothrombin time is more likely to predict the occurrence of PVT induced by AP.
Collapse
Affiliation(s)
- Yang Fei
- Surgical Intensive Care Unit (SICU), Department of General Surgery, Jinling Hospital, Medical School of Nanjing University, No. 305 Zhongshan E Rd, Nanjing, 210002, China
| | - Kun Gao
- Surgical Intensive Care Unit (SICU), Department of General Surgery, Jinling Hospital, Medical School of Nanjing University, No. 305 Zhongshan E Rd, Nanjing, 210002, China
| | - Jian Hu
- School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Jianfeng Tu
- Surgical Intensive Care Unit (SICU), Department of General Surgery, Jinling Hospital, Medical School of Nanjing University, No. 305 Zhongshan E Rd, Nanjing, 210002, China
| | - Wei-Qin Li
- Surgical Intensive Care Unit (SICU), Department of General Surgery, Jinling Hospital, Medical School of Nanjing University, No. 305 Zhongshan E Rd, Nanjing, 210002, China.
| | - Wei Wang
- Department of General Surgery, Bayi Hospital Affiliated Nanjing University of Chinese Medicine/the 81st hospital of P.L.A., Nanjing, 210002, China
| | - Guang-Quan Zong
- Department of General Surgery, Bayi Hospital Affiliated Nanjing University of Chinese Medicine/the 81st hospital of P.L.A., Nanjing, 210002, China
| |
Collapse
|
39
|
Tayefi M, Esmaeili H, Saberi Karimian M, Amirabadi Zadeh A, Ebrahimi M, Safarian M, Nematy M, Parizadeh SMR, Ferns GA, Ghayour-Mobarhan M. The application of a decision tree to establish the parameters associated with hypertension. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2017; 139:83-91. [PMID: 28187897 DOI: 10.1016/j.cmpb.2016.10.020] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2016] [Revised: 09/13/2016] [Accepted: 10/18/2016] [Indexed: 06/06/2023]
Abstract
INTRODUCTION Hypertension is an important risk factor for cardiovascular disease (CVD). The goal of this study was to establish the factors associated with hypertension by using a decision-tree algorithm as a supervised classification method of data mining. METHODS Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve. RESULTS The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively. CONCLUSION We have developed a decision tree model to identify the risk factors associated with hypertension that maybe used to develop programs for hypertension management.
Collapse
Affiliation(s)
- Maryam Tayefi
- Biochemistry and Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Habibollah Esmaeili
- Biochemistry and Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Maryam Saberi Karimian
- Student Research Committee, Department of Modern Sciences and Technologies, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Alireza Amirabadi Zadeh
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahmoud Ebrahimi
- Cardiovascular Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohammad Safarian
- Department of Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mohsen Nematy
- Department of Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Seyed Mohammad Reza Parizadeh
- Biochemistry and Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Brighton & Sussex Medical School, Division of Medical Education, Falmer, Brighton, Sussex BN1 9PH, UK
| | - Majid Ghayour-Mobarhan
- Biochemistry and Nutrition Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran; Cardiovascular Research Center, School of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
40
|
Moon M, Lee SK. Applying of Decision Tree Analysis to Risk Factors Associated with Pressure Ulcers in Long-Term Care Facilities. Healthc Inform Res 2017; 23:43-52. [PMID: 28261530 PMCID: PMC5334131 DOI: 10.4258/hir.2017.23.1.43] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Revised: 01/24/2017] [Accepted: 01/24/2017] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. METHODS The data were extracted from the 2014 National Inpatient Sample (NIS)-data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89*). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. RESULTS The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, "injuries to the hip and thigh" was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. CONCLUSIONS These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data.
Collapse
Affiliation(s)
- Mikyung Moon
- College of Nursing, the Research Institute of Nursing Science, Kyungpook National University, Daegu, Korea
| | | |
Collapse
|
41
|
Ramezankhani A, Hadavandi E, Pournik O, Shahrabi J, Azizi F, Hadaegh F. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study. BMJ Open 2016; 6:e013336. [PMID: 27909038 PMCID: PMC5168628 DOI: 10.1136/bmjopen-2016-013336] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/07/2016] [Revised: 09/09/2016] [Accepted: 10/03/2016] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVE The current study was undertaken for use of the decision tree (DT) method for development of different prediction models for incidence of type 2 diabetes (T2D) and for exploring interactions between predictor variables in those models. DESIGN Prospective cohort study. SETTING Tehran Lipid and Glucose Study (TLGS). METHODS A total of 6647 participants (43.4% men) aged >20 years, without T2D at baselines ((1999-2001) and (2002-2005)), were followed until 2012. 2 series of models (with and without 2-hour postchallenge plasma glucose (2h-PCPG)) were developed using 3 types of DT algorithms. The performances of the models were assessed using sensitivity, specificity, area under the ROC curve (AUC), geometric mean (G-Mean) and F-Measure. PRIMARY OUTCOME MEASURE T2D was primary outcome which defined if fasting plasma glucose (FPG) was ≥7 mmol/L or if the 2h-PCPG was ≥11.1 mmol/L or if the participant was taking antidiabetic medication. RESULTS During a median follow-up of 9.5 years, 729 new cases of T2D were identified. The Quick Unbiased Efficient Statistical Tree (QUEST) algorithm had the highest sensitivity and G-Mean among all the models for men and women. The models that included 2h-PCPG had sensitivity and G-Mean of (78% and 0.75%) and (78% and 0.78%) for men and women, respectively. Both models achieved good discrimination power with AUC above 0.78. FPG, 2h-PCPG, waist-to-height ratio (WHtR) and mean arterial blood pressure (MAP) were the most important factors to incidence of T2D in both genders. Among men, those with an FPG≤4.9 mmol/L and 2h-PCPG≤7.7 mmol/L had the lowest risk, and those with an FPG>5.3 mmol/L and 2h-PCPG>4.4 mmol/L had the highest risk for T2D incidence. In women, those with an FPG≤5.2 mmol/L and WHtR≤0.55 had the lowest risk, and those with an FPG>5.2 mmol/L and WHtR>0.56 had the highest risk for T2D incidence. CONCLUSIONS Our study emphasises the utility of DT for exploring interactions between predictor variables.
Collapse
Affiliation(s)
- Azra Ramezankhani
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Esmaeil Hadavandi
- Industrial Engineering Department, Amirkabir University of Technology, Tehran, Iran
- Department of Industrial Engineering, Birjand University of Technology, Birjand, Iran
| | - Omid Pournik
- Department of Community Medicine, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Jamal Shahrabi
- Industrial Engineering Department, Amirkabir University of Technology, Tehran, Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Farzad Hadaegh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Science, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
42
|
Ramezankhani A, Pournik O, Shahrabi J, Azizi F, Hadaegh F. An application of association rule mining to extract risk pattern for type 2 diabetes using tehran lipid and glucose study database. Int J Endocrinol Metab 2015; 13:e25389. [PMID: 25926855 PMCID: PMC4393501 DOI: 10.5812/ijem.25389] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 12/17/2014] [Accepted: 12/27/2014] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Type 2 diabetes, common and serious global health concern, had an estimated worldwide prevalence of 366 million in 2011, which is expected to rise to 552 million people, by 2030, unless urgent action is taken. OBJECTIVES The aim of this study was to identify risk patterns for type 2 diabetes incidence using association rule mining (ARM). PATIENTS AND METHODS A population of 6647 individuals without diabetes, aged ≥ 20 years at inclusion, was followed for 10-12 years, to analyze risk patterns for diabetes occurrence. Study variables included demographic and anthropometric characteristics, smoking status, medical and drug history and laboratory measures. RESULTS In the case of women, the results showed that impaired fasting glucose (IFG) and impaired glucose tolerance (IGT), in combination with body mass index (BMI) ≥ 30 kg/m(2), family history of diabetes, wrist circumference > 16.5 cm and waist to height ≥ 0.5 can increase the risk for developing diabetes. For men, a combination of IGT, IFG, length of stay in the city (> 40 years), central obesity, total cholesterol to high density lipoprotein ratio ≥ 5.3, low physical activity, chronic kidney disease and wrist circumference > 18.5 cm were identified as risk patterns for diabetes occurrence. CONCLUSIONS Our study showed that ARM is a useful approach in determining which combinations of variables or predictors occur together frequently, in people who will develop diabetes. The ARM focuses on joint exposure to different combinations of risk factors, and not the predictors alone.
Collapse
Affiliation(s)
- Azra Ramezankhani
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran
| | - Omid Pournik
- Department of Community Medicine, School of Medicine, Iran University of Medical Sciences, Tehran, IR Iran
| | - Jamal Shahrabi
- Department of Industrial Engineering, Amirkabir University of Technology, Tehran, IR Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran
| | - Farzad Hadaegh
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran
- Corresponding author: Farzad Hadaegh, Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, IR Iran. Tel: +98-2122409301, Fax: +98-2122402463, E-mail:
| |
Collapse
|