1
|
Zeng J, Huai M, Ge W, Yang Z, Pan X. Development and validation of diagnosis model for inflammatory bowel diseases based on a serologic biomarker panel: A decision tree model study. Arab J Gastroenterol 2024:S1687-1979(24)00061-3. [PMID: 39069425 DOI: 10.1016/j.ajg.2024.05.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 05/31/2024] [Indexed: 07/30/2024]
Abstract
BACKGROUND AND STUDY AIMS Currently, an increasing amount of experimental data is available on newly discovered biomarkers in inflammatory bowel diseases (IBD), but the role of these biomarkers is often questionable due to their limited sensitivity. Therefore, this study aimed to build a diagnostic tool incorporating a panel of serum biomarkers into a computational algorithm to identify patients with IBD and differentiate those with Crohn's disease (CD) from those with ulcerative colitis (UC). PATIENTS AND METHODS We studied sera from 192 CD patients, 118 UC patients, 60 non-IBD controls and 60 healthy controls. Indirect immunofluorescence (IIF) assays were utilized to determine several serum biomarkers previously associated with IBD, and the decision tree algorithm was used to construct the diagnosis model. Performances of models were evaluated by prediction accuracy, precision, AUC and Matthews's correlation coefficient (MCC). The "Inflammatory Bowel Disease Multi-omics Database (IBDMDB)" cohorts were used to validate the model as external validation set. RESULTS The prediction rates were determined and compared for decision tree models after each data was developed using C5.0, C&RT, QUEST and CHAID. The C5.0 and CHAID algorithms, which ranked top for the prediction rate in the IBD vs. non-IBD model and the CD vs. UC model, respectively, were utilized for final pattern analysis. The final decision tree model achieved higher classification accuracy than the approach based on conservative marker combinations (sensitivity 75.0% vs. 79.5%, specificity 93.8% vs. 78.3% for differentiating IBD from non-IBD; and sensitivity 84.3% vs. 73.4%, specificity 92.5% vs. 54.9% for differentiating CD from UC, respectively). The model prediction consistency was 93% (28/30) in the external validation set. CONCLUSION The decision-tree-based approach used in this study, based on serum biomarkers, has shown to be a valid and useful approach to identifying IBD and differentiating CD from UC.
Collapse
Affiliation(s)
- Junxiang Zeng
- Department of Clinical Laboratory, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China; Institute of Artificial Intelligence Medicine, Shanghai Academy of Experimental Medicine, Shanghai, China
| | - Manxiu Huai
- Department of Gastroenterology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Wensong Ge
- Department of Gastroenterology, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhigang Yang
- Department of Gastroenterology Surgery, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Xiujun Pan
- Department of Clinical Laboratory, Xinhua Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China.
| |
Collapse
|
2
|
Seyedtabib M, Kamyari N. Predicting polypharmacy in half a million adults in the Iranian population: comparison of machine learning algorithms. BMC Med Inform Decis Mak 2023; 23:84. [PMID: 37147615 PMCID: PMC10161984 DOI: 10.1186/s12911-023-02177-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 04/21/2023] [Indexed: 05/07/2023] Open
Abstract
BACKGROUND Polypharmacy (PP) is increasingly common in Iran, and contributes to the substantial burden of drug-related morbidity, increasing the potential for drug interactions and potentially inappropriate medications. Machine learning algorithms (ML) can be employed as an alternative solution for the prediction of PP. Therefore, our study aimed to compare several ML algorithms to predict the PP using the health insurance claims data and choose the best-performing algorithm as a predictive tool for decision-making. METHODS This population-based cross-sectional study was performed between April 2021 and March 2022. After feature selection, information about 550 thousand patients were obtained from National Center for Health Insurance Research (NCHIR). Afterwards, several ML algorithms were trained to predict PP. Finally, to assess the models' performance, the metrics derived from the confusion matrix were calculated. RESULTS The study sample comprised 554 133 adults with a median (IQR) age of 51 years (40 - 62) that nested in 27 cities within the Khuzestan province of Iran. Most of the patients were female (62.5%), married (63.5%), and employed (83.2%) during the last year. The prevalence of PP in all populations was about 36.0%. After performing the feature selection, out of 23 features, the number of prescriptions, Insurance coverage for prescription drugs, and hypertension were found as the top three predictors. Experimental results showed that Random Forest (RF) performed better than other ML algorithms with recall, specificity, accuracy, precision and F1-score of 63.92%, 89.92%, 79.99%, 63.92% and 63.92% respectively. CONCLUSION It was found that ML provides a reasonable level of accuracy in predicting polypharmacy. Therefore, the prediction models based on ML, especially the RF algorithm, performed better than other methods for predicting PP in Iranian people in terms of the performance criteria.
Collapse
Affiliation(s)
- Maryam Seyedtabib
- Department of Biostatistics and Epidemiology, School of Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Naser Kamyari
- Department of Biostatistics and Epidemiology, School of Health, Abadan University of Medical Sciences, Abadan, Iran.
| |
Collapse
|
3
|
Luo WM, Su JY, Xu T, Fang ZZ. Prevalence of Diabetic Retinopathy and Use of Common Oral Hypoglycemic Agents Increase the Risk of Diabetic Nephropathy-A Cross-Sectional Study in Patients with Type 2 Diabetes. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:4623. [PMID: 36901633 PMCID: PMC10001907 DOI: 10.3390/ijerph20054623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2022] [Revised: 02/08/2023] [Accepted: 03/01/2023] [Indexed: 06/18/2023]
Abstract
OBJECTIVE This study investigated the effect of amino acid metabolism on the risk of diabetic nephropathy under different conditions of the diabetic retinopathy, and the use of different oral hypoglycemic agents. METHODS This study retrieved 1031 patients with type 2 diabetes from the First Affiliated Hospital of Liaoning Medical University in Jinzhou, which is located in Liaoning Province, China. We conducted a spearman correlation study between diabetic retinopathy and amino acids that have an impact on the prevalence of diabetic nephropathy. Logistic regression was used to analyze the changes of amino acid metabolism in different diabetic retinopathy conditions. Finally, the additive interaction between different drugs and diabetic retinopathy was explored. RESULTS It is showed that the protective effect of some amino acids on the risk of developing diabetic nephropathy is masked in diabetic retinopathy. Additionally, the additive effect of the combination of different drugs on the risk of diabetic nephropathy was greater than that of any one drug alone. CONCLUSIONS We found that diabetic retinopathy patients have a higher risk of developing diabetic nephropathy than the general type 2 diabetes population. Additionally, the use of oral hypoglycemic agents can also increase the risk of diabetic nephropathy.
Collapse
|
4
|
Islam MM, Rahman MJ, Menhazul Abedin M, Ahammed B, Ali M, Ahmed NF, Maniruzzaman M. Identification of the risk factors of type 2 diabetes and its prediction using machine learning techniques. Health Syst (Basingstoke) 2022; 12:243-254. [PMID: 37234468 PMCID: PMC10208154 DOI: 10.1080/20476965.2022.2141141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 10/20/2022] [Indexed: 11/07/2022] Open
Abstract
This study identified the risk factors for type 2 diabetes (T2D) and proposed a machine learning (ML) technique for predicting T2D. The risk factors for T2D were identified by multiple logistic regression (MLR) using p-value (p<0.05). Then, five ML-based techniques, including logistic regression, naïve Bayes, J48, multilayer perceptron, and random forest (RF) were employed to predict T2D. This study utilized two publicly available datasets, derived from the National Health and Nutrition Examination Survey, 2009-2010 and 2011-2012. About 4922 respondents with 387 T2D patients were included in 2009-2010 dataset, whereas 4936 respondents with 373 T2D patients were included in 2011-2012. This study identified six risk factors (age, education, marital status, SBP, smoking, and BMI) for 2009-2010 and nine risk factors (age, race, marital status, SBP, DBP, direct cholesterol, physical activity, smoking, and BMI) for 2011-2012. RF-based classifier obtained 95.9% accuracy, 95.7% sensitivity, 95.3% F-measure, and 0.946 area under the curve.
Collapse
Affiliation(s)
- Md. Merajul Islam
- Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
- Department of Statistics, Jatiya Kabi Kazi Nazrul Islam University, Mymensingh, Bangladesh
| | | | | | - Benojir Ahammed
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - Mohammad Ali
- Statistics Discipline, Khulna University, Khulna, Bangladesh
| | - N.A.M Faisal Ahmed
- Institute of Education and Research, University of Rajshahi, Rajshahi, Bangladesh
| | | |
Collapse
|
5
|
Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation. ADVANCES IN HUMAN-COMPUTER INTERACTION 2022. [DOI: 10.1155/2022/9220560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For diabetes prediction, many ML algorithms are compared and used in the proposed work, and finally the three ML classifiers providing the highest accuracy are determined: RF, GBM, and LGBM. The accuracy of prediction is obtained using two types of datasets. They are Pima Indians dataset and a curated dataset. The ML classifiers LGBM, GB, and RF are used to build a predictive model, and the accuracy of each classifier is noted and compared. In addition to the generalized prediction mechanism, the data augmentation technique is also used, and the final accuracy of prediction is obtained for the classifiers LGBM, GB, and RF. A comparative study and demonstration between augmentation and non-augmentation are also discussed for the two datasets used in order to further improve the performance accuracy for predicting diabetes disease.
Collapse
|
6
|
Blended Ensemble Learning Prediction Model for Strengthening Diagnosis and Treatment of Chronic Diabetes Disease. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4451792. [PMID: 35875742 PMCID: PMC9303104 DOI: 10.1155/2022/4451792] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2022] [Accepted: 06/24/2022] [Indexed: 11/18/2022]
Abstract
Diabetes mellitus (DM), commonly known as diabetes, is a collection of metabolic illnesses characterized by persistently high blood sugar levels. The signs of elevated blood sugar include increased hunger, frequent urination, and increased thirst. If DM is not treated properly, it may lead to several complications. Diabetes is caused by either insufficient insulin production by the pancreas or an insufficient insulin response by the body’s cells. Every year, 1.6 million individuals die from this disease. The objective of this research work is to use relevant features to construct a blended ensemble learning (EL)-based forecasting system and find the optimal classifier for comparing clinical outputs. The EL based on Bayesian networks and radial basis functions has been proposed in this article. The performances of five machine learning (ML) techniques, namely, logistic regression (LR), decision tree (DT) classifier, support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF), are compared with the proposed EL technique. Experiments reveal that the EL method performs better than the existing ML approaches in predicting diabetic illness, with the remarkable accuracy of 97.11%. The proposed ensemble learning could be useful in assisting specialists in accurately diagnosing diabetes and assisting patients in receiving the appropriate therapy.
Collapse
|
7
|
Gollapalli M, Alansari A, Alkhorasani H, Alsubaii M, Sakloua R, Alzahrani R, Taha Al-Hariri M, Nasser Alfares M, AlKhafaji D, Jaafar Al Argan R, Albaker W. A novel stacking ensemble for detecting three types of diabetes mellitus using a Saudi Arabian dataset: Pre-diabetes, T1DM, and T2DM. Comput Biol Med 2022; 147:105757. [DOI: 10.1016/j.compbiomed.2022.105757] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 05/27/2022] [Accepted: 06/18/2022] [Indexed: 11/29/2022]
|
8
|
Odukoya O, Nwaneri S, Odeniyi I, Akodu B, Oluwole E, Olorunfemi G, Popoola O, Osuntoki A. Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population. Healthc Inform Res 2022; 28:58-67. [PMID: 35172091 PMCID: PMC8850175 DOI: 10.4258/hir.2022.28.1.58] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 08/11/2021] [Indexed: 11/23/2022] Open
Abstract
Objectives This study developed and compared the performance of three widely used predictive models—logistic regression (LR), artificial neural network (ANN), and decision tree (DT)—to predict diabetes mellitus using the socio-demographic, lifestyle, and physical attributes of a population of Nigerians. Methods We developed three predictive models using 10 input variables. Data preprocessing steps included the removal of missing values and outliers, min-max normalization, and feature extraction using principal component analysis. Data training and validation were accomplished using 10-fold cross-validation. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUROC) were used as performance evaluation metrics. Analysis and model development were performed in R version 3.6.1. Results The mean age of the participants was 50.52 ± 16.14 years. The classification accuracy, sensitivity, specificity, PPV, and NPV for LR were, respectively, 81.31%, 84.32%, 77.24%, 72.75%, and 82.49%. Those for ANN were 98.64%, 98.37%, 99.00%, 98.61%, and 98.83%, and those for DT were 99.05%, 99.76%, 98.08%, 98.77%, and 99.82%, respectively. The best-performing and poorest-performing classifiers were DT and LR, with 99.05% and 81.31% accuracy, respectively. Similarly, the DT algorithm achieved the best AUC value (0.992) compared to ANN (0.976) and LR (0.892). Conclusions Our study demonstrated that DT, LR, and ANN models can be used effectively for the prediction of diabetes mellitus in the Nigerian population based on certain risk factors. An overall comparative analysis of the models showed that the DT model performed better than LR and ANN.
Collapse
Affiliation(s)
- Oluwakemi Odukoya
- Department of Community Health and Primary Care, College of Medicine, University of Lagos, Lagos State, Nigeria
| | - Solomon Nwaneri
- Department of Biomedical Engineering, College of Medicine, University of Lagos, Lagos State, Nigeria
- Department of Biomedical Engineering, Faculty of Engineering, University of Lagos, Lagos State, Nigeria
| | - Ifedayo Odeniyi
- Endocrinology Unit, Department of Internal Medicine, College of Medicine, University of Lagos, Lagos State, Nigeria
| | - Babatunde Akodu
- Department of Community Health and Primary Care, College of Medicine, University of Lagos, Lagos State, Nigeria
| | - Esther Oluwole
- Department of Community Health and Primary Care, College of Medicine, University of Lagos, Lagos State, Nigeria
| | - Gbenga Olorunfemi
- Division of Epidemiology and Biostatistics, School of Public Health, University of Witwatersrand, Johannesburg, South Africa
| | - Oluwatoyin Popoola
- Department of Biomedical Engineering, College of Medicine, University of Lagos, Lagos State, Nigeria
- Department of Biomedical Engineering, Faculty of Engineering, University of Lagos, Lagos State, Nigeria
| | - Akinniyi Osuntoki
- Department of Biochemistry, College of Medicine, University of Lagos, Lagos State, Nigeria
| |
Collapse
|
9
|
|
10
|
Chen Y, Wang Y, Xu K, Zhou J, Yu L, Wang N, Liu T, Fu C. Adiposity and Long-Term Adiposity Change Are Associated with Incident Diabetes: A Prospective Cohort Study in Southwest China. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph182111481. [PMID: 34769995 PMCID: PMC8582792 DOI: 10.3390/ijerph182111481] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 10/25/2021] [Accepted: 10/29/2021] [Indexed: 02/08/2023]
Abstract
In order to estimate the associations of different adiposity indicators and long-term adiposity changes with risk of incident type 2 diabetes (T2DM), we conducted a 10-year prospective cohort study of 7441 adults in Guizhou, China, from 2010 to 2020. Adiposity was measured at baseline and follow-up. Cox proportional hazard models were used to estimated hazard ratios (HRs) and 95% confidence intervals (95% CIs). A total of 764 new diabetes cases were identified over an average follow-up of 7.06 years. Adiposity indicators, body mass index (BMI), waist circumference (WC), waist-height ratio (WHtR), and long-term adiposity changes (both weight change and WC change) were significantly associated with an increased risk of T2DM (adjusted HRs: 1.16–1.48). Significant non-linear relationships were found between weight/WC change and incident T2DM. Compared with subjects with stable WC from baseline to follow-up visit, the subjects with WC gain ≥9 cm had a 1.61-fold greater risk of T2DM; those with WC loss had a 30% lower risk. Furthermore, the associations were stronger among participants aged 40 years or older, women, and Han Chinese. Preventing weight or WC gain and promoting maintenance of normal body weight or WC are important approaches for diabetes prevention, especially for the elderly, women, and Han Chinese.
Collapse
Affiliation(s)
- Yun Chen
- School of Public Health, Key Laboratory of Public Health Safety, NHC Key Laboratory of Health Technology Assessment, Fudan University, Shanghai 200032, China; (Y.C.); (K.X.); (N.W.)
| | - Yiying Wang
- Guizhou Center for Disease Control and Prevention, Guiyang 550004, China; (Y.W.); (J.Z.); (L.Y.)
| | - Kelin Xu
- School of Public Health, Key Laboratory of Public Health Safety, NHC Key Laboratory of Health Technology Assessment, Fudan University, Shanghai 200032, China; (Y.C.); (K.X.); (N.W.)
| | - Jie Zhou
- Guizhou Center for Disease Control and Prevention, Guiyang 550004, China; (Y.W.); (J.Z.); (L.Y.)
| | - Lisha Yu
- Guizhou Center for Disease Control and Prevention, Guiyang 550004, China; (Y.W.); (J.Z.); (L.Y.)
| | - Na Wang
- School of Public Health, Key Laboratory of Public Health Safety, NHC Key Laboratory of Health Technology Assessment, Fudan University, Shanghai 200032, China; (Y.C.); (K.X.); (N.W.)
| | - Tao Liu
- Guizhou Center for Disease Control and Prevention, Guiyang 550004, China; (Y.W.); (J.Z.); (L.Y.)
- Correspondence: (T.L.); (C.F.); Tel.: +86-21-3356-3933 (C.F.)
| | - Chaowei Fu
- School of Public Health, Key Laboratory of Public Health Safety, NHC Key Laboratory of Health Technology Assessment, Fudan University, Shanghai 200032, China; (Y.C.); (K.X.); (N.W.)
- Correspondence: (T.L.); (C.F.); Tel.: +86-21-3356-3933 (C.F.)
| |
Collapse
|
11
|
Stiglic G, Wang F, Sheikh A, Cilar L. Development and validation of the type 2 diabetes mellitus 10-year risk score prediction models from survey data. Prim Care Diabetes 2021; 15:699-705. [PMID: 33896755 DOI: 10.1016/j.pcd.2021.04.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Accepted: 04/13/2021] [Indexed: 12/23/2022]
Abstract
AIMS In this paper, we demonstrate the development and validation of the 10-years type 2 diabetes mellitus (T2DM) risk prediction models based on large survey data. METHODS The Survey of Health, Ageing and Retirement in Europe (SHARE) data collected in 12 European countries using 53 variables representing behavioural as well as physical and mental health characteristics of the participants aged 50 or older was used to build and validate prediction models. To account for strongly unbalanced outcome variables, each instance was assigned a weight according to the inverse proportion of the outcome label when the regularized logistic regression model was built. RESULTS A pooled sample of 16,363 individuals was used to build and validate a global regularized logistic regression model that achieved an area under the receiver operating characteristic curve of 0.702 (95% CI: 0.698-0.706). Additionally, we measured performance of local country-specific models where AUROC ranged from 0.578 (0.565-0.592) to 0.768 (0.749-0.787). CONCLUSIONS We have developed and validated a survey-based 10-year T2DM risk prediction model for use across 12 European countries. Our results demonstrate the importance of re-calibration of the models as well as strengths of pooling the data from multiple countries to reduce the variance and consequently increase the precision of the results.
Collapse
Affiliation(s)
- Gregor Stiglic
- University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor, Slovenia; University of Maribor, Faculty of Electrical Engineering and Computer Science, Koroska cesta 46, 2000 Maribor, Slovenia; Usher Institute, University of Edinburgh, Old Medical School, Teviot Place, Edinburgh EH8 9AG, UK.
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, 425 East 61 Street, New York, NY 10065
| | - Aziz Sheikh
- Usher Institute, University of Edinburgh, Old Medical School, Teviot Place, Edinburgh EH8 9AG, UK
| | - Leona Cilar
- University of Maribor, Faculty of Health Sciences, Zitna ulica 15, 2000 Maribor, Slovenia
| |
Collapse
|
12
|
Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18147346. [PMID: 34299797 PMCID: PMC8306487 DOI: 10.3390/ijerph18147346] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 07/02/2021] [Accepted: 07/05/2021] [Indexed: 12/27/2022]
Abstract
Diabetes mellitus is one of the most common human diseases worldwide and may cause several health-related complications. It is responsible for considerable morbidity, mortality, and economic loss. A timely diagnosis and prediction of this disease could provide patients with an opportunity to take the appropriate preventive and treatment strategies. To improve the understanding of risk factors, we predict type 2 diabetes for Pima Indian women utilizing a logistic regression model and decision tree—a machine learning algorithm. Our analysis finds five main predictors of type 2 diabetes: glucose, pregnancy, body mass index (BMI), diabetes pedigree function, and age. We further explore a classification tree to complement and validate our analysis. The six-fold classification tree indicates glucose, BMI, and age are important factors, while the ten-node tree implies glucose, BMI, pregnancy, diabetes pedigree function, and age as the significant predictors. Our preferred specification yields a prediction accuracy of 78.26% and a cross-validation error rate of 21.74%. We argue that our model can be applied to make a reasonable prediction of type 2 diabetes, and could potentially be used to complement existing preventive measures to curb the incidence of diabetes and reduce associated costs.
Collapse
|
13
|
Artificial Intelligence for Medical Diagnosis. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_29-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
14
|
Liberda EN, Zuk AM, Martin ID, Tsuji LJS. Fisher's Linear Discriminant Function Analysis and its Potential Utility as a Tool for the Assessment of Health-and-Wellness Programs in Indigenous Communities. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17217894. [PMID: 33126498 PMCID: PMC7663610 DOI: 10.3390/ijerph17217894] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/22/2020] [Accepted: 10/25/2020] [Indexed: 11/16/2022]
Abstract
Diabetes mellitus is a growing public health problem affecting persons in both developed and developing nations. The prevalence of type 2 diabetes mellitus (T2DM) is reported to be several times higher among Indigenous populations compared to their non-Indigenous counterparts. Discriminant function analysis (DFA) is a potential tool that can be used to quantitatively evaluate the effectiveness of Indigenous health-and-wellness programs (e.g., on-the-land programs, T2DM interventions), by creating a type of pre-and-post-program scoring system. As the communities of the Eeyou Istchee territory, subarctic Quebec, Canada, have varying degrees of isolation, we derived a DFA tool for point-of-contact evaluations to aid in monitoring and assessment of health-and-wellness programs in rural and remote locations. We developed several DFA models to discriminate between those with and without T2DM status using age, fasting blood glucose, body mass index, waist girth, systolic and diastolic blood pressure, high-density lipoprotein, triglycerides, and total cholesterol in participants from the Eeyou Istchee. The models showed a ~97% specificity (i.e., true positives for non-T2DM) in classification. This study highlights how varying risk factor models can be used to discriminate those without T2DM with high specificity among James Bay Cree communities in Canada.
Collapse
Affiliation(s)
- Eric N. Liberda
- School of Occupational and Public Health, Ryerson University, Toronto, ON M5B 2K3, Canada
- Correspondence: ; Tel.: +1-416-979-5000
| | - Aleksandra M. Zuk
- Department of Physical and Environmental Sciences, University of Toronto, Toronto, ON M1C 1A4, Canada; (A.M.Z.); (I.D.M.); (L.J.S.T.)
- School of Nursing, Faculty of Health Sciences, Queen’s University, Kingston, ON K7L 3N6, Canada
| | - Ian D. Martin
- Department of Physical and Environmental Sciences, University of Toronto, Toronto, ON M1C 1A4, Canada; (A.M.Z.); (I.D.M.); (L.J.S.T.)
| | - Leonard J. S. Tsuji
- Department of Physical and Environmental Sciences, University of Toronto, Toronto, ON M1C 1A4, Canada; (A.M.Z.); (I.D.M.); (L.J.S.T.)
| |
Collapse
|
15
|
Hou R, Wu J, Xu L, Zou Q, Wu YJ. Computational Prediction of Protein Arginine Methylation Based on Composition-Transition-Distribution Features. ACS OMEGA 2020; 5:27470-27479. [PMID: 33134710 PMCID: PMC7594152 DOI: 10.1021/acsomega.0c03972] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 10/06/2020] [Indexed: 06/11/2023]
Abstract
Arginine methylation is one of the most essential protein post-translational modifications. Identifying the site of arginine methylation is a critical problem in biology research. Unfortunately, biological experiments such as mass spectrometry are expensive and time-consuming. Hence, predicting arginine methylation by machine learning is an alternative fast and efficient way. In this paper, we focus on the systematic characterization of arginine methylation with composition-transition-distribution (CTD) features. The presented framework consists of three stages. In the first stage, we extract CTD features from 1750 samples and exploit decision tree to generate accurate prediction. The accuracy of prediction can reach 96%. In the second stage, the support vector machine can predict the number of arginine methylation sites with 0.36 R-squared. In the third stage, experiments carried out with the updated arginine methylation site data set show that utilizing CTD features and adopting random forest as the classifier outperform previous methods. The accuracy of identification can reach 82.1 and 82.5% in single methylarginine and double methylarginine data sets, respectively. The discovery presented in this paper can be helpful for future research on arginine methylation.
Collapse
Affiliation(s)
- Ruiyan Hou
- Laboratory
of Molecular Toxicology, State Key Laboratory of Integrated Management
of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
- College
of Life Science, University of Chinese Academy
of Sciences, Beijing 100049, China
| | - Jin Wu
- School
of Management, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School
of Electronic and Engineering, Shenzhen
Polytechnic, Shenzhen 518055, China
| | - Quan Zou
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
| | - Yi-Jun Wu
- Laboratory
of Molecular Toxicology, State Key Laboratory of Integrated Management
of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
16
|
Zhang L, Shang X, Sreedharan S, Yan X, Liu J, Keel S, Wu J, Peng W, He M. Predicting the Development of Type 2 Diabetes in a Large Australian Cohort Using Machine-Learning Techniques: Longitudinal Survey Study. JMIR Med Inform 2020; 8:e16850. [PMID: 32720912 PMCID: PMC7420582 DOI: 10.2196/16850] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Revised: 02/20/2020] [Accepted: 02/26/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Previous conventional models for the prediction of diabetes could be updated by incorporating the increasing amount of health data available and new risk prediction methodology. OBJECTIVE We aimed to develop a substantially improved diabetes risk prediction model using sophisticated machine-learning algorithms based on a large retrospective population cohort of over 230,000 people who were enrolled in the study during 2006-2017. METHODS We collected demographic, medical, behavioral, and incidence data for type 2 diabetes mellitus (T2DM) in over 236,684 diabetes-free participants recruited from the 45 and Up Study. We predicted and compared the risk of diabetes onset in these participants at 3, 5, 7, and 10 years based on three machine-learning approaches and the conventional regression model. RESULTS Overall, 6.05% (14,313/236,684) of the participants developed T2DM during an average 8.8-year follow-up period. The 10-year diabetes incidence in men was 8.30% (8.08%-8.49%), which was significantly higher (odds ratio 1.37, 95% CI 1.32-1.41) than that in women at 6.20% (6.00%-6.40%). The incidence of T2DM was doubled in individuals with obesity (men: 17.78% [17.05%-18.43%]; women: 14.59% [13.99%-15.17%]) compared with that of nonobese individuals. The gradient boosting machine model showed the best performance among the four models (area under the curve of 79% in 3-year prediction and 75% in 10-year prediction). All machine-learning models predicted BMI as the most significant factor contributing to diabetes onset, which explained 12%-50% of the variance in the prediction of diabetes. The model predicted that if BMI in obese and overweight participants could be hypothetically reduced to a healthy range, the 10-year probability of diabetes onset would be significantly reduced from 8.3% to 2.8% (P<.001). CONCLUSIONS A one-time self-reported survey can accurately predict the risk of diabetes using a machine-learning approach. Achieving a healthy BMI can significantly reduce the risk of developing T2DM.
Collapse
Affiliation(s)
- Lei Zhang
- China-Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi, China
| | - Xianwen Shang
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Subhashaan Sreedharan
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Xixi Yan
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Jianbin Liu
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Stuart Keel
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Jinrong Wu
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| | - Wei Peng
- Research Centre for Data Analytics and Cognition, La Trobe University, Melbourne, Australia
| | - Mingguang He
- Centre for Eye Research Australia; Ophthalmology, Department of Surgery, The University of Melbourne, Melbourne, Australia
| |
Collapse
|
17
|
Rghioui A, Lloret J, Oumnad A. Big Data Classification and Internet of Things in Healthcare. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2020. [DOI: 10.4018/ijehmc.2020040102] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Every single day, a massive amount of data is generated by different medical data sources. Processing this wealth of data is indeed a daunting task, and it forces us to adopt smart and scalable computational strategies, including machine intelligence, big data analytics, and data classification. The authors can use the Big Data analysis for effective decision making in healthcare domain using the existing machine learning algorithms with some modification to it. The fundamental purpose of this article is to summarize the role of Big Data analysis in healthcare, and to provide a comprehensive analysis of the various techniques involved in mining big data. This article provides an overview of Big Data, applicability of it in healthcare, some of the work in progress and a future works. Therefore, in this article, the use of machine learning techniques is proposed for real-time diabetic patient data analysis from IoT devices and gateways.
Collapse
Affiliation(s)
- Amine Rghioui
- Research Team in Smart Communications-ERSC–Research Centre E3S, EMI, Mohamed V University, Rabat, Morocco
| | - Jaime Lloret
- Integrated Management Coastal Research Institute, Universitat Politecnica de Valencia, 46370 Valencia, Spain
| | - Abedlmajid Oumnad
- Research Team in Smart Communications-ERSC–Research Centre E3S, EMI, Mohamed V University, Rabat, Morocco
| |
Collapse
|
18
|
Sun W, Wang L, Zhang Q, Dong Q. Microbial Biomarkers for Colorectal Cancer Identified with Random Forest Model. EXPLORATORY RESEARCH AND HYPOTHESIS IN MEDICINE 2020; 000:1-000. [DOI: 10.14218/erhm.2019.00026] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
19
|
Abstract
PURPOSE OF REVIEW Machine learning (ML) is increasingly being studied for the screening, diagnosis, and management of diabetes and its complications. Although various models of ML have been developed, most have not led to practical solutions for real-world problems. There has been a disconnect between ML developers, regulatory bodies, health services researchers, clinicians, and patients in their efforts. Our aim is to review the current status of ML in various aspects of diabetes care and identify key challenges that must be overcome to leverage ML to its full potential. RECENT FINDINGS ML has led to impressive progress in development of automated insulin delivery systems and diabetic retinopathy screening tools. Compared with these, use of ML in other aspects of diabetes is still at an early stage. The Food & Drug Administration (FDA) is adopting some innovative models to help bring technologies to the market in an expeditious and safe manner. ML has great potential in managing diabetes and the future is in furthering the partnership of regulatory bodies with health service researchers, clinicians, developers, and patients to improve the outcomes of populations and individual patients with diabetes.
Collapse
Affiliation(s)
- David T Broome
- Department of Endocrinology, Diabetes & Metabolism, Cleveland Clinic Foundation, F-20 9500 Euclid Avenue, Cleveland, OH, 44195, USA
| | - C Beau Hilton
- Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, 9500 Euclid Ave, Cleveland, OH, 44195, USA
| | - Neil Mehta
- Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, EC-40 9500 Euclid Ave, Cleveland, OH, 44195, USA.
| |
Collapse
|
20
|
Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, Tran CT, Simpson CR. Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 182:105055. [PMID: 31505379 DOI: 10.1016/j.cmpb.2019.105055] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 08/17/2019] [Accepted: 08/27/2019] [Indexed: 06/10/2023]
Abstract
OBJECTIVE Diabetes is responsible for considerable morbidity, healthcare utilisation and mortality in both developed and developing countries. Currently, methods of treating diabetes are inadequate and costly so prevention becomes an important step in reducing the burden of diabetes and its complications. Electronic health records (EHRs) for each individual or a population have become important tools in understanding developing trends of diseases. Using EHRs to predict the onset of diabetes could improve the quality and efficiency of medical care. In this paper, we apply a wide and deep learning model that combines the strength of a generalised linear model with various features and a deep feed-forward neural network to improve the prediction of the onset of type 2 diabetes mellitus (T2DM). MATERIALS AND METHODS The proposed method was implemented by training various models into a logistic loss function using a stochastic gradient descent. We applied this model using public hospital record data provided by the Practice Fusion EHRs for the United States population. The dataset consists of de-identified electronic health records for 9948 patients, of which 1904 have been diagnosed with T2DM. Prediction of diabetes in 2012 was based on data obtained from previous years (2009-2011). The imbalance class of the model was handled by Synthetic Minority Oversampling Technique (SMOTE) for each cross-validation training fold to analyse the performance when synthetic examples for the minority class are created. We used SMOTE of 150 and 300 percent, in which 300 percent means that three new synthetic instances are created for each minority class instance. This results in the approximated diabetes:non-diabetes distributions in the training set of 1:2 and 1:1, respectively. RESULTS Our final ensemble model not using SMOTE obtained an accuracy of 84.28%, area under the receiver operating characteristic curve (AUC) of 84.13%, sensitivity of 31.17% and specificity of 96.85%. Using SMOTE of 150 and 300 percent did not improve AUC (83.33% and 82.12%, respectively) but increased sensitivity (49.40% and 71.57%, respectively) with a moderate decrease in specificity (90.16% and 76.59%, respectively). DISCUSSION AND CONCLUSIONS Our algorithm has further optimised the prediction of diabetes onset using a novel state-of-the-art machine learning algorithm: the wide and deep learning neural network architecture.
Collapse
Affiliation(s)
- Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand.
| | - Hung N Pham
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam
| | - Hop Tran
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Nhung Nghiem
- Department of Public Health, University of Otago, 23A Mein Street, Wellington 6021, New Zealand
| | - Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam
| | - Trang T T Do
- Institute for Infocomm Research, Agency for Science, Technology and Research, 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Cao Truong Tran
- Faculty of Information Technology, Le Quy Don Technical University, 236 Hoang Quoc Viet Street, Hanoi 100000, Vietnam
| | - Colin R Simpson
- Faculty of Health, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand; Usher Institute, The University of Edinburgh, Edinburgh, EH89AG, United Kingdom
| |
Collapse
|
21
|
Lai H, Huang H, Keshavjee K, Guergachi A, Gao X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr Disord 2019; 19:101. [PMID: 31615566 PMCID: PMC6794897 DOI: 10.1186/s12902-019-0436-6] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Accepted: 09/30/2019] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Diabetes Mellitus is an increasingly prevalent chronic disease characterized by the body's inability to metabolize glucose. The objective of this study was to build an effective predictive model with high sensitivity and selectivity to better identify Canadian patients at risk of having Diabetes Mellitus based on patient demographic data and the laboratory results during their visits to medical facilities. METHODS Using the most recent records of 13,309 Canadian patients aged between 18 and 90 years, along with their laboratory information (age, sex, fasting blood glucose, body mass index, high-density lipoprotein, triglycerides, blood pressure, and low-density lipoprotein), we built predictive models using Logistic Regression and Gradient Boosting Machine (GBM) techniques. The area under the receiver operating characteristic curve (AROC) was used to evaluate the discriminatory capability of these models. We used the adjusted threshold method and the class weight method to improve sensitivity - the proportion of Diabetes Mellitus patients correctly predicted by the model. We also compared these models to other learning machine techniques such as Decision Tree and Random Forest. RESULTS The AROC for the proposed GBM model is 84.7% with a sensitivity of 71.6% and the AROC for the proposed Logistic Regression model is 84.0% with a sensitivity of 73.4%. The GBM and Logistic Regression models perform better than the Random Forest and Decision Tree models. CONCLUSIONS The ability of our model to predict patients with Diabetes using some commonly used lab results is high with satisfactory sensitivity. These models can be built into an online computer program to help physicians in predicting patients with future occurrence of diabetes and providing necessary preventive interventions. The model is developed and validated on the Canadian population which is more specific and powerful to apply on Canadian patients than existing models developed from US or other populations. Fasting blood glucose, body mass index, high-density lipoprotein, and triglycerides were the most important predictors in these models.
Collapse
Affiliation(s)
- Hang Lai
- Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
| | - Huaxiong Huang
- Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
| | - Karim Keshavjee
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, 155 College Street, Suite 425, Toronto, Ontario M5T 3M6 Canada
| | - Aziz Guergachi
- Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
- Ted Rogers School of Management - Information Technology Management, Ryerson University, 350 Victoria Street, Toronto, Ontario M5B 2K3 Canada
| | - Xin Gao
- Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3 Canada
- The Fields Institute for Research in Mathematical Sciences, Center for Quantitative Analysis and Modelling (CQAM) Lab, 222 College Street, Toronto, Ontario M5T 3J1 Canada
| |
Collapse
|
22
|
Xiong XL, Zhang RX, Bi Y, Zhou WH, Yu Y, Zhu DL. Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults. Curr Med Sci 2019; 39:582-588. [PMID: 31346994 DOI: 10.1007/s11596-019-2077-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 06/10/2019] [Indexed: 02/08/2023]
Abstract
Type 2 diabetes mellitus (T2DM) has become a prevalent health problem in China, especially in urban areas. Early prevention strategies are needed to reduce the associated mortality and morbidity. We applied the combination of rules and different machine learning techniques to assess the risk of development of T2DM in an urban Chinese adult population. A retrospective analysis was performed on 8000 people with non-diabetes and 3845 people with T2DM in Nanjing. Multilayer Perceptron (MLP), AdaBoost (AD), Trees Random Forest (TRF), Support Vector Machine (SVM), and Gradient Tree Boosting (GTB) machine learning techniques with 10 cross validation methods were used with the proposed model for the prediction of the risk of development of T2DM. The performance of these models was evaluated with accuracy, precision, sensitivity, specificity, and area under receiver operating characteristic (ROC) curve (AUC). After comparison, the prediction accuracy of the different five machine models was 0.87, 0.86, 0.86, 0.86 and 0.86 respectively. The combination model using the same voting weight of each component was built on T2DM, which was performed better than individual models. The findings indicate that, combining machine learning models could provide an accurate assessment model for T2DM risk prediction.
Collapse
Affiliation(s)
- Xiao-Lu Xiong
- Department of Endocrinology, Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, 210008, China
| | - Rong-Xin Zhang
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China
| | - Yan Bi
- Department of Endocrinology, Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, 210008, China
| | - Wei-Hong Zhou
- Department of Endocrinology, Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, 210008, China.
| | - Yun Yu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, 211166, China.
| | - Da-Long Zhu
- Department of Endocrinology, Nanjing Drum Tower Hospital Clinical College of Nanjing Medical University, Nanjing, 210008, China.
| |
Collapse
|
23
|
Pei D, Gong Y, Kang H, Zhang C, Guo Q. Accurate and rapid screening model for potential diabetes mellitus. BMC Med Inform Decis Mak 2019; 19:41. [PMID: 30866905 PMCID: PMC6416888 DOI: 10.1186/s12911-019-0790-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 03/03/2019] [Indexed: 11/26/2022] Open
Abstract
Background Prediction or early diagnosis of diabetes is crucial for populations with high risk of diabetes. Methods In this study, we assessed the ability of five popular classifiers (J48, AdaboostM1, SMO, Bayes Net, and Naïve Bayes) to identify individuals with diabetes based on nine non-invasive and easily obtained clinical features, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work stress, and salty food preference. A total of 4205 data entries were obtained from annual physical examination reports for adults in the Shengjing Hospital of China Medical University during January–April 2017. Weka data mining software was used to identify the best algorithm for diabetes classification. Results The results indicate that decision tree classifier J48 has the best performance (accuracy = 0.9503, precision = 0.950, recall = 0.950, F-measure = 0.948, and AUC = 0.964). The decision tree structure shows that age is the most significant feature, followed by family history of diabetes, work stress, BMI, salty food preference, physical activity, hypertension, gender, and history of cardiovascular disease or stroke. Conclusions Our study shows that decision tree analyses can be applied to screen individuals for early diabetes risk without the need for invasive tests. This procedure will be particularly useful in developing regions with high epidemiological risk and poor socioeconomic status, and enable clinical practitioners to rapidly screen patients for increased risk of diabetes. The key features in the tree structure could further facilitate diabetes prevention through targeted community interventions, which can potentially improve early diabetes diagnosis and reduce burdens on the healthcare system.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Yang Gong
- University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hong Kang
- University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Chengpu Zhang
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Qiyong Guo
- Department of radiology, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China.
| |
Collapse
|
24
|
Khandan M, Tirgari B, Abazari F, Cheraghi MA. Mothers' Experiences of Maze Path of Type 1 Diabetes Diagnosis in Children. Ethiop J Health Sci 2019; 28:635-644. [PMID: 30607079 PMCID: PMC6308784 DOI: 10.4314/ejhs.v28i5.15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background Incidence of diabetes Type 1 in children with non-classic symptoms is one of the reasons for the delay in their follow-up. Failure in its diagnosis by the health professional exposes the mothers to many challenges. This study was conducted to explore mothers' experiences in the diagnosis pathway of diabetes Type 1 in children. Methods Semi-structured qualitative interviews were conducted with fifteen mothers of children with Type 1 diabetes. they were selected by the purposefull sampling method. Their child had a medical file in diabetes centers in Kerman, Iran, at least one year has passed of diabetes diagnosis in their child and the maximum age of the child is 14 years. Data were analyzed using content analysis. Three themes and nine sub-themes emerged during data analysis. Results The extracted themes included “presence in the maze path to the child's disease”, “facing the reality of the child's disease”, and “to grin and bear with new conditions”. Conclusions According to the finding, these mothers experienced various challenges. Therefore, identification of these challenges by health professionals to prevent and decrease of Them, is necessary.
Collapse
Affiliation(s)
- Maryam Khandan
- Nursing Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Batool Tirgari
- Nursing Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Farokh Abazari
- Nursing Research Center, Kerman University of Medical Sciences, Kerman, Iran
| | - Mohammad Ali Cheraghi
- School of Nursing and Midwifery, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
25
|
Pei D, Zhang C, Quan Y, Guo Q. Identification of Potential Type II Diabetes in a Chinese Population with a Sensitive Decision Tree Approach. J Diabetes Res 2019; 2019:4248218. [PMID: 30805372 PMCID: PMC6362481 DOI: 10.1155/2019/4248218] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 11/20/2018] [Accepted: 12/18/2018] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Diabetes mellitus is a chronic disease with a steadfast increase in prevalence. Due to the chronic course of the disease combining with devastating complications, this disorder could easily carry a financial burden. The early diagnosis of diabetes remains as one of the major challenges medical providers are facing, and the satisfactory screening tools or methods are still required, especially a population- or community-based tool. METHODS This is a retrospective cross-sectional study involving 15,323 subjects who underwent the annual check-up in the Department of Family Medicine of Shengjing Hospital of China Medical University from January 2017 to June 2017. With a strict data filtration, 10,436 records from the eligible participants were utilized to develop a prediction model using the J48 decision tree algorithm. Nine variables, including age, gender, body mass index (BMI), hypertension, history of cardiovascular disease or stroke, family history of diabetes, physical activity, work-related stress, and salty food preference, were considered. RESULTS The accuracy, precision, recall, and area under the receiver operating characteristic curve (AUC) value for identifying potential diabetes were 94.2%, 94.0%, 94.2%, and 94.8%, respectively. The structure of the decision tree shows that age is the most significant feature. The decision tree demonstrated that among those participants with age ≤ 49, 5497 participants (97%) of the individuals were identified as nondiabetic, while age > 49, 771 participants (50%) of the individuals were identified as nondiabetic. In the subgroup where people were 34 < age ≤ 49 and BMI ≥ 25, when with positive family history of diabetes, 89 (92%) out of 97 individuals were identified as diabetic and, when without family history of diabetes, 576 (58%) of the individuals were identified as nondiabetic. Work-related stress was identified as being associated with diabetes. In individuals with 34 < age ≤ 49 and BMI ≥ 25 and without family history of diabetes, 22 (51%) of the individuals with high work-related stress were identified as nondiabetic while 349 (88%) of the individuals with low or moderate work-related stress were identified as not having diabetes. CONCLUSIONS We proposed a classifier based on a decision tree which used nine features of patients which are easily obtained and noninvasive as predictor variables to identify potential incidents of diabetes. The classifier indicates that a decision tree analysis can be successfully applied to screen diabetes, which will support clinical practitioners for rapid diabetes identification. The model provides a means to target the prevention of diabetes which could reduce the burden on the health system through effective case management.
Collapse
Affiliation(s)
- Dongmei Pei
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Chengpu Zhang
- Department of Family Medicine, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Yu Quan
- Department of Informatics, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| | - Qiyong Guo
- Department of Radiology, Shengjing Hospital, China Medical University, Shenyang, Liaoning, China
| |
Collapse
|
26
|
Maeta K, Nishiyama Y, Fujibayashi K, Gunji T, Sasabe N, Iijima K, Naito T. Prediction of Glucose Metabolism Disorder Risk Using a Machine Learning Algorithm: Pilot Study. JMIR Diabetes 2018; 3:e10212. [PMID: 30478026 PMCID: PMC6288596 DOI: 10.2196/10212] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 08/16/2018] [Accepted: 10/17/2018] [Indexed: 01/10/2023] Open
Abstract
Background A 75-g oral glucose tolerance test (OGTT) provides important information about glucose metabolism, although the test is expensive and invasive. Complete OGTT information, such as 1-hour and 2-hour postloading plasma glucose and immunoreactive insulin levels, may be useful for predicting the future risk of diabetes or glucose metabolism disorders (GMD), which includes both diabetes and prediabetes. Objective We trained several classification models for predicting the risk of developing diabetes or GMD using data from thousands of OGTTs and a machine learning technique (XGBoost). The receiver operating characteristic (ROC) curves and their area under the curve (AUC) values for the trained classification models are reported, along with the sensitivity and specificity determined by the cutoff values of the Youden index. We compared the performance of the machine learning techniques with logistic regressions (LR), which are traditionally used in medical research studies. Methods Data were collected from subjects who underwent multiple OGTTs during comprehensive check-up medical examinations conducted at a single facility in Tokyo, Japan, from May 2006 to April 2017. For each examination, a subject was diagnosed with diabetes or prediabetes according to the American Diabetes Association guidelines. Given the data, 2 studies were conducted: predicting the risk of developing diabetes (study 1) or GMD (study 2). For each study, to apply supervised machine learning methods, the required label data was prepared. If a subject was diagnosed with diabetes or GMD at least once during the period, then that subject’s data obtained in previous trials were classified into the risk group (y=1). After data processing, 13,581 and 6760 OGTTs were analyzed for study 1 and study 2, respectively. For each study, a randomly chosen subset representing 80% of the data was used for training 9 classification models and the remaining 20% was used for evaluating the models. Three classification models, A to C, used XGBoost with various input variables, some including OGTT data. The other 6 classification models, D to I, used LR for comparison. Results For study 1, the AUC values ranged from 0.78 to 0.93. For study 2, the AUC values ranged from 0.63 to 0.78. The machine learning approach using XGBoost showed better performance compared with traditional LR methods. The AUC values increased when the full OGTT variables were included. In our analysis using a particular setting of input variables, XGBoost showed that the OGTT variables were more important than fasting plasma glucose or glycated hemoglobin. Conclusions A machine learning approach, XGBoost, showed better prediction accuracy compared with LR, suggesting that advanced machine learning methods are useful for detecting the early signs of diabetes or GMD. The prediction accuracy increased when all OGTT variables were added. This indicates that complete OGTT information is important for predicting the future risk of diabetes and GMD accurately.
Collapse
Affiliation(s)
- Katsutoshi Maeta
- Faculty of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
| | - Yu Nishiyama
- Faculty of Informatics and Engineering, The University of Electro-Communications, Tokyo, Japan
| | - Kazutoshi Fujibayashi
- Department of General Medicine, School of Medicine, Juntendo University, Tokyo, Japan
| | - Toshiaki Gunji
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Noriko Sasabe
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Kimiko Iijima
- Center for Preventive Medicine, NTT Medical Center Tokyo, Tokyo, Japan
| | - Toshio Naito
- Department of General Medicine, School of Medicine, Juntendo University, Tokyo, Japan
| |
Collapse
|
27
|
Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting Diabetes Mellitus With Machine Learning Techniques. Front Genet 2018; 9:515. [PMID: 30459809 PMCID: PMC6232260 DOI: 10.3389/fgene.2018.00515] [Citation(s) in RCA: 188] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 10/12/2018] [Indexed: 12/30/2022] Open
Abstract
Diabetes mellitus is a chronic disease characterized by hyperglycemia. It may cause many complications. According to the growing morbidity in recent years, in 2040, the world’s diabetic patients will reach 642 million, which means that one of the ten adults in the future is suffering from diabetes. There is no doubt that this alarming figure needs great attention. With the rapid development of machine learning, machine learning has been applied to many aspects of medical health. In this study, we used decision tree, random forest and neural network to predict diabetes mellitus. The dataset is the hospital physical examination data in Luzhou, China. It contains 14 attributes. In this study, five-fold cross validation was used to examine the models. In order to verity the universal applicability of the methods, we chose some methods that have the better performance to conduct independent test experiments. We randomly selected 68994 healthy people and diabetic patients’ data, respectively as training set. Due to the data unbalance, we randomly extracted 5 times data. And the result is the average of these five experiments. In this study, we used principal component analysis (PCA) and minimum redundancy maximum relevance (mRMR) to reduce the dimensionality. The results showed that prediction with random forest could reach the highest accuracy (ACC = 0.8084) when all the attributes were used.
Collapse
Affiliation(s)
- Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Kaiyang Qu
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| | - Yamei Luo
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Dehui Yin
- School of Medical Information and Engineering, Southwest Medical University, Luzhou, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, China
| | - Hua Tang
- Department of Pathophysiology, School of Basic Medicine, Southwest Medical University, Luzhou, China
| |
Collapse
|
28
|
Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer. Oncotarget 2018; 8:9546-9556. [PMID: 28061434 PMCID: PMC5354752 DOI: 10.18632/oncotarget.14488] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022] Open
Abstract
Predicting colorectal cancer (CRC) based on fecal microbiota presents a promising method for non-invasive screening of CRC, but the optimization of classification models remains an unaddressed question. The purpose of this study was to systematically evaluate the effectiveness of different supervised machine-learning models in predicting CRC in two independent eastern and western populations. The structures of intestinal microflora in feces in Chinese population (N = 141) were determined by 454 FLX pyrosequencing, and different supervised classifiers were employed to predict CRC based on fecal microbiota operational taxonomic unit (OTUs). As a result, Bayes Net and Random Forest displayed higher accuracies than other algorithms in both populations, although Bayes Net was found with a lower false negative rate than that of Random Forest. Gut microbiota-based prediction was more accurate than the standard fecal occult blood test (FOBT), and the combination of both approaches further improved the prediction accuracy. Moreover, when unclassified OTUs were used as input, the BayesDMNB text algorithm achieved higher accuracy in the Chinese population (AUC=0.994). Taken together, our results suggest that Bayes Net classification model combined with unclassified OTUs may present an accurate method for predicting CRC based on the compositions of gut microbiota.
Collapse
|
29
|
Kuo CY, Yu LC, Chen HC, Chan CL. Comparison of Models for the Prediction of Medical Costs of Spinal Fusion in Taiwan Diagnosis-Related Groups by Machine Learning Algorithms. Healthc Inform Res 2018; 24:29-37. [PMID: 29503750 PMCID: PMC5820083 DOI: 10.4258/hir.2018.24.1.29] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2017] [Revised: 01/16/2018] [Accepted: 01/22/2018] [Indexed: 12/22/2022] Open
Abstract
Objectives The aims of this study were to compare the performance of machine learning methods for the prediction of the medical costs associated with spinal fusion in terms of profit or loss in Taiwan Diagnosis-Related Groups (Tw-DRGs) and to apply these methods to explore the important factors associated with the medical costs of spinal fusion. Methods A data set was obtained from a regional hospital in Taoyuan city in Taiwan, which contained data from 2010 to 2013 on patients of Tw-DRG49702 (posterior and other spinal fusion without complications or comorbidities). Naïve-Bayesian, support vector machines, logistic regression, C4.5 decision tree, and random forest methods were employed for prediction using WEKA 3.8.1. Results Five hundred thirty-two cases were categorized as belonging to the Tw-DRG49702 group. The mean medical cost was US $4,549.7, and the mean age of the patients was 62.4 years. The mean length of stay was 9.3 days. The length of stay was an important variable in terms of determining medical costs for patients undergoing spinal fusion. The random forest method had the best predictive performance in comparison to the other methods, achieving an accuracy of 84.30%, a sensitivity of 71.4%, a specificity of 92.2%, and an AUC of 0.904. Conclusions Our study demonstrated that the random forest model can be employed to predict the medical costs of Tw-DRG49702, and could inform hospital strategy in terms of increasing the financial management efficiency of this operation.
Collapse
Affiliation(s)
- Ching-Yen Kuo
- Institute of Information Management, Yuan-Ze University, Taoyuan, Taiwan.,Department of Medical Administration, Taoyuan General Hospital, Ministry of Health and Welfare, Taoyuan, Taiwan
| | - Liang-Chin Yu
- Institute of Information Management, Yuan-Ze University, Taoyuan, Taiwan
| | - Hou-Chaung Chen
- Department of Orthopedics, Taoyuan General Hospital, Ministry of Health and Welfare, Taoyuan, Taiwan
| | - Chien-Lung Chan
- Institute of Information Management, Yuan-Ze University, Taoyuan, Taiwan.,Innovation Center for Big Data and Digital Convergence, Yuan-Ze University, Taoyuan, Taiwan
| |
Collapse
|
30
|
Fang H, Lu B, Wang X, Zheng L, Sun K, Cai W. Application of data mining techniques to explore predictors of upper urinary tract damage in patients with neurogenic bladder. Braz J Med Biol Res 2017; 50:e6638. [PMID: 28832768 PMCID: PMC5561813 DOI: 10.1590/1414-431x20176638] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 06/29/2017] [Indexed: 11/30/2022] Open
Abstract
This study proposed a decision tree model to screen upper urinary tract damage (UUTD) for patients with neurogenic bladder (NGB). Thirty-four NGB patients with UUTD were recruited in the case group, while 78 without UUTD were included in the control group. A decision tree method, classification and regression tree (CART), was then applied to develop the model in which UUTD was used as a dependent variable and history of urinary tract infections, bladder management, conservative treatment, and urodynamic findings were used as independent variables. The urethra function factor was found to be the primary screening information of patients and treated as the root node of the tree; Pabd max (maximum abdominal pressure, >14 cmH2O), Pves max (maximum intravesical pressure, ≤89 cmH2O), and gender (female) were also variables associated with UUTD. The accuracy of the proposed model was 84.8%, and the area under curve was 0.901 (95%CI=0.844-0.958), suggesting that the decision tree model might provide a new and convenient way to screen UUTD for NGB patients in both undeveloped and developing areas.
Collapse
Affiliation(s)
- H Fang
- Shenzhen Hospital, Southern Medical University, Shenzhen, China
| | - B Lu
- Shenzhen Hospital, Southern Medical University, Shenzhen, China
| | - X Wang
- The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - L Zheng
- The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - K Sun
- The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
| | - W Cai
- Shenzhen Hospital, Southern Medical University, Shenzhen, China
| |
Collapse
|
31
|
Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project. PLoS One 2017; 12:e0179805. [PMID: 28738059 PMCID: PMC5524285 DOI: 10.1371/journal.pone.0179805] [Citation(s) in RCA: 109] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Accepted: 06/05/2017] [Indexed: 01/21/2023] Open
Abstract
Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
Collapse
|
32
|
Machida Y, Shimauchi A, Kuroki Y, Tozaki M, Kato Y, Hoshi K, Fukuma E. Single focus on breast magnetic resonance imaging: diagnosis based on kinetic pattern and patient age. Acta Radiol 2017; 58:652-659. [PMID: 27664278 DOI: 10.1177/0284185116668212] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Background Because of its small size, a focus in breast magnetic resonance imaging (MRI) must be evaluated on the basis of characteristics other than morphologic features. Patient-related factors including patient age, in conjunction with lesion-related factors, could be useful for decision-making. Purpose To assess the probability of malignant foci based on both lesion- and patient-related factors, and to propose a relevant decision-making method. Material and Methods Foci in our breast MRI database dating from April 2006 to June 2013 were retrospectively identified and analyzed. A Fisher's exact test or a Mann-Whitney U test were performed for univariate analyses, and factors that showed a significant association with outcome in the univariate analyses were subjected to multivariate analysis using a logistic regression model. A decision tree was then drawn using the significant predictors confirmed by multivariate analysis. Results In total, 184 foci (168 benign, 16 malignant) in 184 patients were analyzed in our study. The presence of a washout pattern and older age were found to be significant predictors of malignancy ( P < 0.0001; odds ratio [OR], 17.8; P = 0.021; OR, 1.1, respectively). The main decisive node on the decision tree was the presence of a washout pattern, followed by whether the patient's age was >63 years. Conclusion An enhancing focus showing a washout pattern, especially in older patients, may warrant immediate biopsy rather than short-interval follow-up.
Collapse
Affiliation(s)
- Youichi Machida
- Kameda Kyobashi Clinic, Tokyo, Japan
- Kameda Medical Center, Chiba, Japan
| | | | - Yoshifumi Kuroki
- Kameda Kyobashi Clinic, Tokyo, Japan
- Sagara Hospital Affiliated Breast Center, Kagoshima City, Japan
| | | | | | | | | |
Collapse
|
33
|
Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine Learning and Data Mining Methods in Diabetes Research. Comput Struct Biotechnol J 2017; 15:104-116. [PMID: 28138367 PMCID: PMC5257026 DOI: 10.1016/j.csbj.2016.12.005] [Citation(s) in RCA: 332] [Impact Index Per Article: 47.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 12/20/2016] [Accepted: 12/27/2016] [Indexed: 12/14/2022] Open
Abstract
The remarkable advances in biotechnology and health sciences have led to a significant production of data, such as high throughput genetic data and clinical information, generated from large Electronic Health Records (EHRs). To this end, application of machine learning and data mining methods in biosciences is presently, more than ever before, vital and indispensable in efforts to transform intelligently all available information into valuable knowledge. Diabetes mellitus (DM) is defined as a group of metabolic disorders exerting significant pressure on human health worldwide. Extensive research in all aspects of diabetes (diagnosis, etiopathophysiology, therapy, etc.) has led to the generation of huge amounts of data. The aim of the present study is to conduct a systematic review of the applications of machine learning, data mining techniques and tools in the field of diabetes research with respect to a) Prediction and Diagnosis, b) Diabetic Complications, c) Genetic Background and Environment, and e) Health Care and Management with the first category appearing to be the most popular. A wide range of machine learning algorithms were employed. In general, 85% of those used were characterized by supervised learning approaches and 15% by unsupervised ones, and more specifically, association rules. Support vector machines (SVM) arise as the most successful and widely used algorithm. Concerning the type of data, clinical datasets were mainly used. The title applications in the selected articles project the usefulness of extracting valuable knowledge leading to new hypotheses targeting deeper understanding and further investigation in DM.
Collapse
Affiliation(s)
- Ioannis Kavakiotis
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece
| | - Olga Tsave
- Laboratory of Inorganic Chemistry, Department of Chemical Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Athanasios Salifoglou
- Laboratory of Inorganic Chemistry, Department of Chemical Engineering, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Nicos Maglaveras
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece
- Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Ioannis Vlahavas
- Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| | - Ioanna Chouvarda
- Institute of Applied Biosciences, CERTH, Thessaloniki, Greece
- Lab of Computing and Medical Informatics, Medical School, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece
| |
Collapse
|