1
|
Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetol Metab Syndr 2022; 14:196. [PMID: 36572938 PMCID: PMC9793536 DOI: 10.1186/s13098-022-00969-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Diabetes as a metabolic illness can be characterized by increased amounts of blood glucose. This abnormal increase can lead to critical detriment to the other organs such as the kidneys, eyes, heart, nerves, and blood vessels. Therefore, its prediction, prognosis, and management are essential to prevent harmful effects and also recommend more useful treatments. For these goals, machine learning algorithms have found considerable attention and have been developed successfully. This review surveys the recently proposed machine learning (ML) and deep learning (DL) models for the objectives mentioned earlier. The reported results disclose that the ML and DL algorithms are promising approaches for controlling blood glucose and diabetes. However, they should be improved and employed in large datasets to affirm their applicability.
Collapse
|
2
|
Allaoui G, Rylander C, Averina M, Wilsgaard T, Fuskevåg O, Berg V. Longitudinal changes in blood biomarkers and their ability to predict type 2 diabetes mellitus—The Tromsø study. Endocrinol Diabetes Metab 2022; 5:e00325. [PMID: 35147293 PMCID: PMC8917864 DOI: 10.1002/edm2.325] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 01/31/2022] [Accepted: 02/02/2022] [Indexed: 11/07/2022] Open
Abstract
Introduction Methods Results Conclusion
Collapse
Affiliation(s)
- Giovanni Allaoui
- Division of Diagnostic Services Department of Laboratory Medicine University Hospital of North Norway Tromsø Norway
- Department of Medical Biology Faculty of Health Sciences UiT‐The Arctic University of Norway Tromsø Norway
| | - Charlotta Rylander
- Department of Community Medicine Faculty of Health Sciences UIT‐The Arctic University of Norway Tromsø Norway
| | - Maria Averina
- Division of Diagnostic Services Department of Laboratory Medicine University Hospital of North Norway Tromsø Norway
- Department of Community Medicine Faculty of Health Sciences UIT‐The Arctic University of Norway Tromsø Norway
| | - Tom Wilsgaard
- Department of Community Medicine Faculty of Health Sciences UIT‐The Arctic University of Norway Tromsø Norway
| | - Ole‐Martin Fuskevåg
- Division of Diagnostic Services Department of Laboratory Medicine University Hospital of North Norway Tromsø Norway
| | - Vivian Berg
- Division of Diagnostic Services Department of Laboratory Medicine University Hospital of North Norway Tromsø Norway
- Department of Medical Biology Faculty of Health Sciences UiT‐The Arctic University of Norway Tromsø Norway
| |
Collapse
|
3
|
Romeo L, Frontoni E. A Unified Hierarchical XGBoost model for classifying priorities for COVID-19 vaccination campaign. PATTERN RECOGNITION 2022; 121:108197. [PMID: 34312570 PMCID: PMC8295058 DOI: 10.1016/j.patcog.2021.108197] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Revised: 06/21/2021] [Accepted: 07/20/2021] [Indexed: 05/03/2023]
Abstract
The current ML approaches do not fully focus to answer a still unresolved and topical challenge, namely the prediction of priorities of COVID-19 vaccine administration. Thus, our task includes some additional methodological challenges mainly related to avoiding unwanted bias while handling categorical and ordinal data with a highly imbalanced nature. Hence, the main contribution of this study is to propose a machine learning algorithm, namely Hierarchical Priority Classification eXtreme Gradient Boosting for priority classification for COVID-19 vaccine administration using the Italian Federation of General Practitioners dataset that contains Electronic Health Record data of 17k patients. We measured the effectiveness of the proposed methodology for classifying all the priority classes while demonstrating a significant improvement with respect to the state of the art. The proposed ML approach, which is integrated into a clinical decision support system, is currently supporting General Pracitioners in assigning COVID-19 vaccine administration priorities to their assistants.
Collapse
Affiliation(s)
- Luca Romeo
- Department of Information Engineering (DII), Università Politecnica delle Marche, Ancona, Italy
- Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genova, Italy
| | - Emanuele Frontoni
- Department of Information Engineering (DII), Università Politecnica delle Marche, Ancona, Italy
| |
Collapse
|
4
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|
5
|
Ravaut M, Harish V, Sadeghi H, Leung KK, Volkovs M, Kornas K, Watson T, Poutanen T, Rosella LC. Development and Validation of a Machine Learning Model Using Administrative Health Data to Predict Onset of Type 2 Diabetes. JAMA Netw Open 2021; 4:e2111315. [PMID: 34032855 PMCID: PMC8150694 DOI: 10.1001/jamanetworkopen.2021.11315] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 04/01/2021] [Indexed: 11/14/2022] Open
Abstract
Importance Systems-level barriers to diabetes care could be improved with population health planning tools that accurately discriminate between high- and low-risk groups to guide investments and targeted interventions. Objective To develop and validate a population-level machine learning model for predicting type 2 diabetes 5 years before diabetes onset using administrative health data. Design, Setting, and Participants This decision analytical model study used linked administrative health data from the diverse, single-payer health system in Ontario, Canada, between January 1, 2006, and December 31, 2016. A gradient boosting decision tree model was trained on data from 1 657 395 patients, validated on 243 442 patients, and tested on 236 506 patients. Costs associated with each patient were estimated using a validated costing algorithm. Data were analyzed from January 1, 2006, to December 31, 2016. Exposures A random sample of 2 137 343 residents of Ontario without type 2 diabetes was obtained at study start time. More than 300 features from data sets capturing demographic information, laboratory measurements, drug benefits, health care system interactions, social determinants of health, and ambulatory care and hospitalization records were compiled over 2-year patient medical histories to generate quarterly predictions. Main Outcomes and Measures Discrimination was assessed using the area under the receiver operating characteristic curve statistic, and calibration was assessed visually using calibration plots. Feature contribution was assessed with Shapley values. Costs were estimated in 2020 US dollars. Results This study trained a gradient boosting decision tree model on data from 1 657 395 patients (12 900 257 instances; 6 666 662 women [51.7%]). The developed model achieved a test area under the curve of 80.26 (range, 80.21-80.29), demonstrated good calibration, and was robust to sex, immigration status, area-level marginalization with regard to material deprivation and race/ethnicity, and low contact with the health care system. The top 5% of patients predicted as high risk by the model represented 26% of the total annual diabetes cost in Ontario. Conclusions and Relevance In this decision analytical model study, a machine learning model approach accurately predicted the incidence of diabetes in the population using routinely collected health administrative data. These results suggest that the model could be used to inform decision-making for population health planning and diabetes prevention.
Collapse
Affiliation(s)
- Mathieu Ravaut
- Layer 6 AI, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Vinyas Harish
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Temerty Centre for Artificial Intelligence Research and Education in Medicine, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
| | | | | | | | - Kathy Kornas
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Tristan Watson
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Institute of Clinical Evaluative Sciences (ICES), Toronto, Ontario, Canada
| | | | - Laura C. Rosella
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
- Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada
- Temerty Centre for Artificial Intelligence Research and Education in Medicine, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
- Institute of Clinical Evaluative Sciences (ICES), Toronto, Ontario, Canada
- Institute for Better Health, Trillium Health Partners, Mississauga, Ontario, Canada
| |
Collapse
|
6
|
Srivastava AK, Kumar Y, Singh PK. A Rule-Based Monitoring System for Accurate Prediction of Diabetes. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2020. [DOI: 10.4018/ijehmc.2020070103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Diabetes is a chronic disease that can affect the life of people due to high sugar level in their blood. The sugar level is increased due to a lack of production of insulin in the human body. Large numbers of people are affected with diabetes and it can increase tremendously due life style behavior. Diabetes can also affect the other human organs, like kidneys, hearts, retinas and lead to the failure of these organs. This article presents a diabetic monitoring system to determine the risk of diabetes based on the personal health record of patients. In this work, several rules are designed based on the clinical as well as non-clinical symptoms. The effectiveness of the diabetes monitoring system is tested on a set of two hundred forty people. The simulation results are also compared with well-known techniques available for diabetes prediction. It is stated that proposed monitoring system obtains 90.41% accuracy rate as compared with other techniques.
Collapse
Affiliation(s)
| | - Yugal Kumar
- Jaypee University of Information Technology, India
| | | |
Collapse
|
7
|
Early temporal prediction of Type 2 Diabetes Risk Condition from a General Practitioner Electronic Health Record: A Multiple Instance Boosting Approach. Artif Intell Med 2020; 105:101847. [PMID: 32505428 DOI: 10.1016/j.artmed.2020.101847] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 02/12/2020] [Accepted: 03/20/2020] [Indexed: 11/22/2022]
Abstract
Early prediction of target patients at high risk of developing Type 2 diabetes (T2D) plays a significant role in preventing the onset of overt disease and its associated comorbidities. Although fundamental in early phases of T2D natural history, insulin resistance is not usually quantified by General Practitioners (GPs). Triglyceride-glucose (TyG) index has been proven useful in clinical studies for quantifying insulin resistance and for the early identification of individuals at T2D risk but still not applied by GPs for diagnostic purposes. The aim of this study is to propose a multiple instance learning boosting algorithm (MIL-Boost) for creating a predictive model capable of early prediction of worsening insulin resistance (low vs high T2D risk) in terms of TyG index. The MIL-Boost is applied to past electronic health record (EHR) patients' information stored by a single GP. The proposed MIL-Boost algorithm proved to be effective in dealing with this task, by performing better than the other state-of-the-art ML competitors (Recall from 0.70 and up to 0.83). The proposed MIL-based approach is able to extract hidden patterns from past EHR temporal data, even not directly exploiting triglycerides and glucose measurements. The major advantages of our method can be found in its ability to model the temporal evolution of longitudinal EHR data while dealing with small sample size and variability in the observations (e.g., a small variable number of prescriptions for non-hospitalized patients). The proposed algorithm may represent the main core of a clinical decision support system.
Collapse
|
8
|
Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, Tran CT, Simpson CR. Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019; 182:105055. [PMID: 31505379 DOI: 10.1016/j.cmpb.2019.105055] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 08/17/2019] [Accepted: 08/27/2019] [Indexed: 06/10/2023]
Abstract
OBJECTIVE Diabetes is responsible for considerable morbidity, healthcare utilisation and mortality in both developed and developing countries. Currently, methods of treating diabetes are inadequate and costly so prevention becomes an important step in reducing the burden of diabetes and its complications. Electronic health records (EHRs) for each individual or a population have become important tools in understanding developing trends of diseases. Using EHRs to predict the onset of diabetes could improve the quality and efficiency of medical care. In this paper, we apply a wide and deep learning model that combines the strength of a generalised linear model with various features and a deep feed-forward neural network to improve the prediction of the onset of type 2 diabetes mellitus (T2DM). MATERIALS AND METHODS The proposed method was implemented by training various models into a logistic loss function using a stochastic gradient descent. We applied this model using public hospital record data provided by the Practice Fusion EHRs for the United States population. The dataset consists of de-identified electronic health records for 9948 patients, of which 1904 have been diagnosed with T2DM. Prediction of diabetes in 2012 was based on data obtained from previous years (2009-2011). The imbalance class of the model was handled by Synthetic Minority Oversampling Technique (SMOTE) for each cross-validation training fold to analyse the performance when synthetic examples for the minority class are created. We used SMOTE of 150 and 300 percent, in which 300 percent means that three new synthetic instances are created for each minority class instance. This results in the approximated diabetes:non-diabetes distributions in the training set of 1:2 and 1:1, respectively. RESULTS Our final ensemble model not using SMOTE obtained an accuracy of 84.28%, area under the receiver operating characteristic curve (AUC) of 84.13%, sensitivity of 31.17% and specificity of 96.85%. Using SMOTE of 150 and 300 percent did not improve AUC (83.33% and 82.12%, respectively) but increased sensitivity (49.40% and 71.57%, respectively) with a moderate decrease in specificity (90.16% and 76.59%, respectively). DISCUSSION AND CONCLUSIONS Our algorithm has further optimised the prediction of diabetes onset using a novel state-of-the-art machine learning algorithm: the wide and deep learning neural network architecture.
Collapse
Affiliation(s)
- Binh P Nguyen
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand.
| | - Hung N Pham
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam
| | - Hop Tran
- School of Mathematics and Statistics, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand
| | - Nhung Nghiem
- Department of Public Health, University of Otago, 23A Mein Street, Wellington 6021, New Zealand
| | - Quang H Nguyen
- School of Information and Communication Technology, Hanoi University of Science and Technology, 1 Dai Co Viet Road, Hanoi 100000, Vietnam
| | - Trang T T Do
- Institute for Infocomm Research, Agency for Science, Technology and Research, 1 Fusionopolis Way, Singapore 138632, Singapore
| | - Cao Truong Tran
- Faculty of Information Technology, Le Quy Don Technical University, 236 Hoang Quoc Viet Street, Hanoi 100000, Vietnam
| | - Colin R Simpson
- Faculty of Health, Victoria University of Wellington, Kelburn Parade, Wellington 6140, New Zealand; Usher Institute, The University of Edinburgh, Edinburgh, EH89AG, United Kingdom
| |
Collapse
|