1
|
Morgan-Benita JA, Galván-Tejada CE, Cruz M, Galván-Tejada JI, Gamboa-Rosales H, Arceo-Olague JG, Luna-García H, Celaya-Padilla JM. Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features. Healthcare (Basel) 2022; 10:healthcare10081362. [PMID: 35893185 PMCID: PMC9331873 DOI: 10.3390/healthcare10081362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/11/2022] [Accepted: 07/15/2022] [Indexed: 11/16/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous output that is also fast and effective for early detection and prediction of T2D can be used. In this article, an ensemble technique by hard voting is designed and implemented using generalized linear regression (GLM), support vector machines (SVM) and artificial neural networks (ANN) for the classification of T2DM patients. In the materials and methods as a first step, the data is balanced, standardized, imputed and integrated into the three models to classify the patients in a dichotomous result. For the selection of features, an implementation of LASSO is developed, with a 10-fold cross-validation and for the final validation, the Area Under the Curve (AUC) is used. The results in LASSO showed 12 features, which are used in the implemented models to obtain the best possible scenario in the developed ensemble model. The algorithm with the best performance of the three is SVM, this model obtained an AUC of 92% ± 3%. The ensemble model built with GLM, SVM and ANN obtained an AUC of 90% ± 3%.
Collapse
Affiliation(s)
- Jorge A. Morgan-Benita
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Carlos E. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Av. Cuauhtémoc 330, Col. Doctores, Del. Cuauhtémoc, Mexico City 06720, Mexico;
| | - Jorge I. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Hamurabi Gamboa-Rosales
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Jose G. Arceo-Olague
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Huizilopoztli Luna-García
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
- Correspondence: (H.L.-G.); (J.M.C.-P.)
| | - José M. Celaya-Padilla
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
- Correspondence: (H.L.-G.); (J.M.C.-P.)
| |
Collapse
|
2
|
Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. ADVANCES IN COMPUTATIONAL INTELLIGENCE 2022; 2:22. [PMID: 35434723 PMCID: PMC9006199 DOI: 10.1007/s43674-022-00034-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 02/27/2022] [Accepted: 03/03/2022] [Indexed: 12/14/2022]
Abstract
Type 2 diabetes has recently acquired the status of an epidemic silent killer, though it is non-communicable. There are two main reasons behind this perception of the disease. First, a gradual but exponential growth in the disease prevalence has been witnessed irrespective of age groups, geography or gender. Second, the disease dynamics are very complex in terms of multifactorial risks involved, initial asymptomatic period, different short-term and long-term complications posing serious health threat and related co-morbidities. Majority of its risk factors are lifestyle habits like physical inactivity, lack of exercise, high body mass index (BMI), poor diet, smoking except some inevitable ones like family history of diabetes, ethnic predisposition, ageing etc. Nowadays, machine learning (ML) is increasingly being applied for alleviation of diabetes health burden and many research works have been proposed in the literature to offer clinical decision support in different application areas as well. In this paper, we present a review of such efforts for the prevention and management of type 2 diabetes. Firstly, we present the medical gaps in diabetes knowledge base, guidelines and medical practice identified from relevant articles and highlight those that can be addressed by ML. Further, we review the ML research works in three different application areas namely—(1) risk assessment (statistical risk scores and ML-based risk models), (2) diagnosis (using non-invasive and invasive features), (3) prognosis (from normoglycemia/prior morbidity to incident diabetes and prognosis of incident diabetes to related complications). We discuss and summarize the shortcomings or gaps in the existing ML methodologies for diabetes to be addressed in future. This review provides the breadth of ML predictive modeling applications for diabetes while highlighting the medical and technological gaps as well as various aspects involved in ML-based diabetes clinical decision support.
Collapse
Affiliation(s)
- Ashwini Tuppad
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| | - Shantala Devi Patil
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| |
Collapse
|
3
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|
4
|
Song J, Gao Y, Yin P, Li Y, Li Y, Zhang J, Su Q, Fu X, Pi H. The Random Forest Model Has the Best Accuracy Among the Four Pressure Ulcer Prediction Models Using Machine Learning Algorithms. Risk Manag Healthc Policy 2021; 14:1175-1187. [PMID: 33776495 PMCID: PMC7987326 DOI: 10.2147/rmhp.s297838] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 02/26/2021] [Indexed: 12/11/2022] Open
Abstract
Purpose Build machine learning models for predicting pressure ulcer nursing adverse event, and find an optimal model that predicts the occurrence of pressure ulcer accurately. Patients and Methods Retrospectively enrolled 5814 patients, of which 1673 suffer from pressure ulcer events. Support vector machine (SVM), decision tree (DT), random forest (RF) and artificial neural network (ANN) models were used to construct the pressure ulcer prediction models, respectively. A total of 19 variables are included, and the importance of screening variables is evaluated. Meanwhile, the performance of the prediction models is evaluated and compared. Results The experimental results show that the four pressure ulcer prediction models all achieve good performance. Also, the AUC values of the four models are all greater than 0.95. Besides, the comparison of the four models indicates that RF model achieves a higher accuracy for the prediction of pressure ulcer. Conclusion This research verifies the feasibility of developing a management system for predicting nursing adverse event based on big data technology and machine learning technology. The random forest and decision tree model are more suitable for constructing a pressure ulcer prediction model. This study provides a reference for future pressure ulcer risk warning based on big data.
Collapse
Affiliation(s)
- Jie Song
- Medical School of Chinese PLA, Beijing, People's Republic of China
| | - Yuan Gao
- First Medical Center, Chinese PLA General Hospital, Beijing, People's Republic of China
| | - Pengbin Yin
- Fouth Medical Center, Chinese PLA General Hospital, Beijing, People's Republic of China
| | - Yi Li
- Medical School of Chinese PLA, Beijing, People's Republic of China
| | - Yang Li
- First Medical Center, Chinese PLA General Hospital, Beijing, People's Republic of China
| | - Jie Zhang
- Sixth Medical Center, Chinese PLA General Hospital, Beijing, People's Republic of China
| | - Qingqing Su
- Medical School of Chinese PLA, Beijing, People's Republic of China
| | - Xiaojie Fu
- First Medical Center, Chinese PLA General Hospital, Beijing, People's Republic of China
| | - Hongying Pi
- Medical Service Training Center, Chinese PLA General Hospital, Beijing, People's Republic of China
| |
Collapse
|
5
|
Basu S, Johnson KT, Berkowitz SA. Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes. Curr Diab Rep 2020; 20:80. [PMID: 33270183 DOI: 10.1007/s11892-020-01353-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/26/2020] [Indexed: 12/12/2022]
Abstract
PURPOSE OF REVIEW Machine learning approaches-which seek to predict outcomes or classify patient features by recognizing patterns in large datasets-are increasingly applied to clinical epidemiology research on diabetes. Given its novelty and emergence in fields outside of biomedical research, machine learning terminology, techniques, and research findings may be unfamiliar to diabetes researchers. Our aim was to present the use of machine learning approaches in an approachable way, drawing from clinical epidemiological research in diabetes published from 1 Jan 2017 to 1 June 2020. RECENT FINDINGS Machine learning approaches using tree-based learners-which produce decision trees to help guide clinical interventions-frequently have higher sensitivity and specificity than traditional regression models for risk prediction. Machine learning approaches using neural networking and "deep learning" can be applied to medical image data, particularly for the identification and staging of diabetic retinopathy and skin ulcers. Among the machine learning approaches reviewed, researchers identified new strategies to develop standard datasets for rigorous comparisons across older and newer approaches, methods to illustrate how a machine learner was treating underlying data, and approaches to improve the transparency of the machine learning process. Machine learning approaches have the potential to improve risk stratification and outcome prediction for clinical epidemiology applications. Achieving this potential would be facilitated by use of universal open-source datasets for fair comparisons. More work remains in the application of strategies to communicate how the machine learners are generating their predictions.
Collapse
Affiliation(s)
- Sanjay Basu
- Center for Primary Care, Harvard Medical School, Boston, MA, USA.
- Research and Population Health, Collective Health, San Francisco, CA, USA.
- School of Public Health, Imperial College London, London, SW7, UK.
| | - Karl T Johnson
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Seth A Berkowitz
- General Medicine and Clinical Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
6
|
Vettoretti M, Longato E, Zandonà A, Li Y, Pagán JA, Siscovick D, Carnethon MR, Bertoni AG, Facchinetti A, Di Camillo B. Addressing practical issues of predictive models translation into everyday practice and public health management: a combined model to predict the risk of type 2 diabetes improves incidence prediction and reduces the prevalence of missing risk predictions. BMJ Open Diabetes Res Care 2020; 8:8/1/e001223. [PMID: 32747386 PMCID: PMC7398107 DOI: 10.1136/bmjdrc-2020-001223] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 06/03/2020] [Accepted: 06/10/2020] [Indexed: 12/12/2022] Open
Abstract
INTRODUCTION Many predictive models for incident type 2 diabetes (T2D) exist, but these models are not used frequently for public health management. Barriers to their application include (1) the problem of model choice (some models are applicable only to certain ethnic groups), (2) missing input variables, and (3) the lack of calibration. While (1) and (2) drives to missing predictions, (3) causes inaccurate incidence predictions. In this paper, a combined T2D risk model for public health management that addresses these three issues is developed. RESEARCH DESIGN AND METHODS The combined T2D risk model combines eight existing predictive models by weighted average to overcome the problem of missing incidence predictions. Moreover, the combined model implements a simple recalibration strategy in which the risk scores are rescaled based on the T2D incidence in the target population. The performance of the combined model was compared with that of the eight existing models using data from two test datasets extracted from the Multi-Ethnic Study of Atherosclerosis (MESA; n=1031) and the English Longitudinal Study of Ageing (ELSA; n=4820). Metrics of discrimination, calibration, and missing incidence predictions were used for the assessment. RESULTS The combined T2D model performed well in terms of both discrimination (concordance index: 0.83 on MESA; 0.77 on ELSA) and calibration (expected to observed event ratio: 1.00 on MESA; 1.17 on ELSA), similarly to the best-performing existing models. However, while the existing models yielded a large percentage of missing predictions (17%-45% on MESA; 63%-64% on ELSA), this was negligible with the combined model (0% on MESA, 4% on ELSA). CONCLUSIONS Leveraging on existing literature T2D predictive models, a simple approach based on risk score rescaling and averaging was shown to provide accurate and robust incidence predictions, overcoming the problem of recalibration and missing predictions in practical application of predictive models.
Collapse
Affiliation(s)
- Martina Vettoretti
- Department of Information Engineering, School of Engineering, University of Padova, Padova, Italy
| | - Enrico Longato
- Department of Information Engineering, School of Engineering, University of Padova, Padova, Italy
| | - Alessandro Zandonà
- Department of Information Engineering, School of Engineering, University of Padova, Padova, Italy
| | - Yan Li
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - José Antonio Pagán
- Department of Public Health Policy and Management, New York University, New York, New York, USA
- Center for Health Innovation, New York Academy of Medicine, New York, New York, USA
| | - David Siscovick
- Research, Evaluation & Policy, New York Academy of Medicine, New York, New York, USA
| | - Mercedes R Carnethon
- Department of Preventive Medicine, Northwestern University, Chicago, Illinois, USA
| | - Alain G Bertoni
- Division of Public Health Sciences, Wake Forest University Health Sciences, Winston-Salem, North Carolina, USA
| | - Andrea Facchinetti
- Department of Information Engineering, School of Engineering, University of Padova, Padova, Italy
| | - Barbara Di Camillo
- Department of Information Engineering, School of Engineering, University of Padova, Padova, Italy
| |
Collapse
|
7
|
De Giorgi A, Di Simone E, Cappadona R, Boari B, Savriè C, López-Soto PJ, Rodríguez-Borrego MA, Gallerani M, Manfredini R, Fabbian F. Validation and Comparison of a Modified Elixhauser Index for Predicting In-Hospital Mortality in Italian Internal Medicine Wards. Risk Manag Healthc Policy 2020; 13:443-451. [PMID: 32547275 PMCID: PMC7246324 DOI: 10.2147/rmhp.s247633] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/07/2020] [Indexed: 11/23/2022] Open
Abstract
Purpose Burden of comorbidities appears to be related to clinical outcomes in hospitalized patients. Clinical stratification of admitted patients could be obtained calculating a comorbidity score, which represents the simplest way to identify the severity of patients' clinical conditions and a practical approach to assess prevalent comorbidities. Our aim was to validate a modified Elixhauser score for predicting in-hospital mortality (IHM) in internal medicine admissions and to compare it with a different one derived from clinical data previously used in a similar setting, having a good prognostic accuracy. Patients and Methods A single-center retrospective study enrolled all patients admitted to internal medicine department between January and June 2016. A modified Elixhauser score was calculated from chart review and administrative data; moreover, a second prognostic index was calculated from chart review only. Comorbidity scores were compared using c-statistic. Results We analyzed 1614 individuals without selecting the reason for admission, 224 (13.9%) died during hospital stay. Deceased subjects were older (83.3±9.1 vs 78.4±13.5 years; p<0.001) and had higher burden of comorbidities. The modified Elixhauser score calculated by administrative data and by chart review and the comparator one was 18.13±9.36 vs 24.43±11.27 vs 7.63±3.3, respectively, and the c-statistic was 0.758 (95% CI 0.727-0.790), 0.811 (95% CI 0.782-0.840) and 0.740 (95% CI 0.709-0.771), respectively. Conclusion The new modified Elixhauser score showed a similar performance to a previous clinical prognostic index when it was calculated using administrative data; however, its performance improved if calculation was based on chart review.
Collapse
Affiliation(s)
- Alfredo De Giorgi
- Department of Internal Medicine, University Hospital St. Anna, Ferrara, Italy
| | - Emanuele Di Simone
- Department of Internal Medicine, University Hospital St. Anna, Ferrara, Italy
| | - Rosaria Cappadona
- Department of Medical Sciences, Faculty of Medicine, Pharmacy and Prevention, University of Ferrara, Ferrara, Italy
| | - Benedetta Boari
- Department of Internal Medicine, University Hospital St. Anna, Ferrara, Italy
| | - Caterina Savriè
- Department of Internal Medicine, University Hospital St. Anna, Ferrara, Italy
| | - Pablo J López-Soto
- Department of Nursing, Maimonides Biomedical Research Institute of Cordoba (IMIBIC)/University of Córdoba, Córdoba, Spain
| | - María A Rodríguez-Borrego
- Department of Nursing, Maimonides Biomedical Research Institute of Cordoba (IMIBIC)/University of Córdoba, Córdoba, Spain
| | - Massimo Gallerani
- Department of Internal Medicine, University Hospital St. Anna, Ferrara, Italy
| | - Roberto Manfredini
- Department of Medical Sciences, Faculty of Medicine, Pharmacy and Prevention, University of Ferrara, Ferrara, Italy
| | - Fabio Fabbian
- Department of Medical Sciences, Faculty of Medicine, Pharmacy and Prevention, University of Ferrara, Ferrara, Italy
| |
Collapse
|