1
|
Khalilnejad A, Sun RT, Kompala T, Painter S, James R, Wang Y. Proactive Identification of Patients with Diabetes at Risk of Uncontrolled Outcomes during a Diabetes Management Program: Conceptualization and Development Study Using Machine Learning. JMIR Form Res 2024; 8:e54373. [PMID: 38669074 PMCID: PMC11087850 DOI: 10.2196/54373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/12/2024] [Accepted: 01/20/2024] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND The growth in the capabilities of telehealth have made it possible to identify individuals with a higher risk of uncontrolled diabetes and provide them with targeted support and resources to help them manage their condition. Thus, predictive modeling has emerged as a valuable tool for the advancement of diabetes management. OBJECTIVE This study aimed to conceptualize and develop a novel machine learning (ML) approach to proactively identify participants enrolled in a remote diabetes monitoring program (RDMP) who were at risk of uncontrolled diabetes at 12 months in the program. METHODS Registry data from the Livongo for Diabetes RDMP were used to design separate dynamic predictive ML models to predict participant outcomes at each monthly checkpoint of the participants' program journey (month-n models) from the first day of onboarding (month-0 model) up to the 11th month (month-11 model). A participant's program journey began upon onboarding into the RDMP and monitoring their own blood glucose (BG) levels through the RDMP-provided BG meter. Each participant passed through 12 predicative models through their first year enrolled in the RDMP. Four categories of participant attributes (ie, survey data, BG data, medication fills, and health signals) were used for feature construction. The models were trained using the light gradient boosting machine and underwent hyperparameter tuning. The performance of the models was evaluated using standard metrics, including precision, recall, specificity, the area under the curve, the F1-score, and accuracy. RESULTS The ML models exhibited strong performance, accurately identifying observable at-risk participants, with recall ranging from 70% to 94% and precision from 40% to 88% across the 12-month program journey. Unobservable at-risk participants also showed promising performance, with recall ranging from 61% to 82% and precision from 42% to 61%. Overall, model performance improved as participants progressed through their program journey, demonstrating the importance of engagement data in predicting long-term clinical outcomes. CONCLUSIONS This study explored the Livongo for Diabetes RDMP participants' temporal and static attributes, identification of diabetes management patterns and characteristics, and their relationship to predict diabetes management outcomes. Proactive targeting ML models accurately identified participants at risk of uncontrolled diabetes with a high level of precision that was generalizable through future years within the RDMP. The ability to identify participants who are at risk at various time points throughout the program journey allows for personalized interventions to improve outcomes. This approach offers significant advancements in the feasibility of large-scale implementation in remote monitoring programs and can help prevent uncontrolled glycemic levels and diabetes-related complications. Future research should include the impact of significant changes that can affect a participant's diabetes management.
Collapse
|
2
|
Chen K, Abtahi F, Carrero JJ, Fernandez-Llatas C, Seoane F. Process mining and data mining applications in the domain of chronic diseases: A systematic review. Artif Intell Med 2023; 144:102645. [PMID: 37783545 DOI: 10.1016/j.artmed.2023.102645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/24/2023] [Accepted: 08/28/2023] [Indexed: 10/04/2023]
Abstract
The widespread use of information technology in healthcare leads to extensive data collection, which can be utilised to enhance patient care and manage chronic illnesses. Our objective is to summarise previous studies that have used data mining or process mining methods in the context of chronic diseases in order to identify research trends and future opportunities. The review covers articles that pertain to the application of data mining or process mining methods on chronic diseases that were published between 2000 and 2022. Articles were sourced from PubMed, Web of Science, EMBASE, and Google Scholar based on predetermined inclusion and exclusion criteria. A total of 71 articles met the inclusion criteria and were included in the review. Based on the literature review results, we detected a growing trend in the application of data mining methods in diabetes research. Additionally, a distinct increase in the use of process mining methods to model clinical pathways in cancer research was observed. Frequently, this takes the form of a collaborative integration of process mining, data mining, and traditional statistical methods. In light of this collaborative approach, the meticulous selection of statistical methods based on their underlying assumptions is essential when integrating these traditional methods with process mining and data mining methods. Another notable challenge is the lack of standardised guidelines for reporting process mining studies in the medical field. Furthermore, there is a pressing need to enhance the clinical interpretation of data mining and process mining results.
Collapse
Affiliation(s)
- Kaile Chen
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Biomedical Engineering and Health Systems, Division of Ergonomics, KTH Royal Institute of Technology, 14157 Stockholm, Sweden.
| | - Farhad Abtahi
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Biomedical Engineering and Health Systems, Division of Ergonomics, KTH Royal Institute of Technology, 14157 Stockholm, Sweden; Department of Clinical Physiology, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Juan-Jesus Carrero
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, 17177 Stockholm, Sweden
| | - Carlos Fernandez-Llatas
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; SABIEN, ITACA, Universitat Politècnica de València, Spain
| | - Fernando Seoane
- Department of Clinical Science, Intervention and Technology, Karolinska Institutet, 17177 Stockholm, Sweden; Department of Clinical Physiology, Karolinska University Hospital, 17176 Stockholm, Sweden; Department of Medical Technology, Karolinska University Hospital, 17176 Stockholm, Sweden; Department of Textile Technology, University of Borås, 50190 Borås, Sweden
| |
Collapse
|
3
|
A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Med Biol Eng Comput 2023; 61:785-797. [PMID: 36602674 DOI: 10.1007/s11517-022-02749-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 12/22/2022] [Indexed: 01/06/2023]
Abstract
Diabetes mellitus has become a rapidly growing chronic health problem worldwide. There has been a noticeable increase in diabetes cases in the last two decades. Recent advances in ensemble machine learning methods play an important role in the early detection of diabetes mellitus. These methods are both faster and less costly than traditional methods. This study aims to propose a new super ensemble learning model to enable an early diagnosis of diabetes mellitus. Super learner is a cross-validation-based approach that makes better predictions by combining prediction results of more than one machine learning algorithm. The proposed super learner model was created with four base-learners (logistic regression, decision tree, random forest, gradient boosting) and a meta learner (support vector machines) as a result of a case study. Three different dataset were used to measure the robustness of the proposed model. Chi-square was determined as an optimal feature selection technique from five different techniques, and also hyper-parameter settings were made with GridSearch. Finally, the proposed new super learner model achieved to obtain the best accuracy results in the detection of Diabetes mellitus compared to the base-learners for the early-stage diabetes risk prediction (99.6%), PIMA (92%), and diabetes 130-US hospitals (98%) dataset, respectively. This study revealed that super learner algorithms can be effectively used in the detection of diabetes mellitus. Also, obtaining of the high and convincing statistical scores shows the robustness of the proposed super learner model.
Collapse
|
4
|
Chou CY, Hsu DY, Chou CH. Predicting the Onset of Diabetes with Machine Learning Methods. J Pers Med 2023; 13:406. [PMID: 36983587 PMCID: PMC10057336 DOI: 10.3390/jpm13030406] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 02/16/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The number of people suffering from diabetes in Taiwan has continued to rise in recent years. According to the statistics of the International Diabetes Federation, about 537 million people worldwide (10.5% of the global population) suffer from diabetes, and it is estimated that 643 million people will develop the condition (11.3% of the total population) by 2030. If this trend continues, the number will jump to 783 million (12.2%) by 2045. At present, the number of people with diabetes in Taiwan has reached 2.18 million, with an average of one in ten people suffering from the disease. In addition, according to the Bureau of National Health Insurance in Taiwan, the prevalence rate of diabetes among adults in Taiwan has reached 5% and is increasing each year. Diabetes can cause acute and chronic complications that can be fatal. Meanwhile, chronic complications can result in a variety of disabilities or organ decline. If holistic treatments and preventions are not provided to diabetic patients, it will lead to the consumption of more medical resources and a rapid decline in the quality of life of society as a whole. In this study, based on the outpatient examination data of a Taipei Municipal medical center, 15,000 women aged between 20 and 80 were selected as the subjects. These women were patients who had gone to the medical center during 2018-2020 and 2021-2022 with or without the diagnosis of diabetes. This study investigated eight different characteristics of the subjects, including the number of pregnancies, plasma glucose level, diastolic blood pressure, sebum thickness, insulin level, body mass index, diabetes pedigree function, and age. After sorting out the complete data of the patients, this study used Microsoft Machine Learning Studio to train the models of various kinds of neural networks, and the prediction results were used to compare the predictive ability of the various parameters for diabetes. Finally, this study found that after comparing the models using two-class logistic regression as well as the two-class neural network, two-class decision jungle, or two-class boosted decision tree for prediction, the best model was the two-class boosted decision tree, as its area under the curve could reach a score of 0.991, which was better than other models.
Collapse
Affiliation(s)
- Chun-Yang Chou
- Research Center for Healthcare Industry Innovation, National Taipei University of Nursing and Health Sciences, Taipei 112, Taiwan
| | - Ding-Yang Hsu
- Department of Industrial Design, Ming Chi University of Technology, Taipei 243, Taiwan
| | - Chun-Hung Chou
- Industrial Technology Research Institute, Hsinchu 310401, Taiwan
| |
Collapse
|
5
|
Gil M, Kim SS, Min EJ. Machine learning models for predicting risk of depression in Korean college students: Identifying family and individual factors. Front Public Health 2022; 10:1023010. [PMID: 36466485 PMCID: PMC9714606 DOI: 10.3389/fpubh.2022.1023010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 10/24/2022] [Indexed: 11/19/2022] Open
Abstract
Background Depression is one of the most prevalent mental illnesses among college students worldwide. Using the family triad dataset, this study investigated machine learning (ML) models to predict the risk of depression in college students and identify important family and individual factors. Methods This study predicted college students at risk of depression and identified significant family and individual factors in 171 family data (171 fathers, mothers, and college students). The prediction accuracy of three ML models, sparse logistic regression (SLR), support vector machine (SVM), and random forest (RF), was compared. Results The three ML models showed excellent prediction capabilities. The RF model showed the best performance. It revealed five significant factors responsible for depression: self-perceived mental health of college students, neuroticism, fearful-avoidant attachment, family cohesion, and mother's depression. Additionally, the logistic regression model identified five factors responsible for depression: the severity of cancer in the father, the severity of respiratory diseases in the mother, the self-perceived mental health of college students, conscientiousness, and neuroticism. Discussion These findings demonstrated the ability of ML models to accurately predict the risk of depression and identify family and individual factors related to depression among Korean college students. With recent developments and ML applications, our study can improve intelligent mental healthcare systems to detect early depressive symptoms and increase access to mental health services.
Collapse
Affiliation(s)
- Minji Gil
- College of Nursing, Ewha Womans University, Seoul, South Korea
| | - Suk-Sun Kim
- College of Nursing, Ewha Womans University, Seoul, South Korea,Ewha Research Institute of Nursing Science, Ewha Womans University, Seoul, South Korea,*Correspondence: Suk-Sun Kim
| | - Eun Jeong Min
- Department of Medical Life Sciences, School of Medicine, The Catholic University of Korea, Seoul, South Korea,Eun Jeong Min
| |
Collapse
|
6
|
Kibria HB, Matin A. The Severity Prediction of The Binary And Multi-Class Cardiovascular Disease - A Machine Learning-Based Fusion Approach. Comput Biol Chem 2022; 98:107672. [DOI: 10.1016/j.compbiolchem.2022.107672] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Revised: 02/25/2022] [Accepted: 03/26/2022] [Indexed: 12/22/2022]
|
7
|
A Hybrid Machine Learning Model Based on Global and Local Learner Algorithms for Diabetes Mellitus Prediction. JOURNAL OF BIOMIMETICS BIOMATERIALS AND BIOMEDICAL ENGINEERING 2022. [DOI: 10.4028/www.scientific.net/jbbbe.54.65] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Health is a critical condition for living things, even before the technology exists. Nowadays the healthcare domain provides a lot of scope for research as it has extremely evolved. The most researched areas of health sectors include diabetes mellitus (DM), breast cancer, brain tumor, etc. DM is a severe chronic disease that affects human health and has a high rate throughout the world. Early prediction of DM is important to reduce its risk and even avoid it. In this study, we propose a DM prediction model based on global and local learner algorithms. The proposed global and local learners stacking (GLLS) model; combines the prediction algorithms from two largely different but complementary machine learning paradigms, specifically XGBoost and NB from global learning whereas kNN and SVM (with RBF kernel) from local learning and aggregates them by stacking ensemble technique using LR as meta-learner. The effectiveness of the GLLS model was proved by comparing several performance measures and the results of different contrast experiments. The evaluation results on UCI Pima Indian diabetes data-set (PIDD) indicates the model has achieved the better prediction performance of 99.5%, 99.5%, 99.5%, 99.1%, and 100% in terms of accuracy, AUC, F1 score, sensitivity, and specificity respectively, compared to other research results mentioned in the literature. Moreover, to better validate the GLLS model performance, three additional medical data sets; Messidor, WBC, ILPD, are considered and the model also achieved an accuracy of 82.1%, 98.6%, and 89.3% respectively. Experimental results proved the effectiveness and superiority of our proposed GLLS model.
Collapse
|
8
|
Vehi J, Mujahid O, Contreras I. Aim and Diabetes. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
9
|
Nguyen P, Ohnmacht AJ, Galhoz A, Büttner M, Theis F, Menden MP. Künstliche Intelligenz und maschinelles Lernen in der Diabetesforschung. DIABETOLOGE 2021. [DOI: 10.1007/s11428-021-00817-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
10
|
Development and validation of a new diabetes index for the risk classification of present and new-onset diabetes: multicohort study. Sci Rep 2021; 11:15748. [PMID: 34344964 PMCID: PMC8333254 DOI: 10.1038/s41598-021-95341-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/26/2021] [Indexed: 02/07/2023] Open
Abstract
In this study, we aimed to propose a novel diabetes index for the risk classification based on machine learning techniques with a high accuracy for diabetes mellitus. Upon analyzing their demographic and biochemical data, we classified the 2013-16 Korea National Health and Nutrition Examination Survey (KNHANES), the 2017-18 KNHANES, and the Korean Genome and Epidemiology Study (KoGES), as the derivation, internal validation, and external validation sets, respectively. We constructed a new diabetes index using logistic regression (LR) and calculated the probability of diabetes in the validation sets. We used the area under the receiver operating characteristic curve (AUROC) and Cox regression analysis to measure the performance of the internal and external validation sets, respectively. We constructed a gender-specific diabetes prediction model, having a resultant AUROC of 0.93 and 0.94 for men and women, respectively. Based on this probability, we classified participants into five groups and analyzed cumulative incidence from the KoGES dataset. Group 5 demonstrated significantly worse outcomes than those in other groups. Our novel model for predicting diabetes, based on two large-scale population-based cohort studies, showed high sensitivity and selectivity. Therefore, our diabetes index can be used to classify individuals at high risk of diabetes.
Collapse
|
11
|
Ray A, Chaudhuri AK. Smart healthcare disease diagnosis and patient management: Innovation, improvement and skill development. MACHINE LEARNING WITH APPLICATIONS 2021. [DOI: 10.1016/j.mlwa.2020.100011] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
12
|
Aim and Diabetes. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_158-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
13
|
Kopitar L, Kocbek P, Cilar L, Sheikh A, Stiglic G. Early detection of type 2 diabetes mellitus using machine learning-based prediction models. Sci Rep 2020; 10:11981. [PMID: 32686721 PMCID: PMC7371679 DOI: 10.1038/s41598-020-68771-z] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 06/30/2020] [Indexed: 02/07/2023] Open
Abstract
Most screening tests for T2DM in use today were developed using multivariate regression methods that are often further simplified to allow transformation into a scoring formula. The increasing volume of electronically collected data opened the opportunity to develop more complex, accurate prediction models that can be continuously updated using machine learning approaches. This study compares machine learning-based prediction models (i.e. Glmnet, RF, XGBoost, LightGBM) to commonly used regression models for prediction of undiagnosed T2DM. The performance in prediction of fasting plasma glucose level was measured using 100 bootstrap iterations in different subsets of data simulating new incoming data in 6-month batches. With 6 months of data available, simple regression model performed with the lowest average RMSE of 0.838, followed by RF (0.842), LightGBM (0.846), Glmnet (0.859) and XGBoost (0.881). When more data were added, Glmnet improved with the highest rate (+ 3.4%). The highest level of variable selection stability over time was observed with LightGBM models. Our results show no clinically relevant improvement when more sophisticated prediction models were used. Since higher stability of selected variables over time contributes to simpler interpretation of the models, interpretability and model calibration should also be considered in development of clinical prediction models.
Collapse
Affiliation(s)
- Leon Kopitar
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, 6000, Koper, Slovenia.
| | - Primoz Kocbek
- Faculty of Health Sciences, University of Maribor, 2000, Maribor, Slovenia
| | - Leona Cilar
- Faculty of Health Sciences, University of Maribor, 2000, Maribor, Slovenia
| | - Aziz Sheikh
- Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland, EH8 9AG, UK.,Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital/Harvard Medical School, Boston, MA, 02115, USA
| | - Gregor Stiglic
- Faculty of Health Sciences, University of Maribor, 2000, Maribor, Slovenia.,Faculty of Electrical Engineering and Computer Science, University of Maribor, 2000, Maribor, Slovenia
| |
Collapse
|