1
|
Sahid MA, Babar MUH, Uddin MP. Predictive modeling of multi-class diabetes mellitus using machine learning and filtering iraqi diabetes data dynamics. PLoS One 2024; 19:e0300785. [PMID: 38753669 PMCID: PMC11098411 DOI: 10.1371/journal.pone.0300785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 03/05/2024] [Indexed: 05/18/2024] Open
Abstract
Diabetes is a persistent metabolic disorder linked to elevated levels of blood glucose, commonly referred to as blood sugar. This condition can have detrimental effects on the heart, blood vessels, eyes, kidneys, and nerves as time passes. It is a chronic ailment that arises when the body fails to produce enough insulin or is unable to effectively use the insulin it produces. When diabetes is not properly managed, it often leads to hyperglycemia, a condition characterized by elevated blood sugar levels or impaired glucose tolerance. This can result in significant harm to various body systems, including the nerves and blood vessels. In this paper, we propose a multiclass diabetes mellitus detection and classification approach using an extremely imbalanced Laboratory of Medical City Hospital data dynamics. We also formulate a new dataset that is moderately imbalanced based on the Laboratory of Medical City Hospital data dynamics. To correctly identify the multiclass diabetes mellitus, we employ three machine learning classifiers namely support vector machine, logistic regression, and k-nearest neighbor. We also focus on dimensionality reduction (feature selection-filter, wrapper, and embedded method) to prune the unnecessary features and to scale up the classification performance. To optimize the classification performance of classifiers, we tune the model by hyperparameter optimization with 10-fold grid search cross-validation. In the case of the original extremely imbalanced dataset with 70:30 partition and support vector machine classifier, we achieved maximum accuracy of 0.964, precision of 0.968, recall of 0.964, F1-score of 0.962, Cohen kappa of 0.835, and AUC of 0.99 by using top 4 feature according to filter method. By using the top 9 features according to wrapper-based sequential feature selection, the k-nearest neighbor provides an accuracy of 0.935 and 1.0 for the other performance metrics. For our created moderately imbalanced dataset with an 80:20 partition, the SVM classifier achieves a maximum accuracy of 0.938, and 1.0 for other performance metrics. For the multiclass diabetes mellitus detection and classification, our experiments outperformed conducted research based on the Laboratory of Medical City Hospital data dynamics.
Collapse
Affiliation(s)
- Md Abdus Sahid
- Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Mozaddid Ul Hoque Babar
- Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Md Palash Uddin
- Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| |
Collapse
|
2
|
Mizuno S, Wagata M, Nagaie S, Ishikuro M, Obara T, Tamiya G, Kuriyama S, Tanaka H, Yaegashi N, Yamamoto M, Sugawara J, Ogishima S. Development of phenotyping algorithms for hypertensive disorders of pregnancy (HDP) and their application in more than 22,000 pregnant women. Sci Rep 2024; 14:6292. [PMID: 38491024 PMCID: PMC10943000 DOI: 10.1038/s41598-024-55914-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 02/28/2024] [Indexed: 03/18/2024] Open
Abstract
Recently, many phenotyping algorithms for high-throughput cohort identification have been developed. Prospective genome cohort studies are critical resources for precision medicine, but there are many hurdles in the precise cohort identification. Consequently, it is important to develop phenotyping algorithms for cohort data collection. Hypertensive disorders of pregnancy (HDP) is a leading cause of maternal morbidity and mortality. In this study, we developed, applied, and validated rule-based phenotyping algorithms of HDP. Two phenotyping algorithms, algorithms 1 and 2, were developed according to American and Japanese guidelines, and applied into 22,452 pregnant women in the Birth and Three-Generation Cohort Study of the Tohoku Medical Megabank project. To precise cohort identification, we analyzed both structured data (e.g., laboratory and physiological tests) and unstructured clinical notes. The identified subtypes of HDP were validated against reference standards. Algorithms 1 and 2 identified 7.93% and 8.08% of the subjects as having HDP, respectively, along with their HDP subtypes. Our algorithms were high performing with high positive predictive values (0.96 and 0.90 for algorithms 1 and 2, respectively). Overcoming the hurdle of precise cohort identification from large-scale cohort data collection, we achieved both developed and implemented phenotyping algorithms, and precisely identified HDP patients and their subtypes from large-scale cohort data collection.
Collapse
Affiliation(s)
- Satoshi Mizuno
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
| | - Maiko Wagata
- Department of Feto-Maternal Medical Science, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Satoshi Nagaie
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
| | - Mami Ishikuro
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Taku Obara
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Gen Tamiya
- Department of Statistical Genetics and Genomics, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Shinichi Kuriyama
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | | | - Nobuo Yaegashi
- Department of Gynecology and Obstetrics, Tohoku University Graduate School of Medicine, Tohoku University, Miyagi, Japan
| | - Masayuki Yamamoto
- Department of Biochemistry and Molecular Biology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Junichi Sugawara
- Department of Gynecology and Obstetrics, Tohoku University Graduate School of Medicine, Tohoku University, Miyagi, Japan
- Suzuki Memorial Hospital, 3-5-5, Satonomori, Iwanumashi, Miyagi, Japan
| | - Soichi Ogishima
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan.
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, Miyagi, Japan.
| |
Collapse
|
3
|
He T, Belouali A, Patricoski J, Lehmann H, Ball R, Anagnostou V, Kreimeyer K, Botsis T. Trends and opportunities in computable clinical phenotyping: A scoping review. J Biomed Inform 2023; 140:104335. [PMID: 36933631 DOI: 10.1016/j.jbi.2023.104335] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/07/2023] [Accepted: 03/09/2023] [Indexed: 03/18/2023]
Abstract
Identifying patient cohorts meeting the criteria of specific phenotypes is essential in biomedicine and particularly timely in precision medicine. Many research groups deliver pipelines that automatically retrieve and analyze data elements from one or more sources to automate this task and deliver high-performing computable phenotypes. We applied a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines to conduct a thorough scoping review on computable clinical phenotyping. Five databases were searched using a query that combined the concepts of automation, clinical context, and phenotyping. Subsequently, four reviewers screened 7960 records (after removing over 4000 duplicates) and selected 139 that satisfied the inclusion criteria. This dataset was analyzed to extract information on target use cases, data-related topics, phenotyping methodologies, evaluation strategies, and portability of developed solutions. Most studies supported patient cohort selection without discussing the application to specific use cases, such as precision medicine. Electronic Health Records were the primary source in 87.1 % (N = 121) of all studies, and International Classification of Diseases codes were heavily used in 55.4 % (N = 77) of all studies, however, only 25.9 % (N = 36) of the records described compliance with a common data model. In terms of the presented methods, traditional Machine Learning (ML) was the dominant method, often combined with natural language processing and other approaches, while external validation and portability of computable phenotypes were pursued in many cases. These findings revealed that defining target use cases precisely, moving away from sole ML strategies, and evaluating the proposed solutions in the real setting are essential opportunities for future work. There is also momentum and an emerging need for computable phenotyping to support clinical and epidemiological research and precision medicine.
Collapse
Affiliation(s)
- Ting He
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| | - Anas Belouali
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Jessica Patricoski
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Harold Lehmann
- Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Robert Ball
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US FDA, Silver Spring, MD, USA
| | - Valsamo Anagnostou
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kory Kreimeyer
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Taxiarchis Botsis
- Department of Oncology, The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD, USA; Biomedical Informatics and Data Science Section, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
4
|
Alhassan Z, Watson M, Budgen D, Alshammari R, Alessa A, Al Moubayed N. Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records. JMIR Med Inform 2021; 9:e25237. [PMID: 34028357 PMCID: PMC8185616 DOI: 10.2196/25237] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 01/05/2021] [Accepted: 04/22/2021] [Indexed: 01/30/2023] Open
Abstract
Background Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems, such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records data for identifying such patients can ultimately help provide better health outcomes. Objective Our study investigated the performance of predictive models to forecast HbA1c elevation levels by employing several machine learning models. We also examined the use of patient electronic health record longitudinal data in the performance of the predictive models. Explainable methods were employed to interpret the decisions made by the black box models. Methods This study employed multiple logistic regression, random forest, support vector machine, and logistic regression models, as well as a deep learning model (multilayer perceptron) to classify patients with normal (<5.7%) and elevated (≥5.7%) levels of HbA1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large data set from Saudi Arabia with 18,844 unique patient records. Results The machine learning models achieved promising results for predicting current HbA1c elevation risk. When coupled with longitudinal data, the machine learning models outperformed the multiple logistic regression model used in the comparative study. The multilayer perceptron model achieved an accuracy of 83.22% for the area under receiver operating characteristic curve when used with historical data. All models showed a close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data. Conclusions This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels (≥5.7% or less). Using patients’ longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies.
Collapse
Affiliation(s)
- Zakhriya Alhassan
- Department of Computer Science, Durham University, Durham, United Kingdom.,College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Matthew Watson
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - David Budgen
- Department of Computer Science, Durham University, Durham, United Kingdom
| | - Riyad Alshammari
- National Center for Artificial Intelligence, Saudi Data and Artificial Intelligence Authority, Riyadh, Saudi Arabia
| | - Ali Alessa
- Department of Information Technology Programs, Institute of Public Administration, Riyadh, Saudi Arabia
| | - Noura Al Moubayed
- Department of Computer Science, Durham University, Durham, United Kingdom
| |
Collapse
|
5
|
Avendaño-Valencia LD, Yderstræde KB, Nadimi ES, Blanes-Vidal V. Video-based eye tracking performance for computer-assisted diagnostic support of diabetic neuropathy. Artif Intell Med 2021; 114:102050. [PMID: 33875161 DOI: 10.1016/j.artmed.2021.102050] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 02/16/2021] [Accepted: 02/21/2021] [Indexed: 10/22/2022]
Abstract
Diabetes is currently one of the major public health threats. The essential components for effective treatment of diabetes include early diagnosis and regular monitoring. However, health-care providers are often short of human resources to closely monitor populations at risk. In this work, a video-based eye-tracking method is proposed as a low-cost alternative for detection of diabetic neuropathy. The method is based on the tracking of the eye-trajectories recorded on videos while the subject follows a target on a screen, forcing saccadic movements. Upon extraction of the eye trajectories, representation of the obtained time-series is made with the help of heteroscedastic ARX (H-ARX) models, which capture the dynamics and latency on the subject's response, while features based on the H-ARX model's predictive ability are subsequently used for classification. The methodology is evaluated on a population constituted by 11 control and 20 insulin-treated diabetic individuals suffering from diverse diabetic complications including neuropathy and retinopathy. Results show significant differences on latency and eye movement precision between the populations of control subjects and diabetics, while simultaneously demonstrating that both groups can be classified with an accuracy of 95%. Although this study is limited by the small sample size, the results align with other findings in the literature and encourage further research.
Collapse
Affiliation(s)
- Luis David Avendaño-Valencia
- Group of Applied AI and Data Science, Maersk-McKinney-Moller Institute, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark.
| | - Knud B Yderstræde
- Steno Diabetes Center and Center for Innovative Medical Technology, Odense University Hospital, Sdr. Boulevard 29, 5000 Odense C, Denmark.
| | - Esmaeil S Nadimi
- Group of Applied AI and Data Science, Maersk-McKinney-Moller Institute, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark.
| | - Victoria Blanes-Vidal
- Group of Applied AI and Data Science, Maersk-McKinney-Moller Institute, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark.
| |
Collapse
|
6
|
Okui T, Nojiri C, Kimura S, Abe K, Maeno S, Minami M, Maeda Y, Tajima N, Kawamura T, Nakashima N. Performance evaluation of case definitions of type 1 diabetes for health insurance claims data in Japan. BMC Med Inform Decis Mak 2021; 21:52. [PMID: 33573645 PMCID: PMC7879626 DOI: 10.1186/s12911-021-01422-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 01/25/2021] [Indexed: 12/18/2022] Open
Abstract
Background No case definition of Type 1 diabetes (T1D) for the claims data has been proposed in Japan yet. This study aimed to evaluate the performance of candidate case definitions for T1D using Electronic health care records (EHR) and claims data in a University Hospital in Japan. Methods The EHR and claims data for all the visiting patients in a University Hospital were used. As the candidate case definitions for claims data, we constructed 11 definitions by combinations of International Statistical Classification of Diseases and Related Health Problems, Tenth Revision. (ICD 10) code of T1D, the claims code of insulin needles for T1D patients, basal insulin, and syringe pump for continuous subcutaneous insulin infusion (CSII). We constructed a predictive model for T1D patients using disease names, medical practices, and medications as explanatory variables. The predictive model was applied to patients of test group (validation data), and performances of candidate case definitions were evaluated. Results As a result of performance evaluation, the sensitivity of the confirmed disease name of T1D was 32.9 (95% CI: 28.4, 37.2), and positive predictive value (PPV) was 33.3 (95% CI: 38.0, 38.4). By using the case definition of both the confirmed diagnosis of T1D and either of the claims code of the two insulin treatment methods (i.e., syringe pump for CSII and insulin needles), PPV improved to 90.2 (95% CI: 85.2, 94.4). Conclusions We have established a case definition with high PPV, and the case definition can be used for precisely detecting T1D patients from claims data in Japan.
Collapse
Affiliation(s)
- Tasuku Okui
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan.
| | - Chinatsu Nojiri
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan
| | - Shinichiro Kimura
- Department of Molecular Medicine and Metabolism, Research Institute of Environmental Medicine, Nagoya University, Nagoya, Japan
| | - Kentaro Abe
- National Hospital Organization Kokura Medical Center, Fukuoka, Japan
| | | | | | | | - Naoko Tajima
- Jikei University School of Medicine, Tokyo, Japan
| | | | - Naoki Nakashima
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan
| |
Collapse
|
7
|
Walters CE, Nitin R, Margulis K, Boorom O, Gustavson DE, Bush CT, Davis LK, Below JE, Cox NJ, Camarata SM, Gordon RL. Automated Phenotyping Tool for Identifying Developmental Language Disorder Cases in Health Systems Data (APT-DLD): A New Research Algorithm for Deployment in Large-Scale Electronic Health Record Systems. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3019-3035. [PMID: 32791019 PMCID: PMC7890229 DOI: 10.1044/2020_jslhr-19-00397] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 04/23/2020] [Accepted: 05/19/2020] [Indexed: 05/13/2023]
Abstract
Purpose Data mining algorithms using electronic health records (EHRs) are useful in large-scale population-wide studies to classify etiology and comorbidities (Casey et al., 2016). Here, we apply this approach to developmental language disorder (DLD), a prevalent communication disorder whose risk factors and epidemiology remain largely undiscovered. Method We first created a reliable system for manually identifying DLD in EHRs based on speech-language pathologist (SLP) diagnostic expertise. We then developed and validated an automated algorithmic procedure, called, Automated Phenotyping Tool for identifying DLD cases in health systems data (APT-DLD), that classifies a DLD status for patients within EHRs on the basis of ICD (International Statistical Classification of Diseases and Related Health Problems) codes. APT-DLD was validated in a discovery sample (N = 973) using expert SLP manual phenotype coding as a gold-standard comparison and then applied and further validated in a replication sample of N = 13,652 EHRs. Results In the discovery sample, the APT-DLD algorithm correctly classified 98% (concordance) of DLD cases in concordance with manually coded records in the training set, indicating that APT-DLD successfully mimics a comprehensive chart review. The output of APT-DLD was also validated in relation to independently conducted SLP clinician coding in a subset of records, with a positive predictive value of 95% of cases correctly classified as DLD. We also applied APT-DLD to the replication sample, where it achieved a positive predictive value of 90% in relation to SLP clinician classification of DLD. Conclusions APT-DLD is a reliable, valid, and scalable tool for identifying DLD cohorts in EHRs. This new method has promising public health implications for future large-scale epidemiological investigations of DLD and may inform EHR data mining algorithms for other communication disorders. Supplemental Material https://doi.org/10.23641/asha.12753578.
Collapse
Affiliation(s)
- Courtney E. Walters
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Neuroscience Program, College of Arts and Science, Vanderbilt University, Nashville, TN
| | - Rachana Nitin
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN
| | - Katherine Margulis
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
- Kennedy Krieger Institute, Baltimore, MD
| | - Olivia Boorom
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Daniel E. Gustavson
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| | - Catherine T. Bush
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Lea K. Davis
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Jennifer E. Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Nancy J. Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Stephen M. Camarata
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Reyna L. Gordon
- Department of Otolaryngology, Vanderbilt University Medical Center, Nashville, TN
- Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| |
Collapse
|
8
|
Kuo KM, Talley P, Kao Y, Huang CH. A multi-class classification model for supporting the diagnosis of type II diabetes mellitus. PeerJ 2020; 8:e9920. [PMID: 32974105 PMCID: PMC7487151 DOI: 10.7717/peerj.9920] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 08/20/2020] [Indexed: 12/21/2022] Open
Abstract
Background Numerous studies have utilized machine-learning techniques to predict the early onset of type 2 diabetes mellitus. However, fewer studies have been conducted to predict an appropriate diagnosis code for the type 2 diabetes mellitus condition. Further, ensemble techniques such as bagging and boosting have likewise been utilized to an even lesser extent. The present study aims to identify appropriate diagnosis codes for type 2 diabetes mellitus patients by means of building a multi-class prediction model which is both parsimonious and possessing minimum features. In addition, the importance of features for predicting diagnose code is provided. Methods This study included 149 patients who have contracted type 2 diabetes mellitus. The sample was collected from a large hospital in Taiwan from November, 2017 to May, 2018. Machine learning algorithms including instance-based, decision trees, deep neural network, and ensemble algorithms were all used to build the predictive models utilized in this study. Average accuracy, area under receiver operating characteristic curve, Matthew correlation coefficient, macro-precision, recall, weighted average of precision and recall, and model process time were subsequently used to assess the performance of the built models. Information gain and gain ratio were used in order to demonstrate feature importance. Results The results showed that most algorithms, except for deep neural network, performed well in terms of all performance indices regardless of either the training or testing dataset that were used. Ten features and their importance to determine the diagnosis code of type 2 diabetes mellitus were identified. Our proposed predictive model can be further developed into a clinical diagnosis support system or integrated into existing healthcare information systems. Both methods of application can effectively support physicians whenever they are diagnosing type 2 diabetes mellitus patients in order to foster better patient-care planning.
Collapse
Affiliation(s)
- Kuang-Ming Kuo
- Department of Healthcare Administration, I-Shou University, Kaohsiung City, Taiwan, Republic of China
| | - Paul Talley
- Department of Applied English, I-Shou University, Kaohsiung City, Taiwan, Republic of China
| | - YuHsi Kao
- Department of Endocrinology, E-Da Hospital, Kaohsiung City, Taiwan, Republic of China
| | - Chi Hsien Huang
- Department of Family Medicine, E-Da Hospital, I-Shou University, Kaohsiung City, Taiwan, Republic of China.,Department of Community Healthcare and Geriatrics, Nagoya University Graduate School of Medicine, Nagoya, Japan
| |
Collapse
|
9
|
Jadhav AS, Patil PB, Biradar S. Analysis on diagnosing diabetic retinopathy by segmenting blood vessels, optic disc and retinal abnormalities. J Med Eng Technol 2020; 44:299-316. [PMID: 32729345 DOI: 10.1080/03091902.2020.1791986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
The main intention of mass screening programmes for Diabetic Retinopathy (DR) is to detect and diagnose the disorder earlier than it leads to vision loss. Automated analysis of retinal images has the likelihood to improve the efficacy of screening programmes when compared over the manual image analysis. This article plans to develop a framework for the detection of DR from the retinal fundus images using three evaluations based on optic disc, blood vessels and retinal abnormalities. Initially, the pre-processing steps like green channel conversion and Contrast Limited Adaptive Histogram Equalisation is done. Further, the segmentation procedure starts with optic disc segmentation by open-close watershed transform, blood vessel segmentation by grey level thresholding and abnormality segmentation (hard exudates, haemorrhages, Microaneurysm and soft exudates) by top hat transform and Gabor filtering mechanisms. From the three segmented images, the feature like local binary pattern, texture energy measurement, Shanon's and Kapur's entropy are extracted, which is subjected to optimal feature selection process using the new hybrid optimisation algorithm termed as Trial-based Bypass Improved Dragonfly Algorithm (TB - DA). These features are given to hybrid machine learning algorithm with the combination of NN and DBN. As a modification, the same hybrid TB - DA is used to enhance the training of hybrid classifier, which outputs the categorisation as normal, mild, moderate or severe images based on three components.
Collapse
Affiliation(s)
- Ambaji S Jadhav
- Department of Electrical and Electronics, B.L.D.E.A's V.P. Dr. P.G. Halakatti College of Engineering & Technology (Affiliated to Visvesvaraya Technological University, Belagavi), Vijayapur, India
| | - Pushpa B Patil
- Department of Computer Science & Engineering, B.L.D.E.A's V.P. Dr. P.G. Halakatti College of Engineering & Technology (Affiliated to Visvesvaraya Technological University, Belagavi), Vijayapur, India
| | - Sunil Biradar
- Department of Ophthalmology, Shri B.M. Patil Medical College Hospital and Research Center, Vijayapur, India
| |
Collapse
|
10
|
Robbins T, Lim Choi Keung SN, Sankar S, Randeva H, Arvanitis TN. Diabetes and the direct secondary use of electronic health records: Using routinely collected and stored data to drive research and understanding. Digit Health 2018; 4:2055207618804650. [PMID: 30305917 PMCID: PMC6176528 DOI: 10.1177/2055207618804650] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2018] [Accepted: 09/05/2018] [Indexed: 12/19/2022] Open
Abstract
Introduction Electronic health records provide an unparalleled opportunity for the use of
patient data that is routinely collected and stored, in order to drive
research and develop an epidemiological understanding of disease. Diabetes,
in particular, stands to benefit, being a data-rich, chronic-disease state.
This article aims to provide an understanding of the extent to which the
healthcare sector is using routinely collected and stored data to inform
research and epidemiological understanding of diabetes mellitus. Methods Narrative literature review of articles, published in both the medical- and
engineering-based informatics literature. Results There has been a significant increase in the number of papers published,
which utilise electronic health records as a direct data source for diabetes
research. These articles consider a diverse range of research questions.
Internationally, the secondary use of electronic health records, as a
research tool, is most prominent in the USA. The barriers most commonly
described in research studies include missing values and misclassification,
alongside challenges of establishing the generalisability of results. Discussion Electronic health record research is an important and expanding area of
healthcare research. Much of the research output remains in the form of
conference abstracts and proceedings, rather than journal articles. There is
enormous opportunity within the United Kingdom to develop these research
methodologies, due to national patient identifiers. Such a healthcare
context may enable UK researchers to overcome many of the barriers
encountered elsewhere and thus to truly unlock the potential of electronic
health records.
Collapse
Affiliation(s)
- Tim Robbins
- Institute of Digital Healthcare, WMG, University of Warwick, Coventry, UK.,University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
| | | | - Sailesh Sankar
- University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
| | - Harpal Randeva
- University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
| | | |
Collapse
|