1
|
Sahid MA, Babar MUH, Uddin MP. Predictive modeling of multi-class diabetes mellitus using machine learning and filtering iraqi diabetes data dynamics. PLoS One 2024; 19:e0300785. [PMID: 38753669 PMCID: PMC11098411 DOI: 10.1371/journal.pone.0300785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 03/05/2024] [Indexed: 05/18/2024] Open
Abstract
Diabetes is a persistent metabolic disorder linked to elevated levels of blood glucose, commonly referred to as blood sugar. This condition can have detrimental effects on the heart, blood vessels, eyes, kidneys, and nerves as time passes. It is a chronic ailment that arises when the body fails to produce enough insulin or is unable to effectively use the insulin it produces. When diabetes is not properly managed, it often leads to hyperglycemia, a condition characterized by elevated blood sugar levels or impaired glucose tolerance. This can result in significant harm to various body systems, including the nerves and blood vessels. In this paper, we propose a multiclass diabetes mellitus detection and classification approach using an extremely imbalanced Laboratory of Medical City Hospital data dynamics. We also formulate a new dataset that is moderately imbalanced based on the Laboratory of Medical City Hospital data dynamics. To correctly identify the multiclass diabetes mellitus, we employ three machine learning classifiers namely support vector machine, logistic regression, and k-nearest neighbor. We also focus on dimensionality reduction (feature selection-filter, wrapper, and embedded method) to prune the unnecessary features and to scale up the classification performance. To optimize the classification performance of classifiers, we tune the model by hyperparameter optimization with 10-fold grid search cross-validation. In the case of the original extremely imbalanced dataset with 70:30 partition and support vector machine classifier, we achieved maximum accuracy of 0.964, precision of 0.968, recall of 0.964, F1-score of 0.962, Cohen kappa of 0.835, and AUC of 0.99 by using top 4 feature according to filter method. By using the top 9 features according to wrapper-based sequential feature selection, the k-nearest neighbor provides an accuracy of 0.935 and 1.0 for the other performance metrics. For our created moderately imbalanced dataset with an 80:20 partition, the SVM classifier achieves a maximum accuracy of 0.938, and 1.0 for other performance metrics. For the multiclass diabetes mellitus detection and classification, our experiments outperformed conducted research based on the Laboratory of Medical City Hospital data dynamics.
Collapse
Affiliation(s)
- Md Abdus Sahid
- Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Mozaddid Ul Hoque Babar
- Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Md Palash Uddin
- Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| |
Collapse
|
2
|
Mizuno S, Wagata M, Nagaie S, Ishikuro M, Obara T, Tamiya G, Kuriyama S, Tanaka H, Yaegashi N, Yamamoto M, Sugawara J, Ogishima S. Development of phenotyping algorithms for hypertensive disorders of pregnancy (HDP) and their application in more than 22,000 pregnant women. Sci Rep 2024; 14:6292. [PMID: 38491024 PMCID: PMC10943000 DOI: 10.1038/s41598-024-55914-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 02/28/2024] [Indexed: 03/18/2024] Open
Abstract
Recently, many phenotyping algorithms for high-throughput cohort identification have been developed. Prospective genome cohort studies are critical resources for precision medicine, but there are many hurdles in the precise cohort identification. Consequently, it is important to develop phenotyping algorithms for cohort data collection. Hypertensive disorders of pregnancy (HDP) is a leading cause of maternal morbidity and mortality. In this study, we developed, applied, and validated rule-based phenotyping algorithms of HDP. Two phenotyping algorithms, algorithms 1 and 2, were developed according to American and Japanese guidelines, and applied into 22,452 pregnant women in the Birth and Three-Generation Cohort Study of the Tohoku Medical Megabank project. To precise cohort identification, we analyzed both structured data (e.g., laboratory and physiological tests) and unstructured clinical notes. The identified subtypes of HDP were validated against reference standards. Algorithms 1 and 2 identified 7.93% and 8.08% of the subjects as having HDP, respectively, along with their HDP subtypes. Our algorithms were high performing with high positive predictive values (0.96 and 0.90 for algorithms 1 and 2, respectively). Overcoming the hurdle of precise cohort identification from large-scale cohort data collection, we achieved both developed and implemented phenotyping algorithms, and precisely identified HDP patients and their subtypes from large-scale cohort data collection.
Collapse
Affiliation(s)
- Satoshi Mizuno
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
| | - Maiko Wagata
- Department of Feto-Maternal Medical Science, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Satoshi Nagaie
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan
| | - Mami Ishikuro
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Taku Obara
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Gen Tamiya
- Department of Statistical Genetics and Genomics, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Shinichi Kuriyama
- Department of Molecular Epidemiology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | | | - Nobuo Yaegashi
- Department of Gynecology and Obstetrics, Tohoku University Graduate School of Medicine, Tohoku University, Miyagi, Japan
| | - Masayuki Yamamoto
- Department of Biochemistry and Molecular Biology, Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Japan
| | - Junichi Sugawara
- Department of Gynecology and Obstetrics, Tohoku University Graduate School of Medicine, Tohoku University, Miyagi, Japan
- Suzuki Memorial Hospital, 3-5-5, Satonomori, Iwanumashi, Miyagi, Japan
| | - Soichi Ogishima
- Department of Informatics for Genomic Medicine, Tohoku Medical Megabank Organization, Tohoku University, 2-1, Seiryo-Machi, Aoba-Ku, Sendai, Miyagi, 980-8575, Japan.
- Advanced Research Center for Innovations in Next-Generation Medicine, Tohoku University, Miyagi, Japan.
| |
Collapse
|
3
|
Karmand H, Andishgar A, Tabrizi R, Sadeghi A, Pezeshki B, Ravankhah M, Taherifard E, Ahmadizar F. Machine-learning algorithms in screening for type 2 diabetes mellitus: Data from Fasa Adults Cohort Study. Endocrinol Diabetes Metab 2024; 7:e00472. [PMID: 38411386 PMCID: PMC10897867 DOI: 10.1002/edm2.472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 01/10/2024] [Accepted: 01/30/2024] [Indexed: 02/28/2024] Open
Abstract
INTRODUCTION The application of machine learning (ML) is increasingly growing in biomedical sciences. This study aimed to evaluate factors associated with type 2 diabetes mellitus (T2DM) and compare the performance of ML methods in identifying individuals with the disease in an Iranian setting. METHODS Using the baseline data from Fasa Adult Cohort Study (FACS) and in a sex-stratified manner, we studied factors associated with T2DM by applying seven different ML methods including Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbours (KNN), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGB) and Bagging classifier (BAG). We further compared the performance of these methods; for each algorithm, accuracy, precision, sensitivity, specificity, F1 score, and Area Under Curve (AUC) were calculated. RESULTS 10,112 participants were recruited between 2014 and 2016, of whom 1246 had T2DM at baseline. 4566 (45%) participants were males, aged between 35 and 70 years. For males, age, sugar consumption, and history of hospitalization were the most weighted variables regarding their importance in screening for T2DM using the GBM model, respectively; these variables were sugar consumption, urine blood, and age for females. GBM outperformed other models for both males and females with AUC of 0.75 (0.69-0.82) and 0.76 (0.71-0.80), and F1 score of 0.33 (0.27-0.39) and 0.42 (0.38-0.46), respectively. GBM also showed a sensitivity of 0.24 (0.19-0.29) and a specificity of 0.98 (0.96-1.0) in males and a sensitivity of 0.38 (0.34-0.42) and specificity of 0.92 (0.89-0.95) in females. Notably, close performance characteristics were detected among other ML models. CONCLUSIONS GBM model might achieve better performance in screening for T2DM in a south Iranian population.
Collapse
Affiliation(s)
- Hanieh Karmand
- Student Research Committee, School of Medicine, Fasa University of Medical Sciences, Fasa, Iran
| | - Aref Andishgar
- USERN Office, Fasa University of Medical Sciences, Fasa, Iran
| | - Reza Tabrizi
- Noncommunicable Diseases Research Center, Fasa University of Medical Science, Fasa, Iran
| | - Alireza Sadeghi
- Student Research Committee, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
- Health Policy Research Center, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Babak Pezeshki
- Clinical Research Development Unit, Valiasr Hospital, Fasa University of Medical Sciences, Fasa, Iran
| | - Mahdi Ravankhah
- Student Research Committee, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Erfan Taherifard
- Student Research Committee, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
- Health Policy Research Center, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Fariba Ahmadizar
- Data Science and Biostatistics Department, Julius Global Health, Utrecht, The Netherlands
| |
Collapse
|
4
|
Gao Y, Sun F. Batch normalization followed by merging is powerful for phenotype prediction integrating multiple heterogeneous studies. PLoS Comput Biol 2023; 19:e1010608. [PMID: 37844077 PMCID: PMC10602384 DOI: 10.1371/journal.pcbi.1010608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/26/2023] [Accepted: 09/30/2023] [Indexed: 10/18/2023] Open
Abstract
Heterogeneity in different genomic studies compromises the performance of machine learning models in cross-study phenotype predictions. Overcoming heterogeneity when incorporating different studies in terms of phenotype prediction is a challenging and critical step for developing machine learning algorithms with reproducible prediction performance on independent datasets. We investigated the best approaches to integrate different studies of the same type of omics data under a variety of different heterogeneities. We developed a comprehensive workflow to simulate a variety of different types of heterogeneity and evaluate the performances of different integration methods together with batch normalization by using ComBat. We also demonstrated the results through realistic applications on six colorectal cancer (CRC) metagenomic studies and six tuberculosis (TB) gene expression studies, respectively. We showed that heterogeneity in different genomic studies can markedly negatively impact the machine learning classifier's reproducibility. ComBat normalization improved the prediction performance of machine learning classifier when heterogeneous populations are present, and could successfully remove batch effects within the same population. We also showed that the machine learning classifier's prediction accuracy can be markedly decreased as the underlying disease model became more different in training and test populations. Comparing different merging and integration methods, we found that merging and integration methods can outperform each other in different scenarios. In the realistic applications, we observed that the prediction accuracy improved when applying ComBat normalization with merging or integration methods in both CRC and TB studies. We illustrated that batch normalization is essential for mitigating both population differences of different studies and batch effects. We also showed that both merging strategy and integration methods can achieve good performances when combined with batch normalization. In addition, we explored the potential of boosting phenotype prediction performance by rank aggregation methods and showed that rank aggregation methods had similar performance as other ensemble learning approaches.
Collapse
Affiliation(s)
- Yilin Gao
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, United States of America
| |
Collapse
|
5
|
Wu Y, Min H, Li M, Shi Y, Ma A, Han Y, Gan Y, Guo X, Sun X. Effect of Artificial Intelligence-based Health Education Accurately Linking System (AI-HEALS) for Type 2 diabetes self-management: protocol for a mixed-methods study. BMC Public Health 2023; 23:1325. [PMID: 37434126 DOI: 10.1186/s12889-023-16066-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/06/2023] [Indexed: 07/13/2023] Open
Abstract
BACKGROUND Patients with type 2 diabetes (T2DM) have an increasing need for personalized and Precise management as medical technology advances. Artificial intelligence (AI) technologies on mobile devices are being developed gradually in a variety of healthcare fields. As an AI field, knowledge graph (KG) is being developed to extract and store structured knowledge from massive data sets. It has great prospects for T2DM medical information retrieval, clinical decision-making, and individual intelligent question and answering (QA), but has yet to be thoroughly researched in T2DM intervention. Therefore, we designed an artificial intelligence-based health education accurately linking system (AI-HEALS) to evaluate if the AI-HEALS-based intervention could help patients with T2DM improve their self-management abilities and blood glucose control in primary healthcare. METHODS This is a nested mixed-method study that includes a community-based cluster-randomized control trial and personal in-depth interviews. Individuals with T2DM between the ages of 18 and 75 will be recruited from 40-45 community health centers in Beijing, China. Participants will either receive standard diabetes primary care (SDPC) (control, 3 months) or SDPC plus AI-HEALS online health education program (intervention, 3 months). The AI-HEALS runs in the WeChat service platform, which includes a KBQA, a system of physiological indicators and lifestyle recording and monitoring, medication and blood glucose monitoring reminders, and automated, personalized message sending. Data on sociodemography, medical examination, blood glucose, and self-management behavior will be collected at baseline, as well as 1,3,6,12, and 18 months later. The primary outcome is to reduce HbA1c levels. Secondary outcomes include changes in self-management behavior, social cognition, psychology, T2DM skills, and health literacy. Furthermore, the cost-effectiveness of the AI-HEALS-based intervention will be evaluated. DISCUSSION KBQA system is an innovative and cost-effective technology for health education and promotion for T2DM patients, but it is not yet widely used in the T2DM interventions. This trial will provide evidence on the efficacy of AI and mHealth-based personalized interventions in primary care for improving T2DM outcomes and self-management behaviors. TRIAL REGISTRATION Biomedical Ethics Committee of Peking University: IRB00001052-22,058, 2022/06/06; Clinical Trials: ChiCTR2300068952, 02/03/2023.
Collapse
Affiliation(s)
- Yibo Wu
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China
| | - Hewei Min
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China
| | - Mingzi Li
- School of Nursing, Peking University, Beijing, China
| | - Yuhui Shi
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China
| | - Aijuan Ma
- Beijing Center for Disease Control and Prevention, Beijing, China
| | - Yumei Han
- Beijing Medical Examination Center, Beijing, China
| | - Yadi Gan
- Daxing District Center for Disease Control and Prevention of Beijing, Beijing, China
| | - Xiaohui Guo
- Peking University First Hospital, Beijing, China
| | - Xinying Sun
- Department of Social Medicine and Health Education, School of Public Health, Peking University, Beijing, China.
| |
Collapse
|
6
|
Westhues CC, Mahone GS, da Silva S, Thorwarth P, Schmidt M, Richter JC, Simianer H, Beissinger TM. Prediction of Maize Phenotypic Traits With Genomic and Environmental Predictors Using Gradient Boosting Frameworks. FRONTIERS IN PLANT SCIENCE 2021; 12:699589. [PMID: 34880880 PMCID: PMC8647909 DOI: 10.3389/fpls.2021.699589] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 10/15/2021] [Indexed: 05/26/2023]
Abstract
The development of crop varieties with stable performance in future environmental conditions represents a critical challenge in the context of climate change. Environmental data collected at the field level, such as soil and climatic information, can be relevant to improve predictive ability in genomic prediction models by describing more precisely genotype-by-environment interactions, which represent a key component of the phenotypic response for complex crop agronomic traits. Modern predictive modeling approaches can efficiently handle various data types and are able to capture complex nonlinear relationships in large datasets. In particular, machine learning techniques have gained substantial interest in recent years. Here we examined the predictive ability of machine learning-based models for two phenotypic traits in maize using data collected by the Maize Genomes to Fields (G2F) Initiative. The data we analyzed consisted of multi-environment trials (METs) dispersed across the United States and Canada from 2014 to 2017. An assortment of soil- and weather-related variables was derived and used in prediction models alongside genotypic data. Linear random effects models were compared to a linear regularized regression method (elastic net) and to two nonlinear gradient boosting methods based on decision tree algorithms (XGBoost, LightGBM). These models were evaluated under four prediction problems: (1) tested and new genotypes in a new year; (2) only unobserved genotypes in a new year; (3) tested and new genotypes in a new site; (4) only unobserved genotypes in a new site. Accuracy in forecasting grain yield performance of new genotypes in a new year was improved by up to 20% over the baseline model by including environmental predictors with gradient boosting methods. For plant height, an enhancement of predictive ability could neither be observed by using machine learning-based methods nor by using detailed environmental information. An investigation of key environmental factors using gradient boosting frameworks also revealed that temperature at flowering stage, frequency and amount of water received during the vegetative and grain filling stage, and soil organic matter content appeared as important predictors for grain yield in our panel of environments.
Collapse
Affiliation(s)
- Cathy C. Westhues
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
| | | | - Sofia da Silva
- Kleinwanzlebener Saatzucht (KWS) SAAT SE, Einbeck, Germany
| | | | - Malthe Schmidt
- Kleinwanzlebener Saatzucht (KWS) SAAT SE, Einbeck, Germany
| | | | - Henner Simianer
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
- Animal Breeding and Genetics Group, Department of Animal Sciences, University of Goettingen, Goettingen, Germany
| | - Timothy M. Beissinger
- Division of Plant Breeding Methodology, Department of Crop Sciences, University of Goettingen, Goettingen, Germany
- Center for Integrated Breeding Research, University of Goettingen, Goettingen, Germany
| |
Collapse
|
7
|
Najafi B, Mishra R. Harnessing Digital Health Technologies to Remotely Manage Diabetic Foot Syndrome: A Narrative Review. ACTA ACUST UNITED AC 2021; 57:medicina57040377. [PMID: 33919683 PMCID: PMC8069817 DOI: 10.3390/medicina57040377] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/05/2021] [Accepted: 04/07/2021] [Indexed: 12/15/2022]
Abstract
About 422 million people worldwide have diabetes and approximately one-third of them have a major risk factor for diabetic foot ulcers, including poor sensation in their feet from peripheral neuropathy and/or poor perfusion to their feet from peripheral artery disease. The current healthcare ecosystem, which is centered on the treatment of established foot disease, often fails to adequately control key reversible risk factors to prevent diabetic foot ulcers leading to unacceptable high foot disease amputation rate, 40% recurrence of ulcers rate in the first year, and high hospital admissions. Thus, the latest diabetic foot ulcer guidelines emphasize that a paradigm shift in research priority from siloed hospital treatments to innovative integrated community prevention is now critical to address the high diabetic foot ulcer burden. The widespread uptake and acceptance of wearable and digital health technologies provide a means to timely monitor major risk factors associated with diabetic foot ulcer, empower patients in self-care, and effectively deliver the remote monitoring and multi-disciplinary prevention needed for those at-risk people and address the health care access disadvantage that people living in remote areas. This narrative review paper summarizes some of the latest innovations in three specific areas, including technologies supporting triaging high-risk patients, technologies supporting care in place, and technologies empowering self-care. While many of these technologies are still in infancy, we anticipate that in response to the Coronavirus Disease 2019 pandemic and current unmet needs to decentralize care for people with foot disease, we will see a new wave of innovations in the area of digital health, smart wearables, telehealth technologies, and “hospital-at-home” care delivery model. These technologies will be quickly adopted at scale to improve remote management of diabetic foot ulcers, smartly triaging those who need to be seen in outpatient or inpatient clinics, and supporting acute or subacute care at home.
Collapse
|
8
|
Okui T, Nojiri C, Kimura S, Abe K, Maeno S, Minami M, Maeda Y, Tajima N, Kawamura T, Nakashima N. Performance evaluation of case definitions of type 1 diabetes for health insurance claims data in Japan. BMC Med Inform Decis Mak 2021; 21:52. [PMID: 33573645 PMCID: PMC7879626 DOI: 10.1186/s12911-021-01422-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 01/25/2021] [Indexed: 12/18/2022] Open
Abstract
Background No case definition of Type 1 diabetes (T1D) for the claims data has been proposed in Japan yet. This study aimed to evaluate the performance of candidate case definitions for T1D using Electronic health care records (EHR) and claims data in a University Hospital in Japan. Methods The EHR and claims data for all the visiting patients in a University Hospital were used. As the candidate case definitions for claims data, we constructed 11 definitions by combinations of International Statistical Classification of Diseases and Related Health Problems, Tenth Revision. (ICD 10) code of T1D, the claims code of insulin needles for T1D patients, basal insulin, and syringe pump for continuous subcutaneous insulin infusion (CSII). We constructed a predictive model for T1D patients using disease names, medical practices, and medications as explanatory variables. The predictive model was applied to patients of test group (validation data), and performances of candidate case definitions were evaluated. Results As a result of performance evaluation, the sensitivity of the confirmed disease name of T1D was 32.9 (95% CI: 28.4, 37.2), and positive predictive value (PPV) was 33.3 (95% CI: 38.0, 38.4). By using the case definition of both the confirmed diagnosis of T1D and either of the claims code of the two insulin treatment methods (i.e., syringe pump for CSII and insulin needles), PPV improved to 90.2 (95% CI: 85.2, 94.4). Conclusions We have established a case definition with high PPV, and the case definition can be used for precisely detecting T1D patients from claims data in Japan.
Collapse
Affiliation(s)
- Tasuku Okui
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan.
| | - Chinatsu Nojiri
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan
| | - Shinichiro Kimura
- Department of Molecular Medicine and Metabolism, Research Institute of Environmental Medicine, Nagoya University, Nagoya, Japan
| | - Kentaro Abe
- National Hospital Organization Kokura Medical Center, Fukuoka, Japan
| | | | | | | | - Naoko Tajima
- Jikei University School of Medicine, Tokyo, Japan
| | | | - Naoki Nakashima
- Medical Information Center, Kyushu University Hospital, Maidashi 3-1-1 Higashi-ku, Fukuoka City, Fukuoka Prefecture, 812-8582, Japan
| |
Collapse
|
9
|
Wagholikar KB, Estiri H, Murphy M, Murphy SN. Polar labeling: silver standard algorithm for training disease classifiers. Bioinformatics 2020; 36:3200-3206. [PMID: 32049335 PMCID: PMC7214041 DOI: 10.1093/bioinformatics/btaa088] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 01/30/2020] [Accepted: 02/04/2020] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Expert-labeled data are essential to train phenotyping algorithms for cohort identification. However expert labeling is time and labor intensive, and the costs remain prohibitive for scaling phenotyping to wider use-cases. RESULTS We present an approach referred to as polar labeling (PL), to create silver standard for training machine learning (ML) for disease classification. We test the hypothesis that ML models trained on the silver standard created by applying PL on unlabeled patient records, are comparable in performance to the ML models trained on gold standard, created by clinical experts through manual review of patient records. We perform experimental validation using health records of 38 023 patients spanning six diseases. Our results demonstrate the superior performance of the proposed approach. AVAILABILITY AND IMPLEMENTATION We provide a Python implementation of the algorithm and the Python code developed for this study on Github. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Hossein Estiri
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02114, USA
| | | | - Shawn N Murphy
- Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02114, USA
| |
Collapse
|
10
|
Kuo KM, Talley P, Kao Y, Huang CH. A multi-class classification model for supporting the diagnosis of type II diabetes mellitus. PeerJ 2020; 8:e9920. [PMID: 32974105 PMCID: PMC7487151 DOI: 10.7717/peerj.9920] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 08/20/2020] [Indexed: 12/21/2022] Open
Abstract
Background Numerous studies have utilized machine-learning techniques to predict the early onset of type 2 diabetes mellitus. However, fewer studies have been conducted to predict an appropriate diagnosis code for the type 2 diabetes mellitus condition. Further, ensemble techniques such as bagging and boosting have likewise been utilized to an even lesser extent. The present study aims to identify appropriate diagnosis codes for type 2 diabetes mellitus patients by means of building a multi-class prediction model which is both parsimonious and possessing minimum features. In addition, the importance of features for predicting diagnose code is provided. Methods This study included 149 patients who have contracted type 2 diabetes mellitus. The sample was collected from a large hospital in Taiwan from November, 2017 to May, 2018. Machine learning algorithms including instance-based, decision trees, deep neural network, and ensemble algorithms were all used to build the predictive models utilized in this study. Average accuracy, area under receiver operating characteristic curve, Matthew correlation coefficient, macro-precision, recall, weighted average of precision and recall, and model process time were subsequently used to assess the performance of the built models. Information gain and gain ratio were used in order to demonstrate feature importance. Results The results showed that most algorithms, except for deep neural network, performed well in terms of all performance indices regardless of either the training or testing dataset that were used. Ten features and their importance to determine the diagnosis code of type 2 diabetes mellitus were identified. Our proposed predictive model can be further developed into a clinical diagnosis support system or integrated into existing healthcare information systems. Both methods of application can effectively support physicians whenever they are diagnosing type 2 diabetes mellitus patients in order to foster better patient-care planning.
Collapse
Affiliation(s)
- Kuang-Ming Kuo
- Department of Healthcare Administration, I-Shou University, Kaohsiung City, Taiwan, Republic of China
| | - Paul Talley
- Department of Applied English, I-Shou University, Kaohsiung City, Taiwan, Republic of China
| | - YuHsi Kao
- Department of Endocrinology, E-Da Hospital, Kaohsiung City, Taiwan, Republic of China
| | - Chi Hsien Huang
- Department of Family Medicine, E-Da Hospital, I-Shou University, Kaohsiung City, Taiwan, Republic of China.,Department of Community Healthcare and Geriatrics, Nagoya University Graduate School of Medicine, Nagoya, Japan
| |
Collapse
|
11
|
Musacchio N, Giancaterini A, Guaita G, Ozzello A, Pellegrini MA, Ponzani P, Russo GT, Zilich R, de Micheli A. Artificial Intelligence and Big Data in Diabetes Care: A Position Statement of the Italian Association of Medical Diabetologists. J Med Internet Res 2020; 22:e16922. [PMID: 32568088 PMCID: PMC7338925 DOI: 10.2196/16922] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/09/2020] [Accepted: 04/12/2020] [Indexed: 12/24/2022] Open
Abstract
Since the last decade, most of our daily activities have become digital. Digital health takes into account the ever-increasing synergy between advanced medical technologies, innovation, and digital communication. Thanks to machine learning, we are not limited anymore to a descriptive analysis of the data, as we can obtain greater value by identifying and predicting patterns resulting from inductive reasoning. Machine learning software programs that disclose the reasoning behind a prediction allow for “what-if” models by which it is possible to understand if and how, by changing certain factors, one may improve the outcomes, thereby identifying the optimal behavior. Currently, diabetes care is facing several challenges: the decreasing number of diabetologists, the increasing number of patients, the reduced time allowed for medical visits, the growing complexity of the disease both from the standpoints of clinical and patient care, the difficulty of achieving the relevant clinical targets, the growing burden of disease management for both the health care professional and the patient, and the health care accessibility and sustainability. In this context, new digital technologies and the use of artificial intelligence are certainly a great opportunity. Herein, we report the results of a careful analysis of the current literature and represent the vision of the Italian Association of Medical Diabetologists (AMD) on this controversial topic that, if well used, may be the key for a great scientific innovation. AMD believes that the use of artificial intelligence will enable the conversion of data (descriptive) into knowledge of the factors that “affect” the behavior and correlations (predictive), thereby identifying the key aspects that may establish an improvement of the expected results (prescriptive). Artificial intelligence can therefore become a tool of great technical support to help diabetologists become fully responsible of the individual patient, thereby assuring customized and precise medicine. This, in turn, will allow for comprehensive therapies to be built in accordance with the evidence criteria that should always be the ground for any therapeutic choice.
Collapse
Affiliation(s)
| | - Annalisa Giancaterini
- Diabetology Service, Muggiò Polyambulatory, Azienda Socio Sanitaria Territoriale, Monza, Italy
| | - Giacomo Guaita
- Diabetology, Endocrinology and Metabolic Diseases Service, Azienda Tutela Salute Sardegna-Azienda Socio Sanitaria Locale, Carbonia, Italy
| | - Alessandro Ozzello
- Departmental Structure of Endocrine Diseases and Diabetology, Azienda Sanitaria Locale TO3, Pinerolo, Italy
| | - Maria A Pellegrini
- Italian Association of Diabetologists, Rome, Italy.,New Coram Limited Liability Company, Udine, Italy
| | - Paola Ponzani
- Operative Unit of Diabetology, La Colletta Hospital, Azienda Sanitaria Locale 3, Genova, Italy
| | - Giuseppina T Russo
- Department of Clinical and Experimental Medicine, University of Messina, Messina, Italy
| | | | - Alberto de Micheli
- Associazione dei Cavalieri Italiani del Sovrano Militare Ordine di Malta, Genova, Italy
| |
Collapse
|
12
|
Abhari S, Niakan Kalhori SR, Ebrahimi M, Hasannejadasl H, Garavand A. Artificial Intelligence Applications in Type 2 Diabetes Mellitus Care: Focus on Machine Learning Methods. Healthc Inform Res 2019; 25:248-261. [PMID: 31777668 PMCID: PMC6859270 DOI: 10.4258/hir.2019.25.4.248] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 10/06/2019] [Accepted: 10/09/2019] [Indexed: 12/18/2022] Open
Abstract
Objectives The incidence of type 2 diabetes mellitus has increased significantly in recent years. With the development of artificial intelligence applications in healthcare, they are used for diagnosis, therapeutic decision making, and outcome prediction, especially in type 2 diabetes mellitus. This study aimed to identify the artificial intelligence (AI) applications for type 2 diabetes mellitus care. Methods This is a review conducted in 2018. We searched the PubMed, Web of Science, and Embase scientific databases, based on a combination of related mesh terms. The article selection process was based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Finally, 31 articles were selected after inclusion and exclusion criteria were applied. Data gathering was done by using a data extraction form. Data were summarized and reported based on the study objectives. Results The main applications of AI for type 2 diabetes mellitus care were screening and diagnosis in different stages. Among all of the reviewed AI methods, machine learning methods with 71% (n = 22) were the most commonly applied techniques. Many applications were in multi method forms (23%). Among the machine learning algorithms applications, support vector machine (21%) and naive Bayesian (19%) were the most commonly used methods. The most important variables that were used in the selected studies were body mass index, fasting blood sugar, blood pressure, HbA1c, triglycerides, low-density lipoprotein, high-density lipoprotein, and demographic variables. Conclusions It is recommended to select optimal algorithms by testing various techniques. Support vector machine and naive Bayesian might achieve better performance than other applications due to the type of variables and targets in diabetes-related outcomes classification.
Collapse
Affiliation(s)
- Shahabeddin Abhari
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Sharareh R Niakan Kalhori
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Mehdi Ebrahimi
- Department of Internal Medicine, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.,Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hajar Hasannejadasl
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| | - Ali Garavand
- Department of Health Information Management and Technology, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
13
|
Abstract
Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort.
Collapse
|
14
|
Makino M, Yoshimoto R, Ono M, Itoko T, Katsuki T, Koseki A, Kudo M, Haida K, Kuroda J, Yanagiya R, Saitoh E, Hoshinaga K, Yuzawa Y, Suzuki A. Artificial intelligence predicts the progression of diabetic kidney disease using big data machine learning. Sci Rep 2019; 9:11862. [PMID: 31413285 PMCID: PMC6694113 DOI: 10.1038/s41598-019-48263-5] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 08/01/2019] [Indexed: 12/15/2022] Open
Abstract
Artificial intelligence (AI) is expected to support clinical judgement in medicine. We constructed a new predictive model for diabetic kidney diseases (DKD) using AI, processing natural language and longitudinal data with big data machine learning, based on the electronic medical records (EMR) of 64,059 diabetes patients. AI extracted raw features from the previous 6 months as the reference period and selected 24 factors to find time series patterns relating to 6-month DKD aggravation, using a convolutional autoencoder. AI constructed the predictive model with 3,073 features, including time series data using logistic regression analysis. AI could predict DKD aggravation with 71% accuracy. Furthermore, the group with DKD aggravation had a significantly higher incidence of hemodialysis than the non-aggravation group, over 10 years (N = 2,900). The new predictive model by AI could detect progression of DKD and may contribute to more effective and accurate intervention to reduce hemodialysis.
Collapse
Affiliation(s)
- Masaki Makino
- Department of Endocrinology and Metabolism, Fujita Health University, Toyoake, Aichi, Japan
| | - Ryo Yoshimoto
- Department of Endocrinology and Metabolism, Fujita Health University, Toyoake, Aichi, Japan
| | | | | | | | | | | | - Kyoichi Haida
- Business Process Planning Department, The Dai-ichi Life Insurance Company, Limited, Tokyo, Japan
| | - Jun Kuroda
- IT Business Process Planning Department, The Dai-ichi Life Insurance Company, Limited, Tokyo, Japan
| | - Ryosuke Yanagiya
- Division of Medical Information Systems, Fujita Health University, Toyoake, Aichi, Japan
| | - Eiichi Saitoh
- Department of Rehabilitation Medicine, Fujita Health University, Toyoake, Aichi, Japan
| | | | - Yukio Yuzawa
- Department of Nephrology, Fujita Health University, Toyoake, Aichi, Japan
| | - Atsushi Suzuki
- Department of Endocrinology and Metabolism, Fujita Health University, Toyoake, Aichi, Japan.
| |
Collapse
|
15
|
Artificial Intelligence Transforms the Future of Health Care. Am J Med 2019; 132:795-801. [PMID: 30710543 PMCID: PMC6669105 DOI: 10.1016/j.amjmed.2019.01.017] [Citation(s) in RCA: 178] [Impact Index Per Article: 35.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Revised: 01/16/2019] [Accepted: 01/17/2019] [Indexed: 02/06/2023]
Abstract
Life sciences researchers using artificial intelligence (AI) are under pressure to innovate faster than ever. Large, multilevel, and integrated data sets offer the promise of unlocking novel insights and accelerating breakthroughs. Although more data are available than ever, only a fraction is being curated, integrated, understood, and analyzed. AI focuses on how computers learn from data and mimic human thought processes. AI increases learning capacity and provides decision support system at scales that are transforming the future of health care. This article is a review of applications for machine learning in health care with a focus on clinical, translational, and public health applications with an overview of the important role of privacy, data sharing, and genetic information.
Collapse
|
16
|
Wagholikar KB, Fischer CM, Goodson AP, Herrick CD, Maclean TE, Smith KV, Fera L, Gaziano TA, Dunning JR, Bosque-Hamilton J, Matta L, Toscano E, Richter B, Ainsworth L, Oates MF, Aronson S, MacRae CA, Scirica BM, Desai AS, Murphy SN. Phenotyping to Facilitate Accrual for a Cardiovascular Intervention. J Clin Med Res 2019; 11:458-463. [PMID: 31143314 PMCID: PMC6522233 DOI: 10.14740/jocmr3830] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 04/30/2019] [Indexed: 01/29/2023] Open
Abstract
Background The conventional approach for clinical studies is to identify a cohort of potentially eligible patients and then screen for enrollment. In an effort to reduce the cost and manual effort involved in the screening process, several studies have leveraged electronic health records (EHR) to refine cohorts to better match the eligibility criteria, which is referred to as phenotyping. We extend this approach to dynamically identify a cohort by repeating phenotyping in alternation with manual screening. Methods Our approach consists of multiple screen cycles. At the start of each cycle, the phenotyping algorithm is used to identify eligible patients from the EHR, creating an ordered list such that patients that are most likely eligible are listed first. This list is then manually screened, and the results are analyzed to improve the phenotyping for the next cycle. We describe the preliminary results and challenges in the implementation of this approach for an intervention study on heart failure. Results A total of 1,022 patients were screened, with 223 (23%) of patients being found eligible for enrollment into the intervention study. The iterative approach improved the phenotyping in each screening cycle. Without an iterative approach, the positive screening rate (PSR) was expected to dip below the 20% measured in the first cycle; however, the cyclical approach increased the PSR to 23%. Conclusions Our study demonstrates that dynamic phenotyping can facilitate recruitment for prospective clinical study. Future directions include improved informatics infrastructure and governance policies to enable real-time updates to research repositories, tooling for EHR annotation, and methodologies to reduce human annotation.
Collapse
Affiliation(s)
- Kavishwar B Wagholikar
- Harvard Medical School, Boston, MA, USA.,Massachusetts General Hospital, Boston, MA, USA
| | | | | | | | | | | | | | | | | | | | - Lina Matta
- Brigham and Women's Hospital, Boston, MA, USA
| | | | | | | | | | | | - Calum A MacRae
- Harvard Medical School, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA
| | - Benjamin M Scirica
- Harvard Medical School, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA
| | - Akshay S Desai
- Harvard Medical School, Boston, MA, USA.,Brigham and Women's Hospital, Boston, MA, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, MA, USA.,Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
17
|
Kagawa R, Shinohara E, Imai T, Kawazoe Y, Ohe K. Bias of Inaccurate Disease Mentions in Electronic Health Record-based Phenotyping. Int J Med Inform 2019; 124:90-96. [PMID: 30784432 DOI: 10.1016/j.ijmedinf.2018.12.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 11/13/2018] [Accepted: 12/12/2018] [Indexed: 01/21/2023]
Abstract
OBJECTIVES Electronic health record (EHR)-based phenotyping is an automated technique for identifying patients diagnosed with a particular disease using EHR data. However, EHR-based phenotyping has difficulties in achieving satisfactorily high performance because clinical notes include disease mentions that ultimately signify something other than the patient's diagnosis (such as differential diagnosis or screening). Our objective is to quantify the influence of such disease mentions on EHR-based phenotyping performance. METHODS Physicians manually reviewed whether the disease mentions indicated the patients' diseases in 487,300 clinical notes of 4,430 patients. Particular focus was placed on disease mentions that did not signify the patient's diagnosis even though they did not have any syntactic modifier or indicator in the same sentences. Patients were then classified according to whether their clinical notes included such disease mentions. RESULTS Among the patients whose clinical notes included disease mentions without any modifier or indicator, the proportion of patients whose disease mentions signified the patients' diagnosis was 78.1% (on average). This value can be interpreted as the bias of disease mentions that did not signify the patient's diagnosis on the precision of EHR-based phenotyping by extracting disease mentions from clinical notes. CONCLUSION This study quantified the bias occurred owing to disease mentions that incorrectly signify a patient's diagnosis in the value of precision of EHR-based phenotyping from four dataset types. The results of this study will help researchers in diverse research environments with different available data types.
Collapse
Affiliation(s)
- Rina Kagawa
- Department of Medical Informatics, Strategic Planning, and Management, University of Tsukuba Hospital, Japan; Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Japan.
| | - Emiko Shinohara
- Department of Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Japan
| | - Takeshi Imai
- Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Japan
| | - Yoshimasa Kawazoe
- Department of Artificial Intelligence in Healthcare, Graduate School of Medicine, The University of Tokyo, Japan
| | - Kazuhiko Ohe
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Japan
| |
Collapse
|
18
|
Kruse C. The New Possibilities from "Big Data" to Overlooked Associations Between Diabetes, Biochemical Parameters, Glucose Control, and Osteoporosis. Curr Osteoporos Rep 2018; 16:320-324. [PMID: 29679305 DOI: 10.1007/s11914-018-0445-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
PURPOSE OF REVIEW To review current practices and technologies within the scope of "Big Data" that can further our understanding of diabetes mellitus and osteoporosis from large volumes of data. "Big Data" techniques involving supervised machine learning, unsupervised machine learning, and deep learning image analysis are presented with examples of current literature. RECENT FINDINGS Supervised machine learning can allow us to better predict diabetes-induced osteoporosis and understand relative predictor importance of diabetes-affected bone tissue. Unsupervised machine learning can allow us to understand patterns in data between diabetic pathophysiology and altered bone metabolism. Image analysis using deep learning can allow us to be less dependent on surrogate predictors and use large volumes of images to classify diabetes-induced osteoporosis and predict future outcomes directly from images. "Big Data" techniques herald new possibilities to understand diabetes-induced osteoporosis and ascertain our current ability to classify, understand, and predict this condition.
Collapse
Affiliation(s)
- Christian Kruse
- Steno Diabetes Center North Jutland, Sdr. Skovvej 15, 9000, Aalborg, Denmark.
- Department of Clinical Medicine, Aalborg University, Sdr. Skovvej 15, 9000, Aalborg, Denmark.
- Department of Endocrinology, Aalborg University Hospital, Hobrovej 19, 9100, Aalborg, Denmark.
| |
Collapse
|