Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Upadhyaya SG, Murphree DH, Ngufor CG, Knight AM, Cronk DJ, Cima RR, Curry TB, Pathak J, Carter RE, Kor DJ. Automated Diabetes Case Identification Using Electronic Health Record Data at a Tertiary Care Facility. Mayo Clin Proc Innov Qual Outcomes 2017;1:100-110. [PMID: 30225406 PMCID: PMC6135013 DOI: 10.1016/j.mayocpiqo.2017.04.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

For:	Upadhyaya SG, Murphree DH, Ngufor CG, Knight AM, Cronk DJ, Cima RR, Curry TB, Pathak J, Carter RE, Kor DJ. Automated Diabetes Case Identification Using Electronic Health Record Data at a Tertiary Care Facility. Mayo Clin Proc Innov Qual Outcomes 2017;1:100-110. [PMID: 30225406 PMCID: PMC6135013 DOI: 10.1016/j.mayocpiqo.2017.04.005] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

Number

Cited by Other Article(s)

McGuire D, Markus H, Yang L, Xu J, Montgomery A, Berg A, Li Q, Carrel L, Liu DJ, Jiang B. Dissecting heritability, environmental risk, and air pollution causal effects using > 50 million individuals in MarketScan. Nat Commun 2024;15:5357. [PMID: 38918381 PMCID: PMC11199552 DOI: 10.1038/s41467-024-49566-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Accepted: 06/10/2024] [Indexed: 06/27/2024] Open

Bazoge A, Morin E, Daille B, Gourraud PA. Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review. JMIR Med Inform 2023;11:e42477. [PMID: 38100200 PMCID: PMC10757232 DOI: 10.2196/42477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 01/16/2023] [Accepted: 09/07/2023] [Indexed: 12/17/2023] Open

Abstract

BACKGROUND

In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible.

OBJECTIVE

The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks.

METHODS

This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English.

RESULTS

We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%).

CONCLUSIONS

CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.

Collapse

Cromer SJ, Chen V, Han C, Marshall W, Emongo S, Greaux E, Majarian T, Florez JC, Mercader J, Udler MS. Algorithmic identification of atypical diabetes in electronic health record (EHR) systems. PLoS One 2022;17:e0278759. [PMID: 36508462 PMCID: PMC9744270 DOI: 10.1371/journal.pone.0278759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 11/22/2022] [Indexed: 12/14/2022] Open

Abstract

AIMS

Understanding atypical forms of diabetes (AD) may advance precision medicine, but methods to identify such patients are needed. We propose an electronic health record (EHR)-based algorithmic approach to identify patients who may have AD, specifically those with insulin-sufficient, non-metabolic diabetes, in order to improve feasibility of identifying these patients through detailed chart review.

METHODS

Patients with likely T2D were selected using a validated machine-learning (ML) algorithm applied to EHR data. "Typical" T2D cases were removed by excluding individuals with obesity, evidence of dyslipidemia, antibody-positive diabetes, or cystic fibrosis. To filter out likely type 1 diabetes (T1D) cases, we applied six additional "branch algorithms," relying on various clinical characteristics, which resulted in six overlapping cohorts. Diabetes type was classified by manual chart review as atypical, not atypical, or indeterminate due to missing information.

RESULTS

Of 114,975 biobank participants, the algorithms collectively identified 119 (0.1%) potential AD cases, of which 16 (0.014%) were confirmed after expert review. The branch algorithm that excluded T1D based on outpatient insulin use had the highest percentage yield of AD (13 of 27; 48.2% yield). Together, the 16 AD cases had significantly lower BMI and higher HDL than either unselected T1D or T2D cases identified by ML algorithms (P<0.05). Compared to the ML T1D group, the AD group had a significantly higher T2D polygenic score (P<0.01) and lower hemoglobin A1c (P<0.01).

CONCLUSION

Our EHR-based algorithms followed by manual chart review identified collectively 16 individuals with AD, representing 0.22% of biobank enrollees with T2D. With a maximum yield of 48% cases after manual chart review, our algorithms have the potential to drastically improve efficiency of AD identification. Recognizing patients with AD may inform on the heterogeneity of T2D and facilitate enrollment in studies like the Rare and Atypical Diabetes Network (RADIANT).

Collapse

Affiliation(s)

Sara J. Cromer Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America Programs in Metabolism and Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America Northeastern University, Boston, Massachusetts, United States of America Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
Victoria Chen Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
Christopher Han Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America
William Marshall Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America
Shekina Emongo Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America
Evelyn Greaux Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America
Tim Majarian Programs in Metabolism and Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America Northeastern University, Boston, Massachusetts, United States of America
Jose C. Florez Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America Programs in Metabolism and Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America Northeastern University, Boston, Massachusetts, United States of America Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
Josep Mercader Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America Programs in Metabolism and Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America Northeastern University, Boston, Massachusetts, United States of America Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
Miriam S. Udler Diabetes Unit, Endocrine Division, Massachusetts General Hospital, Boston, Massachusetts, United States of America Department of Medicine, Harvard Medical School, Boston, Massachusetts, United States of America Programs in Metabolism and Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America Northeastern University, Boston, Massachusetts, United States of America Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America

Collapse

Olusanya MO, Ogunsakin RE, Ghai M, Adeleke MA. Accuracy of Machine Learning Classification Models for the Prediction of Type 2 Diabetes Mellitus: A Systematic Survey and Meta-Analysis Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:ijerph192114280. [PMID: 36361161 PMCID: PMC9655196 DOI: 10.3390/ijerph192114280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/22/2022] [Accepted: 10/25/2022] [Indexed: 05/13/2023]

Application of machine learning methods for the prediction of true fasting status in patients performing blood tests. Sci Rep 2022;12:11929. [PMID: 35831336 PMCID: PMC9279373 DOI: 10.1038/s41598-022-15161-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 06/20/2022] [Indexed: 11/28/2022] Open

Wang S, Song F, Qiao Q, Liu Y, Chen J, Ma J. A Comparative Study of Natural Language Processing Algorithms Based on Cities Changing Diabetes Vulnerability Data. Healthcare (Basel) 2022;10:healthcare10061119. [PMID: 35742169 PMCID: PMC9223144 DOI: 10.3390/healthcare10061119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 06/08/2022] [Accepted: 06/13/2022] [Indexed: 11/16/2022] Open

Li J, Xu Z, Xu T, Lin S. Predicting Diabetes in Patients with Metabolic Syndrome Using Machine-Learning Model Based on Multiple Years' Data. Diabetes Metab Syndr Obes 2022;15:2951-2961. [PMID: 36186938 PMCID: PMC9525025 DOI: 10.2147/dmso.s381146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 09/16/2022] [Indexed: 11/23/2022] Open

Abstract

PURPOSE

To evaluate the performance of machine-learning models based on multiple years of continuous data to predict incident diabetes among patients with metabolic syndrome.

PATIENTS AND METHODS

The dataset comprises the health records from 2008 to 2020 including 4510 nondiabetic participants with metabolic syndrome (MetS) at baseline and with at least 6 years of records. MetS was defined according to the International Diabetes Federation (IDF) criteria. Overall, 332 patients developed incident diabetes during the 7±1.4 years of follow-up. Three popular classification algorithms were evaluated on the dataset: logistic regression, random forest, and Xgboost. Five models including single-year models (year 1, year 2, and year 3) and multiple-year models (year 1-2 and year 1-3) were developed for each algorithm.

RESULTS

The model performances improved with the increasing longitudinal dataset as the area under the receiver operating characteristic curve (AUROC) was boosted for both random forest (year 1-3: AUROC=0.893; year 3: AUROC=0.862; year 1-2: AUROC=0.847; year 2: AUROC=0.838) and Xgboost (year 1-3: AUROC=0.897; year 3: AUROC=0.833; year 1-2: AUROC=0.856; year 2: AUROC=0.823) model. In the multiple-year models, the highest fasting plasma glucose, followed by the mean or lowest level of HbA1c and BMI had the most important predictive value for the onset of diabetes. In the "1-3" year model, "delta weight" which reflects the fluctuations of yearly change of weight was the fourth-most important feature.

CONCLUSION

This study demonstrated improved performance with the accumulation of longitudinal data when using machine learning for diabetes prediction in MetS patients. For individuals with similar clinical parameters, the variation trends of these parameters could change the risk of future diabetes. This result indicated that models based on longitudinal multiple years' data may provide more personalized assessment tools for risk evaluation.

Collapse

Brady V, Whisenant M, Wang X, Ly VK, Zhu G, Aguilar D, Wu H. Characterization of Symptoms and Symptom Clusters for Type 2 Diabetes Using a Large Nationwide Electronic Health Record Database. Diabetes Spectr 2022;35:159-170. [PMID: 35668892 PMCID: PMC9160545 DOI: 10.2337/ds21-0064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Lenoir KM, Wagenknecht LE, Divers J, Casanova R, Dabelea D, Saydah S, Pihoker C, Liese AD, Standiford D, Hamman R, Wells BJ. Determining diagnosis date of diabetes using structured electronic health record (EHR) data: the SEARCH for diabetes in youth study. BMC Med Res Methodol 2021;21:210. [PMID: 34629073 PMCID: PMC8502379 DOI: 10.1186/s12874-021-01394-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Accepted: 09/07/2021] [Indexed: 12/01/2022] Open

Abstract

BACKGROUND

Disease surveillance of diabetes among youth has relied mainly upon manual chart review. However, increasingly available structured electronic health record (EHR) data have been shown to yield accurate determinations of diabetes status and type. Validated algorithms to determine date of diabetes diagnosis are lacking. The objective of this work is to validate two EHR-based algorithms to determine date of diagnosis of diabetes.

METHODS

A rule-based ICD-10 algorithm identified youth with diabetes from structured EHR data over the period of 2009 through 2017 within three children's hospitals that participate in the SEARCH for Diabetes in Youth Study: Cincinnati Children's Hospital, Cincinnati, OH, Seattle Children's Hospital, Seattle, WA, and Children's Hospital Colorado, Denver, CO. Previous research and a multidisciplinary team informed the creation of two algorithms based upon structured EHR data to determine date of diagnosis among diabetes cases. An ICD-code algorithm was defined by the year of occurrence of a second ICD-9 or ICD-10 diabetes code. A multiple-criteria algorithm consisted of the year of first occurrence of any of the following: diabetes-related ICD code, elevated glucose, elevated HbA1c, or diabetes medication. We assessed algorithm performance by percent agreement with a gold standard date of diagnosis determined by chart review.

RESULTS

Among 3777 cases, both algorithms demonstrated high agreement with true diagnosis year and differed in classification (p = 0.006): 86.5% agreement for the ICD code algorithm and 85.9% agreement for the multiple-criteria algorithm. Agreement was high for both type 1 and type 2 cases for the ICD code algorithm. Performance improved over time.

CONCLUSIONS

Year of occurrence of the second ICD diabetes-related code in the EHR yields an accurate diagnosis date within these pediatric hospital systems. This may lead to increased efficiency and sustainability of surveillance methods for incidence of diabetes among youth.

Collapse

Turchin A, Florez Builes LF. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review. J Diabetes Sci Technol 2021;15:553-560. [PMID: 33736486 PMCID: PMC8120048 DOI: 10.1177/19322968211000831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021;9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open

Abstract

Background

Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research.

Objective

This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions.

Methods

A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines.

Results

A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance.

Conclusions

Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.

Collapse

Affiliation(s)

Seungwon Lee Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Chelsea Doktorchik Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Elliot Asher Martin Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
Adam Giles D'Souza Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
Cathy Eastwood Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Abdel Aziz Shaheen Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Christopher Naugler Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Joon Lee Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Hude Quan Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada

Collapse

Weber C, Röschke L, Modersohn L, Lohr C, Kolditz T, Hahn U, Ammon D, Betz B, Kiehntopf M. Optimized Identification of Advanced Chronic Kidney Disease and Absence of Kidney Disease by Combining Different Electronic Health Data Resources and by Applying Machine Learning Strategies. J Clin Med 2020;9:jcm9092955. [PMID: 32932685 PMCID: PMC7563476 DOI: 10.3390/jcm9092955] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 08/26/2020] [Accepted: 08/28/2020] [Indexed: 12/31/2022] Open

Kuo KM, Talley P, Kao Y, Huang CH. A multi-class classification model for supporting the diagnosis of type II diabetes mellitus. PeerJ 2020;8:e9920. [PMID: 32974105 PMCID: PMC7487151 DOI: 10.7717/peerj.9920] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Accepted: 08/20/2020] [Indexed: 12/21/2022] Open

Abstract

Background

Numerous studies have utilized machine-learning techniques to predict the early onset of type 2 diabetes mellitus. However, fewer studies have been conducted to predict an appropriate diagnosis code for the type 2 diabetes mellitus condition. Further, ensemble techniques such as bagging and boosting have likewise been utilized to an even lesser extent. The present study aims to identify appropriate diagnosis codes for type 2 diabetes mellitus patients by means of building a multi-class prediction model which is both parsimonious and possessing minimum features. In addition, the importance of features for predicting diagnose code is provided.

Methods

This study included 149 patients who have contracted type 2 diabetes mellitus. The sample was collected from a large hospital in Taiwan from November, 2017 to May, 2018. Machine learning algorithms including instance-based, decision trees, deep neural network, and ensemble algorithms were all used to build the predictive models utilized in this study. Average accuracy, area under receiver operating characteristic curve, Matthew correlation coefficient, macro-precision, recall, weighted average of precision and recall, and model process time were subsequently used to assess the performance of the built models. Information gain and gain ratio were used in order to demonstrate feature importance.

Results

The results showed that most algorithms, except for deep neural network, performed well in terms of all performance indices regardless of either the training or testing dataset that were used. Ten features and their importance to determine the diagnosis code of type 2 diabetes mellitus were identified. Our proposed predictive model can be further developed into a clinical diagnosis support system or integrated into existing healthcare information systems. Both methods of application can effectively support physicians whenever they are diagnosing type 2 diabetes mellitus patients in order to foster better patient-care planning.

Collapse

Siontis KC, Yao X, Pirruccello JP, Philippakis AA, Noseworthy PA. How Will Machine Learning Inform the Clinical Care of Atrial Fibrillation? Circ Res 2020;127:155-169. [DOI: 10.1161/circresaha.120.316401] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]