Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 2016;97:120-127. [PMID: 27919371 DOI: 10.1016/j.ijmedinf.2016.09.014] [Citation(s) in RCA: 123] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 09/27/2016] [Accepted: 09/30/2016] [Indexed: 01/19/2023]

For:	Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 2016;97:120-127. [PMID: 27919371 DOI: 10.1016/j.ijmedinf.2016.09.014] [Citation(s) in RCA: 123] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 09/27/2016] [Accepted: 09/30/2016] [Indexed: 01/19/2023]

Number

Cited by Other Article(s)

Luo X, Gandhi P, Zhang Z, Shao W, Han Z, Chandrasekaran V, Turzhitsky V, Bali V, Roberts AR, Metzger M, Baker J, La Rosa C, Weaver J, Dexter P, Huang K. Applying interpretable deep learning models to identify chronic cough patients using EHR data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021;210:106395. [PMID: 34525412 DOI: 10.1016/j.cmpb.2021.106395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 08/30/2021] [Indexed: 06/13/2023]

Abstract

BACKGROUND AND OBJECTIVE

Chronic cough (CC) affects approximately 10% of adults. Many disease states are associated with chronic cough, such as asthma, upper airway cough syndrome, bronchitis, and gastroesophageal reflux disease. The lack of an ICD code specific for chronic cough makes it challenging to identify such patients from electronic health records (EHRs). For clinical and research purposes, computational methods using EHR data are urgently needed to identify chronic cough cases. This research aims to investigate the data representations and deep learning algorithms for chronic cough prediction.

METHODS

Utilizing real-world EHR data from a large academic healthcare system from October 2005 to September 2015, we investigated Natural Language Representation of the EHR data and systematically evaluated deep learning and traditional machine learning models to predict chronic cough patients. We built these machine learning models using structured data (medication and diagnosis) and unstructured data (clinical notes).

RESULTS

The sensitivity and specificity of a transformer-based deep learning algorithm, specifically BERT with attention model, was 0.856 and 0.866, respectively, using structured data (medication and diagnosis). Sensitivity and specificity improved to 0.952 and 0.930 when we combined structured data with symptoms extracted from clinical notes. We further found that the attention mechanism of deep learning models can be used to extract important features that drive the prediction decisions. Compared with our previously published rule-based algorithm, the deep learning algorithm can identify more chronic cough patients with structured data.

CONCLUSIONS

By applying deep learning models, chronic cough patients can be reliably identified for prospective or retrospective research through medication and diagnosis data, widely available in EHR and electronic claims data, thus improving the generalizability of the patient identification algorithm. Deep learning models can identify chronic cough patients with even higher sensitivity and specificity when structured and unstructured EHR data are utilized. We anticipate language-based data representation and deep learning models developed in this research could also be productively used for other disease prediction and case identification.

Collapse

Affiliation(s)

Xiao Luo Purdue School of Engineering and Technology, IUPUI, 799W Michigan St, Indianapolis, IN 46202, United States.
Priyanka Gandhi Purdue School of Engineering and Technology, IUPUI, 799W Michigan St, Indianapolis, IN 46202, United States.
Zuoyi Zhang Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States.
Wei Shao Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States.
Zhi Han Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
Vasu Chandrasekaran Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Vladimir Turzhitsky Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Vishal Bali Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Anna R Roberts Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
Megan Metzger Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
Jarod Baker Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
Carmen La Rosa Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Jessica Weaver Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Paul Dexter Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States; Eskenazi Health, 720 Eskenazi Ave, Indianapolis, IN 46202, United States.
Kun Huang Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.

Collapse

Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021;28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open

Abstract

OBJECTIVE

High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs.

MATERIALS AND METHODS

We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms.

RESULTS

Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.

DISCUSSION

The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease.

CONCLUSION

Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.

Collapse

Xiong Y, Peng W, Chen Q, Huang Z, Tang B. A Unified Machine Reading Comprehension Framework for Cohort Selection. IEEE J Biomed Health Inform 2021;26:379-387. [PMID: 34236972 DOI: 10.1109/jbhi.2021.3095478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Enriquez JS, Chu Y, Pudakalakatti S, Hsieh KL, Salmon D, Dutta P, Millward NZ, Lurie E, Millward S, McAllister F, Maitra A, Sen S, Killary A, Zhang J, Jiang X, Bhattacharya PK, Shams S. Hyperpolarized Magnetic Resonance and Artificial Intelligence: Frontiers of Imaging in Pancreatic Cancer. JMIR Med Inform 2021;9:e26601. [PMID: 34137725 PMCID: PMC8277399 DOI: 10.2196/26601] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 02/24/2021] [Accepted: 04/03/2021] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

There is an unmet need for noninvasive imaging markers that can help identify the aggressive subtype(s) of pancreatic ductal adenocarcinoma (PDAC) at diagnosis and at an earlier time point, and evaluate the efficacy of therapy prior to tumor reduction. In the past few years, there have been two major developments with potential for a significant impact in establishing imaging biomarkers for PDAC and pancreatic cancer premalignancy: (1) hyperpolarized metabolic (HP)-magnetic resonance (MR), which increases the sensitivity of conventional MR by over 10,000-fold, enabling real-time metabolic measurements; and (2) applications of artificial intelligence (AI).

OBJECTIVE

Our objective of this review was to discuss these two exciting but independent developments (HP-MR and AI) in the realm of PDAC imaging and detection from the available literature to date.

METHODS

A systematic review following the PRISMA extension for Scoping Reviews (PRISMA-ScR) guidelines was performed. Studies addressing the utilization of HP-MR and/or AI for early detection, assessment of aggressiveness, and interrogating the early efficacy of therapy in patients with PDAC cited in recent clinical guidelines were extracted from the PubMed and Google Scholar databases. The studies were reviewed following predefined exclusion and inclusion criteria, and grouped based on the utilization of HP-MR and/or AI in PDAC diagnosis.

RESULTS

Part of the goal of this review was to highlight the knowledge gap of early detection in pancreatic cancer by any imaging modality, and to emphasize how AI and HP-MR can address this critical gap. We reviewed every paper published on HP-MR applications in PDAC, including six preclinical studies and one clinical trial. We also reviewed several HP-MR-related articles describing new probes with many functional applications in PDAC. On the AI side, we reviewed all existing papers that met our inclusion criteria on AI applications for evaluating computed tomography (CT) and MR images in PDAC. With the emergence of AI and its unique capability to learn across multimodal data, along with sensitive metabolic imaging using HP-MR, this knowledge gap in PDAC can be adequately addressed. CT is an accessible and widespread imaging modality worldwide as it is affordable; because of this reason alone, most of the data discussed are based on CT imaging datasets. Although there were relatively few MR-related papers included in this review, we believe that with rapid adoption of MR imaging and HP-MR, more clinical data on pancreatic cancer imaging will be available in the near future.

CONCLUSIONS

Integration of AI, HP-MR, and multimodal imaging information in pancreatic cancer may lead to the development of real-time biomarkers of early detection, assessing aggressiveness, and interrogating early efficacy of therapy in PDAC.

Collapse

Affiliation(s)

José S Enriquez Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Yan Chu School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
Shivanand Pudakalakatti Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Kang Lin Hsieh School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
Duncan Salmon Department of Electrical and Computer Engineering, Rice University, Houston, TX, United States
Prasanta Dutta Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Niki Zacharias Millward Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Urology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Eugene Lurie Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Steven Millward Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Florencia McAllister Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Clinical Cancer Prevention, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Anirban Maitra Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Pathology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Subrata Sen Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Ann Killary Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Department of Translational Molecular Pathology, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Jian Zhang Division of Computer Science and Engineering, Louisiana State University, Baton Rouge, LA, United States
Xiaoqian Jiang School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States
Pratip K Bhattacharya Department of Cancer Systems Imaging, University of Texas MD Anderson Cancer Center, Houston, TX, United States.,Graduate School of Biomedical Sciences, University of Texas MD Anderson Cancer Center, Houston, TX, United States
Shayan Shams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, United States

Collapse

Zhou L, Zheng X, Yang D, Wang Y, Bai X, Ye X. Application of multi-label classification models for the diagnosis of diabetic complications. BMC Med Inform Decis Mak 2021;21:182. [PMID: 34098959 PMCID: PMC8182940 DOI: 10.1186/s12911-021-01525-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/28/2021] [Indexed: 12/23/2022] Open

Abstract

Background

Early diagnosis for the diabetes complications is clinically demanding with great significancy. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electronic health records (EHRs), and leveraged the correlations between the complications to further improve the prediction accuracy.

Methods

We obtained the demographic characteristics and laboratory data from the EHRs for patients admitted to Changzhou No. 2 People’s Hospital, the affiliated hospital of Nanjing Medical University in China from May 2013 to June 2020. The data included 93 biochemical indicators and 9,765 patients. We used the Pearson correlation coefficient (PCC) to analyze the correlations between different diabetic complications from a statistical perspective. We used an MLC model, based on the Random Forest (RF) technique, to leverage these correlations and predict four complications simultaneously. We explored four different MLC models; a Label Power Set (LP), Classifier Chains (CC), Ensemble Classifier Chains (ECC), and Calibrated Label Ranking (CLR). We used traditional Binary Relevance (BR) as a comparison. We used 11 different performance metrics and the area under the receiver operating characteristic curve (AUROC) to evaluate these models. We analyzed the weights of the learned model and illustrated (1) the top 10 key indicators of different complications and (2) the correlations between different diabetic complications.

Results

The MLC models including CC, ECC and CLR outperformed the traditional BR method in most performance metrics; the ECC models performed the best in Hamming loss (0.1760), Accuracy (0.7020), F1_Score (0.7855), Precision (0.8649), F1_micro (0.8078), F1_macro (0.7773), Recall_micro (0.8631), Recall_macro (0.8009), and AUROC (0.8231). The two diabetic complication correlation matrices drawn from the PCC analysis and the MLC models were consistent with each other and indicated that the complications correlated to different extents. The top 10 key indicators given by the model are valuable in medical application.

Conclusions

Our MLC model can effectively utilize the potential correlation between different diabetic complications to further improve the prediction accuracy. This model should be explored further in other complex diseases with multiple complications.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12911-021-01525-7.

Collapse

Machine Learning: Algorithms, Real-World Applications and Research Directions. ACTA ACUST UNITED AC 2021;2:160. [PMID: 33778771 PMCID: PMC7983091 DOI: 10.1007/s42979-021-00592-x] [Citation(s) in RCA: 463] [Impact Index Per Article: 154.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 03/12/2021] [Indexed: 12/16/2022]

Saberi-Karimian M, Khorasanchi Z, Ghazizadeh H, Tayefi M, Saffar S, Ferns GA, Ghayour-Mobarhan M. Potential value and impact of data mining and machine learning in clinical diagnostics. Crit Rev Clin Lab Sci 2021;58:275-296. [PMID: 33739235 DOI: 10.1080/10408363.2020.1857681] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Abstract

Data mining involves the use of mathematical sciences, statistics, artificial intelligence, and machine learning to determine the relationships between variables from a large sample of data. It has previously been shown that data mining can improve the prediction and diagnostic precision of type 2 diabetes mellitus. A few studies have applied machine learning to assess hypertension and metabolic syndrome-related biomarkers, as well as refine the assessment of cardiovascular disease risk. Machine learning methods have also been applied to assess new biomarkers and survival outcomes in patients with renal diseases to predict the development of chronic kidney disease, disease progression, and renal graft survival. In the latter, random forest methods were found to be the best for the prediction of chronic kidney disease. Some studies have investigated the prognosis of nonalcoholic fatty liver disease and acute liver failure, as well as therapy response prediction in patients with viral disorders, using decision tree models. Machine learning techniques, such as Sparse High-Order Interaction Model with Rejection Option, have been used for diagnosing Alzheimer's disease. Data mining techniques have also been applied to identify the risk factors for serious mental illness, such as depression and dementia, and help to diagnose and predict the quality of life of such patients. In relation to child health, some studies have determined the best algorithms for predicting obesity and malnutrition. Machine learning has determined the important risk factors for preterm birth and low birth weight. Published studies of patients with cancer and bacterial diseases are limited and should perhaps be addressed more comprehensively in future studies. Herein, we provide an in-depth review of studies in which biochemical biomarker data were analyzed using machine learning methods to assess the risk of several common diseases, in order to summarize the potential applications of data mining methods in clinical diagnosis. Data mining techniques have now been increasingly applied to clinical diagnostics, and they have the potential to support this field.

Collapse

Understanding current states of machine learning approaches in medical informatics: a systematic literature review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00538-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Comparison of Diagnosis Accuracy between a Backpropagation Artificial Neural Network Model and Linear Regression in Digestive Disease Patients: an Empirical Research. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021;2021:6662779. [PMID: 33727951 PMCID: PMC7937476 DOI: 10.1155/2021/6662779] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 12/10/2020] [Accepted: 02/18/2021] [Indexed: 02/08/2023]

Annapragada AV, Donaruma-Kwoh MM, Annapragada AV, Starosolski ZA. A natural language processing and deep learning approach to identify child abuse from pediatric electronic medical records. PLoS One 2021;16:e0247404. [PMID: 33635890 PMCID: PMC7909689 DOI: 10.1371/journal.pone.0247404] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 02/07/2021] [Indexed: 01/16/2023] Open

Abstract

Child physical abuse is a leading cause of traumatic injury and death in children. In 2017, child abuse was responsible for 1688 fatalities in the United States, of 3.5 million children referred to Child Protection Services and 674,000 substantiated victims. While large referral hospitals maintain teams trained in Child Abuse Pediatrics, smaller community hospitals often do not have such dedicated resources to evaluate patients for potential abuse. Moreover, identification of abuse has a low margin of error, as false positive identifications lead to unwarranted separations, while false negatives allow dangerous situations to continue. This context makes the consistent detection of and response to abuse difficult, particularly given subtle signs in young, non-verbal patients. Here, we describe the development of artificial intelligence algorithms that use unstructured free-text in the electronic medical record-including notes from physicians, nurses, and social workers-to identify children who are suspected victims of physical abuse. Importantly, only the notes from time of first encounter (e.g.: birth, routine visit, sickness) to the last record before child protection team involvement were used. This allowed us to develop an algorithm using only information available prior to referral to the specialized child protection team. The study was performed in a multi-center referral pediatric hospital on patients screened for abuse within five different locations between 2015 and 2019. Of 1123 patients, 867 records were available after data cleaning and processing, and 55% were abuse-positive as determined by a multi-disciplinary team of clinical professionals. These electronic medical records were encoded with three natural language processing (NLP) algorithms-Bag of Words (BOW), Word Embeddings (WE), and Rules-Based (RB)-and used to train multiple neural network architectures. The BOW and WE encodings utilize the full free-text, while RB selects crucial phrases as identified by physicians. The best architecture was selected by average classification accuracy for the best performing model from each train-test split of a cross-validation experiment. Natural language processing coupled with neural networks detected cases of likely child abuse using only information available to clinicians prior to child protection team referral with average accuracy of 0.90±0.02 and average area under the receiver operator characteristic curve (ROC-AUC) 0.93±0.02 for the best performing Bag of Words models. The best performing rules-based models achieved average accuracy of 0.77±0.04 and average ROC-AUC 0.81±0.05, while a Word Embeddings strategy was severely limited by lack of representative embeddings. Importantly, the best performing model had a false positive rate of 8%, as compared to rates of 20% or higher in previously reported studies. This artificial intelligence approach can help screen patients for whom an abuse concern exists and streamline the identification of patients who may benefit from referral to a child protection team. Furthermore, this approach could be applied to develop computer-aided-diagnosis platforms for the challenging and often intractable problem of reliably identifying pediatric patients suffering from physical abuse.

Collapse

Okui T, Nojiri C, Kimura S, Abe K, Maeno S, Minami M, Maeda Y, Tajima N, Kawamura T, Nakashima N. Performance evaluation of case definitions of type 1 diabetes for health insurance claims data in Japan. BMC Med Inform Decis Mak 2021;21:52. [PMID: 33573645 PMCID: PMC7879626 DOI: 10.1186/s12911-021-01422-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 01/25/2021] [Indexed: 12/18/2022] Open

Wu B, Chow W, Sakthivel M, Kakade O, Gupta K, Israel D, Chen YW, Kuruvilla AS. Body Mass Index Variable Interpolation to Expand the Utility of Real-world Administrative Healthcare Claims Database Analyses. Adv Ther 2021;38:1314-1327. [PMID: 33432543 PMCID: PMC7889527 DOI: 10.1007/s12325-020-01605-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 12/11/2020] [Indexed: 12/23/2022]

Abstract

INTRODUCTION

Administrative claims data provide an important source for real-world evidence (RWE) generation, but incomplete reporting, such as for body mass index (BMI), limits the sample sizes that can be analyzed to address certain research questions. The objective of this study was to construct models by implementing machine-learning (ML) algorithms to predict BMI classifications (≥ 30, ≥ 35, and ≥ 40 kg/m²) in administrative healthcare claims databases, and then internally and externally validate them.

METHODS

Five advanced ML algorithms were implemented for each BMI classification on a random sampling of BMI readings from the Optum PanTher Electronic Health Record database (2%) and the Optum Clinformatics Date of Death (20%) database, while incorporating baseline demographic and clinical characteristics. Sensitivity analyses with oversampling ratios were conducted. Model performance was validated internally and externally.

RESULTS

Models trained on the Super Learner ML algorithm (SLA) yielded the best BMI classification predictive performance. SLA model 1 utilized sociodemographic and clinical characteristics, including baseline BMI values; the area under the receiver operating characteristic curve (ROC AUC) was approximately 88% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m² (internal validation), while accuracy ranged from 87.9% to 92.8% and specificity ranged from 91.8% to 94.7%. SLA model 2 utilized sociodemographic information and clinical characteristics, excluding baseline BMI values; ROC AUC was approximately 73% for the prediction of BMI classifications of ≥ 30, ≥ 35, and ≥ 40 kg/m² (internal validation), while accuracy ranged from 73.6% to 80.0% and specificity ranged from 71.6% to 85.9%. The external validation on the MarketScan Commercial Claims and Encounters database yielded relatively consistent results with slightly diminished performance.

CONCLUSION

This study demonstrated the feasibility and validity of using ML algorithms to predict BMI classifications in administrative healthcare claims data to expand the utility for RWE generation.

Collapse

Lee S, Doktorchik C, Martin EA, D'Souza AG, Eastwood C, Shaheen AA, Naugler C, Lee J, Quan H. Electronic Medical Record-Based Case Phenotyping for the Charlson Conditions: Scoping Review. JMIR Med Inform 2021;9:e23934. [PMID: 33522976 PMCID: PMC7884219 DOI: 10.2196/23934] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/20/2020] [Accepted: 12/05/2020] [Indexed: 12/16/2022] Open

Abstract

Background

Electronic medical records (EMRs) contain large amounts of rich clinical information. Developing EMR-based case definitions, also known as EMR phenotyping, is an active area of research that has implications for epidemiology, clinical care, and health services research.

Objective

This review aims to describe and assess the present landscape of EMR-based case phenotyping for the Charlson conditions.

Methods

A scoping review of EMR-based algorithms for defining the Charlson comorbidity index conditions was completed. This study covered articles published between January 2000 and April 2020, both inclusive. Embase (Excerpta Medica database) and MEDLINE (Medical Literature Analysis and Retrieval System Online) were searched using keywords developed in the following 3 domains: terms related to EMR, terms related to case finding, and disease-specific terms. The manuscript follows the Preferred Reporting Items for Systematic reviews and Meta-analyses extension for Scoping Reviews (PRISMA) guidelines.

Results

A total of 274 articles representing 299 algorithms were assessed and summarized. Most studies were undertaken in the United States (181/299, 60.5%), followed by the United Kingdom (42/299, 14.0%) and Canada (15/299, 5.0%). These algorithms were mostly developed either in primary care (103/299, 34.4%) or inpatient (168/299, 56.2%) settings. Diabetes, congestive heart failure, myocardial infarction, and rheumatology had the highest number of developed algorithms. Data-driven and clinical rule–based approaches have been identified. EMR-based phenotype and algorithm development reflect the data access allowed by respective health systems, and algorithms vary in their performance.

Conclusions

Recognizing similarities and differences in health systems, data collection strategies, extraction, data release protocols, and existing clinical pathways is critical to algorithm development strategies. Several strategies to assist with phenotype-based case definitions have been proposed.

Collapse

Affiliation(s)

Seungwon Lee Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Chelsea Doktorchik Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Elliot Asher Martin Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
Adam Giles D'Souza Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Alberta Health Services, Calgary, AB, Canada
Cathy Eastwood Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Abdel Aziz Shaheen Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Christopher Naugler Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Pathology and Laboratory Medicine, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Joon Lee Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Cardiac Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
Hude Quan Centre for Health Informatics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.,Department of Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada

Collapse

Deepa N, Prabadevi B, Maddikunta PK, Gadekallu TR, Baker T, Khan MA, Tariq U. An AI-based intelligent system for healthcare analysis using Ridge-Adaline Stochastic Gradient Descent Classifier. THE JOURNAL OF SUPERCOMPUTING 2021;77:1998-2017. [DOI: 10.1007/s11227-020-03347-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]

Xiong Y, Shi X, Chen S, Jiang D, Tang B, Wang X, Chen Q, Yan J. Cohort selection for clinical trials using hierarchical neural network. J Am Med Inform Assoc 2021;26:1203-1208. [PMID: 31305921 DOI: 10.1093/jamia/ocz099] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Revised: 04/28/2019] [Accepted: 06/13/2019] [Indexed: 12/22/2022] Open

Abstract

OBJECTIVE

Cohort selection for clinical trials is a key step for clinical research. We proposed a hierarchical neural network to determine whether a patient satisfied selection criteria or not.

MATERIALS AND METHODS

We designed a hierarchical neural network (denoted as CNN-Highway-LSTM or LSTM-Highway-LSTM) for the track 1 of the national natural language processing (NLP) clinical challenge (n2c2) on cohort selection for clinical trials in 2018. The neural network is composed of 5 components: (1) sentence representation using convolutional neural network (CNN) or long short-term memory (LSTM) network; (2) a highway network to adjust information flow; (3) a self-attention neural network to reweight sentences; (4) document representation using LSTM, which takes sentence representations in chronological order as input; (5) a fully connected neural network to determine whether each criterion is met or not. We compared the proposed method with its variants, including the methods only using the first component to represent documents directly and the fully connected neural network for classification (denoted as CNN-only or LSTM-only) and the methods without using the highway network (denoted as CNN-LSTM or LSTM-LSTM). The performance of all methods was measured by micro-averaged precision, recall, and F1 score.

RESULTS

The micro-averaged F1 scores of CNN-only, LSTM-only, CNN-LSTM, LSTM-LSTM, CNN-Highway-LSTM, and LSTM-Highway-LSTM were 85.24%, 84.25%, 87.27%, 88.68%, 88.48%, and 90.21%, respectively. The highest micro-averaged F1 score is higher than our submitted 1 of 88.55%, which is 1 of the top-ranked results in the challenge. The results indicate that the proposed method is effective for cohort selection for clinical trials.

DISCUSSION

Although the proposed method achieved promising results, some mistakes were caused by word ambiguity, negation, number analysis and incomplete dictionary. Moreover, imbalanced data was another challenge that needs to be tackled in the future.

CONCLUSION

In this article, we proposed a hierarchical neural network for cohort selection. Experimental results show that this method is good at selecting cohort.

Collapse

Qayyum A, Qadir J, Bilal M, Al-Fuqaha A. Secure and Robust Machine Learning for Healthcare: A Survey. IEEE Rev Biomed Eng 2021;14:156-180. [PMID: 32746371 DOI: 10.1109/rbme.2020.3013489] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Sarker IH. Data Science and Analytics: An Overview from Data-Driven Smart Computing, Decision-Making and Applications Perspective. SN COMPUTER SCIENCE 2021;2:377. [PMID: 34278328 PMCID: PMC8274472 DOI: 10.1007/s42979-021-00765-8] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 07/02/2021] [Indexed: 02/07/2023]

Leary E, Stoker AM, Cook JL. Classification, Categorization, and Algorithms for Articular Cartilage Defects. J Knee Surg 2020;33:1069-1077. [PMID: 32663886 DOI: 10.1055/s-0040-1713778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Frontoni E, Romeo L, Bernardini M, Moccia S, Migliorelli L, Paolanti M, Ferri A, Misericordia P, Mancini A, Zingaretti P. A Decision Support System for Diabetes Chronic Care Models Based on General Practitioner Engagement and EHR Data Sharing. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2020;8:3000112. [PMID: 33150095 PMCID: PMC7605604 DOI: 10.1109/jtehm.2020.3031107] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 09/16/2020] [Accepted: 10/10/2020] [Indexed: 12/19/2022]

Abstract

Objective Decision support systems (DSS) have been developed and promoted for their potential to improve quality of health care. However, there is a lack of common clinical strategy and a poor management of clinical resources and erroneous implementation of preventive medicine. Methods To overcome this problem, this work proposed an integrated system that relies on the creation and sharing of a database extracted from GPs' Electronic Health Records (EHRs) within the Netmedica Italian (NMI) cloud infrastructure. Although the proposed system is a pilot application specifically tailored for improving the chronic Type 2 Diabetes (T2D) care it could be easily targeted to effectively manage different chronic-diseases. The proposed DSS is based on EHR structure used by GPs in their daily activities following the most updated guidelines in data protection and sharing. The DSS is equipped with a Machine Learning (ML) method for analyzing the shared EHRs and thus tackling the high variability of EHRs. A novel set of T2D care-quality indicators are used specifically to determine the economic incentives and the T2D features are presented as predictors of the proposed ML approach. Results The EHRs from 41237 T2D patients were analyzed. No additional data collection, with respect to the standard clinical practice, was required. The DSS exhibited competitive performance (up to an overall accuracy of 98%±2% and macro-recall of 96%±1%) for classifying chronic care quality across the different follow-up phases. The chronic care quality model brought to a significant increase (up to 12%) of the T2D patients without complications. For GPs who agreed to use the proposed system, there was an economic incentive. A further bonus was assigned when performance targets are achieved. Conclusions The quality care evaluation in a clinical use-case scenario demonstrated how the empowerment of the GPs through the use of the platform (integrating the proposed DSS), along with the economic incentives, may speed up the improvement of care.

Collapse

D'Adamo GL, Widdop JT, Giles EM. The future is now? Clinical and translational aspects of "Omics" technologies. Immunol Cell Biol 2020;99:168-176. [PMID: 32924178 DOI: 10.1111/imcb.12404] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Revised: 09/09/2020] [Accepted: 09/09/2020] [Indexed: 12/16/2022]

Sampa MB, Hossain MN, Hoque MR, Islam R, Yokota F, Nishikitani M, Ahmed A. Blood Uric Acid Prediction With Machine Learning: Model Development and Performance Comparison. JMIR Med Inform 2020;8:e18331. [PMID: 33030442 PMCID: PMC7582147 DOI: 10.2196/18331] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 07/16/2020] [Accepted: 08/10/2020] [Indexed: 02/06/2023] Open

Abstract

Background

Uric acid is associated with noncommunicable diseases such as cardiovascular diseases, chronic kidney disease, coronary artery disease, stroke, diabetes, metabolic syndrome, vascular dementia, and hypertension. Therefore, uric acid is considered to be a risk factor for the development of noncommunicable diseases. Most studies on uric acid have been performed in developed countries, and the application of machine-learning approaches in uric acid prediction in developing countries is rare. Different machine-learning algorithms will work differently on different types of data in various diseases; therefore, a different investigation is needed for different types of data to identify the most accurate algorithms. Specifically, no study has yet focused on the urban corporate population in Bangladesh, despite the high risk of developing noncommunicable diseases for this population.

Objective

The aim of this study was to develop a model for predicting blood uric acid values based on basic health checkup test results, dietary information, and sociodemographic characteristics using machine-learning algorithms. The prediction of health checkup test measurements can be very helpful to reduce health management costs.

Methods

Various machine-learning approaches were used in this study because clinical input data are not completely independent and exhibit complex interactions. Conventional statistical models have limitations to consider these complex interactions, whereas machine learning can consider all possible interactions among input data. We used boosted decision tree regression, decision forest regression, Bayesian linear regression, and linear regression to predict personalized blood uric acid based on basic health checkup test results, dietary information, and sociodemographic characteristics. We evaluated the performance of these five widely used machine-learning models using data collected from 271 employees in the Grameen Bank complex of Dhaka, Bangladesh.

Results

The mean uric acid level was 6.63 mg/dL, indicating a borderline result for the majority of the sample (normal range <7.0 mg/dL). Therefore, these individuals should be monitoring their uric acid regularly. The boosted decision tree regression model showed the best performance among the models tested based on the root mean squared error of 0.03, which is also better than that of any previously reported model.

Conclusions

A uric acid prediction model was developed based on personal characteristics, dietary information, and some basic health checkup measurements. This model will be useful for improving awareness among high-risk individuals and populations, which can help to save medical costs. A future study could include additional features (eg, work stress, daily physical activity, alcohol intake, eating red meat) in improving prediction.

Collapse

Tang Y, Gao R, Lee HH, Wells QS, Spann A, Terry JG, Carr JJ, Huo Y, Bao S, Landman BA. Prediction of Type II Diabetes Onset with Computed Tomography and Electronic Medical Records. MULTIMODAL LEARNING FOR CLINICAL DECISION SUPPORT AND CLINICAL IMAGE-BASED PROCEDURES : 10TH INTERNATIONAL WORKSHOP, ML-CDS 2020, AND 9TH INTERNATIONAL WORKSHOP, CLIP 2020, HELD IN CONJUNCTION WITH MICCAI 2020, LIMA, PERU, OCTOBER 4-8, ... 2020;12445:13-23. [PMID: 34113927 PMCID: PMC8188902 DOI: 10.1007/978-3-030-60946-7_2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Luo YF, Henry S, Wang Y, Shen F, Uzuner O, Rumshisky A. The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records. J Am Med Inform Assoc 2020;27:1529-1537. [PMID: 32968800 PMCID: PMC7647359 DOI: 10.1093/jamia/ocaa106] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 05/01/2020] [Accepted: 05/14/2020] [Indexed: 01/19/2023] Open

Alfian G, Syafrudin M, Anshari M, Benes F, Atmaji FTD, Fahrurrozi I, Hidayatullah AF, Rhee J. Blood glucose prediction model for type 1 diabetes based on artificial neural network with time-domain features. Biocybern Biomed Eng 2020. [DOI: 10.1016/j.bbe.2020.10.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Sengupta PP, Shrestha S, Berthon B, Messas E, Donal E, Tison GH, Min JK, D'hooge J, Voigt JU, Dudley J, Verjans JW, Shameer K, Johnson K, Lovstakken L, Tabassian M, Piccirilli M, Pernot M, Yanamala N, Duchateau N, Kagiyama N, Bernard O, Slomka P, Deo R, Arnaout R. Proposed Requirements for Cardiovascular Imaging-Related Machine Learning Evaluation (PRIME): A Checklist: Reviewed by the American College of Cardiology Healthcare Innovation Council. JACC Cardiovasc Imaging 2020;13:2017-2035. [PMID: 32912474 PMCID: PMC7953597 DOI: 10.1016/j.jcmg.2020.07.015] [Citation(s) in RCA: 125] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 07/15/2020] [Accepted: 07/16/2020] [Indexed: 12/20/2022]

Affiliation(s)

Partho P Sengupta West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia.
Sirish Shrestha West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia
Béatrice Berthon Physique pour la Médecine Paris, Inserm U1273, CNRS FRE 2031, ESPCI Paris, PSL Research University, Paris, France
Emmanuel Messas Université Paris Descartes, Sorbonne Paris Cité, Paris, France
Erwan Donal Département de Cardiologie et Maladies Vasculaires, Service de Cardiologie et maladies vasculaires, CHU Rennes, Rennes, France
Geoffrey H Tison Division of Cardiology, Department of Medicine, University of California San Francisco, San Francisco, California
James K Min Cleerly, Inc., New York, New York
Jan D'hooge Laboratory on Cardiovascular Imaging and Dynamics, Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium
Jens-Uwe Voigt Department of Cardiovascular Science, KU Leuven, Leuven, Belgium; Department of Cardiovascular Diseases, University Hospitals Leuven, Belgium
Joel Dudley Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York; Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
Johan W Verjans Australian Institute for Machine Learning, University of Adelaide, North Terrace, Adelaide, South Australia, Australia; Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
Khader Shameer Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York; Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
Kipp Johnson Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York; Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
Lasse Lovstakken Department of Circulation and Medical Imaging, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
Mahdi Tabassian Laboratory on Cardiovascular Imaging and Dynamics, Department of Cardiovascular Sciences, KU Leuven, Leuven, Belgium
Marco Piccirilli West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia
Mathieu Pernot Physique pour la Médecine Paris, Inserm U1273, CNRS FRE 2031, ESPCI Paris, PSL Research University, Paris, France
Naveena Yanamala West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia
Nicolas Duchateau CREATIS, CNRS UMR 5220, INSERM U1206, Université Lyon 1, INSA-LYON, France
Nobuyuki Kagiyama West Virginia University Heart and Vascular Institute, Division of Cardiology, Morgantown, West Virginia
Olivier Bernard CREATIS, CNRS UMR 5220, INSERM U1206, Université Lyon 1, INSA-LYON, France
Piotr Slomka Department of Imaging and Medicine, Cedars-Sinai Medical Center, Los Angeles, California
Rahul Deo Division of Cardiology, Department of Medicine, University of California San Francisco, San Francisco, California
Rima Arnaout Division of Cardiology, Department of Medicine, University of California San Francisco, San Francisco, California

Collapse

Jadhav AS, Patil PB, Biradar S. Analysis on diagnosing diabetic retinopathy by segmenting blood vessels, optic disc and retinal abnormalities. J Med Eng Technol 2020;44:299-316. [PMID: 32729345 DOI: 10.1080/03091902.2020.1791986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Ruan Y, Bellot A, Moysova Z, Tan GD, Lumb A, Davies J, van der Schaar M, Rea R. Predicting the Risk of Inpatient Hypoglycemia With Machine Learning Using Electronic Health Records. Diabetes Care 2020;43:1504-1511. [PMID: 32350021 DOI: 10.2337/dc19-1743] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Accepted: 04/04/2020] [Indexed: 02/03/2023]

Srivastava AK, Kumar Y, Singh PK. A Rule-Based Monitoring System for Accurate Prediction of Diabetes. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2020. [DOI: 10.4018/ijehmc.2020070103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Yang T, Zhang L, Yi L, Feng H, Li S, Chen H, Zhu J, Zhao J, Zeng Y, Liu H. Ensemble Learning Models Based on Noninvasive Features for Type 2 Diabetes Screening: Model Development and Validation. JMIR Med Inform 2020;8:e15431. [PMID: 32554386 PMCID: PMC7333074 DOI: 10.2196/15431] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Revised: 12/22/2019] [Accepted: 02/07/2020] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Early diabetes screening can effectively reduce the burden of disease. However, natural population-based screening projects require a large number of resources. With the emergence and development of machine learning, researchers have started to pursue more flexible and efficient methods to screen or predict type 2 diabetes.

OBJECTIVE

The aim of this study was to build prediction models based on the ensemble learning method for diabetes screening to further improve the health status of the population in a noninvasive and inexpensive manner.

METHODS

The dataset for building and evaluating the diabetes prediction model was extracted from the National Health and Nutrition Examination Survey from 2011-2016. After data cleaning and feature selection, the dataset was split into a training set (80%, 2011-2014), test set (20%, 2011-2014) and validation set (2015-2016). Three simple machine learning methods (linear discriminant analysis, support vector machine, and random forest) and easy ensemble methods were used to build diabetes prediction models. The performance of the models was evaluated through 5-fold cross-validation and external validation. The Delong test (2-sided) was used to test the performance differences between the models.

RESULTS

We selected 8057 observations and 12 attributes from the database. In the 5-fold cross-validation, the three simple methods yielded highly predictive performance models with areas under the curve (AUCs) over 0.800, wherein the ensemble methods significantly outperformed the simple methods. When we evaluated the models in the test set and validation set, the same trends were observed. The ensemble model of linear discriminant analysis yielded the best performance, with an AUC of 0.849, an accuracy of 0.730, a sensitivity of 0.819, and a specificity of 0.709 in the validation set.

CONCLUSIONS

This study indicates that efficient screening using machine learning methods with noninvasive tests can be applied to a large population and achieve the objective of secondary prevention.

Collapse

Tjandra D, Migrino RQ, Giordani B, Wiens J. Cohort discovery and risk stratification for Alzheimer's disease: an electronic health record-based approach. ALZHEIMER'S & DEMENTIA (NEW YORK, N. Y.) 2020;6:e12035. [PMID: 32548236 PMCID: PMC7293993 DOI: 10.1002/trc2.12035] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 04/18/2020] [Indexed: 11/17/2022]

Toward Prevention of Adverse Events Using Anticipatory Analytics. PROGRESS IN PREVENTIVE MEDICINE 2020. [DOI: 10.1097/pp9.0000000000000029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Wang X, Yang Y, Xu Y, Chen Q, Wang H, Gao H. Predicting hypoglycemic drugs of type 2 diabetes based on weighted rank support vector machine. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.105868] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Dexter GP, Grannis SJ, Dixon BE, Kasthurirathne SN. Generalization of Machine Learning Approaches to Identify Notifiable Conditions from a Statewide Health Information Exchange. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020;2020:152-161. [PMID: 32477634 PMCID: PMC7233074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Early temporal prediction of Type 2 Diabetes Risk Condition from a General Practitioner Electronic Health Record: A Multiple Instance Boosting Approach. Artif Intell Med 2020;105:101847. [PMID: 32505428 DOI: 10.1016/j.artmed.2020.101847] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 02/12/2020] [Accepted: 03/20/2020] [Indexed: 11/22/2022]

Lanera C, Berchialla P, Baldi I, Lorenzoni G, Tramontan L, Scamarcia A, Cantarutti L, Giaquinto C, Gregori D. Use of Machine Learning Techniques for Case-Detection of Varicella Zoster Using Routinely Collected Textual Ambulatory Records: Pilot Observational Study. JMIR Med Inform 2020;8:e14330. [PMID: 32369038 PMCID: PMC7238079 DOI: 10.2196/14330] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 08/28/2019] [Accepted: 12/16/2019] [Indexed: 12/11/2022] Open

Abstract

Background

The detection of infectious diseases through the analysis of free text on electronic health reports (EHRs) can provide prompt and accurate background information for the implementation of preventative measures, such as advertising and monitoring the effectiveness of vaccination campaigns.

Objective

The purpose of this paper is to compare machine learning techniques in their application to EHR analysis for disease detection.

Methods

The Pedianet database was used as a data source for a real-world scenario on the identification of cases of varicella. The models’ training and test sets were based on two different Italian regions’ (Veneto and Sicilia) data sets of 7631 patients and 1,230,355 records, and 2347 patients and 569,926 records, respectively, for whom a gold standard of varicella diagnosis was available. Elastic-net regularized generalized linear model (GLMNet), maximum entropy (MAXENT), and LogitBoost (boosting) algorithms were implemented in a supervised environment and 5-fold cross-validated. The document-term matrix generated by the training set involves a dictionary of 1,871,532 tokens. The analysis was conducted on a subset of 29,096 tokens, corresponding to a matrix with no more than a 99% sparsity ratio.

Results

The highest predictive values were achieved through boosting (positive predicative value [PPV] 63.1, 95% CI 42.7-83.5 and negative predicative value [NPV] 98.8, 95% CI 98.3-99.3). GLMNet delivered superior predictive capability compared to MAXENT (PPV 24.5% and NPV 98.3% vs PPV 11.0% and NPV 98.0%). MAXENT and GLMNet predictions weakly agree with each other (agreement coefficient 1 [AC1]=0.60, 95% CI 0.58-0.62), as well as with LogitBoost (MAXENT: AC1=0.64, 95% CI 0.63-0.66 and GLMNet: AC1=0.53, 95% CI 0.51-0.55).

Conclusions

Boosting has demonstrated promising performance in large-scale EHR-based infectious disease identification.

Collapse

Park Y, Ho JC. CaliForest: Calibrated Random Forest for Health Data. PROCEEDINGS OF THE ACM CONFERENCE ON HEALTH, INFERENCE, AND LEARNING 2020;2020:40-50. [PMID: 34308443 DOI: 10.1145/3368555.3384461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Li R, Chen Y, Ritchie MD, Moore JH. Electronic health records and polygenic risk scores for predicting disease risk. Nat Rev Genet 2020;21:493-502. [PMID: 32235907 DOI: 10.1038/s41576-020-0224-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/02/2020] [Indexed: 01/03/2023]

Zhang L, Wang Y, Niu M, Wang C, Wang Z. Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study. Sci Rep 2020;10:4406. [PMID: 32157171 PMCID: PMC7064542 DOI: 10.1038/s41598-020-61123-x] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 02/19/2020] [Indexed: 01/19/2023] Open

Abstract

With the development of data mining, machine learning offers opportunities to improve discrimination by analyzing complex interactions among massive variables. To test the ability of machine learning algorithms for predicting risk of type 2 diabetes mellitus (T2DM) in a rural Chinese population, we focus on a total of 36,652 eligible participants from the Henan Rural Cohort Study. Risk assessment models for T2DM were developed using six machine learning algorithms, including logistic regression (LR), classification and regression tree (CART), artificial neural networks (ANN), support vector machine (SVM), random forest (RF) and gradient boosting machine (GBM). The model performance was measured in an area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, negative predictive value and area under precision recall curve. The importance of variables was identified based on each classifier and the shapley additive explanations approach. Using all available variables, all models for predicting risk of T2DM demonstrated strong predictive performance, with AUCs ranging between 0.811 and 0.872 using laboratory data and from 0.767 to 0.817 without laboratory data. Among them, the GBM model performed best (AUC: 0.872 with laboratory data and 0.817 without laboratory data). Performance of models plateaued when introduced 30 variables to each model except CART model. Among the top-10 variables across all methods were sweet flavor, urine glucose, age, heart rate, creatinine, waist circumference, uric acid, pulse pressure, insulin, and hypertension. New important risk factors (urinary indicators, sweet flavor) were not found in previous risk prediction methods, but determined by machine learning in our study. Through the results, machine learning methods showed competence in predicting risk of T2DM, leading to greater insights on disease risk factors with no priori assumption of causality.

Collapse

Bernardini M, Romeo L, Misericordia P, Frontoni E. Discovering the Type 2 Diabetes in Electronic Health Records Using the Sparse Balanced Support Vector Machine. IEEE J Biomed Health Inform 2020;24:235-246. [DOI: 10.1109/jbhi.2019.2899218] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Talaei-Khoei A, Tavana M, Wilson JM. A predictive analytics framework for identifying patients at risk of developing multiple medical complications caused by chronic diseases. Artif Intell Med 2019;101:101750. [PMID: 31813486 DOI: 10.1016/j.artmed.2019.101750] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2018] [Revised: 07/07/2019] [Accepted: 10/30/2019] [Indexed: 01/22/2023]

Lanera C, Berchialla P, Sharma A, Minto C, Gregori D, Baldi I. Screening PubMed abstracts: is class imbalance always a challenge to machine learning? Syst Rev 2019;8:317. [PMID: 31810495 PMCID: PMC6896747 DOI: 10.1186/s13643-019-1245-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 11/25/2019] [Indexed: 11/17/2022] Open

Alexander J, Edwards RA, Manca L, Grugni R, Bonfanti G, Emir B, Whalen E, Watt S, Brodsky M, Parsons B. Integrating Machine Learning With Microsimulation to Classify Hypothetical, Novel Patients for Predicting Pregabalin Treatment Response Based on Observational and Randomized Data in Patients With Painful Diabetic Peripheral Neuropathy. Pragmat Obs Res 2019;10:67-76. [PMID: 31802967 PMCID: PMC6827520 DOI: 10.2147/por.s214412] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/04/2019] [Accepted: 10/15/2019] [Indexed: 11/23/2022] Open

Abstract

Purpose

Variability in patient treatment responses can be a barrier to effective care. Utilization of available patient databases may improve the prediction of treatment responses. We evaluated machine learning methods to predict novel, individual patient responses to pregabalin for painful diabetic peripheral neuropathy, utilizing an agent-based modeling and simulation platform that integrates real-world observational study (OS) data and randomized clinical trial (RCT) data.

Patients and methods

The best supervised machine learning methods were selected (through literature review) and combined in a novel way for aligning patients with relevant subgroups that best enable prediction of pregabalin responses. Data were derived from a German OS of pregabalin (N=2642) and nine international RCTs (N=1320). Coarsened exact matching of OS and RCT patients was used and a hierarchical cluster analysis was implemented. We tested which machine learning methods would best align candidate patients with specific clusters that predict their pain scores over time. Cluster alignments would trigger assignments of cluster-specific time-series regressions with lagged variables as inputs in order to simulate "virtual" patients and generate 1000 trajectory variations for given novel patients.

Results

Instance-based machine learning methods (k-nearest neighbor, supervised fuzzy c-means) were selected for quantitative analyses. Each method alone correctly classified 56.7% and 39.1% of patients, respectively. An "ensemble method" (combining both methods) correctly classified 98.4% and 95.9% of patients in the training and testing datasets, respectively.

Conclusion

An ensemble combination of two instance-based machine learning techniques best accommodated different data types (dichotomous, categorical, continuous) and performed better than either technique alone in assigning novel patients to subgroups for predicting treatment outcomes using microsimulation. Assignment of novel patients to a cluster of similar patients has the potential to improve prediction of patient outcomes for chronic conditions in which initial treatment response can be incorporated using microsimulation.

Clinical trial registries

www.clinicaltrials.gov: NCT00156078, NCT00159679, NCT00143156, NCT00553475.

Collapse

Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, Tran CT, Simpson CR. Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2019;182:105055. [PMID: 31505379 DOI: 10.1016/j.cmpb.2019.105055] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Revised: 08/17/2019] [Accepted: 08/27/2019] [Indexed: 06/10/2023]

Abstract

OBJECTIVE

Diabetes is responsible for considerable morbidity, healthcare utilisation and mortality in both developed and developing countries. Currently, methods of treating diabetes are inadequate and costly so prevention becomes an important step in reducing the burden of diabetes and its complications. Electronic health records (EHRs) for each individual or a population have become important tools in understanding developing trends of diseases. Using EHRs to predict the onset of diabetes could improve the quality and efficiency of medical care. In this paper, we apply a wide and deep learning model that combines the strength of a generalised linear model with various features and a deep feed-forward neural network to improve the prediction of the onset of type 2 diabetes mellitus (T2DM).

MATERIALS AND METHODS

The proposed method was implemented by training various models into a logistic loss function using a stochastic gradient descent. We applied this model using public hospital record data provided by the Practice Fusion EHRs for the United States population. The dataset consists of de-identified electronic health records for 9948 patients, of which 1904 have been diagnosed with T2DM. Prediction of diabetes in 2012 was based on data obtained from previous years (2009-2011). The imbalance class of the model was handled by Synthetic Minority Oversampling Technique (SMOTE) for each cross-validation training fold to analyse the performance when synthetic examples for the minority class are created. We used SMOTE of 150 and 300 percent, in which 300 percent means that three new synthetic instances are created for each minority class instance. This results in the approximated diabetes:non-diabetes distributions in the training set of 1:2 and 1:1, respectively.

RESULTS

Our final ensemble model not using SMOTE obtained an accuracy of 84.28%, area under the receiver operating characteristic curve (AUC) of 84.13%, sensitivity of 31.17% and specificity of 96.85%. Using SMOTE of 150 and 300 percent did not improve AUC (83.33% and 82.12%, respectively) but increased sensitivity (49.40% and 71.57%, respectively) with a moderate decrease in specificity (90.16% and 76.59%, respectively).

DISCUSSION AND CONCLUSIONS

Our algorithm has further optimised the prediction of diabetes onset using a novel state-of-the-art machine learning algorithm: the wide and deep learning neural network architecture.

Collapse

Dong Y, Xu L, Fan Y, Xiang P, Gao X, Chen Y, Zhang W, Ge Q. A novel surgical predictive model for Chinese Crohn's disease patients. Medicine (Baltimore) 2019;98:e17510. [PMID: 31725605 PMCID: PMC6867775 DOI: 10.1097/md.0000000000017510] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Abstract

Due to the complexity of Crohn's disease (CD), it is difficult to predict disease course with a single stratification factor or biomarker. A logistic regression (LR) model has been proposed by Guizzetti et al to stratify patients with CD-related surgical risk, which could help decision-making on disease treatment. However, there are no reports on relevant studies on Chinese population. The aim of the study is to present and validate a novel surgical predictive model to facilitate therapeutic decision-making for Chinese CD patients. Data was extracted from retrospective full-mode electronic medical records, which contained 239 CD patients and 1524 instances. Two sub-datasets were generated according to different attribute selection strategies, both of which were split into training and testing sets randomly. The imbalanced data in the training sets was addressed by synthetic minority over-sampling technique (SMOTE) algorithm before model development. Seven predictive models were employed using 5 popular machine learning algorithms: random forest (RF), LR, support vector machine (SVM), decision tree (DT) and artificial neural networks (ANN). The performance of each model was evaluated by accuracy, precision, F1-score, true negative (TN) rate, and the area under the receiver operating characteristic curve (AuROC). The result revealed that RF outperformed all other baseline models on both sub-datasets. The 10 leading risk factors for CD-related surgery returned from RF for attribute ranking were changes of radiology, presence of a fistula, presence of an abscess, no infliximab use, enteroscopy findings, C-reactive protein, abdominal pain, white blood cells, erythrocyte sedimentation rate and platelet count. The proposed machine learning model can accurately predict the risk of surgical intervention in Chinese CD patients, which could be used to tailor and modify the treatment strategies for CD patients in clinical practice.

Collapse

Abhari S, Niakan Kalhori SR, Ebrahimi M, Hasannejadasl H, Garavand A. Artificial Intelligence Applications in Type 2 Diabetes Mellitus Care: Focus on Machine Learning Methods. Healthc Inform Res 2019;25:248-261. [PMID: 31777668 PMCID: PMC6859270 DOI: 10.4258/hir.2019.25.4.248] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2019] [Revised: 10/06/2019] [Accepted: 10/09/2019] [Indexed: 12/18/2022] Open

Abstract

Objectives

The incidence of type 2 diabetes mellitus has increased significantly in recent years. With the development of artificial intelligence applications in healthcare, they are used for diagnosis, therapeutic decision making, and outcome prediction, especially in type 2 diabetes mellitus. This study aimed to identify the artificial intelligence (AI) applications for type 2 diabetes mellitus care.

Methods

This is a review conducted in 2018. We searched the PubMed, Web of Science, and Embase scientific databases, based on a combination of related mesh terms. The article selection process was based on Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Finally, 31 articles were selected after inclusion and exclusion criteria were applied. Data gathering was done by using a data extraction form. Data were summarized and reported based on the study objectives.

Results

The main applications of AI for type 2 diabetes mellitus care were screening and diagnosis in different stages. Among all of the reviewed AI methods, machine learning methods with 71% (n = 22) were the most commonly applied techniques. Many applications were in multi method forms (23%). Among the machine learning algorithms applications, support vector machine (21%) and naive Bayesian (19%) were the most commonly used methods. The most important variables that were used in the selected studies were body mass index, fasting blood sugar, blood pressure, HbA1c, triglycerides, low-density lipoprotein, high-density lipoprotein, and demographic variables.

Conclusions

It is recommended to select optimal algorithms by testing various techniques. Support vector machine and naive Bayesian might achieve better performance than other applications due to the type of variables and targets in diabetes-related outcomes classification.

Collapse

Kim J, Chang H, Kim D, Jang DH, Park I, Kim K. Machine learning for prediction of septic shock at initial triage in emergency department. J Crit Care 2019;55:163-170. [PMID: 31734491 DOI: 10.1016/j.jcrc.2019.09.024] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 09/05/2019] [Accepted: 09/23/2019] [Indexed: 12/23/2022]

Gilvary C, Madhukar N, Elkhader J, Elemento O. The Missing Pieces of Artificial Intelligence in Medicine. Trends Pharmacol Sci 2019;40:555-564. [DOI: 10.1016/j.tips.2019.06.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 06/03/2019] [Accepted: 06/04/2019] [Indexed: 12/22/2022]

Bernardini M, Morettini M, Romeo L, Frontoni E, Burattini L. TyG-er: An ensemble Regression Forest approach for identification of clinical factors related to insulin resistance condition using Electronic Health Records. Comput Biol Med 2019;112:103358. [PMID: 31336327 DOI: 10.1016/j.compbiomed.2019.103358] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/17/2019] [Accepted: 07/15/2019] [Indexed: 01/19/2023]

Abstract

BACKGROUND

Insulin resistance is an early-stage deterioration of Type 2 diabetes. Identification and quantification of insulin resistance requires specific blood tests; however, the triglyceride-glucose (TyG) index can provide a surrogate assessment from routine Electronic Health Record (EHR) data. Since insulin resistance is a multi-factorial condition, to improve its characterisation, this study aims to discover non-trivial clinical factors in EHR data to determine where the insulin-resistance condition is encoded.

METHODS

We proposed a high-interpretable Machine Learning approach (i.e., ensemble Regression Forest combined with data imputation strategies), named TyG-er. We applied three different experimental procedures to test TyG-er reliability on the Italian Federation of General Practitioners dataset, named FIMMG_obs dataset, which is publicly available and reflects the clinical use-case (i.e., not all laboratory exams are prescribed on a regular basis over time).

RESULTS

Results detected non-conventional clinical factors (i.e., uricemia, leukocytes, gamma-glutamyltransferase and protein profile) and provided novel insight into the best combination of clinical factors for detecting early glucose tolerance deterioration. The robustness of these extracted clinical factors was confirmed by the high agreement (from 0.664 to 0.911 of Lin's correlation coefficient (r_c)) of the TyG-er approach among different experimental procedures. Moreover, the results of the three experimental procedures outlined the predictive power of the TyG-er approach (up to a mean absolute error of 5.68% and r_c=0.666,p<.05).

CONCLUSIONS

The TyG-er approach is able to carry information about the identification of the TyG index, strictly correlated with the insulin-resistance condition, while extracting the most relevant non-glycemic features from routine data.

Collapse

Yao J, Liu F, Geng Y. Query-specific optimal convolutional neural ranker. Neural Comput Appl 2019. [DOI: 10.1007/s00521-017-3257-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

100

A Comprehensive Medical Decision–Support Framework Based on a Heterogeneous Ensemble Classifier for Diabetes Prediction. ELECTRONICS 2019. [DOI: 10.3390/electronics8060635] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]