Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 2016;97:120-127. [PMID: 27919371 DOI: 10.1016/j.ijmedinf.2016.09.014] [Citation(s) in RCA: 118] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 09/27/2016] [Accepted: 09/30/2016] [Indexed: 01/19/2023]

For:	Zheng T, Xie W, Xu L, He X, Zhang Y, You M, Yang G, Chen Y. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int J Med Inform 2016;97:120-127. [PMID: 27919371 DOI: 10.1016/j.ijmedinf.2016.09.014] [Citation(s) in RCA: 118] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 09/27/2016] [Accepted: 09/30/2016] [Indexed: 01/19/2023]

Number

Cited by Other Article(s)

Choong C, Brnabic A, Chinthammit C, Ravuri M, Terrell K, Kan H. Applying machine learning approaches for predicting obesity risk using US health administrative claims database. BMJ Open Diabetes Res Care 2024;12:e004193. [PMID: 39327067 DOI: 10.1136/bmjdrc-2024-004193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 09/12/2024] [Indexed: 09/28/2024] Open

Abstract

INTRODUCTION

Body mass index (BMI) is inadequately recorded in US administrative claims databases. We aimed to validate the sensitivity and positive predictive value (PPV) of BMI-related diagnosis codes using an electronic medical records (EMR) claims-linked database. Additionally, we applied machine learning (ML) to identify features in US claims databases to predict obesity status.

RESEARCH DESIGN AND METHODS

This observational, retrospective analysis included 692 119 people ≥18 years of age, with ≥1 BMI reading in MarketScan Explorys Claims-EMR data (January 2013-December 2019). Claims-based obesity status was compared with EMR-based BMI (gold standard) to assess BMI-related diagnosis code sensitivity and PPV. Logistic regression (LR), penalized LR with L1 penalty (Least Absolute Shrinkage and Selection Operator), extreme gradient boosting (XGBoost) and random forest, with features drawn from insurance claims, were trained to predict obesity status (BMI≥30 kg/m2) from EMR as the gold standard. Model performance was compared using several metrics, including the area under the receiver operating characteristic curve. The best-performing model was applied to assess feature importance. Obesity risk scores were computed from the best model generated from the claims database and compared against the BMI recorded in the EMR.

RESULTS

The PPV of diagnosis codes from claims alone remained high over the study period (85.4-89.2%); sensitivity was low (16.8-44.8%). XGBoost performed the best at predicting obesity with the highest area under the curve (AUC; 79.4%) and the lowest Brier score. The number of obesity diagnoses and obesity diagnoses from inpatient settings were the most important predictors of obesity. XGBoost showed an AUC of 74.1% when trained without an obesity diagnosis.

CONCLUSIONS

Obesity prevalence is under-reported in claims databases. ML models, with or without explicit obesity, show promise in improving obesity prediction accuracy compared with obesity codes alone. Improved obesity status prediction may assist practitioners and payors to estimate the burden of obesity and investigate the potential unmet needs of current treatments.

Collapse

Mahfouz M, Mahfouz Y, Harmouche-Karaki M, Matta J, Younes H, Helou K, Finan R, Abi-Tayeh G, Meslimani M, Moussa G, Chahrour N, Osseiran C, Skaiki F, Narbonne JF. Utilizing machine learning to classify persistent organic pollutants in the serum of pregnant women: a predictive modeling approach. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2024;31:52980-52995. [PMID: 39168932 DOI: 10.1007/s11356-024-34684-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 08/07/2024] [Indexed: 08/23/2024]

Abstract

Polychlorinated biphenyls (PCBs), organochlorine pesticides (OCPs), polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofurans (PCDD/Fs), and per- and poly-fluoroalkyl substances (PFAS) are persistent organic pollutants (POPs) that remain detrimental to critical subpopulations, namely pregnant women. Required tests for biomonitoring are quite expensive. Moreover, statistical models aiming to discover the relationships between pollutants levels and human characteristics have their limitations. Therefore, the objective of this study is to use machine learning predictive models to further examine the pollutants' predictors, while comparing them. Levels of 33 congeners were measured in the serum of 269 pregnant women, from whom data was collected regarding sociodemographic, dietary, environmental, and anthropometric characteristics. Several machine learning algorithms were compared using "Python" for each pollutant: support vector machine (SVM), random forest, XGBoost, and neural networks. The aforementioned characteristics were included in the model as features. Prediction, accuracy, precision, recall, F1-score, area under the ROC curve (AUC), sensitivity, and specificity were retrieved to compare the models between them and among pollutants. The highest performing model for all pollutants was Random Forest. Results showed a moderate to acceptable performance and discriminative power among all POPs, with OCPs' model performing slightly better than all other models. Top related features for each model were also presented using SHAP analysis, detailing the predictors' negative or positive impact on the model. In conclusion, developing such a tool is of major importance in a context of limited financial and research resources. Nevertheless, machine learning models should always be interpreted with caution by exploring all evaluation metrics.

Collapse

Affiliation(s)

Maya Mahfouz Department of Nutrition, Faculty of Pharmacy, Medical Sciences Campus, Saint Joseph University of Beirut, Damascus RoadRiad Solh, P.O. Box 115076, Beirut, 1107 2180, Lebanon.
Yara Mahfouz Department of Nutrition, Faculty of Pharmacy, Medical Sciences Campus, Saint Joseph University of Beirut, Damascus RoadRiad Solh, P.O. Box 115076, Beirut, 1107 2180, Lebanon
Mireille Harmouche-Karaki Department of Nutrition, Faculty of Pharmacy, Medical Sciences Campus, Saint Joseph University of Beirut, Damascus RoadRiad Solh, P.O. Box 115076, Beirut, 1107 2180, Lebanon
Joseph Matta Industrial Research Institute, Lebanese University Campus, Baabda, Hadath, Lebanon, P.O. Box 112806
Hassan Younes Institut Polytechnique UniLaSalle, Collège Santé, Equipe PANASH, Membre de l'ULR 7519, Université d'Artois, 19 Rue Pierre Waguet, 60026, Beauvais, France
Khalil Helou Department of Nutrition, Faculty of Pharmacy, Medical Sciences Campus, Saint Joseph University of Beirut, Damascus RoadRiad Solh, P.O. Box 115076, Beirut, 1107 2180, Lebanon
Ramzi Finan Hotel-Dieu de France, Saint Joseph University of Beirut Hospital, Blvd Alfred Naccache, Beirut, Lebanon, P.O. Box 166830
Georges Abi-Tayeh Hotel-Dieu de France, Saint Joseph University of Beirut Hospital, Blvd Alfred Naccache, Beirut, Lebanon, P.O. Box 166830
Mohamad Meslimani General Management, Chtoura Hospital, Beqaa, Lebanon
Ghada Moussa Department of Obstetrics and Gynecology, Chtoura Hospital, Beqaa, Lebanon
Nada Chahrour Department of Obstetrics and Gynecology, SRH University Hospital, Nabatieh, Lebanon
Camille Osseiran Department of Obstetrics and Gynecology, Kassab Hospital, Saida, Lebanon
Farouk Skaiki Department of Molecular Biology, General Management, Al Karim Medical Laboratories, Saida, Lebanon
Jean-François Narbonne Laboratoire de Physico-Toxico Chimie Des Systèmes Naturels, University of Bordeaux, 33405, Talence, CEDEX, France

Collapse

Oss Boll H, Amirahmadi A, Ghazani MM, Morais WOD, Freitas EPD, Soliman A, Etminani F, Byttner S, Recamonde-Mendoza M. Graph neural networks for clinical risk prediction based on electronic health records: A survey. J Biomed Inform 2024;151:104616. [PMID: 38423267 DOI: 10.1016/j.jbi.2024.104616] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/21/2024] [Accepted: 02/23/2024] [Indexed: 03/02/2024]

Bali V, Turzhitsky V, Schelfhout J, Paudel M, Hulbert E, Peterson-Brandt J, Hertzberg J, Kelly NR, Patel RH. Machine learning to identify chronic cough from administrative claims data. Sci Rep 2024;14:2449. [PMID: 38291064 PMCID: PMC10828499 DOI: 10.1038/s41598-024-51522-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 01/06/2024] [Indexed: 02/01/2024] Open

Jadhav P, Sears T, Floan G, Joskowitz K, Nienow S, Cruz S, David M, de Cos V, Choi P, Ignacio RC. Application of a Machine Learning Algorithm in Prediction of Abusive Head Trauma in Children. J Pediatr Surg 2024;59:80-85. [PMID: 37858394 DOI: 10.1016/j.jpedsurg.2023.09.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Accepted: 09/07/2023] [Indexed: 10/21/2023]

Abstract

PURPOSE

We explored the application of a machine learning algorithm for the timely detection of potential abusive head trauma (AHT) using the first free-text note of an encounter and demographic information.

METHODS

First free-text physician notes and demographic information were collected for children under 5 years of age at a Level 1 Trauma Center. The control group, which included patients with head/neck injury, was compared to those with AHT diagnosed by the Child Protective Team. Differential scores accounted for words overrepresented in AHT patient vs. control notes. Sentiment scores were reflective of note positivity/negativity and subjectivity scores accounted for note subjectivity/objectivity. The composite scores reflected the patient's differential score modified by the subjectivity score. Composite, sentiment, and subjectivity scores combined with demographic information trained a Random Forest (RF) machine learning algorithm to predict AHT.

RESULTS

Final composite scores with demographic information were highly associated with AHT in a test dataset. The control group included 587 patients and the test group included 193 patients. Combining composite scores with demographic information into the RF model improved AHT classification area under the curve (AUC) from 0.68 to 0.78, with an overall accuracy of 84%. Feature importance analysis of our RF model revealed that composite score, sentiment, age, and subjectivity were the most impactful predictors of AHT. The sentiment was not significantly different between control and AHT notes (p = 0.87), while subjectivity trended higher for AHT notes (p = 0.081).

CONCLUSION

We conclude that a machine learning algorithm can recognize patterns within free-text notes and demographic information that aid in AHT detection in children.

LEVEL OF EVIDENCE

III.

Collapse

Paramasivam G, Sanmugam A, Palem VV, Sevanan M, Sairam AB, Nachiappan N, Youn B, Lee JS, Nallal M, Park KH. Nanomaterials for detection of biomolecules and delivering therapeutic agents in theragnosis: A review. Int J Biol Macromol 2024;254:127904. [PMID: 37939770 DOI: 10.1016/j.ijbiomac.2023.127904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 10/30/2023] [Accepted: 11/03/2023] [Indexed: 11/10/2023]

Sampa MB, Biswas T, Rahman MS, Aziz NHBA, Hossain MN, Aziz NAA. A Machine Learning Web App to Predict Diabetic Blood Glucose Based on a Basic Noninvasive Health Checkup, Sociodemographic Characteristics, and Dietary Information: Case Study. JMIR Diabetes 2023;8:e49113. [PMID: 37999944 DOI: 10.2196/49113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 09/28/2023] [Accepted: 10/11/2023] [Indexed: 11/25/2023] Open

Abstract

BACKGROUND

Over the past few decades, diabetes has become a serious public health concern worldwide, particularly in Bangladesh. The advancement of artificial intelligence can be reaped in the prediction of blood glucose levels for better health management. However, the practical validity of machine learning (ML) techniques for predicting health parameters using data from low- and middle-income countries, such as Bangladesh, is very low. Specifically, Bangladesh lacks research using ML techniques to predict blood glucose levels based on basic noninvasive clinical measurements and dietary and sociodemographic information.

OBJECTIVE

To formulate strategies for public health planning and the control of diabetes, this study aimed to develop a personalized ML model that predicts the blood glucose level of urban corporate workers in Bangladesh.

METHODS

Based on the basic noninvasive health checkup test results, dietary information, and sociodemographic characteristics of 271 employees of the Bangladeshi Grameen Bank complex, 5 well-known ML models, namely, linear regression, boosted decision tree regression, neural network, decision forest regression, and Bayesian linear regression, were used to predict blood glucose levels. Continuous blood glucose data were used in this study to train the model, which then used the trained data to predict new blood glucose values.

RESULTS

Boosted decision tree regression demonstrated the greatest predictive performance of all evaluated models (root mean squared error=2.30). This means that, on average, our model's predicted blood glucose level deviated from the actual blood glucose level by around 2.30 mg/dL. The mean blood glucose value of the population studied was 128.02 mg/dL (SD 56.92), indicating a borderline result for the majority of the samples (normal value: 140 mg/dL). This suggests that the individuals should be monitoring their blood glucose levels regularly.

CONCLUSIONS

This ML-enabled web application for blood glucose prediction helps individuals to self-monitor their health condition. The application was developed with communities in remote areas of low- and middle-income countries, such as Bangladesh, in mind. These areas typically lack health facilities and have an insufficient number of qualified doctors and nurses. The web-based application is a simple, practical, and effective solution that can be adopted by the community. Use of the web application can save money on medical expenses, time, and health management expenses. The created system also aids in achieving the Sustainable Development Goals, particularly in ensuring that everyone in the community enjoys good health and well-being and lowering total morbidity and mortality.

Collapse

Zhou D, Xie J, Wang J, Zong J, Fang Q, Luo F, Zhang T, Ma H, Cao L, Yin H, Yin S, Li S. Establishment of a differential diagnosis method and an online prediction platform for AOSD and sepsis based on gradient boosting decision trees algorithm. Arthritis Res Ther 2023;25:220. [PMID: 37974244 PMCID: PMC10652592 DOI: 10.1186/s13075-023-03207-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2023] [Accepted: 11/03/2023] [Indexed: 11/19/2023] Open

Winkelman J, Nguyen D, vanSonnenberg E, Kirk A, Lieberman S. Artificial Intelligence (AI) in pediatric endocrinology. J Pediatr Endocrinol Metab 2023;36:903-908. [PMID: 37589444 DOI: 10.1515/jpem-2023-0287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 08/03/2023] [Indexed: 08/18/2023]

Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E, Berkovsky S. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int J Med Inform 2023;177:105122. [PMID: 37295138 DOI: 10.1016/j.ijmedinf.2023.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 04/14/2023] [Accepted: 06/03/2023] [Indexed: 06/12/2023]

Abstract

BACKGROUND

Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments.

METHODS

We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries).

RESULTS

We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were "problem", "test" and "treatment". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool.

DISCUSSION

Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.

Collapse

Sajjadi SF, Sacre JW, Chen L, Wild SH, Shaw JE, Magliano DJ. Algorithms to define diabetes type using data from administrative databases: A systematic review of the evidence. Diabetes Res Clin Pract 2023;203:110859. [PMID: 37517777 DOI: 10.1016/j.diabres.2023.110859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 07/06/2023] [Accepted: 07/28/2023] [Indexed: 08/01/2023]

Wu Y, Min H, Li M, Shi Y, Ma A, Han Y, Gan Y, Guo X, Sun X. Effect of Artificial Intelligence-based Health Education Accurately Linking System (AI-HEALS) for Type 2 diabetes self-management: protocol for a mixed-methods study. BMC Public Health 2023;23:1325. [PMID: 37434126 DOI: 10.1186/s12889-023-16066-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 06/06/2023] [Indexed: 07/13/2023] Open

Abstract

BACKGROUND

Patients with type 2 diabetes (T2DM) have an increasing need for personalized and Precise management as medical technology advances. Artificial intelligence (AI) technologies on mobile devices are being developed gradually in a variety of healthcare fields. As an AI field, knowledge graph (KG) is being developed to extract and store structured knowledge from massive data sets. It has great prospects for T2DM medical information retrieval, clinical decision-making, and individual intelligent question and answering (QA), but has yet to be thoroughly researched in T2DM intervention. Therefore, we designed an artificial intelligence-based health education accurately linking system (AI-HEALS) to evaluate if the AI-HEALS-based intervention could help patients with T2DM improve their self-management abilities and blood glucose control in primary healthcare.

METHODS

This is a nested mixed-method study that includes a community-based cluster-randomized control trial and personal in-depth interviews. Individuals with T2DM between the ages of 18 and 75 will be recruited from 40-45 community health centers in Beijing, China. Participants will either receive standard diabetes primary care (SDPC) (control, 3 months) or SDPC plus AI-HEALS online health education program (intervention, 3 months). The AI-HEALS runs in the WeChat service platform, which includes a KBQA, a system of physiological indicators and lifestyle recording and monitoring, medication and blood glucose monitoring reminders, and automated, personalized message sending. Data on sociodemography, medical examination, blood glucose, and self-management behavior will be collected at baseline, as well as 1,3,6,12, and 18 months later. The primary outcome is to reduce HbA1c levels. Secondary outcomes include changes in self-management behavior, social cognition, psychology, T2DM skills, and health literacy. Furthermore, the cost-effectiveness of the AI-HEALS-based intervention will be evaluated.

DISCUSSION

KBQA system is an innovative and cost-effective technology for health education and promotion for T2DM patients, but it is not yet widely used in the T2DM interventions. This trial will provide evidence on the efficacy of AI and mHealth-based personalized interventions in primary care for improving T2DM outcomes and self-management behaviors.

TRIAL REGISTRATION

Biomedical Ethics Committee of Peking University: IRB00001052-22,058, 2022/06/06; Clinical Trials: ChiCTR2300068952, 02/03/2023.

Collapse

Ying W. Phenomic Studies on Diseases: Potential and Challenges. PHENOMICS (CHAM, SWITZERLAND) 2023;3:285-299. [PMID: 36714223 PMCID: PMC9867904 DOI: 10.1007/s43657-022-00089-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 11/21/2022] [Accepted: 11/24/2022] [Indexed: 01/23/2023]

Yun K, He T, Zhen S, Quan M, Yang X, Man D, Zhang S, Wang W, Han X. Development and validation of explainable machine-learning models for carotid atherosclerosis early screening. J Transl Med 2023;21:353. [PMID: 37246225 DOI: 10.1186/s12967-023-04093-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 03/28/2023] [Indexed: 05/30/2023] Open

Abstract

BACKGROUND

Carotid atherosclerosis (CAS), an important factor in the development of stroke, is a major public health concern. The aim of this study was to establish and validate machine learning (ML) models for early screening of CAS using routine health check-up indicators in northeast China.

METHODS

A total of 69,601 health check-up records from the health examination center of the First Hospital of China Medical University (Shenyang, China) were collected between 2018 and 2019. For the 2019 records, 80% were assigned to the training set and 20% to the testing set. The 2018 records were used as the external validation dataset. Ten ML algorithms, including decision tree (DT), K-nearest neighbors (KNN), logistic regression (LR), naive Bayes (NB), random forest (RF), multiplayer perceptron (MLP), extreme gradient boosting machine (XGB), gradient boosting decision tree (GBDT), linear support vector machine (SVM-linear), and non-linear support vector machine (SVM-nonlinear), were used to construct CAS screening models. The area under the receiver operating characteristic curve (auROC) and precision-recall curve (auPR) were used as measures of model performance. The SHapley Additive exPlanations (SHAP) method was used to demonstrate the interpretability of the optimal model.

RESULTS

A total of 6315 records of patients undergoing carotid ultrasonography were collected; of these, 1632, 407, and 1141 patients were diagnosed with CAS in the training, internal validation, and external validation datasets, respectively. The GBDT model achieved the highest performance metrics with auROC of 0.860 (95% CI 0.839-0.880) in the internal validation dataset and 0.851 (95% CI 0.837-0.863) in the external validation dataset. Individuals with diabetes or those over 65 years of age showed low negative predictive value. In the interpretability analysis, age was the most important factor influencing the performance of the GBDT model, followed by sex and non-high-density lipoprotein cholesterol.

CONCLUSIONS

The ML models developed could provide good performance for CAS identification using routine health check-up indicators and could hopefully be applied in scenarios without ethnic and geographic heterogeneity for CAS prevention.

Collapse

Affiliation(s)

Ke Yun National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
Tao He Neusoft Research Institute, Neusoft Corporation, Shenyang, Liaoning Province, China
Shi Zhen Department of Software Engineering, Northeastern University, Shenyang, Liaoning Province, China
Meihui Quan National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
Xiaotao Yang National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
Dongliang Man National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
Shuang Zhang National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China
Wei Wang Department of Physical Examination Center, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China.
Xiaoxu Han National Clinical Research Center for Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China. Department of Laboratory Medicine, The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China. Laboratory Medicine Innovation Unit, Chinese Academy of Medical Sciences, Shenyang, Liaoning Province, China. NHC Key Laboratory of AIDS Immunology (China Medical University), The First Affiliated Hospital of China Medical University, Shenyang, Liaoning Province, China.

Collapse

Banaye Yazdipour A, Masoorian H, Ahmadi M, Mohammadzadeh N, Ayyoubzadeh SM. Predicting the toxicity of nanoparticles using artificial intelligence tools: a systematic review. Nanotoxicology 2023;17:62-77. [PMID: 36883698 DOI: 10.1080/17435390.2023.2186279] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]

Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, Turner K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med 2023;155:106649. [PMID: 36805219 DOI: 10.1016/j.compbiomed.2023.106649] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]

Dweekat OY, Lam SS. Optimized design of hybrid genetic algorithm with multilayer perceptron to predict patients with diabetes. Soft comput 2023. [DOI: 10.1007/s00500-023-07876-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]

Keller MS, Qureshi N, Albertson E, Pevnick J, Brandt N, Bui A, Sarkisian CA. Comparing risk prediction models aimed at predicting hospitalizations for adverse drug events in community dwelling older adults: a protocol paper. RESEARCH SQUARE 2023:rs.3.rs-2429369. [PMID: 36711695 PMCID: PMC9882666 DOI: 10.21203/rs.3.rs-2429369/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]

Hospital selection framework for remote MCD patients based on fuzzy q-rung orthopair environment. Neural Comput Appl 2023;35:6185-6196. [PMID: 36415285 PMCID: PMC9672551 DOI: 10.1007/s00521-022-07998-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 10/25/2022] [Indexed: 11/18/2022]

Zhang S, Yin Q, Wang J. Elevator dynamic monitoring and early warning system based on machine learning algorithm. IET NETWORKS 2022. [DOI: 10.1049/ntw2.12077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Hahn SJ, Kim S, Choi YS, Lee J, Kang J. Prediction of type 2 diabetes using genome-wide polygenic risk score and metabolic profiles: A machine learning analysis of population-based 10-year prospective cohort study. EBioMedicine 2022;86:104383. [PMID: 36462406 PMCID: PMC9713286 DOI: 10.1016/j.ebiom.2022.104383] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/09/2022] [Accepted: 11/09/2022] [Indexed: 12/05/2022] Open

Abstract

BACKGROUND

Previous work on predicting type 2 diabetes by integrating clinical and genetic factors has mostly focused on the Western population. In this study, we use genome-wide polygenic risk score (gPRS) and serum metabolite data for type 2 diabetes risk prediction in the Asian population.

METHODS

Data of 1425 participants from the Korean Genome and Epidemiology Study (KoGES) Ansan-Ansung cohort were used in this study. For gPRS analysis, genotypic and clinical information from KoGES health examinee (n = 58,701) and KoGES cardiovascular disease association (n = 8105) sub-cohorts were included. Linkage disequilibrium analysis identified 239,062 genetic variants that were used to determine the gPRS, while the metabolites were selected using the Boruta algorithm. We used bootstrapped cross-validation to evaluate logistic regression and random forest (RF)-based machine learning models. Finally, associations of gPRS and selected metabolites with the values of homeostatic model assessment of beta-cell function (HOMA-B) and insulin resistance (HOMA-IR) were further estimated.

FINDINGS

During the follow-up period (8.3 ± 2.8 years), 331 participants (23.2%) were diagnosed with type 2 diabetes. The areas under the curves of the RF-based models were 0.844, 0.876, and 0.883 for the model using only demographic and clinical factors, model including the gPRS, and model with both gPRS and metabolites, respectively. Incorporation of additional parameters in the latter two models improved the classification by 11.7% and 4.2% respectively. While gPRS was significantly associated with HOMA-B value, most metabolites had a significant association with HOMA-IR value.

INTERPRETATION

Incorporating both gPRS and metabolite data led to enhanced type 2 diabetes risk prediction by capturing distinct etiologies of type 2 diabetes development. An RF-based model using clinical factors, gPRS, and metabolites predicted type 2 diabetes risk more accurately than the logistic regression-based model.

FUNDING

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MEST) (No. 2019M3E5D1A02070863 and 2022R1C1C1005458). This work was also supported by the 2020 Research Fund (1.200098.01) of UNIST (Ulsan National Institute of Science & Technology).

Collapse

Islam MM, Rahman MJ, Menhazul Abedin M, Ahammed B, Ali M, Ahmed NF, Maniruzzaman M. Identification of the risk factors of type 2 diabetes and its prediction using machine learning techniques. Health Syst (Basingstoke) 2022;12:243-254. [PMID: 37234468 PMCID: PMC10208154 DOI: 10.1080/20476965.2022.2141141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Accepted: 10/20/2022] [Indexed: 11/07/2022] Open

Liu F, Demosthenes P. Real-world data: a brief review of the methods, applications, challenges and opportunities. BMC Med Res Methodol 2022;22:287. [PMID: 36335315 PMCID: PMC9636688 DOI: 10.1186/s12874-022-01768-6] [Citation(s) in RCA: 86] [Impact Index Per Article: 43.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 10/22/2022] [Indexed: 11/07/2022] Open

Xia S, Zhang Y, Peng B, Hu X, Zhou L, Chen C, Lu C, Chen M, Pang C, Dai Y, Ji J. Detection of mild cognitive impairment in type 2 diabetes mellitus based on machine learning using privileged information. Neurosci Lett 2022;791:136908. [DOI: 10.1016/j.neulet.2022.136908] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/28/2022] [Accepted: 10/04/2022] [Indexed: 01/21/2023]

Olusanya MO, Ogunsakin RE, Ghai M, Adeleke MA. Accuracy of Machine Learning Classification Models for the Prediction of Type 2 Diabetes Mellitus: A Systematic Survey and Meta-Analysis Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:ijerph192114280. [PMID: 36361161 PMCID: PMC9655196 DOI: 10.3390/ijerph192114280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Revised: 10/22/2022] [Accepted: 10/25/2022] [Indexed: 05/13/2023]

Aplicaciones de aprendizaje automático en salud. REVISTA MÉDICA CLÍNICA LAS CONDES 2022. [DOI: 10.1016/j.rmclc.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Althomsons SP, Winglee K, Heilig CM, Talarico S, Silk B, Wortham J, Hill AN, Navin TR. Using Machine Learning Techniques and National Tuberculosis Surveillance Data to Predict Excess Growth in Genotyped Tuberculosis Clusters. Am J Epidemiol 2022;191:1936-1943. [PMID: 35780450 PMCID: PMC10790200 DOI: 10.1093/aje/kwac117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 05/05/2022] [Accepted: 06/28/2022] [Indexed: 02/01/2023] Open

Kurasawa H, Waki K, Chiba A, Seki T, Hayashi K, Fujino A, Haga T, Noguchi T, Ohe K. Treatment Discontinuation Prediction in Patients With Diabetes Using a Ranking Model: Machine Learning Model Development. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022;3:e37951. [PMID: 38935955 PMCID: PMC11135228 DOI: 10.2196/37951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 06/19/2022] [Accepted: 09/02/2022] [Indexed: 06/29/2024]

Abstract

BACKGROUND

Treatment discontinuation (TD) is one of the major prognostic issues in diabetes care, and several models have been proposed to predict a missed appointment that may lead to TD in patients with diabetes by using binary classification models for the early detection of TD and for providing intervention support for patients. However, as binary classification models output the probability of a missed appointment occurring within a predetermined period, they are limited in their ability to estimate the magnitude of TD risk in patients with inconsistent intervals between appointments, making it difficult to prioritize patients for whom intervention support should be provided.

OBJECTIVE

This study aimed to develop a machine-learned prediction model that can output a TD risk score defined by the length of time until TD and prioritize patients for intervention according to their TD risk.

METHODS

This model included patients with diagnostic codes indicative of diabetes at the University of Tokyo Hospital between September 3, 2012, and May 17, 2014. The model was internally validated with patients from the same hospital from May 18, 2014, to January 29, 2016. The data used in this study included 7551 patients who visited the hospital after January 1, 2004, and had diagnostic codes indicative of diabetes. In particular, data that were recorded in the electronic medical records between September 3, 2012, and January 29, 2016, were used. The main outcome was the TD of a patient, which was defined as missing a scheduled clinical appointment and having no hospital visits within 3 times the average number of days between the visits of the patient and within 60 days. The TD risk score was calculated by using the parameters derived from the machine-learned ranking model. The prediction capacity was evaluated by using test data with the C-index for the performance of ranking patients, area under the receiver operating characteristic curve, and area under the precision-recall curve for discrimination, in addition to a calibration plot.

RESULTS

The means (95% confidence limits) of the C-index, area under the receiver operating characteristic curve, and area under the precision-recall curve for the TD risk score were 0.749 (0.655, 0.823), 0.758 (0.649, 0.857), and 0.713 (0.554, 0.841), respectively. The observed and predicted probabilities were correlated with the calibration plots.

CONCLUSIONS

A TD risk score was developed for patients with diabetes by combining a machine-learned method with electronic medical records. The score calculation can be integrated into medical records to identify patients at high risk of TD, which would be useful in supporting diabetes care and preventing TD.

Collapse

Noori A, Magdamo C, Liu X, Tyagi T, Li Z, Kondepudi A, Alabsi H, Rudmann E, Wilcox D, Brenner L, Robbins GK, Moura L, Zafar S, Benson NM, Hsu J, R Dickson J, Serrano-Pozo A, Hyman BT, Blacker D, Westover MB, Mukerji SS, Das S. Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study. J Med Internet Res 2022;24:e40384. [PMID: 36040790 PMCID: PMC9472045 DOI: 10.2196/40384] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 07/29/2022] [Accepted: 07/31/2022] [Indexed: 11/13/2022] Open

Abstract

BACKGROUND

Electronic health records (EHRs) with large sample sizes and rich information offer great potential for dementia research, but current methods of phenotyping cognitive status are not scalable.

OBJECTIVE

The aim of this study was to evaluate whether natural language processing (NLP)-powered semiautomated annotation can improve the speed and interrater reliability of chart reviews for phenotyping cognitive status.

METHODS

In this diagnostic study, we developed and evaluated a semiautomated NLP-powered annotation tool (NAT) to facilitate phenotyping of cognitive status. Clinical experts adjudicated the cognitive status of 627 patients at Mass General Brigham (MGB) health care, using NAT or traditional chart reviews. Patient charts contained EHR data from two data sets: (1) records from January 1, 2017, to December 31, 2018, for 100 Medicare beneficiaries from the MGB Accountable Care Organization and (2) records from 2 years prior to COVID-19 diagnosis to the date of COVID-19 diagnosis for 527 MGB patients. All EHR data from the relevant period were extracted; diagnosis codes, medications, and laboratory test values were processed and summarized; clinical notes were processed through an NLP pipeline; and a web tool was developed to present an integrated view of all data. Cognitive status was rated as cognitively normal, cognitively impaired, or undetermined. Assessment time and interrater agreement of NAT compared to manual chart reviews for cognitive status phenotyping was evaluated.

RESULTS

NAT adjudication provided higher interrater agreement (Cohen κ=0.89 vs κ=0.80) and significant speed up (time difference mean 1.4, SD 1.3 minutes; P<.001; ratio median 2.2, min-max 0.4-20) over manual chart reviews. There was moderate agreement with manual chart reviews (Cohen κ=0.67). In the cases that exhibited disagreement with manual chart reviews, NAT adjudication was able to produce assessments that had broader clinical consensus due to its integrated view of highlighted relevant information and semiautomated NLP features.

CONCLUSIONS

NAT adjudication improves the speed and interrater reliability for phenotyping cognitive status compared to manual chart reviews. This study underscores the potential of an NLP-based clinically adjudicated method to build large-scale dementia research cohorts from EHRs.

Collapse

Affiliation(s)

Ayush Noori Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Colin Magdamo Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Xiao Liu Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Tanish Tyagi Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Zhaozhi Li Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Akhil Kondepudi Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
Haitham Alabsi Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Emily Rudmann Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Vaccine and Immunotherapy Center, Division of Infectious Disease, Boston, MA, United States
Douglas Wilcox Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Laura Brenner Harvard Medical School, Boston, MA, United States Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, United States
Gregory K Robbins Harvard Medical School, Boston, MA, United States Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
Lidia Moura Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Sahar Zafar Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Nicole M Benson Harvard Medical School, Boston, MA, United States Mongan Institute, Massachusetts General Hospital, Boston, MA, United States McLean Hospital, Belmont, MA, United States
John Hsu Harvard Medical School, Boston, MA, United States Mongan Institute, Massachusetts General Hospital, Boston, MA, United States
John R Dickson Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Alberto Serrano-Pozo Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Bradley T Hyman Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Deborah Blacker Harvard Medical School, Boston, MA, United States Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States
M Brandon Westover Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States
Shibani S Mukerji Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
Sudeshna Das Department of Neurology, Massachusetts General Hospital, Boston, MA, United States Harvard Medical School, Boston, MA, United States

Collapse

Thompson M, Hill BL, Rakocz N, Chiang JN, Geschwind D, Sankararaman S, Hofer I, Cannesson M, Zaitlen N, Halperin E. Methylation risk scores are associated with a collection of phenotypes within electronic health record systems. NPJ Genom Med 2022;7:50. [PMID: 36008412 PMCID: PMC9411568 DOI: 10.1038/s41525-022-00320-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Accepted: 07/18/2022] [Indexed: 12/20/2022] Open

Wang M, Lin Z, Li R, Li Y, Su J. Predicting disease progress with imprecise lab test results. Artif Intell Med 2022;132:102373. [DOI: 10.1016/j.artmed.2022.102373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Revised: 05/16/2022] [Accepted: 07/28/2022] [Indexed: 11/02/2022]

Freda PJ, Kranzler HR, Moore JH. Novel digital approaches to the assessment of problematic opioid use. BioData Min 2022;15:14. [PMID: 35840990 PMCID: PMC9284824 DOI: 10.1186/s13040-022-00301-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Accepted: 06/30/2022] [Indexed: 11/16/2022] Open

Application of machine learning methods for the prediction of true fasting status in patients performing blood tests. Sci Rep 2022;12:11929. [PMID: 35831336 PMCID: PMC9279373 DOI: 10.1038/s41598-022-15161-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 06/20/2022] [Indexed: 11/28/2022] Open

Yi H. Efficient machine learning algorithm for electroencephalogram modeling in brain–computer interfaces. Neural Comput Appl 2022. [DOI: 10.1007/s00521-020-04861-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Yiadom MYAB, Gong W, Patterson BW, Baugh CW, Mills AM, Gavin N, Podolsky SR, Salazar G, Mumma BE, Tanski M, Hadley K, Azzo C, Dorner SC, Ulintz A, Liu D. Fallacy of Median Door‐to‐ECG Time: Hidden Opportunities for STEMI Screening Improvement. J Am Heart Assoc 2022;11:e024067. [PMID: 35492001 PMCID: PMC9238601 DOI: 10.1161/jaha.121.024067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Abstract

Background

ST‐segment elevation myocardial infarction (STEMI) guidelines recommend screening arriving emergency department (ED) patients for an early ECG in those with symptoms concerning for myocardial ischemia. Process measures target median door‐to‐ECG (D2E) time of 10 minutes.

Methods and Results

This 3‐year descriptive retrospective cohort study, including 676 ED‐diagnosed patients with STEMI from 10 geographically diverse facilities across the United States, examines an alternative approach to quantifying performance: proportion of patients meeting the goal of D2E≤10 minutes. We also identified characteristics associated with D2E>10 minutes and estimated the proportion of patients with screening ECG occurring during intake, triage, and main ED care periods. We found overall median D2E was 7 minutes (IQR:4–16; range: 0–1407 minutes; range of ED medians: 5–11 minutes). Proportion of patients with D2E>10 minutes was 37.9% (ED range: 21.5%–57.1%). Patients with D2E>10 minutes, compared to those with D2E≤10 minutes, were more likely female (32.8% versus 22.6%, P=0.005), Black (23.4% versus 12.4%, P=0.005), non‐English speaking (24.6% versus 19.5%, P=0.032), diabetic (40.2% versus 30.2%, P=0.010), and less frequently reported chest pain (63.3% versus 87.4%, P<0.001). ECGs were performed during ED intake in 62.1% of visits, ED triage in 25.3%, and main ED care in 12.6%.

Conclusions

Examining D2E>10 minutes can identify opportunities to improve care for more ED patients with STEMI. Our findings suggest sex, race, language, and diabetes are associated with STEMI diagnostic delays. Moving the acquisition of ECGs completed during triage to intake could achieve the D2E≤10 minutes goal for 87.4% of ED patients with STEMI. Sophisticated screening, accounting for differential risk and diversity in STEMI presentations, may further improve timely detection.

Collapse

Shao W, Luo X, Zhang Z, Han Z, Chandrasekaran V, Turzhitsky V, Bali V, Roberts AR, Metzger M, Baker J, La Rosa C, Weaver J, Dexter P, Huang K. Application of unsupervised deep learning algorithms for identification of specific clusters of chronic cough patients from EMR data. BMC Bioinformatics 2022;23:140. [PMID: 35439945 PMCID: PMC9019947 DOI: 10.1186/s12859-022-04680-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 04/11/2022] [Indexed: 11/10/2022] Open

Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. ADVANCES IN COMPUTATIONAL INTELLIGENCE 2022;2:22. [PMID: 35434723 PMCID: PMC9006199 DOI: 10.1007/s43674-022-00034-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 02/27/2022] [Accepted: 03/03/2022] [Indexed: 12/14/2022]

Abstract

Type 2 diabetes has recently acquired the status of an epidemic silent killer, though it is non-communicable. There are two main reasons behind this perception of the disease. First, a gradual but exponential growth in the disease prevalence has been witnessed irrespective of age groups, geography or gender. Second, the disease dynamics are very complex in terms of multifactorial risks involved, initial asymptomatic period, different short-term and long-term complications posing serious health threat and related co-morbidities. Majority of its risk factors are lifestyle habits like physical inactivity, lack of exercise, high body mass index (BMI), poor diet, smoking except some inevitable ones like family history of diabetes, ethnic predisposition, ageing etc. Nowadays, machine learning (ML) is increasingly being applied for alleviation of diabetes health burden and many research works have been proposed in the literature to offer clinical decision support in different application areas as well. In this paper, we present a review of such efforts for the prevention and management of type 2 diabetes. Firstly, we present the medical gaps in diabetes knowledge base, guidelines and medical practice identified from relevant articles and highlight those that can be addressed by ML. Further, we review the ML research works in three different application areas namely—(1) risk assessment (statistical risk scores and ML-based risk models), (2) diagnosis (using non-invasive and invasive features), (3) prognosis (from normoglycemia/prior morbidity to incident diabetes and prognosis of incident diabetes to related complications). We discuss and summarize the shortcomings or gaps in the existing ML methodologies for diabetes to be addressed in future. This review provides the breadth of ML predictive modeling applications for diabetes while highlighting the medical and technological gaps as well as various aspects involved in ML-based diabetes clinical decision support.

Collapse

Perez-Lebel A, Varoquaux G, Le Morvan M, Josse J, Poline JB. Benchmarking missing-values approaches for predictive models on health databases. Gigascience 2022;11:6568998. [PMID: 35426912 PMCID: PMC9012100 DOI: 10.1093/gigascience/giac013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 11/30/2021] [Accepted: 01/25/2022] [Indexed: 11/14/2022] Open

Abstract Abstract Background As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values. These large databases are well suited to train machine learning models, e.g., for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative—rather than generative—modeling and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. Results Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: 4 electronic health record datasets, 1 population brain imaging database, 1 health survey, and 2 intensive care surveys. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing-values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values—with missing incorporated attribute—leads to robust, fast, and well-performing predictive modeling. Conclusions Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed. Collapse

Cardozo G, Pintarelli GB, Andreis GR, Lopes ACW, Marques JLB. Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening. BIOMED RESEARCH INTERNATIONAL 2022;2022:8114049. [PMID: 35392258 PMCID: PMC8983182 DOI: 10.1155/2022/8114049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/18/2022] [Accepted: 03/10/2022] [Indexed: 12/28/2022]

Wu C, Zhou T, Tian Y, Wu J, Li J, Liu Z. A method for the early prediction of chronic diseases based on short sequential medical data. Artif Intell Med 2022;127:102262. [DOI: 10.1016/j.artmed.2022.102262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 02/18/2022] [Accepted: 02/23/2022] [Indexed: 11/30/2022]

Development of a visual attention based decision support system for autism spectrum disorder screening. Int J Psychophysiol 2022;173:69-81. [PMID: 35007668 DOI: 10.1016/j.ijpsycho.2022.01.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Revised: 12/14/2021] [Accepted: 01/04/2022] [Indexed: 11/24/2022]

Haneef R, Tijhuis M, Thiébaut R, Májek O, Pristaš I, Tolenan H, Gallay A. Methodological guidelines to estimate population-based health indicators using linked data and/or machine learning techniques. Arch Public Health 2022;80:9. [PMID: 34983651 PMCID: PMC8725299 DOI: 10.1186/s13690-021-00770-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 12/17/2021] [Indexed: 12/23/2022] Open

Abstract

BACKGROUND

The capacity to use data linkage and artificial intelligence to estimate and predict health indicators varies across European countries. However, the estimation of health indicators from linked administrative data is challenging due to several reasons such as variability in data sources and data collection methods resulting in reduced interoperability at various levels and timeliness, availability of a large number of variables, lack of skills and capacity to link and analyze big data. The main objective of this study is to develop the methodological guidelines calculating population-based health indicators to guide European countries using linked data and/or machine learning (ML) techniques with new methods.

METHOD

We have performed the following step-wise approach systematically to develop the methodological guidelines: i. Scientific literature review, ii. Identification of inspiring examples from European countries, and iii. Developing the checklist of guidelines contents.

RESULTS

We have developed the methodological guidelines, which provide a systematic approach for studies using linked data and/or ML-techniques to produce population-based health indicators. These guidelines include a detailed checklist of the following items: rationale and objective of the study (i.e., research question), study design, linked data sources, study population/sample size, study outcomes, data preparation, data analysis (i.e., statistical techniques, sensitivity analysis and potential issues during data analysis) and study limitations.

CONCLUSIONS

This is the first study to develop the methodological guidelines for studies focused on population health using linked data and/or machine learning techniques. These guidelines would support researchers to adopt and develop a systematic approach for high-quality research methods. There is a need for high-quality research methodologies using more linked data and ML-techniques to develop a structured cross-disciplinary approach for improving the population health information and thereby the population health.

Collapse

McKnite AM, Job KM, Nelson R, Sherwin CM, Watt KM, Brewer SC. Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database. INFORMATICS IN MEDICINE UNLOCKED 2022;34. [PMID: 36405250 PMCID: PMC9674326 DOI: 10.1016/j.imu.2022.101104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

Abstract

Electronic health records (EHRs) have given rise to large and complex databases of medical information that have the potential to become powerful tools for clinical research. However, differences in coding systems and the detail and accuracy of the information within EHRs can vary across institutions. This makes it challenging to identify subpopulations of patients and limits the widespread use of multi-institutional databases. In this study, we leveraged machine learning to identify patterns in medication usage among hospitalized pediatric patients receiving renal replacement therapy and created a predictive model that successfully differentiated between intermittent (iHD) and continuous renal replacement therapy (CRRT) hemodialysis patients. We trained six machine learning algorithms (logistical regression, Naïve Bayes, k-nearest neighbor, support vector machine, random forest, and gradient boosted trees) using patient records from a multi-center database (n = 533) and prescribed medication ingredients (n = 228) as features to discriminate between the two hemodialysis types. Predictive skill was assessed using a 5-fold cross-validation, and the algorithms showed a range of performance from 0.7 balanced accuracy (logistical regression) to 0.86 (random forest). The two best performing models were further tested using an independent single-center dataset and achieved 84–87% balanced accuracy. This model overcomes issues inherent within large databases and will allow us to utilize and combine historical records, significantly increasing population size and diversity within both iHD and CRRT populations for future clinical studies. Our work demonstrates the utility of using medications alone to accurately differentiate subpopulations of patients in large datasets, allowing codes to be transferred between different coding systems. This framework has the potential to be used to distinguish other subpopulations of patients where discriminatory ICD codes are not available, permitting more detailed insights and new lines of research.

Collapse

Khademi F, Rabbani M, Motameni H, Akbari E. A weighted ensemble classifier based on WOA for classification of diabetes. Neural Comput Appl 2022. [DOI: 10.1007/s00521-021-06481-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Brady V, Whisenant M, Wang X, Ly VK, Zhu G, Aguilar D, Wu H. Characterization of Symptoms and Symptom Clusters for Type 2 Diabetes Using a Large Nationwide Electronic Health Record Database. Diabetes Spectr 2022;35:159-170. [PMID: 35668892 PMCID: PMC9160545 DOI: 10.2337/ds21-0064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Different Data Mining Approaches Based Medical Text Data. JOURNAL OF HEALTHCARE ENGINEERING 2021;2021:1285167. [PMID: 34912530 PMCID: PMC8668297 DOI: 10.1155/2021/1285167] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 11/18/2021] [Indexed: 12/15/2022]

Sharma T, Shah M. A comprehensive review of machine learning techniques on diabetes detection. Vis Comput Ind Biomed Art 2021;4:30. [PMID: 34862560 PMCID: PMC8642577 DOI: 10.1186/s42492-021-00097-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 10/29/2021] [Indexed: 12/14/2022] Open

Tao K, Li J, Li J, Shan W, Yan H, Lu Y. Estimation of Heart Rate Using Regression Models and Artificial Neural Network in Middle-Aged Adults. Front Physiol 2021;12:742754. [PMID: 34658928 PMCID: PMC8514712 DOI: 10.3389/fphys.2021.742754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 09/07/2021] [Indexed: 11/13/2022] Open

Luo X, Gandhi P, Zhang Z, Shao W, Han Z, Chandrasekaran V, Turzhitsky V, Bali V, Roberts AR, Metzger M, Baker J, La Rosa C, Weaver J, Dexter P, Huang K. Applying interpretable deep learning models to identify chronic cough patients using EHR data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021;210:106395. [PMID: 34525412 DOI: 10.1016/j.cmpb.2021.106395] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Accepted: 08/30/2021] [Indexed: 06/13/2023]

Abstract

BACKGROUND AND OBJECTIVE

Chronic cough (CC) affects approximately 10% of adults. Many disease states are associated with chronic cough, such as asthma, upper airway cough syndrome, bronchitis, and gastroesophageal reflux disease. The lack of an ICD code specific for chronic cough makes it challenging to identify such patients from electronic health records (EHRs). For clinical and research purposes, computational methods using EHR data are urgently needed to identify chronic cough cases. This research aims to investigate the data representations and deep learning algorithms for chronic cough prediction.

METHODS

Utilizing real-world EHR data from a large academic healthcare system from October 2005 to September 2015, we investigated Natural Language Representation of the EHR data and systematically evaluated deep learning and traditional machine learning models to predict chronic cough patients. We built these machine learning models using structured data (medication and diagnosis) and unstructured data (clinical notes).

RESULTS

The sensitivity and specificity of a transformer-based deep learning algorithm, specifically BERT with attention model, was 0.856 and 0.866, respectively, using structured data (medication and diagnosis). Sensitivity and specificity improved to 0.952 and 0.930 when we combined structured data with symptoms extracted from clinical notes. We further found that the attention mechanism of deep learning models can be used to extract important features that drive the prediction decisions. Compared with our previously published rule-based algorithm, the deep learning algorithm can identify more chronic cough patients with structured data.

CONCLUSIONS

By applying deep learning models, chronic cough patients can be reliably identified for prospective or retrospective research through medication and diagnosis data, widely available in EHR and electronic claims data, thus improving the generalizability of the patient identification algorithm. Deep learning models can identify chronic cough patients with even higher sensitivity and specificity when structured and unstructured EHR data are utilized. We anticipate language-based data representation and deep learning models developed in this research could also be productively used for other disease prediction and case identification.

Collapse

Affiliation(s)

Xiao Luo Purdue School of Engineering and Technology, IUPUI, 799W Michigan St, Indianapolis, IN 46202, United States.
Priyanka Gandhi Purdue School of Engineering and Technology, IUPUI, 799W Michigan St, Indianapolis, IN 46202, United States.
Zuoyi Zhang Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States.
Wei Shao Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States.
Zhi Han Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
Vasu Chandrasekaran Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Vladimir Turzhitsky Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Vishal Bali Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Anna R Roberts Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
Megan Metzger Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
Jarod Baker Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.
Carmen La Rosa Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Jessica Weaver Center for Observational and Real-World Evidence, Merck Co., Inc, 2000 Galloping Hill Rd, Kenilworth, NJ, 07033 United States.
Paul Dexter Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States; Eskenazi Health, 720 Eskenazi Ave, Indianapolis, IN 46202, United States.
Kun Huang Indiana University School of Medicine, 340W 10th St #6200, Indianapolis, IN 46202, United States; Regenstrief Institute, 1101W 10th Street, Indianapolis, IN, 46202, United States.

Collapse

Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021;28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open

Abstract

OBJECTIVE

High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs.

MATERIALS AND METHODS

We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms.

RESULTS

Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.

DISCUSSION

The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease.

CONCLUSION

Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.

Collapse