Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chung H, Ko H, Kang WS, Kim KW, Lee H, Park C, Song HO, Choi TY, Seo JH, Lee J. Prediction and Feature Importance Analysis for Severity of COVID-19 in South Korea Using Artificial Intelligence: Model Development and Validation. J Med Internet Res 2021;23:e27060. [PMID: 33764883 PMCID: PMC8057199 DOI: 10.2196/27060] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 02/18/2021] [Accepted: 03/24/2021] [Indexed: 12/29/2022] Open

For:	Chung H, Ko H, Kang WS, Kim KW, Lee H, Park C, Song HO, Choi TY, Seo JH, Lee J. Prediction and Feature Importance Analysis for Severity of COVID-19 in South Korea Using Artificial Intelligence: Model Development and Validation. J Med Internet Res 2021;23:e27060. [PMID: 33764883 PMCID: PMC8057199 DOI: 10.2196/27060] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 02/18/2021] [Accepted: 03/24/2021] [Indexed: 12/29/2022] Open

Number

Cited by Other Article(s)

Zan J, Dong X, Yang H, Yan J, He Z, Tian J, Zhang Y. Application of the Unbalanced Ensemble Algorithm for Prognostic Prediction Outcomes of All-Cause Mortality in Coronary Heart Disease Patients Comorbid with Hypertension. Risk Manag Healthc Policy 2024;17:1921-1936. [PMID: 39135612 PMCID: PMC11317517 DOI: 10.2147/rmhp.s472398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024] Open

Abstract

Purpose

This study sought to develop an unbalanced-ensemble model that could accurately predict death outcomes of patients with comorbid coronary heart disease (CHD) and hypertension and evaluate the factors contributing to death.

Patients and Methods

Medical records of 1058 patients with coronary heart disease combined with hypertension and excluding those acute coronary syndrome were collected. Patients were followed-up at the first, third, sixth, and twelfth months after discharge to record death events. Follow-up ended two years after discharge. Patients were divided into survival and nonsurvival groups. According to medical records, gender, smoking, drinking, COPD, cerebral stroke, diabetes, hyperhomocysteinemia, heart failure and renal insufficiency of the two groups were sorted and compared and other influencing factors of the two groups, feature selection was carried out to construct models. Owing to data unbalance, we developed four unbalanced-ensemble prediction models based on Balanced Random Forest (BRF), EasyEnsemble, RUSBoost, SMOTEBoost and the two base classification algorithms based on AdaBoost and Logistic. Each model was optimised using hyperparameters based on GridSearchCV and evaluated using area under the curve (AUC), sensitivity, recall, Brier score, and geometric mean (G-mean). Additionally, to understand the influence of variables on model performance, we constructed a SHapley Additive explanation (SHAP) model based on the optimal model.

Results

There were significant differences in age, heart rate, COPD, cerebral stroke, heart failure and renal insufficiency in the nonsurvival group compared with the survival group. Among all models, BRF yielded the highest AUC (0.810; 95% CI, 0.778-0.839), sensitivity (0.990; 95% CI, 0.981-1.000), recall (0.990; 95% CI, 0.981-1.000), and G-mean (0.806; 95% CI, 0.778-0.827), and the lowest Brier score (0.181; 95% CI, 0.178-0.185). Therefore, we identified BRF as the optimal model. Furthermore, red blood cell count (RBC), body mass index (BMI), and lactate dehydrogenase were found to be important mortality-associated risk factors.

Conclusion

BRF combined with advanced machine learning methods and SHAP is highly effective and accurately predicts mortality in patients with CHD comorbid with hypertension. This model has the potential to assist clinicians in modifying treatment strategies to improve patient outcomes.

Collapse

Kim J, Choi YS, Lee YJ, Yeo SG, Kim KW, Kim MS, Rahmati M, Yon DK, Lee J. Limitations of the Cough Sound-Based COVID-19 Diagnosis Artificial Intelligence Model and its Future Direction: Longitudinal Observation Study. J Med Internet Res 2024;26:e51640. [PMID: 38319694 PMCID: PMC10879967 DOI: 10.2196/51640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 11/10/2023] [Accepted: 01/02/2024] [Indexed: 02/07/2024] Open

Abstract

BACKGROUND

The outbreak of SARS-CoV-2 in 2019 has necessitated the rapid and accurate detection of COVID-19 to manage patients effectively and implement public health measures. Artificial intelligence (AI) models analyzing cough sounds have emerged as promising tools for large-scale screening and early identification of potential cases.

OBJECTIVE

This study aimed to investigate the efficacy of using cough sounds as a diagnostic tool for COVID-19, considering the unique acoustic features that differentiate positive and negative cases. We investigated whether an AI model trained on cough sound recordings from specific periods, especially the early stages of the COVID-19 pandemic, were applicable to the ongoing situation with persistent variants.

METHODS

We used cough sound recordings from 3 data sets (Cambridge, Coswara, and Virufy) representing different stages of the pandemic and variants. Our AI model was trained using the Cambridge data set with subsequent evaluation against all data sets. The performance was analyzed based on the area under the receiver operating curve (AUC) across different data measurement periods and COVID-19 variants.

RESULTS

The AI model demonstrated a high AUC when tested with the Cambridge data set, indicative of its initial effectiveness. However, the performance varied significantly with other data sets, particularly in detecting later variants such as Delta and Omicron, with a marked decline in AUC observed for the latter. These results highlight the challenges in maintaining the efficacy of AI models against the backdrop of an evolving virus.

CONCLUSIONS

While AI models analyzing cough sounds offer a promising noninvasive and rapid screening method for COVID-19, their effectiveness is challenged by the emergence of new virus variants. Ongoing research and adaptations in AI methodologies are crucial to address these limitations. The adaptability of AI models to evolve with the virus underscores their potential as a foundational technology for not only the current pandemic but also future outbreaks, contributing to a more agile and resilient global health infrastructure.

Collapse

Shi J, Bendig D, Vollmar HC, Rasche P. Mapping the Bibliometrics Landscape of AI in Medicine: Methodological Study. J Med Internet Res 2023;25:e45815. [PMID: 38064255 PMCID: PMC10746970 DOI: 10.2196/45815] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 08/16/2023] [Accepted: 09/30/2023] [Indexed: 12/18/2023] Open

Abstract

BACKGROUND

Artificial intelligence (AI), conceived in the 1950s, has permeated numerous industries, intensifying in tandem with advancements in computing power. Despite the widespread adoption of AI, its integration into medicine trails other sectors. However, medical AI research has experienced substantial growth, attracting considerable attention from researchers and practitioners.

OBJECTIVE

In the absence of an existing framework, this study aims to outline the current landscape of medical AI research and provide insights into its future developments by examining all AI-related studies within PubMed over the past 2 decades. We also propose potential data acquisition and analysis methods, developed using Python (version 3.11) and to be executed in Spyder IDE (version 5.4.3), for future analogous research.

METHODS

Our dual-pronged approach involved (1) retrieving publication metadata related to AI from PubMed (spanning 2000-2022) via Python, including titles, abstracts, authors, journals, country, and publishing years, followed by keyword frequency analysis and (2) classifying relevant topics using latent Dirichlet allocation, an unsupervised machine learning approach, and defining the research scope of AI in medicine. In the absence of a universal medical AI taxonomy, we used an AI dictionary based on the European Commission Joint Research Centre AI Watch report, which emphasizes 8 domains: reasoning, planning, learning, perception, communication, integration and interaction, service, and AI ethics and philosophy.

RESULTS

From 2000 to 2022, a comprehensive analysis of 307,701 AI-related publications from PubMed highlighted a 36-fold increase. The United States emerged as a clear frontrunner, producing 68,502 of these articles. Despite its substantial contribution in terms of volume, China lagged in terms of citation impact. Diving into specific AI domains, as the Joint Research Centre AI Watch report categorized, the learning domain emerged dominant. Our classification analysis meticulously traced the nuanced research trajectories across each domain, revealing the multifaceted and evolving nature of AI's application in the realm of medicine.

CONCLUSIONS

The research topics have evolved as the volume of AI studies increases annually. Machine learning remains central to medical AI research, with deep learning expected to maintain its fundamental role. Empowered by predictive algorithms, pattern recognition, and imaging analysis capabilities, the future of AI research in medicine is anticipated to concentrate on medical diagnosis, robotic intervention, and disease management. Our topic modeling outcomes provide a clear insight into the focus of AI research in medicine over the past decades and lay the groundwork for predicting future directions. The domains that have attracted considerable research attention, primarily the learning domain, will continue to shape the trajectory of AI in medicine. Given the observed growing interest, the domain of AI ethics and philosophy also stands out as a prospective area of increased focus.

Collapse

Zhao Y, Chen Z, Jian X. A High-Generalizability Machine Learning Framework for Analyzing the Homogenized Properties of Short Fiber-Reinforced Polymer Composites. Polymers (Basel) 2023;15:3962. [PMID: 37836011 PMCID: PMC10575166 DOI: 10.3390/polym15193962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/25/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023] Open

Hida M, Imai R, Nakamura M, Nakao H, Kitagawa K, Wada C, Eto S, Takeda M, Imaoka M. Investigation of factors influencing low physical activity levels in community-dwelling older adults with chronic pain: a cross-sectional study. Sci Rep 2023;13:14062. [PMID: 37640818 PMCID: PMC10462701 DOI: 10.1038/s41598-023-41319-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 08/24/2023] [Indexed: 08/31/2023] Open

Hosseinzadeh Kasani P, Lee JE, Park C, Yun CH, Jang JW, Lee SA. Evaluation of nutritional status and clinical depression classification using an explainable machine learning method. Front Nutr 2023;10:1165854. [PMID: 37229464 PMCID: PMC10203418 DOI: 10.3389/fnut.2023.1165854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/27/2023] [Indexed: 05/27/2023] Open

Abstract

Introduction

Depression is a prevalent disorder worldwide, with potentially severe implications. It contributes significantly to an increased risk of diseases associated with multiple risk factors. Early accurate diagnosis of depressive symptoms is a critical first step toward management, intervention, and prevention. Various nutritional and dietary compounds have been suggested to be involved in the onset, maintenance, and severity of depressive disorders. Despite the challenges to better understanding the association between nutritional risk factors and the occurrence of depression, assessing the interplay of these markers through supervised machine learning remains to be fully explored.

Methods

This study aimed to determine the ability of machine learning-based decision support methods to identify the presence of depression using publicly available health data from the Korean National Health and Nutrition Examination Survey. Two exploration techniques, namely, uniform manifold approximation and projection and Pearson correlation, were performed for explanatory analysis among datasets. A grid search optimization with cross-validation was performed to fine-tune the models for classifying depression with the highest accuracy. Several performance measures, including accuracy, precision, recall, F1 score, confusion matrix, areas under the precision-recall and receiver operating characteristic curves, and calibration plot, were used to compare classifier performances. We further investigated the importance of the features provided: visualized interpretation using ELI5, partial dependence plots, and local interpretable using model-agnostic explanations and Shapley additive explanation for the prediction at both the population and individual levels.

Results

The best model achieved an accuracy of 86.18% for XGBoost and an area under the curve of 84.96% for the random forest model in original dataset and the XGBoost algorithm with an accuracy of 86.02% and an area under the curve of 85.34% in the quantile-based dataset. The explainable results revealed a complementary observation of the relative changes in feature values, and, thus, the importance of emergent depression risks could be identified.

Discussion

The strength of our approach is the large sample size used for training with a fine-tuned model. The machine learning-based analysis showed that the hyper-tuned model has empirically higher accuracy in classifying patients with depressive disorder, as evidenced by the set of interpretable experiments, and can be an effective solution for disease control.

Collapse

Paul SG, Saha A, Biswas AA, Zulfiker MS, Arefin MS, Rahman MM, Reza AW. Combating Covid-19 using machine learning and deep learning: Applications, challenges, and future perspectives. ARRAY 2023;17:100271. [PMID: 36530931 PMCID: PMC9737520 DOI: 10.1016/j.array.2022.100271] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 12/05/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open

Neves da Silva L, Domingues Fernandes R, Costa R, Oliveira A, Sá A, Mosca A, Oliveira B, Braga M, Mendes M, Carvalho A, Moreira P, Santa Cruz A. Prediction of Noninvasive Ventilation Failure in COVID-19 Patients: When Shall We Stop? Cureus 2022;14:e30599. [PMID: 36420242 PMCID: PMC9679987 DOI: 10.7759/cureus.30599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/22/2022] [Indexed: 01/25/2023] Open

Abstract

INTRODUCTION

In coronavirus disease 2019 (COVID-19), there are no tools available for the difficult task of recognizing which patients do not benefit from maintaining respiratory support, such as noninvasive ventilation (NIV). Identifying treatment failure is crucial to provide the best possible care and optimizing resources. Therefore, this study aimed to build a model that predicts NIV failure in patients who did not progress to invasive mechanical ventilation (IMV).

METHODS

This retrospective observational study included critical COVID-19 patients treated with NIV who did not progress to IMV. Patients were admitted to a Portuguese tertiary hospital between October 1, 2020, and March 31, 2021. The outcome of interest was NIV failure, defined as COVID-19-related in-hospital death. A binary logistic regression was performed, where the outcome (mortality) was the dependent variable. Using the independent variables of the logistic regression a decision-tree classification model was implemented.

RESULTS

The study sample, composed of 103 patients, had a mean age of 66.3 years (SD=14.9), of which 38.8% (40 patients) were female. Most patients (82.5%) were autonomous for basic activities of daily living. The prediction model was statistically significant with an area under the curve of 0.994 and a precision of 0.950. Higher age, a higher number of days with increases in the fraction of inspired oxygen (FiO2), a higher number of days of maximum expiratory positive airway pressure, a lower number of days on NIV, and a lower number of days from disease onset to hospital admission were, with statistical significance, associated with increased odds of death. A decision-tree classification model was then obtained to achieve the best combination of variables to predict the outcome of interest.

CONCLUSIONS

This study presents a model to predict death in COVID-19 patients treated with NIV in patients who did not progress to IMV, based on easily applicable variables that mainly reflect patients' evolution during hospitalization. Along with the decision-tree classification model, these original findings may help clinicians define the best therapeutical approach to each patient, prioritizing life-comforting measures when adequate, and optimizing resources, which is crucial within limited or overloaded healthcare systems. Further research is needed on this subject of treatment failure, not only to understand if these results are reproducible but also, in a broader sense, helping to fill this gap in modern medicine guidelines.

Collapse

Healthcare Supply Chain Management under COVID-19 Settings: The Existing Practices in Hong Kong and the United States. Healthcare (Basel) 2022;10:healthcare10081549. [PMID: 36011207 PMCID: PMC9408565 DOI: 10.3390/healthcare10081549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 11/18/2022] Open

Abstract

COVID-19 is recognized as an infectious disease generated by serious acute respiratory syndrome coronavirus 2. COVID-19 has rapidly spread all over the world within a short time period. Due to the coronavirus pandemic transmitting quickly worldwide, the impact on global healthcare systems and healthcare supply chain management has been profound. The COVID-19 outbreak has seriously influenced the routine and daily operations of healthcare facilities and the entire healthcare supply chain management and has brough about a public health crisis. As making sure the availability of healthcare facilities during COVID-19 is crucial, the debate on how to take resilience actions for sustaining healthcare supply chain management has gained new momentum. Apart from the logistics of handling human remains in some countries, supplies within the communities are urgently needed for emergency response. This study focuses on a comprehensive evaluation of the current practices of healthcare supply chain management in Hong Kong and the United States under COVID-19 settings. A wide range of different aspects associated with healthcare supply chain operations are considered, including the best practices for using respirators, transport of life-saving medical supplies, contingency healthcare strategies, blood distribution, and best practices for using disinfectants, as well as human remains handling and logistics. The outcomes of the conducted research identify the existing healthcare supply chain trends in two major Eastern and Western regions of the world, Hong Kong and the United States, and determine the key challenges and propose some strategies that can improve the effectiveness of healthcare supply chain management under COVID-19 settings. The study highlights how to build resilient healthcare supply chain management preparedness for future emergencies.

Collapse

Ng A, Wei B, Jain J, Ward EA, Tandon SD, Moskowitz JT, Krogh-Jespersen S, Wakschlag LS, Alshurafa N. Predicting the Next-Day Perceived and Physiological Stress of Pregnant Women by Using Machine Learning and Explainability: Algorithm Development and Validation. JMIR Mhealth Uhealth 2022;10:e33850. [PMID: 35917157 PMCID: PMC9382551 DOI: 10.2196/33850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 02/02/2022] [Accepted: 05/13/2022] [Indexed: 11/30/2022] Open

Abstract

Background

Cognitive behavioral therapy–based interventions are effective in reducing prenatal stress, which can have severe adverse health effects on mothers and newborns if unaddressed. Predicting next-day physiological or perceived stress can help to inform and enable pre-emptive interventions for a likely physiologically and perceptibly stressful day. Machine learning models are useful tools that can be developed to predict next-day physiological and perceived stress by using data collected from the previous day. Such models can improve our understanding of the specific factors that predict physiological and perceived stress and allow researchers to develop systems that collect selected features for assessment in clinical trials to minimize the burden of data collection.

Objective

The aim of this study was to build and evaluate a machine-learned model that predicts next-day physiological and perceived stress by using sensor-based, ecological momentary assessment (EMA)–based, and intervention-based features and to explain the prediction results.

Methods

We enrolled pregnant women into a prospective proof-of-concept study and collected electrocardiography, EMA, and cognitive behavioral therapy intervention data over 12 weeks. We used the data to train and evaluate 6 machine learning models to predict next-day physiological and perceived stress. After selecting the best performing model, Shapley Additive Explanations were used to identify the feature importance and explainability of each feature.

Results

A total of 16 pregnant women enrolled in the study. Overall, 4157.18 hours of data were collected, and participants answered 2838 EMAs. After applying feature selection, 8 and 10 features were found to positively predict next-day physiological and perceived stress, respectively. A random forest classifier performed the best in predicting next-day physiological stress (F1 score of 0.84) and next-day perceived stress (F1 score of 0.74) by using all features. Although any subset of sensor-based, EMA-based, or intervention-based features could reliably predict next-day physiological stress, EMA-based features were necessary to predict next-day perceived stress. The analysis of explainability metrics showed that the prolonged duration of physiological stress was highly predictive of next-day physiological stress and that physiological stress and perceived stress were temporally divergent.

Conclusions

In this study, we were able to build interpretable machine learning models to predict next-day physiological and perceived stress, and we identified unique features that were highly predictive of next-day stress that can help to reduce the burden of data collection.

Collapse

Chowdhury NK, Kabir MA, Rahman MM, Islam SMS. Machine learning for detecting COVID-19 from cough sounds: An ensemble-based MCDM method. Comput Biol Med 2022;145:105405. [PMID: 35318171 PMCID: PMC8926945 DOI: 10.1016/j.compbiomed.2022.105405] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 03/10/2022] [Accepted: 03/11/2022] [Indexed: 12/16/2022]

Feature Importance Analysis by Nowcasting Perspective to Predict COVID-19. MOBILE NETWORKS AND APPLICATIONS 2022. [PMCID: PMC9033308 DOI: 10.1007/s11036-022-01966-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]

Bartoszko J, Dranitsaris G, Wilcox ME, Del Sorbo L, Mehta S, Peer M, Parotto M, Bogoch I, Riazi S. Development of a repeated-measures predictive model and clinical risk score for mortality in ventilated COVID-19 patients. Can J Anaesth 2022;69:343-352. [PMID: 34931293 PMCID: PMC8687635 DOI: 10.1007/s12630-021-02163-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 09/13/2021] [Accepted: 10/04/2021] [Indexed: 11/09/2022] Open

Predictive Machine Learning Models and Survival Analysis for COVID-19 Prognosis Based on Hematochemical Parameters. SENSORS 2021;21:s21248503. [PMID: 34960595 PMCID: PMC8705488 DOI: 10.3390/s21248503] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 12/15/2021] [Accepted: 12/17/2021] [Indexed: 12/26/2022]

Abstract

The coronavirus disease 2019 (COVID-19) pandemic has affected hundreds of millions of individuals and caused millions of deaths worldwide. Predicting the clinical course of the disease is of pivotal importance to manage patients. Several studies have found hematochemical alterations in COVID-19 patients, such as inflammatory markers. We retrospectively analyzed the anamnestic data and laboratory parameters of 303 patients diagnosed with COVID-19 who were admitted to the Polyclinic Hospital of Bari during the first phase of the COVID-19 global pandemic. After the pre-processing phase, we performed a survival analysis with Kaplan–Meier curves and Cox Regression, with the aim to discover the most unfavorable predictors. The target outcomes were mortality or admission to the intensive care unit (ICU). Different machine learning models were also compared to realize a robust classifier relying on a low number of strongly significant factors to estimate the risk of death or admission to ICU. From the survival analysis, it emerged that the most significant laboratory parameters for both outcomes was C-reactive protein min; HR=17.963 (95% CI 6.548–49.277, p < 0.001) for death, HR=1.789 (95% CI 1.000–3.200, p = 0.050) for admission to ICU. The second most important parameter was Erythrocytes max; HR=1.765 (95% CI 1.141–2.729, p < 0.05) for death, HR=1.481 (95% CI 0.895–2.452, p = 0.127) for admission to ICU. The best model for predicting the risk of death was the decision tree, which resulted in ROC-AUC of 89.66%, whereas the best model for predicting the admission to ICU was support vector machine, which had ROC-AUC of 95.07%. The hematochemical predictors identified in this study can be utilized as a strong prognostic signature to characterize the severity of the disease in COVID-19 patients.

Collapse

Chung H, Park C, Kang WS, Lee J. Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19. Front Physiol 2021;12:778720. [PMID: 34912242 PMCID: PMC8667070 DOI: 10.3389/fphys.2021.778720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 10/29/2021] [Indexed: 11/29/2022] Open

Doyle R. Machine Learning-Based Prediction of COVID-19 Mortality With Limited Attributes to Expedite Patient Prognosis and Triage: Retrospective Observational Study. JMIRX MED 2021;2:e29392. [PMID: 34843609 PMCID: PMC8601033 DOI: 10.2196/29392] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/16/2021] [Accepted: 09/14/2021] [Indexed: 12/12/2022]

Abstract

Background

The onset and development of the COVID-19 pandemic have placed pressure on hospital resources and staff worldwide. The integration of more streamlined predictive modeling in prognosis and triage–related decision-making can partly ease this pressure.

Objective

The objective of this study is to assess the performance impact of dimensionality reduction on COVID-19 mortality prediction models, demonstrating the high impact of a limited number of features to limit the need for complex variable gathering before reaching meaningful risk labelling in clinical settings.

Methods

Standard machine learning classifiers were employed to predict an outcome of either death or recovery using 25 patient-level variables, spanning symptoms, comorbidities, and demographic information, from a geographically diverse sample representing 17 countries. The effects of feature reduction on the data were tested by running classifiers on a high-quality data set of 212 patients with populated entries for all 25 available features. The full data set was compared to two reduced variations with 7 features and 1 feature, respectively, extracted using univariate mutual information and chi-square testing. Classifier performance on each data set was then assessed on the basis of accuracy, sensitivity, specificity, and received operating characteristic–derived area under the curve metrics to quantify benefit or loss from reduction.

Results

The performance of the classifiers on the 212-patient sample resulted in strong mortality detection, with the highest performing model achieving specificity of 90.7% (95% CI 89.1%-92.3%) and sensitivity of 92.0% (95% CI 91.0%-92.9%). Dimensionality reduction provided strong benefits for performance. The baseline accuracy of a random forest classifier increased from 89.2% (95% CI 88.0%-90.4%) to 92.5% (95% CI 91.9%-93.0%) when training on 7 chi-square–extracted features and to 90.8% (95% CI 89.8%-91.7%) when training on 7 mutual information–extracted features. Reduction impact on a separate logistic classifier was mixed; however, when present, losses were marginal compared to the extent of feature reduction, altogether showing that reduction either improves performance or can reduce the variable-sourcing burden at hospital admission with little performance loss. Extreme feature reduction to a single most salient feature, often age, demonstrated large standalone explanatory power, with the best-performing model achieving an accuracy of 81.6% (95% CI 81.1%-82.1%); this demonstrates the relatively marginal improvement that additional variables bring to the tested models.

Conclusions

Predictive statistical models have promising performance in early prediction of death among patients with COVID-19. Strong dimensionality reduction was shown to further improve baseline performance on selected classifiers and only marginally reduce it in others, highlighting the importance of feature reduction in future model construction and the feasibility of deprioritizing large, hard-to-source, and nonessential feature sets in real world settings.

Collapse

Wong KCY, Xiang Y, Yin L, So HC. Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach. JMIR Public Health Surveill 2021;7:e29544. [PMID: 34591027 PMCID: PMC8485986 DOI: 10.2196/29544] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/24/2021] [Accepted: 07/31/2021] [Indexed: 01/08/2023] Open

Abstract

Background

COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. More accurate prediction of those at risk of developing severe infections is of high clinical importance.

Objective

Based on the UK Biobank (UKBB), we aimed to build machine learning models to predict the risk of developing severe or fatal infections, and uncover major risk factors involved.

Methods

We first restricted the analysis to infected individuals (n=7846), then performed analysis at a population level, considering those with no known infection as controls (ncontrols=465,728). Hospitalization was used as a proxy for severity. A total of 97 clinical variables (collected prior to the COVID-19 outbreak) covering demographic variables, comorbidities, blood measurements (eg, hematological/liver/renal function/metabolic parameters), anthropometric measures, and other risk factors (eg, smoking/drinking) were included as predictors. We also constructed a simplified (lite) prediction model using 27 covariates that can be more easily obtained (demographic and comorbidity data). XGboost (gradient-boosted trees) was used for prediction and predictive performance was assessed by cross-validation. Variable importance was quantified by Shapley values (ShapVal), permutation importance (PermImp), and accuracy gain. Shapley dependency and interaction plots were used to evaluate the pattern of relationships between risk factors and outcomes.

Results

A total of 2386 severe and 477 fatal cases were identified. For analyses within infected individuals (n=7846), our prediction model achieved area under the receiving-operating characteristic curve (AUC–ROC) of 0.723 (95% CI 0.711-0.736) and 0.814 (95% CI 0.791-0.838) for severe and fatal infections, respectively. The top 5 contributing factors (sorted by ShapVal) for severity were age, number of drugs taken (cnt_tx), cystatin C (reflecting renal function), waist-to-hip ratio (WHR), and Townsend deprivation index (TDI). For mortality, the top features were age, testosterone, cnt_tx, waist circumference (WC), and red cell distribution width. For analyses involving the whole UKBB population, AUCs for severity and fatality were 0.696 (95% CI 0.684-0.708) and 0.825 (95% CI 0.802-0.848), respectively. The same top 5 risk factors were identified for both outcomes, namely, age, cnt_tx, WC, WHR, and TDI. Apart from the above, age, cystatin C, TDI, and cnt_tx were among the top 10 across all 4 analyses. Other diseases top ranked by ShapVal or PermImp were type 2 diabetes mellitus (T2DM), coronary artery disease, atrial fibrillation, and dementia, among others. For the “lite” models, predictive performances were broadly similar, with estimated AUCs of 0.716, 0.818, 0.696, and 0.830, respectively. The top ranked variables were similar to above, including age, cnt_tx, WC, sex (male), and T2DM.

Conclusions

We identified numerous baseline clinical risk factors for severe/fatal infection by XGboost. For example, age, central obesity, impaired renal function, multiple comorbidities, and cardiometabolic abnormalities may predispose to poorer outcomes. The prediction models may be useful at a population level to identify those susceptible to developing severe/fatal infections, facilitating targeted prevention strategies. A risk-prediction tool is also available online. Further replications in independent cohorts are required to verify our findings.

Collapse