1
|
Zan J, Dong X, Yang H, Yan J, He Z, Tian J, Zhang Y. Application of the Unbalanced Ensemble Algorithm for Prognostic Prediction Outcomes of All-Cause Mortality in Coronary Heart Disease Patients Comorbid with Hypertension. Risk Manag Healthc Policy 2024; 17:1921-1936. [PMID: 39135612 PMCID: PMC11317517 DOI: 10.2147/rmhp.s472398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 07/24/2024] [Indexed: 08/15/2024] Open
Abstract
Purpose This study sought to develop an unbalanced-ensemble model that could accurately predict death outcomes of patients with comorbid coronary heart disease (CHD) and hypertension and evaluate the factors contributing to death. Patients and Methods Medical records of 1058 patients with coronary heart disease combined with hypertension and excluding those acute coronary syndrome were collected. Patients were followed-up at the first, third, sixth, and twelfth months after discharge to record death events. Follow-up ended two years after discharge. Patients were divided into survival and nonsurvival groups. According to medical records, gender, smoking, drinking, COPD, cerebral stroke, diabetes, hyperhomocysteinemia, heart failure and renal insufficiency of the two groups were sorted and compared and other influencing factors of the two groups, feature selection was carried out to construct models. Owing to data unbalance, we developed four unbalanced-ensemble prediction models based on Balanced Random Forest (BRF), EasyEnsemble, RUSBoost, SMOTEBoost and the two base classification algorithms based on AdaBoost and Logistic. Each model was optimised using hyperparameters based on GridSearchCV and evaluated using area under the curve (AUC), sensitivity, recall, Brier score, and geometric mean (G-mean). Additionally, to understand the influence of variables on model performance, we constructed a SHapley Additive explanation (SHAP) model based on the optimal model. Results There were significant differences in age, heart rate, COPD, cerebral stroke, heart failure and renal insufficiency in the nonsurvival group compared with the survival group. Among all models, BRF yielded the highest AUC (0.810; 95% CI, 0.778-0.839), sensitivity (0.990; 95% CI, 0.981-1.000), recall (0.990; 95% CI, 0.981-1.000), and G-mean (0.806; 95% CI, 0.778-0.827), and the lowest Brier score (0.181; 95% CI, 0.178-0.185). Therefore, we identified BRF as the optimal model. Furthermore, red blood cell count (RBC), body mass index (BMI), and lactate dehydrogenase were found to be important mortality-associated risk factors. Conclusion BRF combined with advanced machine learning methods and SHAP is highly effective and accurately predicts mortality in patients with CHD comorbid with hypertension. This model has the potential to assist clinicians in modifying treatment strategies to improve patient outcomes.
Collapse
Affiliation(s)
- Jiaxin Zan
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
| | - Xiaojing Dong
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
| | - Hong Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
| | - Jingjing Yan
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
| | - Zixuan He
- Department of Cardiology, The First Hospital of Shanxi Medical University, Taiyuan, People’s Republic of China
| | - Jing Tian
- Department of Cardiology, The First Hospital of Shanxi Medical University, Taiyuan, People’s Republic of China
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, People’s Republic of China
- Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, People’s Republic of China
- School of Health Services and Management, Shanxi University of Chinese Medicine, Taiyuan, People’s Republic of China
| |
Collapse
|
2
|
Kim J, Choi YS, Lee YJ, Yeo SG, Kim KW, Kim MS, Rahmati M, Yon DK, Lee J. Limitations of the Cough Sound-Based COVID-19 Diagnosis Artificial Intelligence Model and its Future Direction: Longitudinal Observation Study. J Med Internet Res 2024; 26:e51640. [PMID: 38319694 PMCID: PMC10879967 DOI: 10.2196/51640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Revised: 11/10/2023] [Accepted: 01/02/2024] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND The outbreak of SARS-CoV-2 in 2019 has necessitated the rapid and accurate detection of COVID-19 to manage patients effectively and implement public health measures. Artificial intelligence (AI) models analyzing cough sounds have emerged as promising tools for large-scale screening and early identification of potential cases. OBJECTIVE This study aimed to investigate the efficacy of using cough sounds as a diagnostic tool for COVID-19, considering the unique acoustic features that differentiate positive and negative cases. We investigated whether an AI model trained on cough sound recordings from specific periods, especially the early stages of the COVID-19 pandemic, were applicable to the ongoing situation with persistent variants. METHODS We used cough sound recordings from 3 data sets (Cambridge, Coswara, and Virufy) representing different stages of the pandemic and variants. Our AI model was trained using the Cambridge data set with subsequent evaluation against all data sets. The performance was analyzed based on the area under the receiver operating curve (AUC) across different data measurement periods and COVID-19 variants. RESULTS The AI model demonstrated a high AUC when tested with the Cambridge data set, indicative of its initial effectiveness. However, the performance varied significantly with other data sets, particularly in detecting later variants such as Delta and Omicron, with a marked decline in AUC observed for the latter. These results highlight the challenges in maintaining the efficacy of AI models against the backdrop of an evolving virus. CONCLUSIONS While AI models analyzing cough sounds offer a promising noninvasive and rapid screening method for COVID-19, their effectiveness is challenged by the emergence of new virus variants. Ongoing research and adaptations in AI methodologies are crucial to address these limitations. The adaptability of AI models to evolve with the virus underscores their potential as a foundational technology for not only the current pandemic but also future outbreaks, contributing to a more agile and resilient global health infrastructure.
Collapse
Affiliation(s)
- Jina Kim
- Department of Biomedical Engineering, Kyung Hee University, Seoul, Republic of Korea
| | - Yong Sung Choi
- Department of Biomedical Engineering, Kyung Hee University, Seoul, Republic of Korea
| | - Young Joo Lee
- Department of Biomedical Engineering, Kyung Hee University, Seoul, Republic of Korea
| | - Seung Geun Yeo
- Department of Biomedical Engineering, Kyung Hee University, Seoul, Republic of Korea
| | - Kyung Won Kim
- Department of Radiology and Research Institute of Radiology, Asan Image Metrics, Clinical Trial Center, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Min Seo Kim
- Medical and Population Genetics and Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Masoud Rahmati
- Department of Physical Education and Sport Sciences, Faculty of Literature and Human Sciences, Lorestan University, Khoramabad, Iran
- Department of Physical Education and Sport Sciences, Faculty of Literature and Humanities, Vali-E-Asr University of Rafsanjan, Rafsanjan, Iran
| | - Dong Keon Yon
- Center for Digital Health, Medical Science Research Institute, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, Republic of Korea
- Department of Pediatrics, Kyung Hee University Medical Center, Kyung Hee University College of Medicine, Seoul, Republic of Korea
| | - Jinseok Lee
- Department of Biomedical Engineering, Kyung Hee University, Seoul, Republic of Korea
- Department of Electronics and Information Convergence Engineering, Kyung Hee University, Yongin, Republic of Korea
| |
Collapse
|
3
|
Shi J, Bendig D, Vollmar HC, Rasche P. Mapping the Bibliometrics Landscape of AI in Medicine: Methodological Study. J Med Internet Res 2023; 25:e45815. [PMID: 38064255 PMCID: PMC10746970 DOI: 10.2196/45815] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 08/16/2023] [Accepted: 09/30/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Artificial intelligence (AI), conceived in the 1950s, has permeated numerous industries, intensifying in tandem with advancements in computing power. Despite the widespread adoption of AI, its integration into medicine trails other sectors. However, medical AI research has experienced substantial growth, attracting considerable attention from researchers and practitioners. OBJECTIVE In the absence of an existing framework, this study aims to outline the current landscape of medical AI research and provide insights into its future developments by examining all AI-related studies within PubMed over the past 2 decades. We also propose potential data acquisition and analysis methods, developed using Python (version 3.11) and to be executed in Spyder IDE (version 5.4.3), for future analogous research. METHODS Our dual-pronged approach involved (1) retrieving publication metadata related to AI from PubMed (spanning 2000-2022) via Python, including titles, abstracts, authors, journals, country, and publishing years, followed by keyword frequency analysis and (2) classifying relevant topics using latent Dirichlet allocation, an unsupervised machine learning approach, and defining the research scope of AI in medicine. In the absence of a universal medical AI taxonomy, we used an AI dictionary based on the European Commission Joint Research Centre AI Watch report, which emphasizes 8 domains: reasoning, planning, learning, perception, communication, integration and interaction, service, and AI ethics and philosophy. RESULTS From 2000 to 2022, a comprehensive analysis of 307,701 AI-related publications from PubMed highlighted a 36-fold increase. The United States emerged as a clear frontrunner, producing 68,502 of these articles. Despite its substantial contribution in terms of volume, China lagged in terms of citation impact. Diving into specific AI domains, as the Joint Research Centre AI Watch report categorized, the learning domain emerged dominant. Our classification analysis meticulously traced the nuanced research trajectories across each domain, revealing the multifaceted and evolving nature of AI's application in the realm of medicine. CONCLUSIONS The research topics have evolved as the volume of AI studies increases annually. Machine learning remains central to medical AI research, with deep learning expected to maintain its fundamental role. Empowered by predictive algorithms, pattern recognition, and imaging analysis capabilities, the future of AI research in medicine is anticipated to concentrate on medical diagnosis, robotic intervention, and disease management. Our topic modeling outcomes provide a clear insight into the focus of AI research in medicine over the past decades and lay the groundwork for predicting future directions. The domains that have attracted considerable research attention, primarily the learning domain, will continue to shape the trajectory of AI in medicine. Given the observed growing interest, the domain of AI ethics and philosophy also stands out as a prospective area of increased focus.
Collapse
Affiliation(s)
- Jin Shi
- Institute for Entrepreneurship, University of Münster, Münster, Germany
| | - David Bendig
- Institute for Entrepreneurship, University of Münster, Münster, Germany
| | | | - Peter Rasche
- Department of Healthcare, University of Applied Science - Hochschule Niederrhein, Krefeld, Germany
| |
Collapse
|
4
|
Zhao Y, Chen Z, Jian X. A High-Generalizability Machine Learning Framework for Analyzing the Homogenized Properties of Short Fiber-Reinforced Polymer Composites. Polymers (Basel) 2023; 15:3962. [PMID: 37836011 PMCID: PMC10575166 DOI: 10.3390/polym15193962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 09/25/2023] [Accepted: 09/25/2023] [Indexed: 10/15/2023] Open
Abstract
This study aims to develop a high-generalizability machine learning framework for predicting the homogenized mechanical properties of short fiber-reinforced polymer composites. The ensemble machine learning model (EML) employs a stacking algorithm using three base models of Extra Trees (ET), eXtreme Gradient Boosting machine (XGBoost), and Light Gradient Boosting machine (LGBM). A micromechanical model of a two-step homogenization algorithm is adopted and verified as an effective approach to composite modeling with randomly distributed fibers, which is integrated with finite element simulations for providing a high-quality ground-truth dataset. The model performance is thoroughly assessed for its accuracy, efficiency, interpretability, and generalizability. The results suggest that: (1) the EML model outperforms the base members on prediction accuracy, achieving R2 values of 0.988 and 0.952 on the train and test datasets, respectively; (2) the SHapley Additive exPlanations (SHAP) analysis identifies the Young's modulus of matrix, fiber, and fiber content as the top three factors influencing the homogenized properties, whereas the anisotropy is predominantly determined by the fiber orientations; (3) the EML model showcases good generalization capability on experimental data, and it has been shown to be more effective than high-fidelity computational models by significantly lowering computational costs while maintaining high accuracy.
Collapse
Affiliation(s)
- Yunmei Zhao
- School of Aerospace Engineering and Applied Mechanics, Tongji University, Shanghai 200092, China;
| | - Zhenyue Chen
- School of Aerospace Engineering and Applied Mechanics, Tongji University, Shanghai 200092, China;
| | - Xiaobin Jian
- Department of Aeronautics and Astronautics, Fudan University, Shanghai 200433, China
| |
Collapse
|
5
|
Hida M, Imai R, Nakamura M, Nakao H, Kitagawa K, Wada C, Eto S, Takeda M, Imaoka M. Investigation of factors influencing low physical activity levels in community-dwelling older adults with chronic pain: a cross-sectional study. Sci Rep 2023; 13:14062. [PMID: 37640818 PMCID: PMC10462701 DOI: 10.1038/s41598-023-41319-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 08/24/2023] [Indexed: 08/31/2023] Open
Abstract
Low levels of physical activity in individuals with chronic pain can lead to additional functional impairment and disability. This study aims to investigate the predictors of low physical activity levels in individuals with chronic pain, and to determine the accuracy of the artificial neural network used to analyze these predictors. Community-dwelling older adults with chronic pain (n = 103) were surveyed for their physical activity levels and classified into low, moderate, or high physical activity level groups. Chronic pain-related measurements, physical function assessment, and clinical history, which all influence physical activity, were also taken at the same time. Logistic regression analysis and analysis of multilayer perceptron, an artificial neural network algorithm, were performed. Both analyses revealed that history of falls was a predictor of low levels of physical activity in community-dwelling older adults. Multilayer perceptron analysis was shown to have excellent accuracy. Our results emphasize the importance of fall prevention in improving the physical activity levels of community-dwelling older adults with chronic pain. Future cross-sectional studies should compare multiple analysis methods to show results with improved accuracy.
Collapse
Affiliation(s)
- Mitsumasa Hida
- Department of Rehabilitation, Osaka Kawasaki Rehabilitation University, 158 Mizuma, Kaizuka, Osaka, 597-0104, Japan.
| | - Ryota Imai
- Department of Rehabilitation, Osaka Kawasaki Rehabilitation University, 158 Mizuma, Kaizuka, Osaka, 597-0104, Japan
| | - Misa Nakamura
- Department of Rehabilitation, Osaka Kawasaki Rehabilitation University, 158 Mizuma, Kaizuka, Osaka, 597-0104, Japan
| | - Hidetoshi Nakao
- Department of Physical Therapy, Josai International University, 1 Gumyo, Togane, Chiba, 283-8555, Japan
| | - Kodai Kitagawa
- National Institute of Technology, Hachinohe College, 16-1 Uwanotai, Tamonoki, Hachinohe, Aomori, 039-1192, Japan
| | - Chikamune Wada
- Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Hibikino 2-4, Wakamatsu-ku, Kitakyushu, Fukuoka, 808-0135, Japan
| | - Shinji Eto
- Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Hibikino 2-4, Wakamatsu-ku, Kitakyushu, Fukuoka, 808-0135, Japan
| | - Masatoshi Takeda
- Department of Rehabilitation, Osaka Kawasaki Rehabilitation University, 158 Mizuma, Kaizuka, Osaka, 597-0104, Japan
| | - Masakazu Imaoka
- Department of Rehabilitation, Osaka Kawasaki Rehabilitation University, 158 Mizuma, Kaizuka, Osaka, 597-0104, Japan
| |
Collapse
|
6
|
Hosseinzadeh Kasani P, Lee JE, Park C, Yun CH, Jang JW, Lee SA. Evaluation of nutritional status and clinical depression classification using an explainable machine learning method. Front Nutr 2023; 10:1165854. [PMID: 37229464 PMCID: PMC10203418 DOI: 10.3389/fnut.2023.1165854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 03/27/2023] [Indexed: 05/27/2023] Open
Abstract
Introduction Depression is a prevalent disorder worldwide, with potentially severe implications. It contributes significantly to an increased risk of diseases associated with multiple risk factors. Early accurate diagnosis of depressive symptoms is a critical first step toward management, intervention, and prevention. Various nutritional and dietary compounds have been suggested to be involved in the onset, maintenance, and severity of depressive disorders. Despite the challenges to better understanding the association between nutritional risk factors and the occurrence of depression, assessing the interplay of these markers through supervised machine learning remains to be fully explored. Methods This study aimed to determine the ability of machine learning-based decision support methods to identify the presence of depression using publicly available health data from the Korean National Health and Nutrition Examination Survey. Two exploration techniques, namely, uniform manifold approximation and projection and Pearson correlation, were performed for explanatory analysis among datasets. A grid search optimization with cross-validation was performed to fine-tune the models for classifying depression with the highest accuracy. Several performance measures, including accuracy, precision, recall, F1 score, confusion matrix, areas under the precision-recall and receiver operating characteristic curves, and calibration plot, were used to compare classifier performances. We further investigated the importance of the features provided: visualized interpretation using ELI5, partial dependence plots, and local interpretable using model-agnostic explanations and Shapley additive explanation for the prediction at both the population and individual levels. Results The best model achieved an accuracy of 86.18% for XGBoost and an area under the curve of 84.96% for the random forest model in original dataset and the XGBoost algorithm with an accuracy of 86.02% and an area under the curve of 85.34% in the quantile-based dataset. The explainable results revealed a complementary observation of the relative changes in feature values, and, thus, the importance of emergent depression risks could be identified. Discussion The strength of our approach is the large sample size used for training with a fine-tuned model. The machine learning-based analysis showed that the hyper-tuned model has empirically higher accuracy in classifying patients with depressive disorder, as evidenced by the set of interpretable experiments, and can be an effective solution for disease control.
Collapse
Affiliation(s)
- Payam Hosseinzadeh Kasani
- Department of Neurology, Kangwon National University Hospital, Chuncheon, Republic of Korea
- Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon, Republic of Korea
| | - Jung Eun Lee
- Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon, Republic of Korea
| | - Chihyun Park
- Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon, Republic of Korea
- Department of Computer Science and Engineering, Kangwon National University, Chuncheon, Republic of Korea
| | - Cheol-Heui Yun
- Department of Agricultural Biotechnology, Seoul National University, Seoul, Republic of Korea
- Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Jae-Won Jang
- Department of Neurology, Kangwon National University Hospital, Chuncheon, Republic of Korea
- Department of Neurology, Kangwon National University School of Medicine, Chuncheon, Republic of Korea
| | - Sang-Ah Lee
- Interdisciplinary Graduate Program in Medical Bigdata Convergence, Kangwon National University, Chuncheon, Republic of Korea
- Department of Preventive Medicine, College of Medicine, Kangwon National University, Chuncheon, Republic of Korea
| |
Collapse
|
7
|
Paul SG, Saha A, Biswas AA, Zulfiker MS, Arefin MS, Rahman MM, Reza AW. Combating Covid-19 using machine learning and deep learning: Applications, challenges, and future perspectives. ARRAY 2023; 17:100271. [PMID: 36530931 PMCID: PMC9737520 DOI: 10.1016/j.array.2022.100271] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 12/05/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
COVID-19, a worldwide pandemic that has affected many people and thousands of individuals have died due to COVID-19, during the last two years. Due to the benefits of Artificial Intelligence (AI) in X-ray image interpretation, sound analysis, diagnosis, patient monitoring, and CT image identification, it has been further researched in the area of medical science during the period of COVID-19. This study has assessed the performance and investigated different machine learning (ML), deep learning (DL), and combinations of various ML, DL, and AI approaches that have been employed in recent studies with diverse data formats to combat the problems that have arisen due to the COVID-19 pandemic. Finally, this study shows the comparison among the stand-alone ML and DL-based research works regarding the COVID-19 issues with the combinations of ML, DL, and AI-based research works. After in-depth analysis and comparison, this study responds to the proposed research questions and presents the future research directions in this context. This review work will guide different research groups to develop viable applications based on ML, DL, and AI models, and will also guide healthcare institutes, researchers, and governments by showing them how these techniques can ease the process of tackling the COVID-19.
Collapse
Affiliation(s)
- Showmick Guha Paul
- Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh
| | - Arpa Saha
- Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh
| | - Al Amin Biswas
- Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh,Corresponding author
| | - Md. Sabab Zulfiker
- Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh
| | - Mohammad Shamsul Arefin
- Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh,Department of Computer Science and Engineering, Chittagong University of Engineering and Technology, Chittagong, Bangladesh
| | - Md. Mahfujur Rahman
- Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh
| | - Ahmed Wasif Reza
- Department of Computer Science and Engineering, East West University, Dhaka, Bangladesh
| |
Collapse
|
8
|
Neves da Silva L, Domingues Fernandes R, Costa R, Oliveira A, Sá A, Mosca A, Oliveira B, Braga M, Mendes M, Carvalho A, Moreira P, Santa Cruz A. Prediction of Noninvasive Ventilation Failure in COVID-19 Patients: When Shall We Stop? Cureus 2022; 14:e30599. [PMID: 36420242 PMCID: PMC9679987 DOI: 10.7759/cureus.30599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/22/2022] [Indexed: 01/25/2023] Open
Abstract
INTRODUCTION In coronavirus disease 2019 (COVID-19), there are no tools available for the difficult task of recognizing which patients do not benefit from maintaining respiratory support, such as noninvasive ventilation (NIV). Identifying treatment failure is crucial to provide the best possible care and optimizing resources. Therefore, this study aimed to build a model that predicts NIV failure in patients who did not progress to invasive mechanical ventilation (IMV). METHODS This retrospective observational study included critical COVID-19 patients treated with NIV who did not progress to IMV. Patients were admitted to a Portuguese tertiary hospital between October 1, 2020, and March 31, 2021. The outcome of interest was NIV failure, defined as COVID-19-related in-hospital death. A binary logistic regression was performed, where the outcome (mortality) was the dependent variable. Using the independent variables of the logistic regression a decision-tree classification model was implemented. RESULTS The study sample, composed of 103 patients, had a mean age of 66.3 years (SD=14.9), of which 38.8% (40 patients) were female. Most patients (82.5%) were autonomous for basic activities of daily living. The prediction model was statistically significant with an area under the curve of 0.994 and a precision of 0.950. Higher age, a higher number of days with increases in the fraction of inspired oxygen (FiO2), a higher number of days of maximum expiratory positive airway pressure, a lower number of days on NIV, and a lower number of days from disease onset to hospital admission were, with statistical significance, associated with increased odds of death. A decision-tree classification model was then obtained to achieve the best combination of variables to predict the outcome of interest. CONCLUSIONS This study presents a model to predict death in COVID-19 patients treated with NIV in patients who did not progress to IMV, based on easily applicable variables that mainly reflect patients' evolution during hospitalization. Along with the decision-tree classification model, these original findings may help clinicians define the best therapeutical approach to each patient, prioritizing life-comforting measures when adequate, and optimizing resources, which is crucial within limited or overloaded healthcare systems. Further research is needed on this subject of treatment failure, not only to understand if these results are reproducible but also, in a broader sense, helping to fill this gap in modern medicine guidelines.
Collapse
Affiliation(s)
| | | | - Ricardo Costa
- Department of Internal Medicine, Hospital de Braga, Braga, PRT
| | - Ana Oliveira
- Department of Internal Medicine, Hospital de Braga, Braga, PRT
| | - Ana Sá
- Department of Internal Medicine, Hospital de Braga, Braga, PRT
| | - Ana Mosca
- Department of Internal Medicine, Hospital de Braga, Braga, PRT
| | | | - Marta Braga
- Department of Internal Medicine, Hospital de Braga, Braga, PRT
| | - Marta Mendes
- Department of Internal Medicine, Hospital de Braga, Braga, PRT
| | - Alexandre Carvalho
- Department of Internal Medicine, Hospital de Braga, Braga, PRT
- Life and Health Sciences Research Institute (ICVS), University of Minho, Braga, PRT
- ICVS/3B's-PT Government Associate Laboratory, University of Minho, Guimarães, PRT
| | - Pedro Moreira
- Psychological Neuroscience Laboratory, Psychology Research Center (CIPsi) School of Psychology, University of Minho, Braga, PRT
| | - André Santa Cruz
- Department of Internal Medicine, Hospital de Braga, Braga, PRT
- Life and Health Sciences Research Institute (ICVS), University of Minho, Braga, PRT
- ICVS/3B's-PT Government Associate Laboratory, University of Minho, Guimarães, PRT
| |
Collapse
|
9
|
Healthcare Supply Chain Management under COVID-19 Settings: The Existing Practices in Hong Kong and the United States. Healthcare (Basel) 2022; 10:healthcare10081549. [PMID: 36011207 PMCID: PMC9408565 DOI: 10.3390/healthcare10081549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 11/18/2022] Open
Abstract
COVID-19 is recognized as an infectious disease generated by serious acute respiratory syndrome coronavirus 2. COVID-19 has rapidly spread all over the world within a short time period. Due to the coronavirus pandemic transmitting quickly worldwide, the impact on global healthcare systems and healthcare supply chain management has been profound. The COVID-19 outbreak has seriously influenced the routine and daily operations of healthcare facilities and the entire healthcare supply chain management and has brough about a public health crisis. As making sure the availability of healthcare facilities during COVID-19 is crucial, the debate on how to take resilience actions for sustaining healthcare supply chain management has gained new momentum. Apart from the logistics of handling human remains in some countries, supplies within the communities are urgently needed for emergency response. This study focuses on a comprehensive evaluation of the current practices of healthcare supply chain management in Hong Kong and the United States under COVID-19 settings. A wide range of different aspects associated with healthcare supply chain operations are considered, including the best practices for using respirators, transport of life-saving medical supplies, contingency healthcare strategies, blood distribution, and best practices for using disinfectants, as well as human remains handling and logistics. The outcomes of the conducted research identify the existing healthcare supply chain trends in two major Eastern and Western regions of the world, Hong Kong and the United States, and determine the key challenges and propose some strategies that can improve the effectiveness of healthcare supply chain management under COVID-19 settings. The study highlights how to build resilient healthcare supply chain management preparedness for future emergencies.
Collapse
|
10
|
Ng A, Wei B, Jain J, Ward EA, Tandon SD, Moskowitz JT, Krogh-Jespersen S, Wakschlag LS, Alshurafa N. Predicting the Next-Day Perceived and Physiological Stress of Pregnant Women by Using Machine Learning and Explainability: Algorithm Development and Validation. JMIR Mhealth Uhealth 2022; 10:e33850. [PMID: 35917157 PMCID: PMC9382551 DOI: 10.2196/33850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Revised: 02/02/2022] [Accepted: 05/13/2022] [Indexed: 11/30/2022] Open
Abstract
Background Cognitive behavioral therapy–based interventions are effective in reducing prenatal stress, which can have severe adverse health effects on mothers and newborns if unaddressed. Predicting next-day physiological or perceived stress can help to inform and enable pre-emptive interventions for a likely physiologically and perceptibly stressful day. Machine learning models are useful tools that can be developed to predict next-day physiological and perceived stress by using data collected from the previous day. Such models can improve our understanding of the specific factors that predict physiological and perceived stress and allow researchers to develop systems that collect selected features for assessment in clinical trials to minimize the burden of data collection. Objective The aim of this study was to build and evaluate a machine-learned model that predicts next-day physiological and perceived stress by using sensor-based, ecological momentary assessment (EMA)–based, and intervention-based features and to explain the prediction results. Methods We enrolled pregnant women into a prospective proof-of-concept study and collected electrocardiography, EMA, and cognitive behavioral therapy intervention data over 12 weeks. We used the data to train and evaluate 6 machine learning models to predict next-day physiological and perceived stress. After selecting the best performing model, Shapley Additive Explanations were used to identify the feature importance and explainability of each feature. Results A total of 16 pregnant women enrolled in the study. Overall, 4157.18 hours of data were collected, and participants answered 2838 EMAs. After applying feature selection, 8 and 10 features were found to positively predict next-day physiological and perceived stress, respectively. A random forest classifier performed the best in predicting next-day physiological stress (F1 score of 0.84) and next-day perceived stress (F1 score of 0.74) by using all features. Although any subset of sensor-based, EMA-based, or intervention-based features could reliably predict next-day physiological stress, EMA-based features were necessary to predict next-day perceived stress. The analysis of explainability metrics showed that the prolonged duration of physiological stress was highly predictive of next-day physiological stress and that physiological stress and perceived stress were temporally divergent. Conclusions In this study, we were able to build interpretable machine learning models to predict next-day physiological and perceived stress, and we identified unique features that were highly predictive of next-day stress that can help to reduce the burden of data collection.
Collapse
Affiliation(s)
- Ada Ng
- McCormick School of Engineering, Northwestern University, Evanston, IL, United States
| | - Boyang Wei
- McCormick School of Engineering, Northwestern University, Evanston, IL, United States
| | - Jayalakshmi Jain
- McCormick School of Engineering, Northwestern University, Evanston, IL, United States
| | - Erin A Ward
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - S Darius Tandon
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Judith T Moskowitz
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | | | - Lauren S Wakschlag
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - Nabil Alshurafa
- Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| |
Collapse
|
11
|
Chowdhury NK, Kabir MA, Rahman MM, Islam SMS. Machine learning for detecting COVID-19 from cough sounds: An ensemble-based MCDM method. Comput Biol Med 2022; 145:105405. [PMID: 35318171 PMCID: PMC8926945 DOI: 10.1016/j.compbiomed.2022.105405] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 03/10/2022] [Accepted: 03/11/2022] [Indexed: 12/16/2022]
Abstract
This research aims to analyze the performance of state-of-the-art machine learning techniques for classifying COVID-19 from cough sounds and to identify the model(s) that consistently perform well across different cough datasets. Different performance evaluation metrics (precision, sensitivity, specificity, AUC, accuracy, etc.) make selecting the best performance model difficult. To address this issue, in this paper, we propose an ensemble-based multi-criteria decision making (MCDM) method for selecting top performance machine learning technique(s) for COVID-19 cough classification. We use four cough datasets, namely Cambridge, Coswara, Virufy, and NoCoCoDa to verify the proposed method. At first, our proposed method uses the audio features of cough samples and then applies machine learning (ML) techniques to classify them as COVID-19 or non-COVID-19. Then, we consider a multi-criteria decision-making (MCDM) method that combines ensemble technologies (i.e., soft and hard) to select the best model. In MCDM, we use the technique for order preference by similarity to ideal solution (TOPSIS) for ranking purposes, while entropy is applied to calculate evaluation criteria weights. In addition, we apply the feature reduction process through recursive feature elimination with cross-validation under different estimators. The results of our empirical evaluations show that the proposed method outperforms the state-of-the-art models. We see that when the proposed method is used for analysis using the Extra-Trees classifier, it has achieved promising results (AUC: 0.95, Precision: 1, Recall: 0.97).
Collapse
Affiliation(s)
- Nihad Karim Chowdhury
- Department of Computer Science and Engineering, University of Chittagong, Bangladesh,Corresponding author
| | - Muhammad Ashad Kabir
- Data Science Research Unit, School of Computing, Mathematics and Engineering, Charles Sturt University, NSW, Australia
| | - Md. Muhtadir Rahman
- Department of Computer Science and Engineering, University of Chittagong, Bangladesh
| | | |
Collapse
|
12
|
Feature Importance Analysis by Nowcasting Perspective to Predict COVID-19. MOBILE NETWORKS AND APPLICATIONS 2022. [PMCID: PMC9033308 DOI: 10.1007/s11036-022-01966-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
The present work raises an investigation about prediction and the feature importance to estimate the COVID-19 infection, using Machine Learning approach. Our work analyzed the inclusion of climatic features, mobility, government actions and the number of cases per health sub-territory from an existing model. The Random Forest with Permutation Importance method was used to assess the importance and list the thirty most relevant that represent the probability of infection of the disease. Among all features, the most important were: i) the variables per region health stand out, ii) period comprised between the date of notification and symptom onset, iii) symptoms features as fever, cough and sore throat, iv) variables of the traffic flow and mobility, and also v) wheathers features. The model was validated and reached an accuracy average of 81.82%, whereas the sensitivity and specificity achieved 87.52% and the 78.67% respectively in the infection estimate. Therefore, the proposed investigation represents an alternative to guide authorities in understanding aspects related to the disease.
Collapse
|
13
|
Bartoszko J, Dranitsaris G, Wilcox ME, Del Sorbo L, Mehta S, Peer M, Parotto M, Bogoch I, Riazi S. Development of a repeated-measures predictive model and clinical risk score for mortality in ventilated COVID-19 patients. Can J Anaesth 2022; 69:343-352. [PMID: 34931293 PMCID: PMC8687635 DOI: 10.1007/s12630-021-02163-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Revised: 09/13/2021] [Accepted: 10/04/2021] [Indexed: 11/09/2022] Open
Abstract
PURPOSE The COVID-19 pandemic has caused intensive care units (ICUs) to reach capacities requiring triage. A tool to predict mortality risk in ventilated patients with COVID-19 could inform decision-making and resource allocation, and allow population-level comparisons across institutions. METHODS This retrospective cohort study included all mechanically ventilated adults with COVID-19 admitted to three tertiary care ICUs in Toronto, Ontario, between 1 March 2020 and 15 December 2020. Generalized estimating equations were used to identify variables predictive of mortality. The primary outcome was the probability of death at three-day intervals from the time of ICU admission (day 0), with risk re-calculation every three days to day 15; the final risk calculation estimated the probability of death at day 15 and beyond. A numerical algorithm was developed from the final model coefficients. RESULTS One hundred twenty-seven patients were eligible for inclusion. Median ICU length of stay was 26.9 (interquartile range, 15.4-52.0) days. Overall mortality was 42%. From day 0 to 15, the variables age, temperature, lactate level, ventilation tidal volume, and vasopressor use significantly predicted mortality. Our final clinical risk score had an area under the receiver-operating characteristics curve of 0.9 (95% confidence interval [CI], 0.8 to 0.9). For every ten-point increase in risk score, the relative increase in the odds of death was approximately 4, with an odds ratio of 4.1 (95% CI, 2.9 to 5.9). CONCLUSION Our dynamic prediction tool for mortality in ventilated patients with COVID-19 has excellent diagnostic properties. Notwithstanding, external validation is required before widespread implementation.
Collapse
Affiliation(s)
- Justyna Bartoszko
- Department of Anesthesia and Pain Management, University Health Network, 323-200 Elizabeth St, Toronto, ON, M5G 2C4, Canada
- Department of Anesthesiology and Pain Medicine, University of Toronto, Toronto, ON, Canada
| | - George Dranitsaris
- Department of Public Health, Falk College, Syracuse University, Syracuse, NY, USA
| | - M Elizabeth Wilcox
- Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
- Department of Medicine (Critical Care Medicine), University Health Network, Toronto, ON, Canada
| | - Lorenzo Del Sorbo
- Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
- Department of Medicine (Critical Care Medicine), University Health Network, Toronto, ON, Canada
| | - Sangeeta Mehta
- Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
- Department of Medicine, Sinai Health System, Toronto, ON, Canada
| | - Miki Peer
- Department of Anesthesia and Pain Management, University Health Network, 323-200 Elizabeth St, Toronto, ON, M5G 2C4, Canada
| | - Matteo Parotto
- Department of Anesthesia and Pain Management, University Health Network, 323-200 Elizabeth St, Toronto, ON, M5G 2C4, Canada
- Interdepartmental Division of Critical Care Medicine, University of Toronto, Toronto, ON, Canada
| | - Isaac Bogoch
- Division of General Internal Medicine and Infectious Diseases, University Health Network, Toronto, ON, Canada
- Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Sheila Riazi
- Department of Anesthesia and Pain Management, University Health Network, 323-200 Elizabeth St, Toronto, ON, M5G 2C4, Canada.
- Department of Anesthesiology and Pain Medicine, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
14
|
Predictive Machine Learning Models and Survival Analysis for COVID-19 Prognosis Based on Hematochemical Parameters. SENSORS 2021; 21:s21248503. [PMID: 34960595 PMCID: PMC8705488 DOI: 10.3390/s21248503] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 12/15/2021] [Accepted: 12/17/2021] [Indexed: 12/26/2022]
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has affected hundreds of millions of individuals and caused millions of deaths worldwide. Predicting the clinical course of the disease is of pivotal importance to manage patients. Several studies have found hematochemical alterations in COVID-19 patients, such as inflammatory markers. We retrospectively analyzed the anamnestic data and laboratory parameters of 303 patients diagnosed with COVID-19 who were admitted to the Polyclinic Hospital of Bari during the first phase of the COVID-19 global pandemic. After the pre-processing phase, we performed a survival analysis with Kaplan–Meier curves and Cox Regression, with the aim to discover the most unfavorable predictors. The target outcomes were mortality or admission to the intensive care unit (ICU). Different machine learning models were also compared to realize a robust classifier relying on a low number of strongly significant factors to estimate the risk of death or admission to ICU. From the survival analysis, it emerged that the most significant laboratory parameters for both outcomes was C-reactive protein min; HR=17.963 (95% CI 6.548–49.277, p < 0.001) for death, HR=1.789 (95% CI 1.000–3.200, p = 0.050) for admission to ICU. The second most important parameter was Erythrocytes max; HR=1.765 (95% CI 1.141–2.729, p < 0.05) for death, HR=1.481 (95% CI 0.895–2.452, p = 0.127) for admission to ICU. The best model for predicting the risk of death was the decision tree, which resulted in ROC-AUC of 89.66%, whereas the best model for predicting the admission to ICU was support vector machine, which had ROC-AUC of 95.07%. The hematochemical predictors identified in this study can be utilized as a strong prognostic signature to characterize the severity of the disease in COVID-19 patients.
Collapse
|
15
|
Chung H, Park C, Kang WS, Lee J. Gender Bias in Artificial Intelligence: Severity Prediction at an Early Stage of COVID-19. Front Physiol 2021; 12:778720. [PMID: 34912242 PMCID: PMC8667070 DOI: 10.3389/fphys.2021.778720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 10/29/2021] [Indexed: 11/29/2022] Open
Abstract
Artificial intelligence (AI) technologies have been applied in various medical domains to predict patient outcomes with high accuracy. As AI becomes more widely adopted, the problem of model bias is increasingly apparent. In this study, we investigate the model bias that can occur when training a model using datasets for only one particular gender and aim to present new insights into the bias issue. For the investigation, we considered an AI model that predicts severity at an early stage based on the medical records of coronavirus disease (COVID-19) patients. For 5,601 confirmed COVID-19 patients, we used 37 medical records, namely, basic patient information, physical index, initial examination findings, clinical findings, comorbidity diseases, and general blood test results at an early stage. To investigate the gender-based AI model bias, we trained and evaluated two separate models-one that was trained using only the male group, and the other using only the female group. When the model trained by the male-group data was applied to the female testing data, the overall accuracy decreased-sensitivity from 0.93 to 0.86, specificity from 0.92 to 0.86, accuracy from 0.92 to 0.86, balanced accuracy from 0.93 to 0.86, and area under the curve (AUC) from 0.97 to 0.94. Similarly, when the model trained by the female-group data was applied to the male testing data, once again, the overall accuracy decreased-sensitivity from 0.97 to 0.90, specificity from 0.96 to 0.91, accuracy from 0.96 to 0.91, balanced accuracy from 0.96 to 0.90, and AUC from 0.97 to 0.95. Furthermore, when we evaluated each gender-dependent model with the test data from the same gender used for training, the resultant accuracy was also lower than that from the unbiased model.
Collapse
Affiliation(s)
- Heewon Chung
- Department of Biomedical Engineering, College of Electronics and Information, Kyung Hee University, Yongin-si, South Korea
| | - Chul Park
- Department of Internal Medicine, Wonkwang University School of Medicine, Iksan, South Korea
| | - Wu Seong Kang
- Department of Trauma Surgery, Cheju Halla General Hospital, Jeju-si, South Korea
| | - Jinseok Lee
- Department of Biomedical Engineering, College of Electronics and Information, Kyung Hee University, Yongin-si, South Korea
| |
Collapse
|
16
|
Doyle R. Machine Learning-Based Prediction of COVID-19 Mortality With Limited Attributes to Expedite Patient Prognosis and Triage: Retrospective Observational Study. JMIRX MED 2021; 2:e29392. [PMID: 34843609 PMCID: PMC8601033 DOI: 10.2196/29392] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/16/2021] [Accepted: 09/14/2021] [Indexed: 12/12/2022]
Abstract
Background The onset and development of the COVID-19 pandemic have placed pressure on hospital resources and staff worldwide. The integration of more streamlined predictive modeling in prognosis and triage–related decision-making can partly ease this pressure. Objective The objective of this study is to assess the performance impact of dimensionality reduction on COVID-19 mortality prediction models, demonstrating the high impact of a limited number of features to limit the need for complex variable gathering before reaching meaningful risk labelling in clinical settings. Methods Standard machine learning classifiers were employed to predict an outcome of either death or recovery using 25 patient-level variables, spanning symptoms, comorbidities, and demographic information, from a geographically diverse sample representing 17 countries. The effects of feature reduction on the data were tested by running classifiers on a high-quality data set of 212 patients with populated entries for all 25 available features. The full data set was compared to two reduced variations with 7 features and 1 feature, respectively, extracted using univariate mutual information and chi-square testing. Classifier performance on each data set was then assessed on the basis of accuracy, sensitivity, specificity, and received operating characteristic–derived area under the curve metrics to quantify benefit or loss from reduction. Results The performance of the classifiers on the 212-patient sample resulted in strong mortality detection, with the highest performing model achieving specificity of 90.7% (95% CI 89.1%-92.3%) and sensitivity of 92.0% (95% CI 91.0%-92.9%). Dimensionality reduction provided strong benefits for performance. The baseline accuracy of a random forest classifier increased from 89.2% (95% CI 88.0%-90.4%) to 92.5% (95% CI 91.9%-93.0%) when training on 7 chi-square–extracted features and to 90.8% (95% CI 89.8%-91.7%) when training on 7 mutual information–extracted features. Reduction impact on a separate logistic classifier was mixed; however, when present, losses were marginal compared to the extent of feature reduction, altogether showing that reduction either improves performance or can reduce the variable-sourcing burden at hospital admission with little performance loss. Extreme feature reduction to a single most salient feature, often age, demonstrated large standalone explanatory power, with the best-performing model achieving an accuracy of 81.6% (95% CI 81.1%-82.1%); this demonstrates the relatively marginal improvement that additional variables bring to the tested models. Conclusions Predictive statistical models have promising performance in early prediction of death among patients with COVID-19. Strong dimensionality reduction was shown to further improve baseline performance on selected classifiers and only marginally reduce it in others, highlighting the importance of feature reduction in future model construction and the feasibility of deprioritizing large, hard-to-source, and nonessential feature sets in real world settings.
Collapse
|
17
|
Wong KCY, Xiang Y, Yin L, So HC. Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach. JMIR Public Health Surveill 2021; 7:e29544. [PMID: 34591027 PMCID: PMC8485986 DOI: 10.2196/29544] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/24/2021] [Accepted: 07/31/2021] [Indexed: 01/08/2023] Open
Abstract
Background COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. More accurate prediction of those at risk of developing severe infections is of high clinical importance. Objective Based on the UK Biobank (UKBB), we aimed to build machine learning models to predict the risk of developing severe or fatal infections, and uncover major risk factors involved. Methods We first restricted the analysis to infected individuals (n=7846), then performed analysis at a population level, considering those with no known infection as controls (ncontrols=465,728). Hospitalization was used as a proxy for severity. A total of 97 clinical variables (collected prior to the COVID-19 outbreak) covering demographic variables, comorbidities, blood measurements (eg, hematological/liver/renal function/metabolic parameters), anthropometric measures, and other risk factors (eg, smoking/drinking) were included as predictors. We also constructed a simplified (lite) prediction model using 27 covariates that can be more easily obtained (demographic and comorbidity data). XGboost (gradient-boosted trees) was used for prediction and predictive performance was assessed by cross-validation. Variable importance was quantified by Shapley values (ShapVal), permutation importance (PermImp), and accuracy gain. Shapley dependency and interaction plots were used to evaluate the pattern of relationships between risk factors and outcomes. Results A total of 2386 severe and 477 fatal cases were identified. For analyses within infected individuals (n=7846), our prediction model achieved area under the receiving-operating characteristic curve (AUC–ROC) of 0.723 (95% CI 0.711-0.736) and 0.814 (95% CI 0.791-0.838) for severe and fatal infections, respectively. The top 5 contributing factors (sorted by ShapVal) for severity were age, number of drugs taken (cnt_tx), cystatin C (reflecting renal function), waist-to-hip ratio (WHR), and Townsend deprivation index (TDI). For mortality, the top features were age, testosterone, cnt_tx, waist circumference (WC), and red cell distribution width. For analyses involving the whole UKBB population, AUCs for severity and fatality were 0.696 (95% CI 0.684-0.708) and 0.825 (95% CI 0.802-0.848), respectively. The same top 5 risk factors were identified for both outcomes, namely, age, cnt_tx, WC, WHR, and TDI. Apart from the above, age, cystatin C, TDI, and cnt_tx were among the top 10 across all 4 analyses. Other diseases top ranked by ShapVal or PermImp were type 2 diabetes mellitus (T2DM), coronary artery disease, atrial fibrillation, and dementia, among others. For the “lite” models, predictive performances were broadly similar, with estimated AUCs of 0.716, 0.818, 0.696, and 0.830, respectively. The top ranked variables were similar to above, including age, cnt_tx, WC, sex (male), and T2DM. Conclusions We identified numerous baseline clinical risk factors for severe/fatal infection by XGboost. For example, age, central obesity, impaired renal function, multiple comorbidities, and cardiometabolic abnormalities may predispose to poorer outcomes. The prediction models may be useful at a population level to identify those susceptible to developing severe/fatal infections, facilitating targeted prevention strategies. A risk-prediction tool is also available online. Further replications in independent cohorts are required to verify our findings.
Collapse
Affiliation(s)
- Kenneth Chi-Yin Wong
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Yong Xiang
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Liangying Yin
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Hon-Cheong So
- School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China.,KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Institute of Zoology and The Chinese University of Hong Kong, Kunming, China.,CUHK Shenzhen Research Institute, Shenzhen, China.,Department of Psychiatry, The Chinese University of Hong Kong, Hong Kong, China.,Margaret K.L. Cheung Research Centre for Management of Parkinsonism, The Chinese University of Hong Kong, Hong Kong, China.,Brain and Mind Institute, The Chinese University of Hong Kong, Hong Kong, China.,Hong Kong Branch of the Chinese Academy of Sciences Center for Excellence in Animal Evolution and Genetics, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|