1
|
Safdari A, Keshav CS, Mody D, Verma K, Kaushal U, Burra VK, Ray S, Bandyopadhyay D. The external validity of machine learning-based prediction scores from hematological parameters of COVID-19: A study using hospital records from Brazil, Italy, and Western Europe. PLoS One 2025; 20:e0316467. [PMID: 39903736 PMCID: PMC11793750 DOI: 10.1371/journal.pone.0316467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 12/11/2024] [Indexed: 02/06/2025] Open
Abstract
The unprecedented worldwide pandemic caused by COVID-19 has motivated several research groups to develop machine-learning based approaches that aim to automate the diagnosis or screening of COVID-19, in large-scale. The gold standard for COVID-19 detection, quantitative-Real-Time-Polymerase-Chain-Reaction (qRT-PCR), is expensive and time-consuming. Alternatively, haematology-based detections were fast and near-accurate, although those were less explored. The external-validity of the haematology-based COVID-19-predictions on diverse populations are yet to be fully investigated. Here we report external-validity of machine learning-based prediction scores from haematological parameters recorded in different hospitals of Brazil, Italy, and Western Europe (raw sample size, 195554). The XGBoost classifier performed consistently better (out of seven ML classifiers) on all the datasets. The working models include a set of either four or fourteen haematological parameters. The internal performances of the XGBoost models (AUC scores range from 84% to 97%) were superior to ML models reported in the literature for some of these datasets (AUC scores range from 84% to 87%). The meta-validation on the external performances revealed the reliability of the performance (AUC score 86%) along with good accuracy of the probabilistic prediction (Brier score 14%), particularly when the model was trained and tested on fourteen haematological parameters from the same country (Brazil). The external performance was reduced when the model was trained on datasets from Italy and tested on Brazil (AUC score 69%) and Western Europe (AUC score 65%); presumably affected by factors, like, ethnicity, phenotype, immunity, reference ranges, across the populations. The state-of-the-art in the present study is the development of a COVID-19 prediction tool that is reliable and parsimonious, using a fewer number of hematological features, in comparison to the earlier study with meta-validation, based on sufficient sample size (n = 195554). Thus, current models can be applied at other demographic locations, preferably, with prior training of the model on the same population. Availability: https://covipred.bits-hyderabad.ac.in/home; https://github.com/debashreebanerjee/CoviPred.
Collapse
Affiliation(s)
- Ali Safdari
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani, Hyderabad Campus, Hyderabad, Telangana, India
| | - Chanda Sai Keshav
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani, Hyderabad Campus, Hyderabad, Telangana, India
| | - Deepanshu Mody
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani, Hyderabad Campus, Hyderabad, Telangana, India
| | - Kshitij Verma
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani, Hyderabad Campus, Hyderabad, Telangana, India
| | - Utsav Kaushal
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani, Hyderabad Campus, Hyderabad, Telangana, India
| | - Vaadeendra Kumar Burra
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani, Hyderabad Campus, Hyderabad, Telangana, India
| | - Sibnath Ray
- Gencrest Private Limited, 301-302, B-Wing, Corporate Center, Mumbai, India
| | - Debashree Bandyopadhyay
- Department of Biological Sciences, Birla Institute of Technology and Science, Pilani, Hyderabad Campus, Hyderabad, Telangana, India
| |
Collapse
|
2
|
Abbasi Habashi S, Koyuncu M, Alizadehsani R. A Survey of COVID-19 Diagnosis Using Routine Blood Tests with the Aid of Artificial Intelligence Techniques. Diagnostics (Basel) 2023; 13:1749. [PMID: 37238232 PMCID: PMC10217633 DOI: 10.3390/diagnostics13101749] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 04/19/2023] [Accepted: 04/29/2023] [Indexed: 05/28/2023] Open
Abstract
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), causing a disease called COVID-19, is a class of acute respiratory syndrome that has considerably affected the global economy and healthcare system. This virus is diagnosed using a traditional technique known as the Reverse Transcription Polymerase Chain Reaction (RT-PCR) test. However, RT-PCR customarily outputs a lot of false-negative and incorrect results. Current works indicate that COVID-19 can also be diagnosed using imaging resolutions, including CT scans, X-rays, and blood tests. Nevertheless, X-rays and CT scans cannot always be used for patient screening because of high costs, radiation doses, and an insufficient number of devices. Therefore, there is a requirement for a less expensive and faster diagnostic model to recognize the positive and negative cases of COVID-19. Blood tests are easily performed and cost less than RT-PCR and imaging tests. Since biochemical parameters in routine blood tests vary during the COVID-19 infection, they may supply physicians with exact information about the diagnosis of COVID-19. This study reviewed some newly emerging artificial intelligence (AI)-based methods to diagnose COVID-19 using routine blood tests. We gathered information about research resources and inspected 92 articles that were carefully chosen from a variety of publishers, such as IEEE, Springer, Elsevier, and MDPI. Then, these 92 studies are classified into two tables which contain articles that use machine Learning and deep Learning models to diagnose COVID-19 while using routine blood test datasets. In these studies, for diagnosing COVID-19, Random Forest and logistic regression are the most widely used machine learning methods and the most widely used performance metrics are accuracy, sensitivity, specificity, and AUC. Finally, we conclude by discussing and analyzing these studies which use machine learning and deep learning models and routine blood test datasets for COVID-19 detection. This survey can be the starting point for a novice-/beginner-level researcher to perform on COVID-19 classification.
Collapse
Affiliation(s)
| | - Murat Koyuncu
- Department of Information Systems Engineering, Atilim University, 06830 Ankara, Turkey;
| | - Roohallah Alizadehsani
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Waurn Ponds, Geelong, VIC 3216, Australia
| |
Collapse
|
3
|
Bayraktar M, Tekin E, Kocak MN. How to diagnose COVID-19 in family practice? Usability of complete blood count as a COVID-19 diagnostic tool: a cross-sectional study in Turkey. BMJ Open 2023; 13:e069493. [PMID: 37068894 PMCID: PMC10111184 DOI: 10.1136/bmjopen-2022-069493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 04/19/2023] Open
Abstract
OBJECTIVE COVID-19 is currently diagnosed in hospital settings. An easy and practical diagnosis of COVID-19 is needed in primary care. For this purpose, the usability of complete blood count in the diagnosis of COVID-19 was investigated. DESIGN Retrospective, cross-sectional study. SETTING Single-centre study in a tertiary university hospital in Erzurum, Turkey. PARTICIPANTS Between March 2020 and February 2021, patients aged 18-70 years who applied to the hospital and underwent both complete blood count and reverse-transcription-PCR tests for COVID-19 were included and compared. Conditions affecting the test parameters (oncological-haematological conditions, chronic diseases, drug usage) were excluded. OUTCOME MEASURE The complete blood count and COVID-19 results of eligible patients identified using diagnostic codes [U07.3 (COVID-19) or Z03.8 (observation for other suspected diseases and conditions)] were investigated. RESULTS Of the 978 patients included, 39.4% (n=385) were positive for COVID-19 and 60.6% (n=593) were negative. The mean age was 41.5±14.5 years, and 53.9% (n=527) were male. COVID-19-positive patients were found to have significantly lower leucocyte, neutrophil, lymphocyte, monocyte, basophil, platelet and immature granulocyte (IG) values (p<0.001). Neutrophil/lymphocyte, neutrophil/monocyte and IG/lymphocyte ratios were also found to be significantly decreased (p<0.001). With logistic regression analysis, low lymphocyte count (OR 0.695; 95% CI 0.597 to 0.809) and low red cell distribution width-coefficient of variation (RDW-CV) (OR 0.887; 95% CI 0.818 to 0.962) were significantly associated with COVID-19 positivity. In receiver operating characteristic analysis, the cut-off values of lymphocyte and RDW-CV were 0.745 and 12.35, respectively. CONCLUSION Although our study was designed retrospectively and reflects regional data, it is important to determine that low lymphocyte count and RDW-CV can be used in the diagnosis of COVID-19 in primary care.
Collapse
Affiliation(s)
| | - Erdal Tekin
- Emergency Medicine, Ataturk University, Erzurum, Turkey
| | | |
Collapse
|
4
|
Altantawy DA, Kishk SS. Equilibrium-based COVID-19 diagnosis from routine blood tests: A sparse deep convolutional model. EXPERT SYSTEMS WITH APPLICATIONS 2023; 213:118935. [PMID: 36210961 PMCID: PMC9527205 DOI: 10.1016/j.eswa.2022.118935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 06/21/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
SARS-CoV2 (COVID-19) is the virus that causes the pandemic that has severely impacted human society with a massive death toll worldwide. Hence, there is a persistent need for fast and reliable automatic tools to help health teams in making clinical decisions. Predictive models could potentially ease the strain on healthcare systems by early and reliable screening of COVID-19 patients which helps to combat the spread of the disease. Recent studies have reported some key advantages of employing routine blood tests for initial screening of COVID-19 patients. Thus, in this paper, we propose a novel COVID-19 prediction model based on routine blood tests. In this model, we depend on exploiting the real dependency among the employed feature pool by a sparsification procedure. In this sparse domain, a hybrid feature selection mechanism is proposed. This mechanism fuses the selected features from two perspectives, the first is Pearson correlation and the second is a new Minkowski-based equilibrium optimizer (MEO). Then, the selected features are fed into a new 1D Convolutional Neural Network (1DCNN) for a final diagnosis decision. The proposed prediction model is tested with a new public dataset from San Raphael Hospital, Milan, Italy, i.e., OSR dataset which has two sub-datasets. According to the experimental results, the proposed model outperforms the state-of-the-art techniques with an average testing accuracy of 98.5% while we employ only less than half the size of the feature pool, i.e., we need only less than half the given blood tests in the employed dataset to get a final diagnosis decision.
Collapse
Affiliation(s)
- Doaa A Altantawy
- Electronics and Communications Engineering Department, Faculty of Engineering, Mansoura University, 60 El-Gomhoria Street, Mansoura, Egypt
| | - Sherif S Kishk
- Electronics and Communications Engineering Department, Faculty of Engineering, Mansoura University, 60 El-Gomhoria Street, Mansoura, Egypt
| |
Collapse
|
5
|
Huyut MT. Automatic Detection of Severely and Mildly Infected COVID-19 Patients with Supervised Machine Learning Models. Ing Rech Biomed 2023; 44:100725. [PMID: 35673548 PMCID: PMC9158375 DOI: 10.1016/j.irbm.2022.05.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 04/24/2022] [Accepted: 05/29/2022] [Indexed: 02/07/2023]
Abstract
Objectives When the prognosis of COVID-19 disease can be detected early, the intense-pressure and loss of workforce in health-services can be partially reduced. The primary-purpose of this article is to determine the feature-dataset consisting of the routine-blood-values (RBV) and demographic-data that affect the prognosis of COVID-19. Second, by applying the feature-dataset to the supervised machine-learning (ML) models, it is to identify severely and mildly infected COVID-19 patients at the time of admission. Material and methods The sample of this study consists of severely (n = 192) and mildly (n = 4010) infected-patients hospitalized with the diagnosis of COVID-19 between March-September, 2021. The RBV-data measured at the time of admission and age-gender characteristics of these patients were analyzed retrospectively. For the selection of the features, the minimum-redundancy-maximum-relevance (MRMR) method, principal-components-analysis and forward-multiple-logistics-regression analyzes were used. The features set were statistically compared between mild and severe infected-patients. Then, the performances of various supervised-ML-models were compared in identifying severely and mildly infected-patients using the feature set. Results In this study, 28 RBV-parameters and age-variable were found as the feature-dataset. The effect of features on the prognosis of the disease has been clinically proven. The ML-models with the highest overall-accuracy in identifying patient-groups were found respectively, as follows: local-weighted-learning (LWL)-97.86%, K-star (K*)-96.31%, Naive-Bayes (NB)-95.36% and k-nearest-neighbor (KNN)-94.05%. Also, the most successful models with the highest area-under-the-receiver-operating-characteristic-curve (AUC) values in identifying patient groups were found respectively, as follows: LWL-0.95%, K*-0.91%, NB-0.85% and KNN-0.75%. Conclusion The findings in this article have significant a motivation for the healthcare professionals to detect at admission severely and mildly infected COVID-19 patients.
Collapse
Affiliation(s)
- M T Huyut
- Department of Biostatistics and Medical Informatics, Medical Faculty, Erzincan Binali Yıldırım University, 24100, Erzincan, Turkey
| |
Collapse
|
6
|
Cardozo G, Tirloni SF, Pereira Moro AR, Marques JLB. Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e40473. [PMID: 36644762 PMCID: PMC9828303 DOI: 10.2196/40473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/28/2022] [Accepted: 10/31/2022] [Indexed: 11/05/2022]
Abstract
Background In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases.
Collapse
Affiliation(s)
- Glauco Cardozo
- Federal Institute of Santa Catarina Florianópolis Brazil
| | | | | | | |
Collapse
|
7
|
Boer AK, Deneer R, Maas M, Ammerlaan HSM, van Balkom RHH, Thijssen WAHM, Bennenbroek S, Leers M, Martens RJH, Buijs MM, Kerremans JJ, Messchaert M, van Suijlen JJ, van Riel NAW, Scharnhorst V. Development and validation of an early warning score to identify COVID-19 in the emergency department based on routine laboratory tests: a multicentre case-control study. BMJ Open 2022; 12:e059111. [PMID: 35922102 PMCID: PMC9352566 DOI: 10.1136/bmjopen-2021-059111] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 06/10/2022] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVES Identifying patients with a possible SARS-CoV-2 infection in the emergency department (ED) is challenging. Symptoms differ, incidence rates vary and test capacity may be limited. As PCR-testing all ED patients is neither feasible nor effective in most centres, a rapid, objective, low-cost early warning score to triage ED patients for a possible infection is developed. DESIGN Case-control study. SETTING Secondary and tertiary hospitals in the Netherlands. PARTICIPANTS The study included patients presenting to the ED with venous blood sampling from July 2019 to July 2020 (n=10 417, 279 SARS-CoV-2-positive). The temporal validation cohort covered the period from July 2020 to October 2021 (n=14 080, 1093 SARS-CoV-2-positive). The external validation cohort consisted of patients presenting to the ED of three hospitals in the Netherlands (n=12 061, 652 SARS-CoV-2-positive). PRIMARY OUTCOME MEASURES The primary outcome was one or more positive SARS-CoV-2 PCR test results within 1 day prior to or 1 week after ED presentation. RESULTS The resulting 'CoLab-score' consists of 10 routine laboratory measurements and age. The score showed good discriminative ability (AUC: 0.930, 95% CI 0.909 to 0.945). The lowest CoLab-score had high sensitivity for COVID-19 (0.984, 95% CI 0.970 to 0.991; specificity: 0.411, 95% CI 0.285 to 0.520). Conversely, the highest score had high specificity (0.978, 95% CI 0.973 to 0.983; sensitivity: 0.608, 95% CI 0.522 to 0.685). The results were confirmed in temporal and external validation. CONCLUSIONS The CoLab-score is based on routine laboratory measurements and is available within 1 hour after presentation. Depending on the prevalence, COVID-19 may be safely ruled out in over one-third of ED presentations. Highly suspect cases can be identified regardless of presenting symptoms. The CoLab-score is continuous, in contrast to the binary outcome of lateral flow testing, and can guide PCR testing and triage ED patients.
Collapse
Affiliation(s)
- Arjen-Kars Boer
- Department of Laboratory Medicine, Catharina Hospital, Eindhoven, The Netherlands
| | - Ruben Deneer
- Department of Laboratory Medicine, Catharina Hospital, Eindhoven, The Netherlands
- Faculty of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| | - Maaike Maas
- Department of Emergency Medicine, Catharina Hospital, Eindhoven, The Netherlands
| | - Heidi S M Ammerlaan
- Department of Internal Medicine, Catharina Hospital, Eindhoven, The Netherlands
| | | | - Wendy A H M Thijssen
- Department of Emergency Medicine, Catharina Ziekenhuis, Eindhoven, The Netherlands
| | - Sophie Bennenbroek
- Department of Emergency Medicine, Catharina Ziekenhuis, Eindhoven, The Netherlands
| | - Mathie Leers
- Department of Clinical Chemistry and Hematology, Zuyderland Medical Centre Heerlen, Heerlen, The Netherlands
| | - Remy J H Martens
- Department of Clinical Chemistry and Hematology, Zuyderland Medical Centre Heerlen, Heerlen, The Netherlands
| | | | - Jos J Kerremans
- Department of Medical Microbiology and Infection Prevention, Alrijne Hospital, Leiderdorp, The Netherlands
| | - Muriël Messchaert
- Department of Clinical Chemistry, Gelre Hospitals, Apeldoorn, The Netherlands
| | | | - Natal A W van Riel
- Faculty of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Amsterdam University Medical Center, University of Amsterdam, Amsterdam, The Netherlands
| | - Volkher Scharnhorst
- Department of Laboratory Medicine, Catharina Hospital, Eindhoven, The Netherlands
- Faculty of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
| |
Collapse
|
8
|
Diagnosis and Prognosis of COVID-19 Disease Using Routine Blood Values and LogNNet Neural Network. SENSORS 2022; 22:s22134820. [PMID: 35808317 PMCID: PMC9269123 DOI: 10.3390/s22134820] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 06/16/2022] [Accepted: 06/23/2022] [Indexed: 01/08/2023]
Abstract
Since February 2020, the world has been engaged in an intense struggle with the COVID-19 disease, and health systems have come under tragic pressure as the disease turned into a pandemic. The aim of this study is to obtain the most effective routine blood values (RBV) in the diagnosis and prognosis of COVID-19 using a backward feature elimination algorithm for the LogNNet reservoir neural network. The first dataset in the study consists of a total of 5296 patients with the same number of negative and positive COVID-19 tests. The LogNNet-model achieved the accuracy rate of 99.5% in the diagnosis of the disease with 46 features and the accuracy of 99.17% with only mean corpuscular hemoglobin concentration, mean corpuscular hemoglobin, and activated partial prothrombin time. The second dataset consists of a total of 3899 patients with a diagnosis of COVID-19 who were treated in hospital, of which 203 were severe patients and 3696 were mild patients. The model reached the accuracy rate of 94.4% in determining the prognosis of the disease with 48 features and the accuracy of 82.7% with only erythrocyte sedimentation rate, neutrophil count, and C reactive protein features. Our method will reduce the negative pressures on the health sector and help doctors to understand the pathogenesis of COVID-19 using the key features. The method is promising to create mobile health monitoring systems in the Internet of Things.
Collapse
|
9
|
Clinical and Laboratory Approach to Diagnose COVID-19 Using Machine Learning. Interdiscip Sci 2022; 14:452-470. [PMID: 35133633 PMCID: PMC8846962 DOI: 10.1007/s12539-021-00499-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 12/17/2021] [Accepted: 12/23/2021] [Indexed: 12/18/2022]
Abstract
Coronavirus 2 (SARS-CoV-2), often known by the name COVID-19, is a type of acute respiratory syndrome that has had a significant influence on both economy and health infrastructure worldwide. This novel virus is diagnosed utilising a conventional method known as the RT-PCR (Reverse Transcription Polymerase Chain Reaction) test. This approach, however, produces a lot of false-negative and erroneous outcomes. According to recent studies, COVID-19 can also be diagnosed using X-rays, CT scans, blood tests and cough sounds. In this article, we use blood tests and machine learning to predict the diagnosis of this deadly virus. We also present an extensive review of various existing machine-learning applications that diagnose COVID-19 from clinical and laboratory markers. Four different classifiers along with a technique called Synthetic Minority Oversampling Technique (SMOTE) were used for classification. Shapley Additive Explanations (SHAP) method was utilized to calculate the gravity of each feature and it was found that eosinophils, monocytes, leukocytes and platelets were the most critical blood parameters that distinguished COVID-19 infection for our dataset. These classifiers can be utilized in conjunction with RT-PCR tests to improve sensitivity and in emergency situations such as a pandemic outbreak that might happen due to new strains of the virus. The positive results indicate the prospective use of an automated framework that could help clinicians and medical personnel diagnose and screen patients.
Collapse
|
10
|
Kuo KM, Talley PC, Chang CS. The Accuracy of Machine Learning Approaches Using Non-image Data for the Prediction of COVID-19: A Meta-Analysis. Int J Med Inform 2022; 164:104791. [PMID: 35594810 PMCID: PMC9098530 DOI: 10.1016/j.ijmedinf.2022.104791] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 04/08/2022] [Accepted: 05/09/2022] [Indexed: 12/12/2022]
Abstract
Objective COVID-19 is a novel, severely contagious disease with enormous negative impact on humanity as well as the world economy. An expeditious, feasible tool for detecting COVID-19 remains yet elusive. Recently, there has been a surge of interest in applying machine learning techniques to predict COVID-19 using non-image data. We have therefore undertaken a meta-analysis to quantify the diagnostic performance of machine learning models facilitating the prediction of COVID-19. Materials and methods A comprehensive electronic database search for the period between January 1st, 2021 and December 3rd, 2021 was undertaken in order to identify eligible studies relevant to this meta-analysis. Summary sensitivity, specificity, and the area under receiver operating characteristic curves were used to assess potential diagnostic accuracy. Risk of bias was assessed by means of a revised Quality Assessment of Diagnostic Studies. Results A total of 30 studies, including 34 models, met all of the inclusion criteria. Summary sensitivity, specificity, and area under receiver operating characteristic curves were 0.86, 0.86, and 0.91, respectively. The purpose of machine learning models, class imbalance, and feature selection are significant covariates useful in explaining the between-study heterogeneity, in terms of both sensitivity and specificity. Conclusions Our study findings show that non-image data can be used to predict COVID-19 with an acceptable performance. Further, class imbalance and feature selection are suggested to be incorporated whenever building models for the prediction of COVID-19, thus improving further diagnostic performance.
Collapse
|
11
|
Abayomi-Alli OO, Damaševičius R, Maskeliūnas R, Misra S. An Ensemble Learning Model for COVID-19 Detection from Blood Test Samples. SENSORS 2022; 22:s22062224. [PMID: 35336395 PMCID: PMC8955536 DOI: 10.3390/s22062224] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Revised: 02/28/2022] [Accepted: 03/10/2022] [Indexed: 02/04/2023]
Abstract
Current research endeavors in the application of artificial intelligence (AI) methods in the diagnosis of the COVID-19 disease has proven indispensable with very promising results. Despite these promising results, there are still limitations in real-time detection of COVID-19 using reverse transcription polymerase chain reaction (RT-PCR) test data, such as limited datasets, imbalance classes, a high misclassification rate of models, and the need for specialized research in identifying the best features and thus improving prediction rates. This study aims to investigate and apply the ensemble learning approach to develop prediction models for effective detection of COVID-19 using routine laboratory blood test results. Hence, an ensemble machine learning-based COVID-19 detection system is presented, aiming to aid clinicians to diagnose this virus effectively. The experiment was conducted using custom convolutional neural network (CNN) models as a first-stage classifier and 15 supervised machine learning algorithms as a second-stage classifier: K-Nearest Neighbors, Support Vector Machine (Linear and RBF), Naive Bayes, Decision Tree, Random Forest, MultiLayer Perceptron, AdaBoost, ExtraTrees, Logistic Regression, Linear and Quadratic Discriminant Analysis (LDA/QDA), Passive, Ridge, and Stochastic Gradient Descent Classifier. Our findings show that an ensemble learning model based on DNN and ExtraTrees achieved a mean accuracy of 99.28% and area under curve (AUC) of 99.4%, while AdaBoost gave a mean accuracy of 99.28% and AUC of 98.8% on the San Raffaele Hospital dataset, respectively. The comparison of the proposed COVID-19 detection approach with other state-of-the-art approaches using the same dataset shows that the proposed method outperforms several other COVID-19 diagnostics methods.
Collapse
Affiliation(s)
- Olusola O. Abayomi-Alli
- Department of Software Engineering, Kaunas University of Technology, 51368 Kaunas, Lithuania;
| | - Robertas Damaševičius
- Department of Software Engineering, Kaunas University of Technology, 51368 Kaunas, Lithuania;
- Correspondence:
| | - Rytis Maskeliūnas
- Department of Multimedia Engineering, Kaunas University of Technology, 51368 Kaunas, Lithuania;
| | - Sanjay Misra
- Department of Computer Science and Communication, Ostfold University College, 3001 Halden, Norway;
| |
Collapse
|
12
|
Lunn Y, Patel R, Sokphat TS, Bourn L, Fields K, Fitzgerald A, Sundaresan V, Thomas G, Korvink M, Gunn LH. Assessing Hospital Resource Utilization with Application to Imaging for Patients Diagnosed with Prostate Cancer. Healthcare (Basel) 2022; 10:healthcare10020248. [PMID: 35206863 PMCID: PMC8872431 DOI: 10.3390/healthcare10020248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 01/25/2022] [Accepted: 01/25/2022] [Indexed: 02/04/2023] Open
Abstract
Resource utilization measures are typically modeled by relying on clinical characteristics. However, in some settings, those clinical markers are not available, and hospitals are unable to explore potential inefficiencies or resource misutilization. We propose a novel approach to exploring misutilization that solely relies on administrative data in the form of patient characteristics and competing resource utilization, with the latter being a novel addition. We demonstrate this approach in a 2019 patient cohort diagnosed with prostate cancer (n = 51,111) across 1056 U.S. healthcare facilities using Premier, Inc.’s (Charlotte, NC, USA) all payor databases. A multivariate logistic regression model was fitted using administrative information and competing resources utilization. A decision curve analysis informed by industry average standards of utilization allows for a definition of misutilization with regards to these industry standards. Odds ratios were extracted at the patient level to demonstrate differences in misutilization by patient characteristics, such as race; Black individuals experienced higher under-utilization compared to White individuals (p < 0.0001). Volume-adjusted Poisson rate regression models allow for the identification and ranking of facilities with large departures in utilization. The proposed approach is scalable and easily generalizable to other diseases and resources and can be complemented with clinical information from electronic health record information, when available.
Collapse
Affiliation(s)
- Yazmine Lunn
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (Y.L.); (R.P.); (T.S.S.); (L.B.); (K.F.); (A.F.); (V.S.)
| | - Rudra Patel
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (Y.L.); (R.P.); (T.S.S.); (L.B.); (K.F.); (A.F.); (V.S.)
| | - Timothy S. Sokphat
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (Y.L.); (R.P.); (T.S.S.); (L.B.); (K.F.); (A.F.); (V.S.)
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;
| | - Laura Bourn
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (Y.L.); (R.P.); (T.S.S.); (L.B.); (K.F.); (A.F.); (V.S.)
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;
| | - Khalil Fields
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (Y.L.); (R.P.); (T.S.S.); (L.B.); (K.F.); (A.F.); (V.S.)
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;
| | - Anna Fitzgerald
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (Y.L.); (R.P.); (T.S.S.); (L.B.); (K.F.); (A.F.); (V.S.)
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;
| | - Vandana Sundaresan
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (Y.L.); (R.P.); (T.S.S.); (L.B.); (K.F.); (A.F.); (V.S.)
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;
| | - Greeshma Thomas
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;
| | | | - Laura H. Gunn
- School of Data Science, University of North Carolina at Charlotte, Charlotte, NC 28223, USA; (Y.L.); (R.P.); (T.S.S.); (L.B.); (K.F.); (A.F.); (V.S.)
- Department of Public Health Sciences, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;
- Faculty of Medicine, School of Public Health, Imperial College London, London W6 8RP, UK
- Correspondence:
| |
Collapse
|
13
|
Meystre SM, Heider PM, Kim Y, Davis M, Obeid J, Madory J, Alekseyenko AV. Natural language processing enabling COVID-19 predictive analytics to support data-driven patient advising and pooled testing. J Am Med Inform Assoc 2021; 29:12-21. [PMID: 34415311 PMCID: PMC8714262 DOI: 10.1093/jamia/ocab186] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/04/2021] [Accepted: 08/16/2021] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE The COVID-19 (coronavirus disease 2019) pandemic response at the Medical University of South Carolina included virtual care visits for patients with suspected severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. The telehealth system used for these visits only exports a text note to integrate with the electronic health record, but structured and coded information about COVID-19 (eg, exposure, risk factors, symptoms) was needed to support clinical care and early research as well as predictive analytics for data-driven patient advising and pooled testing. MATERIALS AND METHODS To capture COVID-19 information from multiple sources, a new data mart and a new natural language processing (NLP) application prototype were developed. The NLP application combined reused components with dictionaries and rules crafted by domain experts. It was deployed as a Web service for hourly processing of new data from patients assessed or treated for COVID-19. The extracted information was then used to develop algorithms predicting SARS-CoV-2 diagnostic test results based on symptoms and exposure information. RESULTS The dedicated data mart and NLP application were developed and deployed in a mere 10-day sprint in March 2020. The NLP application was evaluated with good accuracy (85.8% recall and 81.5% precision). The SARS-CoV-2 testing predictive analytics algorithms were configured to provide patients with data-driven COVID-19 testing advices with a sensitivity of 81% to 92% and to enable pooled testing with a negative predictive value of 90% to 91%, reducing the required tests to about 63%. CONCLUSIONS SARS-CoV-2 testing predictive analytics and NLP successfully enabled data-driven patient advising and pooled testing.
Collapse
Affiliation(s)
- Stéphane M Meystre
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Paul M Heider
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Youngjun Kim
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Matthew Davis
- Information Solutions, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Jihad Obeid
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA
| | - James Madory
- Department of Pathology, Medical University of South Carolina, Charleston, South Carolina, USA
| | - Alexander V Alekseyenko
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, South Carolina, USA
| |
Collapse
|
14
|
McRae AD, Hohl CM, Rosychuk R, Vatanpour S, Ghaderi G, Archambault PM, Brooks SC, Cheng I, Davis P, Hayward J, Lang E, Ohle R, Rowe B, Welsford M, Yadav K, Morrison LJ, Perry J. CCEDRRN COVID-19 Infection Score (CCIS): development and validation in a Canadian cohort of a clinical risk score to predict SARS-CoV-2 infection in patients presenting to the emergency department with suspected COVID-19. BMJ Open 2021; 11:e055832. [PMID: 34857584 PMCID: PMC8640195 DOI: 10.1136/bmjopen-2021-055832] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVES To develop and validate a clinical risk score that can accurately quantify the probability of SARS-CoV-2 infection in patients presenting to an emergency department without the need for laboratory testing. DESIGN Cohort study of participants in the Canadian COVID-19 Emergency Department Rapid Response Network (CCEDRRN) registry. Regression models were fitted to predict a positive SARS-CoV-2 test result using clinical and demographic predictors, as well as an indicator of local SARS-CoV-2 incidence. SETTING 32 emergency departments in eight Canadian provinces. PARTICIPANTS 27 665 consecutively enrolled patients who were tested for SARS-CoV-2 in participating emergency departments between 1 March and 30 October 2020. MAIN OUTCOME MEASURES Positive SARS-CoV-2 nucleic acid test result within 14 days of an index emergency department encounter for suspected COVID-19 disease. RESULTS We derived a 10-item CCEDRRN COVID-19 Infection Score using data from 21 743 patients. This score included variables from history and physical examination and an indicator of local disease incidence. The score had a c-statistic of 0.838 with excellent calibration. We externally validated the rule in 5295 patients. The score maintained excellent discrimination and calibration and had superior performance compared with another previously published risk score. Score cut-offs were identified that can rule-in or rule-out SARS-CoV-2 infection without the need for nucleic acid testing with 97.4% sensitivity (95% CI 96.4 to 98.3) and 95.9% specificity (95% CI 95.5 to 96.0). CONCLUSIONS The CCEDRRN COVID-19 Infection Score uses clinical characteristics and publicly available indicators of disease incidence to quantify a patient's probability of SARS-CoV-2 infection. The score can identify patients at sufficiently high risk of SARS-CoV-2 infection to warrant isolation and empirical therapy prior to test confirmation while also identifying patients at sufficiently low risk of infection that they may not need testing. TRIAL REGISTRATION NUMBER NCT04702945.
Collapse
Affiliation(s)
- Andrew D McRae
- Department of Emergency Medicine, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Corinne M Hohl
- Department of Emergency Medicine, The University of British Columbia Faculty of Medicine, Vancouver, British Columbia, Canada
| | - Rhonda Rosychuk
- Department of Paediatrics, University of Alberta Faculty of Medicine & Dentistry, Edmonton, Alberta, Canada
| | - Shabnam Vatanpour
- Department of Emergency Medicine, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Gelareh Ghaderi
- Department of Emergency Medicine, The University of British Columbia Faculty of Medicine, Vancouver, British Columbia, Canada
| | - Patrick M Archambault
- Department of Emergency Medicine, Universite Laval Faculte de medecine, Quebec, Quebec, Canada
| | - Steven C Brooks
- Department of Emergency Medicine, Queen's University School of Medicine, Kingston, Ontario, Canada
| | - Ivy Cheng
- Department of Emergency Medicine, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Philip Davis
- Department of Emergency Medicine, University of Saskatchewan College of Medicine, Saskatoon, Saskatchewan, Canada
| | - Jake Hayward
- Department of Emergency Medicine, University of Alberta Faculty of Medicine & Dentistry, Edmonton, Alberta, Canada
| | - Eddy Lang
- Department of Emergency Medicine, University of Calgary Cumming School of Medicine, Calgary, Alberta, Canada
| | - Robert Ohle
- Department of Emergency Medicine, Northern Ontario School of Medicine, Thunder Bay, Ontario, Canada
| | - Brian Rowe
- Department of Emergency Medicine, University of Alberta Faculty of Medicine & Dentistry, Edmonton, Alberta, Canada
| | - Michelle Welsford
- Department of Emergency Medicine, McMaster University Faculty of Health Sciences, Hamilton, Ontario, Canada
| | - Krishan Yadav
- Department of Emergency Medicine, University of Ottawa Faculty of Medicine, Ottawa, Ontario, Canada
| | - Laurie J Morrison
- Department of Emergency Medicine, St Michael's Hospital, Toronto, Ontario, Canada
| | - Jeffrey Perry
- Department of Emergency Medicine, University of Ottawa Faculty of Medicine, Ottawa, Ontario, Canada
| |
Collapse
|
15
|
Dairi A, Harrou F, Sun Y. Deep Generative Learning-Based 1-SVM Detectors for Unsupervised COVID-19 Infection Detection Using Blood Tests. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT 2021; 71:2500211. [PMID: 35582656 PMCID: PMC8962827 DOI: 10.1109/tim.2021.3130675] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 10/03/2021] [Accepted: 11/08/2021] [Indexed: 05/02/2023]
Abstract
A sample blood test has recently become an important tool to help identify false-positive/false-negative real-time reverse transcription polymerase chain reaction (rRT-PCR) tests. Importantly, this is mainly because it is an inexpensive and handy option to detect the potential COVID-19 patients. However, this test should be conducted by certified laboratories, expensive equipment, and trained personnel, and 3-4 h are needed to deliver results. Furthermore, it has relatively large false-negative rates around 15%-20%. Consequently, an alternative and more accessible solution, quicker and less costly, is needed. This article introduces flexible and unsupervised data-driven approaches to detect the COVID-19 infection based on blood test samples. In other words, we address the problem of COVID-19 infection detection using a blood test as an anomaly detection problem through an unsupervised deep hybrid model. Essentially, we amalgamate the features extraction capability of the variational autoencoder (VAE) and the detection sensitivity of the one-class support vector machine (1SVM) algorithm. Two sets of routine blood tests samples from the Albert Einstein Hospital, S ao Paulo, Brazil, and the San Raffaele Hospital, Milan, Italy, are used to assess the performance of the investigated deep learning models. Here, missing values have been imputed based on a random forest regressor. Compared to generative adversarial networks (GANs), deep belief network (DBN), and restricted Boltzmann machine (RBM)-based 1SVM, the traditional VAE, GAN, DBN, and RBM with softmax layer as discriminator layer, and the standalone 1SVM, the proposed VAE-based 1SVM detector offers superior discrimination performance of potential COVID-19 infections. Results also revealed that the deep learning-driven 1SVM detection approaches provide promising detection performance compared to the conventional deep learning models.
Collapse
Affiliation(s)
- Abdelkader Dairi
- Université des Sciences et de la Technologie d’Oran Mohamed-Boudiaf (USTOMB)Oran31000Algérie
- Laboratoire des Technologies de l’Environnement (LTE)Ecole Nationale Polytechnique OranOran31000Algeria
| | - Fouzi Harrou
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) DivisionKing Abdullah University of Science and Technology (KAUST)Thuwal23955-6900Saudi Arabia
| | - Ying Sun
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) DivisionKing Abdullah University of Science and Technology (KAUST)Thuwal23955-6900Saudi Arabia
| |
Collapse
|
16
|
Çubukçu HC, Topcu Dİ, Bayraktar N, Gülşen M, Sarı N, Arslan AH. Detection of COVID-19 by Machine Learning Using Routine Laboratory Tests. Am J Clin Pathol 2021; 157:758-766. [PMID: 34791032 PMCID: PMC8690000 DOI: 10.1093/ajcp/aqab187] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 09/22/2021] [Indexed: 01/22/2023] Open
Abstract
Objectives The present study aimed to develop a clinical decision support tool to assist coronavirus disease 2019 (COVID-19) diagnoses with machine learning (ML) models using routine laboratory test results. Methods We developed ML models using laboratory data (n = 1,391) composed of six clinical chemistry (CC) results, 14 CBC parameter results, and results of a severe acute respiratory syndrome coronavirus 2 real-time reverse transcription–polymerase chain reaction as a gold standard method. Four ML algorithms, including random forest (RF), gradient boosting (XGBoost), support vector machine (SVM), and logistic regression, were used to build eight ML models using CBC and a combination of CC and CBC parameters. Performance evaluation was conducted on the test data set and external validation data set from Brazil. Results The accuracy values of all models ranged from 74% to 91%. The RF model trained from CC and CBC analytes showed the best performance on the present study’s data set (accuracy, 85.3%; sensitivity, 79.6%; specificity, 91.2%). The RF model trained from only CBC parameters detected COVID-19 cases with 82.8% accuracy. The best performance on the external validation data set belonged to the SVM model trained from CC and CBC parameters (accuracy, 91.18%; sensitivity, 100%; specificity, 84.21%). Conclusions ML models presented in this study can be used as clinical decision support tools to contribute to physicians’ clinical judgment for COVID-19 diagnoses.
Collapse
Affiliation(s)
- Hikmet Can Çubukçu
- Interdisciplinary Stem Cells and Regenerative Medicine, Ankara University Stem Cell Institute, Ankara, Turkey
| | - Deniz İlhan Topcu
- Departments of Medical Biochemistry and Clinical Microbiology, Başkent University Faculty of Medicine, Ankara, Turkey
| | - Nilüfer Bayraktar
- Departments of Medical Biochemistry and Clinical Microbiology, Başkent University Faculty of Medicine, Ankara, Turkey
| | - Murat Gülşen
- Department of Autism, Special Mental Needs and Rare Diseases Department, Turkish Ministry of Health, Ankara, Turkey
| | - Nuran Sarı
- Department of Infectious Diseases and Clinical Microbiology, Başkent University Faculty of Medicine, Ankara, Turkey
| | - Ayşe Hande Arslan
- Department of Infectious Diseases and Clinical Microbiology, Başkent University Faculty of Medicine, Ankara, Turkey
| |
Collapse
|
17
|
Baktash V, Hosack T, Rule R, Patel N, Kho J, Sekhar R, Mandal AKJ, Missouris CG. Development, evaluation and validation of machine learning algorithms to detect atypical and asymptomatic presentations of Covid-19 in hospital practice. QJM 2021; 114:496-501. [PMID: 34156436 DOI: 10.1093/qjmed/hcab172] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/05/2021] [Revised: 06/12/2021] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Diagnostic methods for Covid-19 have improved, both in speed and availability. Because of atypical and asymptomatic carriage of the virus and nosocomial spread within institutions, timely diagnosis remains a challenge. Machine learning models trained on blood test results have shown promise in identifying cases of Covid-19. AIMS To train and validate a machine learning model capable of differentiating Covid-19 positive from negative patients using routine blood tests and assess the model's accuracy against atypical and asymptomatic presentations. DESIGN AND METHODS We conducted a retrospective analysis of medical admissions to our institution during March and April 2020. Participants were categorized into Covid-19 positive or negative groups based on clinical, radiological features or nasopharyngeal swab. A machine learning model was trained on laboratory parameters and validated for accuracy, sensitivity and specificity and externally validated at an unconnected establishment. RESULTS An Ensemble Bagged Tree model was trained on data collected from 405 patients (212 Covid-19 positive) producing an accuracy of 81.79% (95% confidence interval (CI) 77.53-85.55%), the sensitivity of 85.85% (CI 80.42-90.24%) and specificity of 76.65% (CI 69.49-82.84%). Accuracy was preserved for atypical and asymptomatic subgroups. Using an external data set for 226 patients (141 Covid-19 positive) accuracy of 76.82% (CI 70.87-82.08%), sensitivity of 78.38% (CI 70.87-84.72%) and specificity of 74.12% (CI 63.48-83.01%) was achieved. CONCLUSION A machine learning model using routine laboratory parameters can detect atypical and asymptomatic presentations of Covid-19 and might be an adjunct to existing screening measures.
Collapse
Affiliation(s)
- V Baktash
- Department of Medicine, Wexham Park Hospital, Frimley Health NHS Foundation Trust, Wexham Street, Slough, Berkshire, SL2 4HL, UK
| | - T Hosack
- Department of Medicine, Stoke Mandeville Hospital, Mandeville Rd, Aylesbury, Buckinghamshire, HP21 8AL, UK
| | - R Rule
- Department of Medicine, Stoke Mandeville Hospital, Mandeville Rd, Aylesbury, Buckinghamshire, HP21 8AL, UK
| | - N Patel
- Department of Medicine, Wexham Park Hospital, Frimley Health NHS Foundation Trust, Wexham Street, Slough, Berkshire, SL2 4HL, UK
| | - J Kho
- Department of Medicine, Wexham Park Hospital, Frimley Health NHS Foundation Trust, Wexham Street, Slough, Berkshire, SL2 4HL, UK
| | - R Sekhar
- Department of Medicine, Stoke Mandeville Hospital, Mandeville Rd, Aylesbury, Buckinghamshire, HP21 8AL, UK
| | - A K J Mandal
- Department of Medicine, Wexham Park Hospital, Frimley Health NHS Foundation Trust, Wexham Street, Slough, Berkshire, SL2 4HL, UK
| | - C G Missouris
- Department of Medicine, Wexham Park Hospital, Frimley Health NHS Foundation Trust, Wexham Street, Slough, Berkshire, SL2 4HL, UK
- Department of Clinical Cardiology, University of Nicosia Medical School, 93 Agiou Nikolaou Street, Engomi 2408 Nicosia, Cyprus
| |
Collapse
|
18
|
Evaluation of Covid-19 Triage Assessment Scale in Patients Attending the Emergency Department. JOURNAL OF BASIC AND CLINICAL HEALTH SCIENCES 2021. [DOI: 10.30621/jbachs.959016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
19
|
Blagojević A, Šušteršič T, Lorencin I, Šegota SB, Anđelić N, Milovanović D, Baskić D, Baskić D, Petrović NZ, Sazdanović P, Car Z, Filipović N. Artificial intelligence approach towards assessment of condition of COVID-19 patients - Identification of predictive biomarkers associated with severity of clinical condition and disease progression. Comput Biol Med 2021; 138:104869. [PMID: 34547582 PMCID: PMC8438805 DOI: 10.1016/j.compbiomed.2021.104869] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 09/10/2021] [Accepted: 09/12/2021] [Indexed: 01/08/2023]
Abstract
BACKGROUND AND OBJECTIVES Although ML has been studied for different epidemiological and clinical issues as well as for survival prediction of COVID-19, there is a noticeable shortage of literature dealing with ML usage in prediction of disease severity changes through the course of the disease. In that way, predicting disease progression from mild towards moderate, severe and critical condition, would help not only to respond in a timely manner to prevent lethal results, but also to minimize the number of patients in hospitals where this is not necessary. METHODS We present a methodology for the classification of patients into 4 distinct categories of the clinical condition of COVID-19 disease. Classification of patients is based on the values of blood biomarkers that were assessed by Gradient boosting regressor and which were selected as biomarkers that have the greatest influence in the classification of patients with COVID-19. RESULTS The results show that among several tested algorithms, XGBoost classifier achieved best results with an average accuracy of 94% and an average F1-score of 94.3%. We have also extracted 10 best features from blood analysis that are strongly associated with patient condition and based on those features we can predict the severity of the clinical condition. CONCLUSIONS The main advantage of our system is that it is a decision tree-based algorithm which is easier to interpret, instead of the use of black box models, which are not appealing in medical practice.
Collapse
Affiliation(s)
- Anđela Blagojević
- University of Kragujevac, Faculty of Engineering, Sestre Janjić 6, 34000, Kragujevac, Serbia,Bioengineering Research and Development Center (BioIRC), Prvoslava Stojanovića 6, 34000, Kragujevac, Serbia
| | - Tijana Šušteršič
- University of Kragujevac, Faculty of Engineering, Sestre Janjić 6, 34000, Kragujevac, Serbia,Bioengineering Research and Development Center (BioIRC), Prvoslava Stojanovića 6, 34000, Kragujevac, Serbia
| | - Ivan Lorencin
- University of Rijeka, Faculty of Engineering, Vukovarska 58, 51000, Rijeka, Croatia
| | - Sandi Baressi Šegota
- University of Rijeka, Faculty of Engineering, Vukovarska 58, 51000, Rijeka, Croatia
| | - Nikola Anđelić
- University of Rijeka, Faculty of Engineering, Vukovarska 58, 51000, Rijeka, Croatia
| | - Dragan Milovanović
- Clinical Centre Kragujevac, Zmaj Jovina 30, 34000, Kragujevac, Serbia,University of Kragujevac, Faculty of Medical Sciences, Svetozara Markovića 69, 34000, Kragujevac, Serbia
| | - Danijela Baskić
- Clinical Centre Kragujevac, Zmaj Jovina 30, 34000, Kragujevac, Serbia
| | - Dejan Baskić
- University of Kragujevac, Faculty of Medical Sciences, Svetozara Markovića 69, 34000, Kragujevac, Serbia,Institute of Public Health Kragujevac, Nikole Pašića 1, 34000, Kragujevac, Serbia
| | - Nataša Zdravković Petrović
- Clinical Centre Kragujevac, Zmaj Jovina 30, 34000, Kragujevac, Serbia,University of Kragujevac, Faculty of Medical Sciences, Svetozara Markovića 69, 34000, Kragujevac, Serbia
| | - Predrag Sazdanović
- Clinical Centre Kragujevac, Zmaj Jovina 30, 34000, Kragujevac, Serbia,University of Kragujevac, Faculty of Medical Sciences, Svetozara Markovića 69, 34000, Kragujevac, Serbia
| | - Zlatan Car
- University of Rijeka, Faculty of Engineering, Vukovarska 58, 51000, Rijeka, Croatia
| | - Nenad Filipović
- University of Kragujevac, Faculty of Engineering, Sestre Janjić 6, 34000, Kragujevac, Serbia,Bioengineering Research and Development Center (BioIRC), Prvoslava Stojanovića 6, 34000, Kragujevac, Serbia,Corresponding author. Faculty of Engineering, University of Kragujevac, Sestre Janjić 6, 34000 Kragujevac, Serbia
| |
Collapse
|
20
|
Rapid antigen test to identify COVID-19 infected patients with and without symptoms admitted to the Emergency Department. Am J Emerg Med 2021; 51:92-97. [PMID: 34717211 PMCID: PMC8530784 DOI: 10.1016/j.ajem.2021.10.022] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 10/10/2021] [Accepted: 10/12/2021] [Indexed: 01/19/2023] Open
Abstract
Purpose Early detection of SARS-CoV-2 patients is essential to contain the pandemic and keep the hospital secure. The rapid antigen test seems to be a quick and easy diagnostic test to identify patients infected with SARS-CoV-2. To assess the possible role of the antigen test in the Emergency Department (ED) assessment of potential SARS-CoV-2 infection in both symptomatic and asymptomatic patients. Methods Between 1 July 2020 and 10 December 2020, all patients consecutively assessed in the ED for suspected COVID-19 symptoms or who required hospitalisation for a condition not associated with COVID-19 were subjected to a rapid antigen test and RT-PCR swab. The diagnostic accuracy of the antigen test was determined in comparison to the SARS-CoV-2 PCR test using contingency tables. The possible clinical benefit of the antigen test was globally evaluated through decision curve analysis (DCA). Results A total of 3899 patients were subjected to antigen tests and PCR swabs. The sensitivity, specificity and accuracy of the antigen test were 82.9%, 99.1% and 97.4% (Cohen's K = 0.854, 95% CI 0.826–0.882, p < 0.001), respectively. In symptomatic patients, sensitivity was found to be 89.8%, while in asymptomatic patients, sensitivity was 63.1%. DCA appears to confirm a net clinical benefit for the preliminary use of antigen tests. Conclusions The antigen test performed in the ED, though not ideal, can improve the overall identification of infected patients. While it appears to perform well in symptomatic patients, in asymptomatic patients, although it improves their management, it seems not to be definitive.
Collapse
|
21
|
Ortíz-Barrios MA, Coba-Blanco DM, Alfaro-Saíz JJ, Stand-González D. Process Improvement Approaches for Increasing the Response of Emergency Departments against the COVID-19 Pandemic: A Systematic Review. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:8814. [PMID: 34444561 PMCID: PMC8392152 DOI: 10.3390/ijerph18168814] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 08/15/2021] [Accepted: 08/17/2021] [Indexed: 12/23/2022]
Abstract
The COVID-19 pandemic has strongly affected the dynamics of Emergency Departments (EDs) worldwide and has accentuated the need for tackling different operational inefficiencies that decrease the quality of care provided to infected patients. The EDs continue to struggle against this outbreak by implementing strategies maximizing their performance within an uncertain healthcare environment. The efforts, however, have remained insufficient in view of the growing number of admissions and increased severity of the coronavirus disease. Therefore, the primary aim of this paper is to review the literature on process improvement interventions focused on increasing the ED response to the current COVID-19 outbreak to delineate future research lines based on the gaps detected in the practical scenario. Therefore, we applied the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to perform a review containing the research papers published between December 2019 and April 2021 using ISI Web of Science, Scopus, PubMed, IEEE, Google Scholar, and Science Direct databases. The articles were further classified taking into account the research domain, primary aim, journal, and publication year. A total of 65 papers disseminated in 51 journals were concluded to satisfy the inclusion criteria. Our review found that most applications have been directed towards predicting the health outcomes in COVID-19 patients through machine learning and data analytics techniques. In the overarching pandemic, healthcare decision makers are strongly recommended to integrate artificial intelligence techniques with approaches from the operations research (OR) and quality management domains to upgrade the ED performance under social-economic restrictions.
Collapse
Affiliation(s)
- Miguel Angel Ortíz-Barrios
- Department of Productivity and Innovation, Universidad de la Costa CUC, Barranquilla 081001, Colombia; (D.M.C.-B.); (D.S.-G.)
| | - Dayana Milena Coba-Blanco
- Department of Productivity and Innovation, Universidad de la Costa CUC, Barranquilla 081001, Colombia; (D.M.C.-B.); (D.S.-G.)
| | - Juan-José Alfaro-Saíz
- Research Centre on Production Management and Engineering, Universitat Politècnica de València, 46022 Valencia, Spain;
| | - Daniela Stand-González
- Department of Productivity and Innovation, Universidad de la Costa CUC, Barranquilla 081001, Colombia; (D.M.C.-B.); (D.S.-G.)
| |
Collapse
|
22
|
Dorn M, Grisci BI, Narloch PH, Feltes BC, Avila E, Kahmann A, Alho CS. Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets. PeerJ Comput Sci 2021; 7:e670. [PMID: 34458574 PMCID: PMC8372002 DOI: 10.7717/peerj-cs.670] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 07/20/2021] [Indexed: 06/13/2023]
Abstract
The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil's case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.
Collapse
Affiliation(s)
- Marcio Dorn
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
- Center of Biotechnology, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
- Forensic Science, National Institute of Science and Technology, Porto Alegre, RS, Brazil
| | - Bruno Iochins Grisci
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Pedro Henrique Narloch
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Bruno César Feltes
- Institute of Informatics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
- Department of Genetics, Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Eduardo Avila
- Forensic Science, National Institute of Science and Technology, Porto Alegre, RS, Brazil
- School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Alessandro Kahmann
- Institute of Mathematics, Statistics and Physics, Federal University of Rio Grande, Rio Grande, RS, Brazil
| | - Clarice Sampaio Alho
- Forensic Science, National Institute of Science and Technology, Porto Alegre, RS, Brazil
- School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, RS, Brazil
| |
Collapse
|
23
|
Cobre ADF, Stremel DP, Noleto GR, Fachi MM, Surek M, Wiens A, Tonin FS, Pontarolo R. Diagnosis and prediction of COVID-19 severity: can biochemical tests and machine learning be used as prognostic indicators? Comput Biol Med 2021; 134:104531. [PMID: 34091385 PMCID: PMC8164361 DOI: 10.1016/j.compbiomed.2021.104531] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/21/2021] [Accepted: 05/25/2021] [Indexed: 01/08/2023]
Abstract
OBJECTIVE This study aimed to implement and evaluate machine learning based-models to predict COVID-19' diagnosis and disease severity. METHODS COVID-19 test samples (positive or negative results) from patients who attended a single hospital were evaluated. Patients diagnosed with COVID-19 were categorised according to the severity of the disease. Data were submitted to exploratory analysis (principal component analysis, PCA) to detect outlier samples, recognise patterns, and identify important variables. Based on patients' laboratory tests results, machine learning models were implemented to predict disease positivity and severity. Artificial neural networks (ANN), decision trees (DT), partial least squares discriminant analysis (PLS-DA), and K nearest neighbour algorithm (KNN) models were used. The four models were validated based on the accuracy (area under the ROC curve). RESULTS The first subset of data had 5,643 patient samples (5,086 negatives and 557 positives for COVID-19). The second subset included 557 COVID-19 positive patients. The ANN, DT, PLS-DA, and KNN models allowed the classification of negative and positive samples with >84% accuracy. It was also possible to classify patients with severe and non-severe disease with an accuracy >86%. The following were associated with the prediction of COVID-19 diagnosis and severity: hyperferritinaemia, hypocalcaemia, pulmonary hypoxia, hypoxemia, metabolic and respiratory acidosis, low urinary pH, and high levels of lactate dehydrogenase. CONCLUSION Our analysis shows that all the models could assist in the diagnosis and prediction of COVID-19 severity.
Collapse
Affiliation(s)
| | - Dile Pontarolo Stremel
- Department of Forest Engineering and Technology, Universidade Federal Do Paraná, Curitiba, Brazil
| | | | - Mariana Millan Fachi
- Pharmaceutical Sciences Postgraduate Programme, Universidade Federal Do Paraná, Curitiba, Brazil
| | - Monica Surek
- Pharmaceutical Sciences Postgraduate Programme, Universidade Federal Do Paraná, Curitiba, Brazil
| | - Astrid Wiens
- Department of Pharmacy, Universidade Federal Do Paraná, Curitiba, Brazil
| | - Fernanda Stumpf Tonin
- Pharmaceutical Sciences Postgraduate Programme, Universidade Federal Do Paraná, Curitiba, Brazil
| | - Roberto Pontarolo
- Department of Pharmacy, Universidade Federal Do Paraná, Curitiba, Brazil,Corresponding author
| |
Collapse
|
24
|
Mamidi TKK, Tran-Nguyen TK, Melvin RL, Worthey EA. Development of An Individualized Risk Prediction Model for COVID-19 Using Electronic Health Record Data. Front Big Data 2021; 4:675882. [PMID: 34151259 PMCID: PMC8211871 DOI: 10.3389/fdata.2021.675882] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/19/2021] [Indexed: 11/13/2022] Open
Abstract
Developing an accurate and interpretable model to predict an individual's risk for Coronavirus Disease 2019 (COVID-19) is a critical step to efficiently triage testing and other scarce preventative resources. To aid in this effort, we have developed an interpretable risk calculator that utilized de-identified electronic health records (EHR) from the University of Alabama at Birmingham Informatics for Integrating Biology and the Bedside (UAB-i2b2) COVID-19 repository under the U-BRITE framework. The generated risk scores are analogous to commonly used credit scores where higher scores indicate higher risks for COVID-19 infection. By design, these risk scores can easily be calculated in spreadsheets or even with pen and paper. To predict risk, we implemented a Credit Scorecard modeling approach on longitudinal EHR data from 7,262 patients enrolled in the UAB Health System who were evaluated and/or tested for COVID-19 between January and June 2020. In this cohort, 912 patients were positive for COVID-19. Our workflow considered the timing of symptoms and medical conditions and tested the effects by applying different variable selection techniques such as LASSO and Elastic-Net. Within the two weeks before a COVID-19 diagnosis, the most predictive features were respiratory symptoms such as cough, abnormalities of breathing, pain in the throat and chest as well as other chronic conditions including nicotine dependence and major depressive disorder. When extending the timeframe to include all medical conditions across all time, our models also uncovered several chronic conditions impacting the respiratory, cardiovascular, central nervous and urinary organ systems. The whole pipeline of data processing, risk modeling and web-based risk calculator can be applied to any EHR data following the OMOP common data format. The results can be employed to generate questionnaires to estimate COVID-19 risk for screening in building entries or to optimize hospital resources.
Collapse
Affiliation(s)
- Tarun Karthik Kumar Mamidi
- Center for Computational Genomics and Data Science, Departments of Pediatrics and Pathology, University of Alabama at Birmingham School of Medicine, Birmingham, AL, United States
| | - Thi K. Tran-Nguyen
- Hugh Kaul Precision Medicine Institute, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Ryan L. Melvin
- Department of Anesthesiology and Perioperative Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Elizabeth A. Worthey
- Center for Computational Genomics and Data Science, Departments of Pediatrics and Pathology, University of Alabama at Birmingham School of Medicine, Birmingham, AL, United States
- Hugh Kaul Precision Medicine Institute, University of Alabama at Birmingham, Birmingham, AL, United States
| |
Collapse
|
25
|
Alballa N, Al-Turaiki I. Machine learning approaches in COVID-19 diagnosis, mortality, and severity risk prediction: A review. INFORMATICS IN MEDICINE UNLOCKED 2021; 24:100564. [PMID: 33842685 PMCID: PMC8018906 DOI: 10.1016/j.imu.2021.100564] [Citation(s) in RCA: 80] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 03/26/2021] [Accepted: 03/27/2021] [Indexed: 02/06/2023] Open
Abstract
The existence of widespread COVID-19 infections has prompted worldwide efforts to control and manage the virus, and hopefully curb it completely. One important line of research is the use of machine learning (ML) to understand and fight COVID-19. This is currently an active research field. Although there are already many surveys in the literature, there is a need to keep up with the rapidly growing number of publications on COVID-19-related applications of ML. This paper presents a review of recent reports on ML algorithms used in relation to COVID-19. We focus on the potential of ML for two main applications: diagnosis of COVID-19 and prediction of mortality risk and severity, using readily available clinical and laboratory data. Aspects related to algorithm types, training data sets, and feature selection are discussed. As we cover work published between January 2020 and January 2021, a few key points have come to light. The bulk of the machine learning algorithms used in these two applications are supervised learning algorithms. The established models are yet to be used in real-world implementations, and much of the associated research is experimental. The diagnostic and prognostic features discovered by ML models are consistent with results presented in the medical literature. A limitation of the existing applications is the use of imbalanced data sets that are prone to selection bias.
Collapse
Affiliation(s)
- Norah Alballa
- Computer Science Department, College of Computer and Information Sciences, King Saud University, Saudi Arabia
| | - Isra Al-Turaiki
- Information Technology Department, College of Computer and Information Sciences, King Saud University, Saudi Arabia
| |
Collapse
|
26
|
Syeda HB, Syed M, Sexton KW, Syed S, Begum S, Syed F, Prior F, Yu F. Role of Machine Learning Techniques to Tackle the COVID-19 Crisis: Systematic Review. JMIR Med Inform 2021; 9:e23811. [PMID: 33326405 PMCID: PMC7806275 DOI: 10.2196/23811] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 10/27/2020] [Accepted: 11/15/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND SARS-CoV-2, the novel coronavirus responsible for COVID-19, has caused havoc worldwide, with patients presenting a spectrum of complications that have pushed health care experts to explore new technological solutions and treatment plans. Artificial Intelligence (AI)-based technologies have played a substantial role in solving complex problems, and several organizations have been swift to adopt and customize these technologies in response to the challenges posed by the COVID-19 pandemic. OBJECTIVE The objective of this study was to conduct a systematic review of the literature on the role of AI as a comprehensive and decisive technology to fight the COVID-19 crisis in the fields of epidemiology, diagnosis, and disease progression. METHODS A systematic search of PubMed, Web of Science, and CINAHL databases was performed according to PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines to identify all potentially relevant studies published and made available online between December 1, 2019, and June 27, 2020. The search syntax was built using keywords specific to COVID-19 and AI. RESULTS The search strategy resulted in 419 articles published and made available online during the aforementioned period. Of these, 130 publications were selected for further analyses. These publications were classified into 3 themes based on AI applications employed to combat the COVID-19 crisis: Computational Epidemiology, Early Detection and Diagnosis, and Disease Progression. Of the 130 studies, 71 (54.6%) focused on predicting the COVID-19 outbreak, the impact of containment policies, and potential drug discoveries, which were classified under the Computational Epidemiology theme. Next, 40 of 130 (30.8%) studies that applied AI techniques to detect COVID-19 by using patients' radiological images or laboratory test results were classified under the Early Detection and Diagnosis theme. Finally, 19 of the 130 studies (14.6%) that focused on predicting disease progression, outcomes (ie, recovery and mortality), length of hospital stay, and number of days spent in the intensive care unit for patients with COVID-19 were classified under the Disease Progression theme. CONCLUSIONS In this systematic review, we assembled studies in the current COVID-19 literature that utilized AI-based methods to provide insights into different COVID-19 themes. Our findings highlight important variables, data types, and available COVID-19 resources that can assist in facilitating clinical and translational research.
Collapse
Affiliation(s)
- Hafsa Bareen Syeda
- Department of Neurology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Mahanazuddin Syed
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Kevin Wayne Sexton
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
- Department of Surgery, University of Arkansas for Medical Sciences, Little Rock, AR, United States
- Department of Health Policy and Management, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Shorabuddin Syed
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Salma Begum
- Department of Information Technology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Farhanuddin Syed
- College of Medicine, Shadan Institute of Medical Sciences, Hyderabad, India
| | - Fred Prior
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
- Department of Radiology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Feliciano Yu
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| |
Collapse
|
27
|
Yang HS, Hou Y, Vasovic LV, Steel PAD, Chadburn A, Racine-Brzostek SE, Velu P, Cushing MM, Loda M, Kaushal R, Zhao Z, Wang F. Routine Laboratory Blood Tests Predict SARS-CoV-2 Infection Using Machine Learning. Clin Chem 2020; 66:1396-1404. [PMID: 32821907 PMCID: PMC7499540 DOI: 10.1093/clinchem/hvaa200] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 08/12/2020] [Indexed: 01/08/2023]
Abstract
Background Accurate diagnostic strategies to rapidly identify SARS-CoV-2 positive individuals for management of patient care and protection of health care personnel are urgently needed. The predominant diagnostic test is viral RNA detection by RT-PCR from nasopharyngeal swabs specimens, however the results are not promptly obtainable in all patient care locations. Routine laboratory testing, in contrast, is readily available with a turn-around time (TAT) usually within 1-2 hours. Method We developed a machine learning model incorporating patient demographic features (age, sex, race) with 27 routine laboratory tests to predict an individual’s SARS-CoV-2 infection status. Laboratory test results obtained within two days before the release of SARS-CoV-2-RT-PCR result were used to train a gradient boosted decision tree (GBDT) model from 3,356 SARS-CoV-2 RT-PCR tested patients (1,402 positive and 1,954 negative) evaluated at a metropolitan hospital. Results The model achieved an area under the receiver operating characteristic curve (AUC) of 0.854 (95% CI: 0.829-0.878). Application of this model to an independent patient dataset from a separate hospital resulted in a comparable AUC (0.838), validating the generalization of its use. Moreover, our model predicted initial SARS-CoV-2 RT-PCR positivity in 66% individuals whose RT-PCR result changed from negative to positive within two days. Conclusion This model employing routine laboratory test results offers opportunities for early and rapid identification of high-risk SARS-CoV-2 infected patients before their RT-PCR results are available. It may play an important role in assisting the identification of SARS-COV-2 infected patients in areas where RT-PCR testing is not accessible due to financial or supply constraints.
Collapse
Affiliation(s)
- He S Yang
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.,New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Yu Hou
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Ljiljana V Vasovic
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.,New York-Presbyterian Hospital, Lower Manhattan Hospital, New York, NY
| | - Peter A D Steel
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY.,Department of Emergency Medicine, Weill Cornell Medicine, New York, NY
| | - Amy Chadburn
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.,New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Sabrina E Racine-Brzostek
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.,New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Priya Velu
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.,New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Melissa M Cushing
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.,New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Massimo Loda
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.,New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Rainu Kaushal
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY.,Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Zhen Zhao
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY.,New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| |
Collapse
|
28
|
Yang HS, Hou Y, Vasovic LV, Steel PAD, Chadburn A, Racine-Brzostek SE, Velu P, Cushing MM, Loda M, Kaushal R, Zhao Z, Wang F. Routine Laboratory Blood Tests Predict SARS-CoV-2 Infection Using Machine Learning. Clin Chem 2020; 66:1396-1404. [PMID: 32821907 DOI: 10.1101/2020.06.17.20133892] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Accepted: 08/12/2020] [Indexed: 05/21/2023]
Abstract
BACKGROUND Accurate diagnostic strategies to identify SARS-CoV-2 positive individuals rapidly for management of patient care and protection of health care personnel are urgently needed. The predominant diagnostic test is viral RNA detection by RT-PCR from nasopharyngeal swabs specimens, however the results are not promptly obtainable in all patient care locations. Routine laboratory testing, in contrast, is readily available with a turn-around time (TAT) usually within 1-2 hours. METHOD We developed a machine learning model incorporating patient demographic features (age, sex, race) with 27 routine laboratory tests to predict an individual's SARS-CoV-2 infection status. Laboratory testing results obtained within 2 days before the release of SARS-CoV-2 RT-PCR result were used to train a gradient boosting decision tree (GBDT) model from 3,356 SARS-CoV-2 RT-PCR tested patients (1,402 positive and 1,954 negative) evaluated at a metropolitan hospital. RESULTS The model achieved an area under the receiver operating characteristic curve (AUC) of 0.854 (95% CI: 0.829-0.878). Application of this model to an independent patient dataset from a separate hospital resulted in a comparable AUC (0.838), validating the generalization of its use. Moreover, our model predicted initial SARS-CoV-2 RT-PCR positivity in 66% individuals whose RT-PCR result changed from negative to positive within 2 days. CONCLUSION This model employing routine laboratory test results offers opportunities for early and rapid identification of high-risk SARS-CoV-2 infected patients before their RT-PCR results are available. It may play an important role in assisting the identification of SARS-CoV-2 infected patients in areas where RT-PCR testing is not accessible due to financial or supply constraints.
Collapse
Affiliation(s)
- He S Yang
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Yu Hou
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Ljiljana V Vasovic
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY
- New York-Presbyterian Hospital, Lower Manhattan Hospital, New York, NY
| | - Peter A D Steel
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
- Department of Emergency Medicine, Weill Cornell Medicine, New York, NY
| | - Amy Chadburn
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Sabrina E Racine-Brzostek
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Priya Velu
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Melissa M Cushing
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Massimo Loda
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Rainu Kaushal
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Zhen Zhao
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY
- New York-Presbyterian Hospital, Weill Cornell Medicine, New York, NY
| | - Fei Wang
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| |
Collapse
|
29
|
Cabitza F, Campagner A, Ferrari D, Di Resta C, Ceriotti D, Sabetta E, Colombini A, De Vecchi E, Banfi G, Locatelli M, Carobene A. Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests. Clin Chem Lab Med 2020; 59:421-431. [PMID: 33079698 DOI: 10.1515/cclm-2020-1294] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 10/07/2020] [Indexed: 02/07/2023]
Abstract
Objectives The rRT-PCR test, the current gold standard for the detection of coronavirus disease (COVID-19), presents with known shortcomings, such as long turnaround time, potential shortage of reagents, false-negative rates around 15-20%, and expensive equipment. The hematochemical values of routine blood exams could represent a faster and less expensive alternative. Methods Three different training data set of hematochemical values from 1,624 patients (52% COVID-19 positive), admitted at San Raphael Hospital (OSR) from February to May 2020, were used for developing machine learning (ML) models: the complete OSR dataset (72 features: complete blood count (CBC), biochemical, coagulation, hemogasanalysis and CO-Oxymetry values, age, sex and specific symptoms at triage) and two sub-datasets (COVID-specific and CBC dataset, 32 and 21 features respectively). 58 cases (50% COVID-19 positive) from another hospital, and 54 negative patients collected in 2018 at OSR, were used for internal-external and external validation. Results We developed five ML models: for the complete OSR dataset, the area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.83 to 0.90; for the COVID-specific dataset from 0.83 to 0.87; and for the CBC dataset from 0.74 to 0.86. The validations also achieved good results: respectively, AUC from 0.75 to 0.78; and specificity from 0.92 to 0.96. Conclusions ML can be applied to blood tests as both an adjunct and alternative method to rRT-PCR for the fast and cost-effective identification of COVID-19-positive patients. This is especially useful in developing countries, or in countries facing an increase in contagions.
Collapse
Affiliation(s)
| | - Andrea Campagner
- IRCCS Istituto Ortopedico Galeazzi, Laboratory of Clinical Chemistry and Microbiology, Milan, Italy
| | | | - Chiara Di Resta
- Vita-Salute San Raffaele University; Unit of Genomics for Human Disease Diagnosis, Division of Genetics and Cell Biology, Milan, Italy
| | - Daniele Ceriotti
- Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Eleonora Sabetta
- Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Alessandra Colombini
- IRCCS Istituto Ortopedico Galeazzi, Laboratory of Clinical Chemistry and Microbiology, Milan, Italy
| | - Elena De Vecchi
- IRCCS Istituto Ortopedico Galeazzi, Laboratory of Clinical Chemistry and Microbiology, Milan, Italy
| | - Giuseppe Banfi
- IRCCS Istituto Ortopedico Galeazzi, Laboratory of Clinical Chemistry and Microbiology, Milan, Italy
| | - Massimo Locatelli
- Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Anna Carobene
- Laboratory Medicine, IRCCS San Raffaele Scientific Institute, Milan, Italy
| |
Collapse
|
30
|
AlJame M, Ahmad I, Imtiaz A, Mohammed A. Ensemble learning model for diagnosing COVID-19 from routine blood tests. INFORMATICS IN MEDICINE UNLOCKED 2020; 21:100449. [PMID: 33102686 PMCID: PMC7572278 DOI: 10.1016/j.imu.2020.100449] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Revised: 09/28/2020] [Accepted: 10/07/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND AND OBJECTIVES The pandemic of novel coronavirus disease 2019 (COVID-19) has severely impacted human society with a massive death toll worldwide. There is an urgent need for early and reliable screening of COVID-19 patients to provide better and timely patient care and to combat the spread of the disease. In this context, recent studies have reported some key advantages of using routine blood tests for initial screening of COVID-19 patients. In this article, first we present a review of the emerging techniques for COVID-19 diagnosis using routine laboratory and/or clinical data. Then, we propose ERLX which is an ensemble learning model for COVID-19 diagnosis from routine blood tests. METHOD The proposed model uses three well-known diverse classifiers, extra trees, random forest and logistic regression, which have different architectures and learning characteristics at the first level, and then combines their predictions by using a second level extreme gradient boosting (XGBoost) classifier to achieve a better performance. For data preparation, the proposed methodology employs a KNNImputer algorithm to handle null values in the dataset, isolation forest (iForest) to remove outlier data, and a synthetic minority oversampling technique (SMOTE) to balance data distribution. For model interpretability, features importance are reported by using the SHapley Additive exPlanations (SHAP) technique. RESULTS The proposed model was trained and evaluated by using a publicly available data set from Albert Einstein Hospital in Brazil, which consisted of 5644 data samples with 559 confirmed COVID-19 cases. The ensemble model achieved outstanding performance with an overall accuracy of 99.88% [95% CI: 99.6-100], AUC of 99.38% [95% CI: 97.5-100], a sensitivity of 98.72% [95% CI: 94.6-100] and a specificity of 99.99% [95% CI: 99.99-100]. DISCUSSION The proposed model revealed better performance when compared against existing state-of-the-art studies (Banerjee et al., 2020; de Freitas Barbosa et al., 2020; de Moraes Batista et al., 2020; Soares et al., 2020) [3,22,56,71] for the same set of features employed by them. As compared to the best performing Bayes Net model (de Freitas Barbosa et al., 2020) [22] average accuracy of 95.159%, ERLX achieved an average accuracy of 99.94%. In comparison with AUC of 85% reported by the SVM model (de Moraes Batista et al., 2020) [56], ERLX obtained AUC of 99.77% in addition to improvements in sensitivity, and specificity. As compared with ER-COV model (Soares et al., 2020) [71] average sensitivity of 70.25% and specificity of 85.98%, ERLX model achieved sensitivity of 99.47% and specificity of 99.99%. The ERLX model obtained a considerably higher score as compared with ANN model (Banerjee et al., 2020) [3] in all performance metrics. Therefore, the model presented is robust and can be deployed for reliable early and rapid screening of COVID-19 patients.
Collapse
Affiliation(s)
- Maryam AlJame
- Computer Engineering Department, Kuwait University, Kuwait
| | - Imtiaz Ahmad
- Computer Engineering Department, Kuwait University, Kuwait
| | | | - Ameer Mohammed
- Computer Engineering Department, Kuwait University, Kuwait
| |
Collapse
|