1
|
Lourenço AA, Amaral PHR, Paim AAO, Marques GF, Gomes-de-Pontes L, da Mata CPSM, da Fonseca FG, Pérez JCG, Coelho-dos-Reis JGA. Algorithms for predicting COVID outcome using ready-to-use laboratorial and clinical data. Front Public Health 2024; 12:1347334. [PMID: 38807995 PMCID: PMC11130428 DOI: 10.3389/fpubh.2024.1347334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/30/2024] [Indexed: 05/30/2024] Open
Abstract
The pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an emerging crisis affecting the public health system. The clinical features of COVID-19 can range from an asymptomatic state to acute respiratory syndrome and multiple organ dysfunction. Although some hematological and biochemical parameters are altered during moderate and severe COVID-19, there is still a lack of tools to combine these parameters to predict the clinical outcome of a patient with COVID-19. Thus, this study aimed at employing hematological and biochemical parameters of patients diagnosed with COVID-19 in order to build machine learning algorithms for predicting COVID mortality or survival. Patients included in the study had a diagnosis of SARS-CoV-2 infection confirmed by RT-PCR and biochemical and hematological measurements were performed in three different time points upon hospital admission. Among the parameters evaluated, the ones that stand out the most are the important features of the T1 time point (urea, lymphocytes, glucose, basophils and age), which could be possible biomarkers for the severity of COVID-19 patients. This study shows that urea is the parameter that best classifies patient severity and rises over time, making it a crucial analyte to be used in machine learning algorithms to predict patient outcome. In this study optimal and medically interpretable machine learning algorithms for outcome prediction are presented for each time point. It was found that urea is the most paramount variable for outcome prediction over all three time points. However, the order of importance of other variables changes for each time point, demonstrating the importance of a dynamic approach for an effective patient's outcome prediction. All in all, the use of machine learning algorithms can be a defining tool for laboratory monitoring and clinical outcome prediction, which may bring benefits to public health in future pandemics with newly emerging and reemerging SARS-CoV-2 variants of concern.
Collapse
Affiliation(s)
- Alice Aparecida Lourenço
- Laboratório de Virologia Básica e Aplicada, Instituto de Ciências Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | | | - Adriana Alves Oliveira Paim
- Laboratório de Virologia Básica e Aplicada, Instituto de Ciências Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Geovane Ferreira Marques
- Laboratório de Virologia Básica e Aplicada, Instituto de Ciências Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Leticia Gomes-de-Pontes
- Laboratório de Virologia Básica e Aplicada, Instituto de Ciências Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | | | - Flávio Guimarães da Fonseca
- Laboratório de Virologia Básica e Aplicada, Instituto de Ciências Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- CT Vacinas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Juan Carlos González Pérez
- Departamento de Física, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Jordana Grazziela Alves Coelho-dos-Reis
- Laboratório de Virologia Básica e Aplicada, Instituto de Ciências Biológicas, Departamento de Microbiologia, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
2
|
Er AG, Ding DY, Er B, Uzun M, Cakmak M, Sadee C, Durhan G, Ozmen MN, Tanriover MD, Topeli A, Aydin Son Y, Tibshirani R, Unal S, Gevaert O. Multimodal data fusion using sparse canonical correlation analysis and cooperative learning: a COVID-19 cohort study. NPJ Digit Med 2024; 7:117. [PMID: 38714751 PMCID: PMC11076490 DOI: 10.1038/s41746-024-01128-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/25/2024] [Indexed: 05/10/2024] Open
Abstract
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
Collapse
Affiliation(s)
- Ahmet Gorkem Er
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 06800, Ankara, Turkey.
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey.
| | - Daisy Yi Ding
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Berrin Er
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mertcan Uzun
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mehmet Cakmak
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Christoph Sadee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Gamze Durhan
- Department of Radiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mustafa Nasuh Ozmen
- Department of Radiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mine Durusu Tanriover
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Arzu Topeli
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Yesim Aydin Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 06800, Ankara, Turkey
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Serhat Unal
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
3
|
Yakimovich A. Toward the novel AI tasks in infection biology. mSphere 2024; 9:e0059123. [PMID: 38334404 PMCID: PMC10900907 DOI: 10.1128/msphere.00591-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2024] Open
Abstract
Machine learning and artificial intelligence (AI) are becoming more common in infection biology laboratories around the world. Yet, as they gain traction in research, novel frontiers arise. Novel artificial intelligence algorithms are capable of addressing advanced tasks like image generation and question answering. However, similar algorithms can prove useful in addressing advanced questions in infection biology like prediction of host-pathogen interactions or inferring virus protein conformations. Addressing such tasks requires large annotated data sets, which are often scarce in biomedical research. In this review, I bring together several successful examples where such tasks were addressed. I underline the importance of formulating novel AI tasks in infection biology accompanied by freely available benchmark data sets to address these tasks. Furthermore, I discuss the current state of the field and potential future trends. I argue that one such trend involves AI tools becoming more versatile.
Collapse
Affiliation(s)
- Artur Yakimovich
- Center for Advanced Systems Understanding (CASUS), Görlitz, Germany
- Helmholtz-Zentrum Dresden-Rossendorf e. V. (HZDR), Dresden, Germany
- Department of Renal Medicine, Division of Medicine, Bladder Infection and Immunity Group (BIIG), University College London, Royal Free Hospital Campus, London, United Kingdom
- Artificial Intelligence for Life Sciences CIC, Dorset, United Kingdom
- Institute of Computer Science, University of Wroclaw, Wroclaw, Poland
| |
Collapse
|
4
|
Viderman D, Kotov A, Popov M, Abdildin Y. Machine and deep learning methods for clinical outcome prediction based on physiological data of COVID-19 patients: a scoping review. Int J Med Inform 2024; 182:105308. [PMID: 38091862 DOI: 10.1016/j.ijmedinf.2023.105308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/20/2023] [Accepted: 12/03/2023] [Indexed: 01/07/2024]
Abstract
INTRODUCTION Since the beginning of the COVID-19 pandemic, numerous machine and deep learning (MDL) methods have been proposed in the literature to analyze patient physiological data. The objective of this review is to summarize various aspects of these methods and assess their practical utility for predicting various clinical outcomes. METHODS We searched PubMed, Scopus, and Cochrane Library, screened and selected the studies matching the inclusion criteria. The clinical analysis focused on the characteristics of the patient cohorts in the studies included in this review, the specific tasks in the context of the COVID-19 pandemic that machine and deep learning methods were used for, and their practical limitations. The technical analysis focused on the details of specific MDL methods and their performance. RESULTS Analysis of the 48 selected studies revealed that the majority (∼54 %) of them examined the application of MDL methods for the prediction of survival/mortality-related patient outcomes, while a smaller fraction (∼13 %) of studies also examined applications to the prediction of patients' physiological outcomes and hospital resource utilization. 21 % of the studies examined the application of MDL methods to multiple clinical tasks. Machine and deep learning methods have been shown to be effective at predicting several outcomes of COVID-19 patients, such as disease severity, complications, intensive care unit (ICU) transfer, and mortality. MDL methods also achieved high accuracy in predicting the required number of ICU beds and ventilators. CONCLUSION Machine and deep learning methods have been shown to be valuable tools for predicting disease severity, organ dysfunction and failure, patient outcomes, and hospital resource utilization during the COVID-19 pandemic. The discovered knowledge and our conclusions and recommendations can also be useful to healthcare professionals and artificial intelligence researchers in managing future pandemics.
Collapse
Affiliation(s)
- Dmitriy Viderman
- Department of Surgery, School of Medicine, Nazarbayev University, Astana, Kazakhstan; Department of Anesthesiology, Intensive Care, and Pain Medicine, National Research Oncology Center, Astana, Kazakhstan.
| | - Alexander Kotov
- Department of Computer Science, College of Engineering, Wayne State University, Detroit, USA.
| | - Maxim Popov
- Department of Computer Science, School of Engineering and Digital Sciences, Nazarbayev University, Astana, Kazakhstan.
| | - Yerkin Abdildin
- Department of Mechanical and Aerospace Engineering, School of Engineering and Digital Sciences, Nazarbayev University, Astana, Kazakhstan.
| |
Collapse
|
5
|
Ahmad I, Amelio A, Merla A, Scozzari F. A survey on the role of artificial intelligence in managing Long COVID. Front Artif Intell 2024; 6:1292466. [PMID: 38274052 PMCID: PMC10808521 DOI: 10.3389/frai.2023.1292466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open
Abstract
In the last years, several techniques of artificial intelligence have been applied to data from COVID-19. In addition to the symptoms related to COVID-19, many individuals with SARS-CoV-2 infection have described various long-lasting symptoms, now termed Long COVID. In this context, artificial intelligence techniques have been utilized to analyze data from Long COVID patients in order to assist doctors and alleviate the considerable strain on care and rehabilitation facilities. In this paper, we explore the impact of the machine learning methodologies that have been applied to analyze the many aspects of Long COVID syndrome, from clinical presentation through diagnosis. We also include the text mining techniques used to extract insights and trends from large amounts of text data related to Long COVID. Finally, we critically compare the various approaches and outline the work that has to be done to create a robust artificial intelligence approach for efficient diagnosis and treatment of Long COVID.
Collapse
Affiliation(s)
- Ijaz Ahmad
- Department of Human, Legal and Economic Sciences, Telematic University “Leonardo da Vinci”, Chieti, Italy
| | - Alessia Amelio
- Department of Engineering and Geology, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| | - Arcangelo Merla
- Department of Engineering and Geology, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| | - Francesca Scozzari
- Laboratory of Computational Logic and Artificial Intelligence, Department of Economic Studies, University “G. d'Annunzio” Chieti-Pescara, Pescara, Italy
| |
Collapse
|
6
|
Er AG, Ding DY, Er B, Uzun M, Cakmak M, Sadee C, Durhan G, Ozmen MN, Tanriover MD, Topeli A, Son YA, Tibshirani R, Unal S, Gevaert O. Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19. RESEARCH SQUARE 2023:rs.3.rs-3569833. [PMID: 38045288 PMCID: PMC10690316 DOI: 10.21203/rs.3.rs-3569833/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
Collapse
Affiliation(s)
- Ahmet Gorkem Er
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Daisy Yi Ding
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Berrin Er
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mertcan Uzun
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mehmet Cakmak
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Christoph Sadee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Gamze Durhan
- Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mustafa Nasuh Ozmen
- Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mine Durusu Tanriover
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Arzu Topeli
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Yesim Aydin Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Serhat Unal
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|