1
|
Nirmalarajah K, Aftanas P, Barati S, Chien E, Crowl G, Faheem A, Farooqi L, Jamal AJ, Khan S, Kotwa JD, Li AX, Mozafarihashjin M, Nasir JA, Shigayeva A, Yim W, Yip L, Zhong XZ, Katz K, Kozak R, McArthur AG, Daneman N, Maguire F, McGeer AJ, Duvvuri VR, Mubareka S. Identification of patient demographic, clinical, and SARS-CoV-2 genomic factors associated with severe COVID-19 using supervised machine learning: a retrospective multicenter study. BMC Infect Dis 2025; 25:132. [PMID: 39875869 DOI: 10.1186/s12879-025-10450-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2024] [Accepted: 01/06/2025] [Indexed: 01/30/2025] Open
Abstract
BACKGROUND Drivers of COVID-19 severity are multifactorial and include multidimensional and potentially interacting factors encompassing viral determinants and host-related factors (i.e., demographics, pre-existing conditions and/or genetics), thus complicating the prediction of clinical outcomes for different severe acute respiratory syndrome coronavirus (SARS-CoV-2) variants. Although millions of SARS-CoV-2 genomes have been publicly shared in global databases, linkages with detailed clinical data are scarce. Therefore, we aimed to establish a COVID-19 patient dataset with linked clinical and viral genomic data to then examine associations between SARS-CoV-2 genomic signatures and clinical disease phenotypes. METHODS A cohort of adult patients with laboratory confirmed SARS-CoV-2 from 11 participating healthcare institutions in the Greater Toronto Area (GTA) were recruited from March 2020 to April 2022. Supervised machine learning (ML) models were developed to predict hospitalization using SARS-CoV-2 lineage-specific genomic signatures, patient demographics, symptoms, and pre-existing comorbidities. The relative importance of these features was then evaluated. RESULTS Complete clinical data and viral whole genome level information were obtained from 617 patients, 50.4% of whom were hospitalized. Notably, inpatients were older with a mean age of 66.67 years (SD ± 17.64 years), whereas outpatients had a mean age of 44.89 years (SD ± 16.00 years). SHapley Additive exPlanations (SHAP) analyses revealed that underlying vascular disease, underlying pulmonary disease, and fever were the most significant clinical features associated with hospitalization. In models built on the amino acid sequences of functional regions including spike, nucleocapsid, ORF3a, and ORF8 proteins, variants preceding the emergence of variants of concern (VOCs) or pre-VOC variants, were associated with hospitalization. CONCLUSIONS Viral genomic features have limited utility in predicting hospitalization across SARS-CoV-2 diversity. Combining clinical and viral genomic datasets provides perspective on patient specific and virus-related factors that impact COVID-19 disease severity. Overall, clinical features had greater discriminatory power than viral genomic features in predicting hospitalization.
Collapse
Affiliation(s)
- Kuganya Nirmalarajah
- Sunnybrook Research Institute, Toronto, ON, Canada
- Public Health Ontario, 661 University Avenue, Toronto, ON, Canada
- Department of Laboratory Medicine & Pathobiology, University of Toronto, Toronto, ON, Canada
| | | | | | - Emily Chien
- Sunnybrook Research Institute, Toronto, ON, Canada
| | | | | | | | | | - Saman Khan
- Sinai Health System, Toronto, ON, Canada
| | | | - Angel X Li
- Sinai Health System, Toronto, ON, Canada
| | | | - Jalees A Nasir
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | | | - Winfield Yim
- Sunnybrook Research Institute, Toronto, ON, Canada
| | - Lily Yip
- Sunnybrook Research Institute, Toronto, ON, Canada
| | | | - Kevin Katz
- Shared Hospital Laboratory, Toronto, ON, Canada
- North York General Hospital, Toronto, ON, Canada
| | - Robert Kozak
- Sunnybrook Research Institute, Toronto, ON, Canada
- Shared Hospital Laboratory, Toronto, ON, Canada
- Department of Laboratory Medicine & Pathobiology, University of Toronto, Toronto, ON, Canada
| | - Andrew G McArthur
- Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, ON, Canada
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| | - Nick Daneman
- Sunnybrook Research Institute, Toronto, ON, Canada
| | - Finlay Maguire
- Sunnybrook Research Institute, Toronto, ON, Canada
- Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada
- Department of Community Health & Epidemiology, Faculty of Medicine, Dalhousie University, Halifax, NS, Canada
| | - Allison J McGeer
- Sinai Health System, Toronto, ON, Canada
- Department of Laboratory Medicine & Pathobiology, University of Toronto, Toronto, ON, Canada
| | - Venkata R Duvvuri
- Public Health Ontario, 661 University Avenue, Toronto, ON, Canada.
- Department of Laboratory Medicine & Pathobiology, University of Toronto, Toronto, ON, Canada.
- Laboratory for Industrial and Applied Mathematics, Department of Mathematics and Statistics, York University, Toronto, ON, Canada.
| | - Samira Mubareka
- Sunnybrook Research Institute, Toronto, ON, Canada.
- Shared Hospital Laboratory, Toronto, ON, Canada.
- Department of Laboratory Medicine & Pathobiology, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
2
|
Dorsey AF. Urbanization and Infectious Disease. Am J Hum Biol 2025; 37:e24197. [PMID: 39605171 DOI: 10.1002/ajhb.24197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 11/20/2024] [Accepted: 11/21/2024] [Indexed: 11/29/2024] Open
Abstract
The United Nations currently estimates that over half of the global population has lived in cities since 2017 and that this proportion is continuing to grow, particularly in the Global South. While urbanization is not new, increased population density combined with accelerating rates of (re)emerging and noncommunicable diseases as well as growing economic disparities has created new challenges to human health and well-being. Here, I examine peri-urban communities, peripheral settlements on the edges of urban areas populated by rural people, and argue that these areas are often overlooked, despite becoming increasingly common. Thus, human biologists should move beyond categorizing these spaces as transitional. Using unplanned, peri-urban communities around Lima, Peru as a case study, I detail the complexity of political ecological factors that impact infectious disease risk and rates in peri-urban communities. Using disease mechanisms, I demonstrate the importance of a biocultural approach and a political ecology perspective when investigating infectious disease. I highlight how human biologists and anthropologists are uniquely positioned to explore the heterogeneity of infectious disease patterns and pathways in an increasingly urbanized world.
Collapse
Affiliation(s)
- Achsah F Dorsey
- Department of Anthropology, University of Massachusetts, Amherst, Massachusetts, USA
| |
Collapse
|
3
|
Cai G, Szalai EÁ, Martinekova P, Li X, Qian X, Veres DS, Péterfi Z, Biswakarma J, Nagy R, Mikó A, Ábrahám S, Erőss B, Hegyi P, Szentesi A. Concomitant virus infection increases mortality and worsens outcome of acute pancreatitis: A systematic review and meta-analysis. Pancreatology 2024:S1424-3903(24)00832-9. [PMID: 39690099 DOI: 10.1016/j.pan.2024.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 11/22/2024] [Accepted: 12/05/2024] [Indexed: 12/19/2024]
Abstract
BACKGROUND Acute pancreatitis (AP) is a major health threat, with a high mortality rate in severe forms. Though alcohol and bile-induced factors are the most common causes, increasing evidence suggests that viral infections such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and human immunodeficiency virus (HIV) may also trigger AP development. Our study aims to explore this association in greater detail. METHODS After the PROSPERO registration, we systematically searched PubMed, Embase, Cochrane Library, China Science and Technology Journal Database, China National Knowledge Infrastructure, and Wanfang Data Knowledge Service Platform in February 2023. We included studies with the following PECO framework: Population: AP patients, Exposure/Comparison: with/without virus infection, Outcome: mortality, severity, and complications of AP. Pooled odds ratios (OR) were calculated with 95 % confidence intervals (CIs). RESULTS Altogether, 29 cohorts with 2,295,172 patients were identified for the meta-analysis and 858 cases for the qualitative synthesis. Patients with concurrent SARS-CoV-2 infection and AP exhibited heightened odds of in-hospital mortality (OR: 3.15, CI: 2.08-4.76), and necrosis (OR: 1.83, CI: 1.13-2.97). Mild AP was less prevalent in the SARS-CoV-2 group (OR: 0.37, CI: 0.14-0.97) compared to moderately severe and severe AP together. Contrarily, no evidence was found that concomitant HIV infection elevated in-hospital mortality (OR: 1.12, CI: 0.92-1.37) or sepsis occurrence (OR:1.21, CI: 0.41-3.59). CONCLUSION Patients co-diagnosed with AP and SARS-CoV-2 infection require heightened attention due to an increased risk of mortality and complications. No evidence was found that HIV infection elevated the risk of a more severe outcome.
Collapse
Affiliation(s)
- Gefu Cai
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary
| | - Eszter Ágnes Szalai
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Department of Restorative Dentistry and Endodontics, Semmelweis University, Budapest, Hungary
| | | | - Ximeng Li
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary
| | - Xinyi Qian
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Department of Prosthodontics, Semmelweis University, Budapest, Hungary
| | - Dániel Sándor Veres
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Department of Biophysics and Radiation Biology, Semmelweis University, Budapest, Hungary
| | - Zoltán Péterfi
- Department of Infectology, First Department of Medicine, Medical School, University of Pécs, Pécs, Hungary
| | | | - Rita Nagy
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Institute for Translational Medicine, Medical School, University of Pécs, Pécs, Hungary; Heim Pál National Pediatric Institute, Budapest, Hungary
| | - Alexandra Mikó
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Institute for Translational Medicine, Medical School, University of Pécs, Pécs, Hungary; Department of Medical Genetics, Medical School, University of Pécs, Pécs, Hungary
| | - Szabolcs Ábrahám
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Department of Surgery, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Bálint Erőss
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Institute for Translational Medicine, Medical School, University of Pécs, Pécs, Hungary; Institute of Pancreatic Diseases, Semmelweis University, Budapest, Hungary
| | - Péter Hegyi
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Institute for Translational Medicine, Medical School, University of Pécs, Pécs, Hungary; Institute of Pancreatic Diseases, Semmelweis University, Budapest, Hungary; Translational Pancreatology Research Group, Interdisciplinary Centre of Excellence for Research Development and Innovation, University of Szeged, Szeged, Hungary
| | - Andrea Szentesi
- Centre for Translational Medicine, Semmelweis University, Budapest, Hungary; Institute for Translational Medicine, Medical School, University of Pécs, Pécs, Hungary.
| |
Collapse
|
4
|
Miao M, Ma Y, Tan J, Chen R, Men K. Enhanced predictability and interpretability of COVID-19 severity based on SARS-CoV-2 genomic diversity: a comprehensive study encompassing four years of data. Sci Rep 2024; 14:26992. [PMID: 39506014 PMCID: PMC11541897 DOI: 10.1038/s41598-024-78493-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Accepted: 10/31/2024] [Indexed: 11/08/2024] Open
Abstract
Despite the end of the global Coronavirus Disease 2019 (COVID-19) pandemic, the risk factors for COVID-19 severity continue to be a pivotal area of research. Specifically, studying the impact of the genomic diversity of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) on COVID-19 severity is crucial for predicting severe outcomes. Therefore, this study aimed to investigate the impact of the SARS-CoV-2 genome sequence, genotype, patient age, gender, and vaccination status on the severity of COVID-19, and to develop accurate and robust prediction models. The training set (n = 12,038), primary testing set (n = 4,006), and secondary testing set (n = 2,845) consist of SARS-CoV-2 genome sequences with patient information, which were obtained from Global Initiative on Sharing all Individual Data (GISAID) spanning over four years. Four machine learning methods were employed to construct prediction models. By extracting SARS-CoV-2 genomic features, optimizing model parameters, and integrating models, this study improved the prediction accuracy. Furthermore, Shapley Additive exPlanes (SHAP) was applied to analyze the interpretability of the model and to identify risk factors, providing insights for the management of severe cases. The proposed ensemble model achieved an F-score of 88.842% and an Area Under the Curve (AUC) of 0.956 on the global testing dataset. In addition to factors such as patient age, gender, and vaccination status, over 40 amino acid site mutation characteristics were identified to have a significant impact on the severity of COVID-19. This work has the potential to facilitate the early identification of COVID-19 patients with high risks of severe illness, thus effectively reducing the rates of severe cases and mortality.
Collapse
Affiliation(s)
- Miao Miao
- School of Public Health, Xi'an Medical University, Xi'an, 710021, Shaanxi, China
| | - Yonghong Ma
- School of Public Health, Xi'an Medical University, Xi'an, 710021, Shaanxi, China
| | - Jiao Tan
- School of Public Health, Xi'an Medical University, Xi'an, 710021, Shaanxi, China
| | - Renjuan Chen
- School of Public Health, Xi'an Medical University, Xi'an, 710021, Shaanxi, China
| | - Ke Men
- School of Public Health, Xi'an Medical University, Xi'an, 710021, Shaanxi, China.
| |
Collapse
|
5
|
Er AG, Ding DY, Er B, Uzun M, Cakmak M, Sadee C, Durhan G, Ozmen MN, Tanriover MD, Topeli A, Aydin Son Y, Tibshirani R, Unal S, Gevaert O. Multimodal data fusion using sparse canonical correlation analysis and cooperative learning: a COVID-19 cohort study. NPJ Digit Med 2024; 7:117. [PMID: 38714751 PMCID: PMC11076490 DOI: 10.1038/s41746-024-01128-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 04/25/2024] [Indexed: 05/10/2024] Open
Abstract
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
Collapse
Affiliation(s)
- Ahmet Gorkem Er
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 06800, Ankara, Turkey.
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey.
| | - Daisy Yi Ding
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Berrin Er
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mertcan Uzun
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mehmet Cakmak
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Christoph Sadee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Gamze Durhan
- Department of Radiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mustafa Nasuh Ozmen
- Department of Radiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Mine Durusu Tanriover
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Arzu Topeli
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Yesim Aydin Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, 06800, Ankara, Turkey
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Serhat Unal
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, 06230, Ankara, Turkey
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
6
|
Er AG, Ding DY, Er B, Uzun M, Cakmak M, Sadee C, Durhan G, Ozmen MN, Tanriover MD, Topeli A, Son YA, Tibshirani R, Unal S, Gevaert O. Multimodal Biomedical Data Fusion Using Sparse Canonical Correlation Analysis and Cooperative Learning: A Cohort Study on COVID-19. RESEARCH SQUARE 2023:rs.3.rs-3569833. [PMID: 38045288 PMCID: PMC10690316 DOI: 10.21203/rs.3.rs-3569833/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.
Collapse
Affiliation(s)
- Ahmet Gorkem Er
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Daisy Yi Ding
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| | - Berrin Er
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mertcan Uzun
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mehmet Cakmak
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Christoph Sadee
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Gamze Durhan
- Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mustafa Nasuh Ozmen
- Department of Radiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Mine Durusu Tanriover
- Department of Internal Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Arzu Topeli
- Department of Internal Medicine, Division of Intensive Care Medicine, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Yesim Aydin Son
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, 06800, Türkiye
| | - Robert Tibshirani
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
- Department of Statistics, Stanford University, Stanford, CA, 94305, USA
| | - Serhat Unal
- Department of Infectious Diseases and Clinical Microbiology, Hacettepe University Faculty of Medicine, Ankara, 06230, Türkiye
| | - Olivier Gevaert
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA, 94305, USA
| |
Collapse
|
7
|
Kidambi Raju S, Ramaswamy S, Eid MM, Gopalan S, Karim FK, Marappan R, Khafaga DS. Evaluation of Mutual Information and Feature Selection for SARS-CoV-2 Respiratory Infection. Bioengineering (Basel) 2023; 10:880. [PMID: 37508907 PMCID: PMC10376564 DOI: 10.3390/bioengineering10070880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/01/2023] [Accepted: 07/13/2023] [Indexed: 07/30/2023] Open
Abstract
This study aims to develop a predictive model for SARS-CoV-2 using machine-learning techniques and to explore various feature selection methods to enhance the accuracy of predictions. A precise forecast of the SARS-CoV-2 respiratory infections spread can help with efficient planning and resource allocation. The proposed model utilizes stochastic regression to capture the virus transmission's stochastic nature, considering data uncertainties. Feature selection techniques are employed to identify the most relevant and informative features contributing to prediction accuracy. Furthermore, the study explores the use of neighbor embedding and Sammon mapping algorithms to visualize high-dimensional SARS-CoV-2 respiratory infection data in a lower-dimensional space, enabling better interpretation and understanding of the underlying patterns. The application of machine-learning techniques for predicting SARS-CoV-2 respiratory infections, the use of statistical measures in healthcare, including confirmed cases, deaths, and recoveries, and an analysis of country-wise dynamics of the pandemic using machine-learning models are used. Our analysis involves the performance of various algorithms, including neural networks (NN), decision trees (DT), random forests (RF), the Adam optimizer (AD), hyperparameters (HP), stochastic regression (SR), neighbor embedding (NE), and Sammon mapping (SM). A pre-processed and feature-extracted SARS-CoV-2 respiratory infection dataset is combined with ADHPSRNESM to form a new orchestration in the proposed model for a perfect prediction to increase the precision of accuracy. The findings of this research can contribute to public health efforts by enabling policymakers and healthcare professionals to make informed decisions based on accurate predictions, ultimately aiding in managing and controlling the SARS-CoV-2 pandemic.
Collapse
Affiliation(s)
| | | | - Marwa M Eid
- Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura 11152, Egypt
| | | | - Faten Khalid Karim
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Raja Marappan
- School of Computing, SASTRA Deemed University, Thanjavur 613401, India
| | - Doaa Sami Khafaga
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| |
Collapse
|
8
|
Gauthier S, Tran-Dinh A, Morilla I. Plasma proteome dynamics of COVID-19 severity learnt by a graph convolutional network of multi-scale topology. Life Sci Alliance 2023; 6:e202201624. [PMID: 36806094 PMCID: PMC9941303 DOI: 10.26508/lsa.202201624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 02/06/2023] [Accepted: 02/06/2023] [Indexed: 02/22/2023] Open
Abstract
Efforts to understand the molecular mechanisms of COVID-19 have led to the identification of ACE2 as the main receptor for the SARS-CoV-2 spike protein on cell surfaces. However, there are still important questions about the role of other proteins in disease progression. To address these questions, we modelled the plasma proteome of 384 COVID-19 patients using protein level measurements taken at three different times and incorporating comprehensive clinical evaluation data collected 28 d after hospitalisation. Our analysis can accurately assess the severity of the illness using a metric based on WHO scores. By using topological vectorisation, we identified proteins that vary most in expression based on disease severity, and then utilised these findings to construct a graph convolutional network. This dynamic model allows us to learn the molecular interactions between these proteins, providing a tool to determine the severity of a COVID-19 infection at an early stage and identify potential pharmacological treatments by studying the dynamic interactions between the most relevant proteins.
Collapse
Affiliation(s)
- Samy Gauthier
- Université Sorbonne Paris Nord, LAGA, CNRS, UMR 7539, Laboratoire d'excellence Inflamex, Villetaneuse, France
| | - Alexy Tran-Dinh
- Département d'anesthésie-Réanimation, INSERM, Université de Paris, AP-HP, Hôpital Bichat Claude Bernard, Paris, France
- Université de Paris, LVTS, Inserm U1148, Paris, France
| | - Ian Morilla
- Université Sorbonne Paris Nord, LAGA, CNRS, UMR 7539, Laboratoire d'excellence Inflamex, Villetaneuse, France
- Department of Genetics, University of Malaga, MLiMO, Málaga, Spain
| |
Collapse
|
9
|
Alquraan L, Alzoubi KH, Rababa'h SY. Mutations of SARS-CoV-2 and their impact on disease diagnosis and severity. INFORMATICS IN MEDICINE UNLOCKED 2023; 39:101256. [PMID: 37131549 PMCID: PMC10127666 DOI: 10.1016/j.imu.2023.101256] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Revised: 04/22/2023] [Accepted: 04/24/2023] [Indexed: 05/04/2023] Open
Abstract
Numerous variations of the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), including D614G, B.1.1.7 (United Kingdom), B.1.1.28 (Brazil P1, P2), CAL.20C (Southern California), B.1.351 (South Africa), B.1.617 (B.1.617.1 Kappa & Delta B.1.617.2) and B.1.1.529, have been reported worldwide. The receptor-binding domain (RBD) of the spike (S) protein is involved in virus-cell binding, where virus-neutralizing antibodies (NAbs) react. Novel variants in the S-protein could maximize viral affinity for the human angiotensin-converting enzyme 2 (ACE2) receptor and increase virus transmission. Molecular detection with false-negative results may refer to mutations in the part of the virus's genome used for virus diagnosis. Furthermore, these changes in S-protein structure alter the neutralizing ability of NAbs, resulting in a reduction in vaccine efficiency. Further information is needed to evaluate how new mutations may affect vaccine efficacy.
Collapse
Affiliation(s)
- Laiali Alquraan
- Department of Biology, Faculty of Science, Yarmouk University, Irbid, Jordan
| | - Karem H Alzoubi
- Department of Pharmacy Practice and Pharmacotherapeutics, College of Pharmacy, University of Sharjah, Sharjah, United Arab Emirates
- Department of Clinical Pharmacy, Faculty of Pharmacy, Jordan University of Science and Technology, Irbid, Jordan
| | - Suzie Y Rababa'h
- Department of Medical Science, Irbid Faculty, Al-Balqa Applied University (BAU), Irbid, Jordan
| |
Collapse
|
10
|
Sokhansanj BA, Zhao Z, Rosen GL. Interpretable and Predictive Deep Neural Network Modeling of the SARS-CoV-2 Spike Protein Sequence to Predict COVID-19 Disease Severity. BIOLOGY 2022; 11:1786. [PMID: 36552295 PMCID: PMC9774807 DOI: 10.3390/biology11121786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 11/28/2022] [Accepted: 12/05/2022] [Indexed: 12/13/2022]
Abstract
Through the COVID-19 pandemic, SARS-CoV-2 has gained and lost multiple mutations in novel or unexpected combinations. Predicting how complex mutations affect COVID-19 disease severity is critical in planning public health responses as the virus continues to evolve. This paper presents a novel computational framework to complement conventional lineage classification and applies it to predict the severe disease potential of viral genetic variation. The transformer-based neural network model architecture has additional layers that provide sample embeddings and sequence-wide attention for interpretation and visualization. First, training a model to predict SARS-CoV-2 taxonomy validates the architecture's interpretability. Second, an interpretable predictive model of disease severity is trained on spike protein sequence and patient metadata from GISAID. Confounding effects of changing patient demographics, increasing vaccination rates, and improving treatment over time are addressed by including demographics and case date as independent input to the neural network model. The resulting model can be interpreted to identify potentially significant virus mutations and proves to be a robust predctive tool. Although trained on sequence data obtained entirely before the availability of empirical data for Omicron, the model can predict the Omicron's reduced risk of severe disease, in accord with epidemiological and experimental data.
Collapse
Affiliation(s)
- Bahrad A. Sokhansanj
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|