1
|
Alfonsi T, Bernasconi A, Chiara M, Ceri S. Data-driven recombination detection in viral genomes. Nat Commun 2024; 15:3313. [PMID: 38632281 PMCID: PMC11024102 DOI: 10.1038/s41467-024-47464-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 03/25/2024] [Indexed: 04/19/2024] Open
Abstract
Recombination is a key molecular mechanism for the evolution and adaptation of viruses. The first recombinant SARS-CoV-2 genomes were recognized in 2021; as of today, more than ninety SARS-CoV-2 lineages are designated as recombinant. In the wake of the COVID-19 pandemic, several methods for detecting recombination in SARS-CoV-2 have been proposed; however, none could faithfully confirm manual analyses by experts in the field. We hereby present RecombinHunt, an original data-driven method for the identification of recombinant genomes, capable of recognizing recombinant SARS-CoV-2 genomes (or lineages) with one or two breakpoints with high accuracy and within reduced turn-around times. ReconbinHunt shows high specificity and sensitivity, compares favorably with other state-of-the-art methods, and faithfully confirms manual analyses by experts. RecombinHunt identifies recombinant viral genomes from the recent monkeypox epidemic in high concordance with manually curated analyses by experts, suggesting that our approach is robust and can be applied to any epidemic/pandemic virus.
Collapse
Affiliation(s)
- Tommaso Alfonsi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy
| | - Anna Bernasconi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy.
| | - Matteo Chiara
- Department of Biosciences, Università degli Studi di Milano, Via Celoria 26, 20133, Milan, Italy
| | - Stefano Ceri
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy
| |
Collapse
|
2
|
Drake KO, Boyd O, Franceschi VB, Colquhoun RM, Ellaby NAF, Volz EM. Phylogenomic early warning signals for SARS-CoV-2 epidemic waves. EBioMedicine 2024; 100:104939. [PMID: 38194742 PMCID: PMC10792554 DOI: 10.1016/j.ebiom.2023.104939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 12/11/2023] [Accepted: 12/12/2023] [Indexed: 01/11/2024] Open
Abstract
BACKGROUND Epidemic waves of coronavirus disease 2019 (COVID-19) infections have often been associated with the emergence of novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants. Rapid detection of growing genomic variants can therefore serve as a predictor of future waves, enabling timely implementation of countermeasures such as non-pharmaceutical interventions (social distancing), additional vaccination (booster campaigns), or healthcare capacity adjustments. The large amount of SARS-CoV-2 genomic sequence data produced during the pandemic has provided a unique opportunity to explore the utility of these data for generating early warning signals (EWS). METHODS We developed an analytical pipeline (Transmission Fitness Polymorphism Scanner - designated in an R package mrc-ide/tfpscanner) for systematically exploring all clades within a SARS-CoV-2 virus phylogeny to detect variants showing unusually high growth rates. We investigated the use of these cluster growth rates as the basis for a variety of statistical time series to use as leading indicators for the epidemic waves in the UK during the pandemic between August 2020 and March 2022. We also compared the performance of these phylogeny-derived leading indicators with a range of non-phylogeny-derived leading indicators. Our experiments simulated data generation and real-time analysis. FINDINGS Using phylogenomic analysis, we identified leading indicators that would have generated EWS ahead of significant increases in COVID-19 hospitalisations in the UK between August 2020 and March 2022. Our results also show that EWS lead time is sensitive to the threshold set for the number of false positive (FP) EWS. It is often possible to generate longer EWS lead times if more FP EWS are tolerated. On the basis of maximising lead time and minimising the number of FP EWS, the best performing leading indicators that we identified, amongst a set of 1.4 million, were the maximum logistic growth rate (LGR) amongst clusters of the dominant Pango lineage and the mean simple LGR across a broader set of clusters. In the case of the former, the time between the EWS and wave inflection points (a conservative measure of wave start dates) for the seven waves ranged between a 20-day lead time and a 7-day lag, with a mean lead time of 5.4 days. The maximum number of FP EWS generated prior to a true positive (TP) EWS was two and this only occurred for two of the seven waves in the period. The mean simple LGR amongst a broader set of clusters also performed well in terms of lead time but with slightly more FP EWS. INTERPRETATION As a result of the significant surveillance effort during the pandemic, early detection of SARS-CoV-2 variants of concern Alpha, Delta, and Omicron provided some of the first examples where timely detection and characterisation of pathogen variants has been used to tailor public health response. The success of our method in generating early warning signals based on phylogenomic analysis for SARS-CoV-2 in the UK may make it a worthwhile addition to existing surveillance strategies. In addition, the method may be translatable to other countries and/or regions, and to other pathogens with large-scale and rapid genomic surveillance. FUNDING This research was funded in whole, or in part, by the Wellcome Trust (220885_Z_20_Z). For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. KOD, OB, VBF and EMV acknowledge funding from the MRC Centre for Global Infectious Disease Analysis (reference MR/X020258/1), jointly funded by the UK Medical Research Council (MRC) and the UK Foreign, Commonwealth & Development Office (FCDO), under the MRC/FCDO Concordat agreement and is also part of the EDCTP2 programme supported by the European Union. RMC acknowledges funding from the Wellcome Trust Collaborators Award (206298/Z/17/Z).
Collapse
Affiliation(s)
- Kieran O Drake
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom.
| | - Olivia Boyd
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Vinicius B Franceschi
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| | - Rachel M Colquhoun
- Institute of Evolutionary Biology, Ashworth Laboratories, University of Edinburgh, Edinburgh, United Kingdom
| | | | - Erik M Volz
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, United Kingdom
| |
Collapse
|
3
|
Heron EA, Valle G, Bernasconi A. Editorial: Identification of phenotypically important genomic variants. FRONTIERS IN BIOINFORMATICS 2023; 3:1328945. [PMID: 38025396 PMCID: PMC10668015 DOI: 10.3389/fbinf.2023.1328945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 11/03/2023] [Indexed: 12/01/2023] Open
Affiliation(s)
| | - Giorgio Valle
- Department of Biology, Università di Padova, Padova, Italy
| | - Anna Bernasconi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy
| |
Collapse
|
4
|
Pinoli P, Canakoglu A, Ceri S, Chiara M, Ferrandi E, Minotti L, Bernasconi A. VariantHunter: a method and tool for fast detection of emerging SARS-CoV-2 variants. Database (Oxford) 2023; 2023:baad044. [PMID: 37410916 PMCID: PMC10325486 DOI: 10.1093/database/baad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 05/31/2023] [Accepted: 06/16/2023] [Indexed: 07/08/2023]
Abstract
With the progression of the COVID-19 pandemic, large datasets of SARS-CoV-2 genome sequences were collected to closely monitor the evolution of the virus and identify the novel variants/strains. By analyzing genome sequencing data, health authorities can 'hunt' novel emerging variants of SARS-CoV-2 as early as possible, and then monitor their evolution and spread. We designed VariantHunter, a highly flexible and user-friendly tool for systematically monitoring the evolution of SARS-CoV-2 at global and regional levels. In VariantHunter, amino acid changes are analyzed over an interval of 4 weeks in an arbitrary geographical area (continent, country, or region); for every week in the interval, the prevalence is computed and changes are ranked based on their increase or decrease in prevalence. VariantHunter supports two main types of analysis: lineage-independent and lineage-specific. The former considers all the available data and aims to discover new viral variants. The latter evaluates specific lineages/viral variants to identify novel candidate designations (sub-lineages and sub-variants). Both analyses use simple statistics and visual representations (diffusion charts and heatmaps) to track viral evolution. A dataset explorer allows users to visualize available data and refine their selection. VariantHunter is a web application free to all users. The two types of supported analysis (lineage-independent and lineage-specific) allow user-friendly monitoring of the viral evolution, empowering genomic surveillance without requiring any computational background. Database URL http://gmql.eu/variant_hunter/.
Collapse
Affiliation(s)
- Pietro Pinoli
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, Milan 20133, Italy
| | - Arif Canakoglu
- Department of Anesthesia, Critical Care and Emergency, Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Via Francesco Sforza 28, Milan 20122, Italy
| | - Stefano Ceri
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, Milan 20133, Italy
| | - Matteo Chiara
- Department of Biosciences, University of Milan, Via Celoria 26, Milan 20133, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, via Amendola 122/O, Bari 70126, Italy
| | - Erika Ferrandi
- Department of Biosciences, University of Milan, Via Celoria 26, Milan 20133, Italy
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnology, Consiglio Nazionale delle Ricerche, via Amendola 122/O, Bari 70126, Italy
| | - Luca Minotti
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, Milan 20133, Italy
| | - Anna Bernasconi
- Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, Milan 20133, Italy
| |
Collapse
|
5
|
Ren H, Ling Y, Cao R, Wang Z, Li Y, Huang T. Early warning of emerging infectious diseases based on multimodal data. BIOSAFETY AND HEALTH 2023; 5:S2590-0536(23)00074-5. [PMID: 37362865 PMCID: PMC10245235 DOI: 10.1016/j.bsheal.2023.05.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 05/18/2023] [Accepted: 05/31/2023] [Indexed: 06/28/2023] Open
Abstract
The coronavirus disease 2019 (COVID-19) pandemic has dramatically increased the awareness of emerging infectious diseases. The advancement of multiomics analysis technology has resulted in the development of several databases containing virus information. Several scientists have integrated existing data on viruses to construct phylogenetic trees and predict virus mutation and transmission in different ways, providing prospective technical support for epidemic prevention and control. This review summarized the databases of known emerging infectious viruses and techniques focusing on virus variant forecasting and early warning. It focuses on the multi-dimensional information integration and database construction of emerging infectious viruses, virus mutation spectrum construction and variant forecast model, analysis of the affinity between mutation antigen and the receptor, propagation model of virus dynamic evolution, and monitoring and early warning for variants. As people have suffered from COVID-19 and repeated flu outbreaks, we focused on the research results of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and influenza viruses. This review comprehensively viewed the latest virus research and provided a reference for future virus prevention and control research.
Collapse
Affiliation(s)
- Haotian Ren
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yunchao Ling
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Ruifang Cao
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Zhen Wang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Yixue Li
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
- School of Life Science, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310024 China
- Guangzhou Laboratory, Guangzhou 510005, China
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
- Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai 200433, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| |
Collapse
|
6
|
Huang Q, Qiu H, Bible PW, Huang Y, Zheng F, Gu J, Sun J, Hao Y, Liu Y. Early detection of SARS-CoV-2 variants through dynamic co-mutation network surveillance. Front Public Health 2023; 11:1015969. [PMID: 36755900 PMCID: PMC9901361 DOI: 10.3389/fpubh.2023.1015969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 01/02/2023] [Indexed: 01/25/2023] Open
Abstract
Background Precise public health and clinical interventions for the COVID-19 pandemic has spurred a global rush on SARS-CoV-2 variant tracking, but current approaches to variant tracking are challenged by the flood of viral genome sequences leading to a loss of timeliness, accuracy, and reliability. Here, we devised a new co-mutation network framework, aiming to tackle these difficulties in variant surveillance. Methods To avoid simultaneous input and modeling of the whole large-scale data, we dynamically investigate the nucleotide covarying pattern of weekly sequences. The community detection algorithm is applied to a co-occurring genomic alteration network constructed from mutation corpora of weekly collected data. Co-mutation communities are identified, extracted, and characterized as variant markers. They contribute to the creation and weekly updates of a community-based variant dictionary tree representing SARS-CoV-2 evolution, where highly similar ones between weeks have been merged to represent the same variants. Emerging communities imply the presence of novel viral variants or new branches of existing variants. This process was benchmarked with worldwide GISAID data and validated using national level data from six COVID-19 hotspot countries. Results A total of 235 co-mutation communities were identified after a 120 weeks' investigation of worldwide sequence data, from March 2020 to mid-June 2022. The dictionary tree progressively developed from these communities perfectly recorded the time course of SARS-CoV-2 branching, coinciding with GISAID clades. The time-varying prevalence of these communities in the viral population showed a good match with the emergence and circulation of the variants they represented. All these benchmark results not only exhibited the methodology features but also demonstrated high efficiency in detection of the pandemic variants. When it was applied to regional variant surveillance, our method displayed significantly earlier identification of feature communities of major WHO-named SARS-CoV-2 variants in contrast with Pangolin's monitoring. Conclusion An efficient genomic surveillance framework built from weekly co-mutation networks and a dynamic community-based variant dictionary tree enables early detection and continuous investigation of SARS-CoV-2 variants overcoming genomic data flood, aiding in the response to the COVID-19 pandemic.
Collapse
Affiliation(s)
- Qiang Huang
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Huining Qiu
- Guangdong Artificial Intelligence Machine Vision Engineering Technology Research Center, Guangzhou, China
| | - Paul W. Bible
- College of Arts and Sciences, Marian University, Indianapolis, IN, United States
| | - Yong Huang
- Institute of Public Health, Guangzhou Medical University & Guangzhou Center for Disease Control and Prevention, Guangzhou, China
| | - Fangfang Zheng
- School of Traditional Chinese Medicine Healthcare, Guangdong Food and Drug Vocational College, Guangzhou, China
| | - Jing Gu
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Jian Sun
- Department of Clinical Research, The Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China,*Correspondence: Jian Sun ✉
| | - Yuantao Hao
- Peking University Center for Public Health and Epidemic Preparedness & Response, Beijing, China,Yuantao Hao ✉
| | - Yu Liu
- Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou, China,Yu Liu ✉
| |
Collapse
|
7
|
Serna García G, Al Khalaf R, Invernici F, Ceri S, Bernasconi A. CoVEffect: interactive system for mining the effects of SARS-CoV-2 mutations and variants based on deep learning. Gigascience 2022; 12:giad036. [PMID: 37222749 PMCID: PMC10205000 DOI: 10.1093/gigascience/giad036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 04/11/2023] [Accepted: 04/27/2023] [Indexed: 05/25/2023] Open
Abstract
BACKGROUND Literature about SARS-CoV-2 widely discusses the effects of variations that have spread in the past 3 years. Such information is dispersed in the texts of several research articles, hindering the possibility of practically integrating it with related datasets (e.g., millions of SARS-CoV-2 sequences available to the community). We aim to fill this gap, by mining literature abstracts to extract-for each variant/mutation-its related effects (in epidemiological, immunological, clinical, or viral kinetics terms) with labeled higher/lower levels in relation to the nonmutated virus. RESULTS The proposed framework comprises (i) the provisioning of abstracts from a COVID-19-related big data corpus (CORD-19) and (ii) the identification of mutation/variant effects in abstracts using a GPT2-based prediction model. The above techniques enable the prediction of mutations/variants with their effects and levels in 2 distinct scenarios: (i) the batch annotation of the most relevant CORD-19 abstracts and (ii) the on-demand annotation of any user-selected CORD-19 abstract through the CoVEffect web application (http://gmql.eu/coveffect), which assists expert users with semiautomated data labeling. On the interface, users can inspect the predictions and correct them; user inputs can then extend the training dataset used by the prediction model. Our prototype model was trained through a carefully designed process, using a minimal and highly diversified pool of samples. CONCLUSIONS The CoVEffect interface serves for the assisted annotation of abstracts, allowing the download of curated datasets for further use in data integration or analysis pipelines. The overall framework can be adapted to resolve similar unstructured-to-structured text translation tasks, which are typical of biomedical domains.
Collapse
Affiliation(s)
- Giuseppe Serna García
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Ruba Al Khalaf
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Francesco Invernici
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Stefano Ceri
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| | - Anna Bernasconi
- Dipartimento di Informazione, Elettronica e Bioingegneria, 20133 Milano Country: Italy, Italy
| |
Collapse
|
8
|
Bernasconi A, Guizzardi G, Pastor O, Storey VC. Semantic interoperability: ontological unpacking of a viral conceptual model. BMC Bioinformatics 2022; 23:491. [PMID: 36396980 PMCID: PMC9672571 DOI: 10.1186/s12859-022-05022-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 10/29/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Genomics and virology are unquestionably important, but complex, domains being investigated by a large number of scientists. The need to facilitate and support work within these domains requires sharing of databases, although it is often difficult to do so because of the different ways in which data is represented across the databases. To foster semantic interoperability, models are needed that provide a deep understanding and interpretation of the concepts in a domain, so that the data can be consistently interpreted among researchers. RESULTS In this research, we propose the use of conceptual models to support semantic interoperability among databases and assess their ontological clarity to support their effective use. This modeling effort is illustrated by its application to the Viral Conceptual Model (VCM) that captures and represents the sequencing of viruses, inspired by the need to understand the genomic aspects of the virus responsible for COVID-19. For achieving semantic clarity on the VCM, we leverage the "ontological unpacking" method, a process of ontological analysis that reveals the ontological foundation of the information that is represented in a conceptual model. This is accomplished by applying the stereotypes of the OntoUML ontology-driven conceptual modeling language.As a result, we propose a new OntoVCM, an ontologically grounded model, based on the initial VCM, but with guaranteed interoperability among the data sources that employ it. CONCLUSIONS We propose and illustrate how the unpacking of the Viral Conceptual Model resolves several issues related to semantic interoperability, the importance of which is recognized by the "I" in FAIR principles. The research addresses conceptual uncertainty within the domain of SARS-CoV-2 data and knowledge.The method employed provides the basis for further analyses of complex models currently used in life science applications, but lacking ontological grounding, subsequently hindering the interoperability needed for scientists to progress their research.
Collapse
Affiliation(s)
- Anna Bernasconi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy.
- PROS Research Center, VRAIN Research Institute, Universitat Politècnica de València, Valencia, Spain.
| | - Giancarlo Guizzardi
- Conceptual and Cognitive Modeling Research Group, Free University of Bozen-Bolzano, Bolzano, Italy
- Services and Cybersecurity Group, University of Twente, Enschede, The Netherlands
| | - Oscar Pastor
- PROS Research Center, VRAIN Research Institute, Universitat Politècnica de València, Valencia, Spain
| | - Veda C Storey
- J. Mack Robinson College of Business, Georgia State University, Atlanta, Georgia, USA
| |
Collapse
|
9
|
Al Khalaf R, Bernasconi A, Pinoli P, Ceri S. Analysis of co-occurring and mutually exclusive amino acid changes and detection of convergent and divergent evolution events in SARS-CoV-2. Comput Struct Biotechnol J 2022; 20:4238-4250. [PMID: 35945925 PMCID: PMC9352683 DOI: 10.1016/j.csbj.2022.07.051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 07/29/2022] [Accepted: 07/29/2022] [Indexed: 11/28/2022] Open
Affiliation(s)
- Ruba Al Khalaf
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Anna Bernasconi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
- Corresponding author.
| | - Pietro Pinoli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
| |
Collapse
|
10
|
Sokhansanj BA, Rosen GL. Mapping Data to Deep Understanding: Making the Most of the Deluge of SARS-CoV-2 Genome Sequences. mSystems 2022; 7:e0003522. [PMID: 35311562 PMCID: PMC9040592 DOI: 10.1128/msystems.00035-22] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2022] [Indexed: 12/22/2022] Open
Abstract
Next-generation sequencing has been essential to the global response to the COVID-19 pandemic. As of January 2022, nearly 7 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) sequences are available to researchers in public databases. Sequence databases are an abundant resource from which to extract biologically relevant and clinically actionable information. As the pandemic has gone on, SARS-CoV-2 has rapidly evolved, involving complex genomic changes that challenge current approaches to classifying SARS-CoV-2 variants. Deep sequence learning could be a potentially powerful way to build complex sequence-to-phenotype models. Unfortunately, while they can be predictive, deep learning typically produces "black box" models that cannot directly provide biological and clinical insight. Researchers should therefore consider implementing emerging methods for visualizing and interpreting deep sequence models. Finally, researchers should address important data limitations, including (i) global sequencing disparities, (ii) insufficient sequence metadata, and (iii) screening artifacts due to poor sequence quality control.
Collapse
Affiliation(s)
- Bahrad A. Sokhansanj
- Drexel University, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA
| | - Gail L. Rosen
- Drexel University, Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical & Computer Engineering, College of Engineering, Philadelphia, Pennsylvania, USA
| |
Collapse
|
11
|
Huang Q, Zhang Q, Bible PW, Liang Q, Zheng F, Wang Y, Hao Y, Liu Y. A New Way to Trace SARS-CoV-2 Variants Through Weighted Network Analysis of Frequency Trajectories of Mutations. Front Microbiol 2022; 13:859241. [PMID: 35369526 PMCID: PMC8966897 DOI: 10.3389/fmicb.2022.859241] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 02/18/2022] [Indexed: 11/13/2022] Open
Abstract
Early detection of SARS-CoV-2 variants enables timely tracking of clinically important strains in order to inform the public health response. Current subtype-based variant surveillance depending on prior subtype assignment according to lag features and their continuous risk assessment may delay this process. We proposed a weighted network framework to model the frequency trajectories of mutations (FTMs) for SARS-CoV-2 variant tracing, without requiring prior subtype assignment. This framework modularizes the FTMs and conglomerates synchronous FTMs together to represent the variants. It also generates module clusters to unveil the epidemic stages and their contemporaneous variants. Eventually, the module-based variants are assessed by phylogenetic tree through sub-sampling to facilitate communication and control of the epidemic. This process was benchmarked using worldwide GISAID data, which not only demonstrated all the methodology features but also showed the module-based variant identification had highly specific and sensitive mapping with the global phylogenetic tree. When applying this process to regional data like India and South Africa for SARS-CoV-2 variant surveillance, the approach clearly elucidated the national dispersal history of the viral variants and their co-circulation pattern, and provided much earlier warning of Beta (B.1.351), Delta (B.1.617.2), and Omicron (B.1.1.529). In summary, our work showed that the weighted network modeling of FTMs enables us to rapidly and easily track down SARS-CoV-2 variants overcoming prior viral subtyping with lag features, accelerating the understanding and surveillance of COVID-19.
Collapse
Affiliation(s)
- Qiang Huang
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Qiang Zhang
- College of Computer, Chengdu University, Chengdu, China
| | - Paul W Bible
- College of Arts and Sciences, Marian University, Indianapolis, IN, United States
| | - Qiaoxing Liang
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangzhou, China
| | - Fangfang Zheng
- School of Traditional Chinese Medicine Healthcare, Guangdong Food and Drug Vocational College, Guangzhou, China
| | - Ying Wang
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yuantao Hao
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| | - Yu Liu
- Department of Medical Statistics and Epidemiology, School of Public Health, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
12
|
Shi Q, Herbert C, Ward DV, Simin K, McCormick BA, Ellison Iii RT, Zai AH. COVID-19 Variant Surveillance and Social Determinants in Central Massachusetts: Development Study (Preprint). JMIR Form Res 2022; 6:e37858. [PMID: 35658093 PMCID: PMC9196873 DOI: 10.2196/37858] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Revised: 05/08/2022] [Accepted: 05/25/2022] [Indexed: 11/25/2022] Open
Abstract
Background Public health scientists have used spatial tools such as web-based Geographical Information System (GIS) applications to monitor and forecast the progression of the COVID-19 pandemic and track the impact of their interventions. The ability to track SARS-CoV-2 variants and incorporate the social determinants of health with street-level granularity can facilitate the identification of local outbreaks, highlight variant-specific geospatial epidemiology, and inform effective interventions. We developed a novel dashboard, the University of Massachusetts’ Graphical user interface for Geographic Information (MAGGI) variant tracking system that combines GIS, health-associated sociodemographic data, and viral genomic data to visualize the spatiotemporal incidence of SARS-CoV-2 variants with street-level resolution while safeguarding protected health information. The specificity and richness of the dashboard enhance the local understanding of variant introductions and transmissions so that appropriate public health strategies can be devised and evaluated. Objective We developed a web-based dashboard that simultaneously visualizes the geographic distribution of SARS-CoV-2 variants in Central Massachusetts, the social determinants of health, and vaccination data to support public health efforts to locally mitigate the impact of the COVID-19 pandemic. Methods MAGGI uses a server-client model–based system, enabling users to access data and visualizations via an encrypted web browser, thus securing patient health information. We integrated data from electronic medical records, SARS-CoV-2 genomic analysis, and public health resources. We developed the following functionalities into MAGGI: spatial and temporal selection capability by zip codes of interest, the detection of variant clusters, and a tool to display variant distribution by the social determinants of health. MAGGI was built on the Environmental Systems Research Institute ecosystem and is readily adaptable to monitor other infectious diseases and their variants in real-time. Results We created a geo-referenced database and added sociodemographic and viral genomic data to the ArcGIS dashboard that interactively displays Central Massachusetts’ spatiotemporal variants distribution. Genomic epidemiologists and public health officials use MAGGI to show the occurrence of SARS-CoV-2 genomic variants at high geographic resolution and refine the display by selecting a combination of data features such as variant subtype, subject zip codes, or date of COVID-19–positive sample collection. Furthermore, they use it to scale time and space to visualize association patterns between socioeconomics, social vulnerability based on the Centers for Disease Control and Prevention’s social vulnerability index, and vaccination rates. We launched the system at the University of Massachusetts Chan Medical School to support internal research projects starting in March 2021. Conclusions We developed a COVID-19 variant surveillance dashboard to advance our geospatial technologies to study SARS-CoV-2 variants transmission dynamics. This real-time, GIS-based tool exemplifies how spatial informatics can support public health officials, genomics epidemiologists, infectious disease specialists, and other researchers to track and study the spread patterns of SARS-CoV-2 variants in our communities.
Collapse
Affiliation(s)
- Qiming Shi
- Center for Clinical and Translational Science, UMass Chan Medical School, Worcester, MA, United States
| | - Carly Herbert
- Department of Population and Quantitative Health Sciences, UMass Chan Medical School, Worcester, MA, United States
- Department of Medicine, UMass Chan Medical School, Worcester, MA, United States
| | - Doyle V Ward
- Department of Microbiology and Physiological Systems, UMass Chan Medical School, Worcester, MA, United States
- Center for Microbiome Research, UMass Chan Medical School, Worcester, MA, United States
| | - Karl Simin
- Molecular, Cell, and Cancer Biology, UMass Chan Medical School, Worcester, MA, United States
| | - Beth A McCormick
- Department of Microbiology and Physiological Systems, UMass Chan Medical School, Worcester, MA, United States
- Center for Microbiome Research, UMass Chan Medical School, Worcester, MA, United States
| | - Richard T Ellison Iii
- Department of Medicine, UMass Chan Medical School, Worcester, MA, United States
- Department of Microbiology and Physiological Systems, UMass Chan Medical School, Worcester, MA, United States
| | - Adrian H Zai
- Center for Clinical and Translational Science, UMass Chan Medical School, Worcester, MA, United States
- Department of Population and Quantitative Health Sciences, UMass Chan Medical School, Worcester, MA, United States
| |
Collapse
|
13
|
Caetano-Anollés K, Hernandez N, Mughal F, Tomaszewski T, Caetano-Anollés G. The seasonal behaviour of COVID-19 and its galectin-like culprit of the viral spike. METHODS IN MICROBIOLOGY 2021; 50:27-81. [PMID: 38620818 PMCID: PMC8590929 DOI: 10.1016/bs.mim.2021.10.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Seasonal behaviour is an attribute of many viral diseases. Like other 'winter' RNA viruses, infections caused by the causative agent of COVID-19, SARS-CoV-2, appear to exhibit significant seasonal changes. Here we discuss the seasonal behaviour of COVID-19, emerging viral phenotypes, viral evolution, and how the mutational landscape of the virus affects the seasonal attributes of the disease. We propose that the multiple seasonal drivers behind infectious disease spread (and the spread of COVID-19 specifically) are in 'trade-off' relationships and can be better described within a framework of a 'triangle of viral persistence' modulated by the environment, physiology, and behaviour. This 'trade-off' exists as one trait cannot increase without a decrease in another. We also propose that molecular components of the virus can act as sensors of environment and physiology, and could represent molecular culprits of seasonality. We searched for flexible protein structures capable of being modulated by the environment and identified a galectin-like fold within the N-terminal domain of the spike protein of SARS-CoV-2 as a potential candidate. Tracking the prevalence of mutations in this structure resulted in the identification of a hemisphere-dependent seasonal pattern driven by mutational bursts. We propose that the galectin-like structure is a frequent target of mutations because it helps the virus evade or modulate the physiological responses of the host to further its spread and survival. The flexible regions of the N-terminal domain should now become a focus for mitigation through vaccines and therapeutics and for prediction and informed public health decision making.
Collapse
Affiliation(s)
| | - Nicolas Hernandez
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, United States
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, United States
| | - Tre Tomaszewski
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, United States
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, IL, United States
| |
Collapse
|