1
|
Rudar J, Kruczkiewicz P, Vernygora O, Golding GB, Hajibabaei M, Lung O. Sequence signatures within the genome of SARS-CoV-2 can be used to predict host source. Microbiol Spectr 2024; 12:e0358423. [PMID: 38436242 PMCID: PMC10986507 DOI: 10.1128/spectrum.03584-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 02/11/2024] [Indexed: 03/05/2024] Open
Abstract
We conducted an in silico analysis to better understand the potential factors impacting host adaptation of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in white-tailed deer, humans, and mink due to the strong evidence of sustained transmission within these hosts. Classification models trained on single nucleotide and amino acid differences between samples effectively identified white-tailed deer-, human-, and mink-derived SARS-CoV-2. For example, the balanced accuracy score of Extremely Randomized Trees classifiers was 0.984 ± 0.006. Eighty-eight commonly identified predictive mutations are found at sites under strong positive and negative selective pressure. A large fraction of sites under selection (86.9%) or identified by machine learning (87.1%) are found in genes other than the spike. Some locations encoded by these gene regions are predicted to be B- and T-cell epitopes or are implicated in modulating the immune response suggesting that host adaptation may involve the evasion of the host immune system, modulation of the class-I major-histocompatibility complex, and the diminished recognition of immune epitopes by CD8+ T cells. Our selection and machine learning analysis also identified that silent mutations, such as C7303T and C9430T, play an important role in discriminating deer-derived samples across multiple clades. Finally, our investigation into the origin of the B.1.641 lineage from white-tailed deer in Canada discovered an additional human sequence from Michigan related to the B.1.641 lineage sampled near the emergence of this lineage. These findings demonstrate that machine-learning approaches can be used in combination with evolutionary genomics to identify factors possibly involved in the cross-species transmission of viruses and the emergence of novel viral lineages.IMPORTANCESevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmissible virus capable of infecting and establishing itself in human and wildlife populations, such as white-tailed deer. This fact highlights the importance of developing novel ways to identify genetic factors that contribute to its spread and adaptation to new host species. This is especially important since these populations can serve as reservoirs that potentially facilitate the re-introduction of new variants into human populations. In this study, we apply machine learning and phylogenetic methods to uncover biomarkers of SARS-CoV-2 adaptation in mink and white-tailed deer. We find evidence demonstrating that both non-synonymous and silent mutations can be used to differentiate animal-derived sequences from human-derived ones and each other. This evidence also suggests that host adaptation involves the evasion of the immune system and the suppression of antigen presentation. Finally, the methods developed here are general and can be used to investigate host adaptation in viruses other than SARS-CoV-2.
Collapse
Affiliation(s)
- Josip Rudar
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba, Canada
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
| | - Peter Kruczkiewicz
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba, Canada
| | - Oksana Vernygora
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba, Canada
| | - G. Brian Golding
- Department of Biology, McMaster University, Hamilton, Ontario, Canada
| | - Mehrdad Hajibabaei
- Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
| | - Oliver Lung
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, Manitoba, Canada
- Department of Biological Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| |
Collapse
|
2
|
Gupta S, Gupta D, Bhatnagar S. Analysis of SARS-CoV-2 genome evolutionary patterns. Microbiol Spectr 2024; 12:e0265423. [PMID: 38197644 PMCID: PMC10846092 DOI: 10.1128/spectrum.02654-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 11/20/2023] [Indexed: 01/11/2024] Open
Abstract
The spread of SARS-CoV-2 virus accompanied by public availability of abundant sequence data provides a window for the determination of viral evolutionary patterns. In this study, SARS-CoV-2 genome sequences were collected from seven countries in the period January 2020-December 2022. The sequences were classified into three phases, namely, pre-vaccination, post-vaccination, and recent period. Comparison was performed between these phases based on parameters like mutation rates, selection pressure (dN/dS ratio), and transition to transversion ratios (Ti/Tv). Similar comparisons were performed among SARS-CoV-2 variants. Statistical significance was tested using Graphpad unpaired t-test. The analysis showed an increase in the percent genomic mutation rates post-vaccination and in recent periods across all countries from the pre-vaccination sequences. Mutation rates were highest in NSP3, S, N, and NSP12b before and increased further after vaccination. NSP4 showed the largest change in mutation rates after vaccination. The dN/dS ratios showed purifying selection that shifted toward neutral selection after vaccination. N, ORF8, ORF3a, and ORF10 were under highest positive selection before vaccination. Shift toward neutral selection was driven by E, NSP3, and ORF7a in the after vaccination set. In recent sequences, the largest dN/dS change was observed in E, NSP1, and NSP13. The Ti/Tv ratios decreased with time. C→U and G→U were the most frequent transitions and transversions. However, U→G was the most frequent transversion in recent period. The Omicron variant had the highest genomic mutation rates, while Delta showed the highest dN/dS ratio. Protein-wise dN/dS ratio was also seen to vary across the different variants.IMPORTANCETo the best of our knowledge, there exists no other large-scale study of the genomic and protein-wise mutation patterns during the time course of evolution in different countries. Analyzing the SARS-CoV-2 evolutionary patterns in view of the varying spatial, temporal, and biological signals is important for diagnostics, therapeutics, and pharmacovigilance of SARS-CoV-2.
Collapse
Affiliation(s)
- Shubhangi Gupta
- Department of Biological Sciences and Engineering, Computational and Structural Biology Laboratory, Netaji Subhas University of Technology, Dwarka, New Delhi, India
| | - Deepanshu Gupta
- Division of Biotechnology, Computational and Structural Biology Laboratory, Netaji Subhas Institute of Technology, Dwarka, New Delhi, India
| | - Sonika Bhatnagar
- Department of Biological Sciences and Engineering, Computational and Structural Biology Laboratory, Netaji Subhas University of Technology, Dwarka, New Delhi, India
- Division of Biotechnology, Computational and Structural Biology Laboratory, Netaji Subhas Institute of Technology, Dwarka, New Delhi, India
| |
Collapse
|
3
|
Wang Y, Li Z, Wang X, Jiang W, Jiang W. SARS-CoV-2 continuously optimizes its codon usage to adapt to human lung environment. J Appl Genet 2023; 64:831-837. [PMID: 37740828 DOI: 10.1007/s13353-023-00790-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 09/14/2023] [Accepted: 09/16/2023] [Indexed: 09/25/2023]
Abstract
Viruses need to utilize the resources from host cells to reproduce themselves. RNA translation rate, which is largely determined by codon usage, is the rate-limiting step across the life cycle of viruses. Adapting to the codon usage of hosts would help virus better proliferate. We retrieved the time-course mutation profile of millions of world-wide SARS-CoV-2 sequences. For synonymous mutations, we defined whether a mutation elevate or reduce the relative synonymous codon usage (RSCU). We found that if a synonymous mutation in SARS-CoV-2 increases the RSCU (calculated from human lungs), denoted as delta RSCU > 0, then this mutation is positively selected because the allele frequency (AF) of this mutation increases with time, and vice versa. The results suggest that in SARS-CoV-2, the synonymous mutations that increase codon optimality are beneficial to the virus and are favored by natural selection. For the first time, we used the dynamics of allele frequency to demonstrate that SARS-CoV-2 is continuously optimizing its codon usage to adapt to human lungs. Nevertheless, adaptation to other human tissues cannot be excluded. These results warn us that under this global pandemic, synonymous mutations in SARS-CoV-2 should not be automatically ignored since they indeed change the fitness of the virus.
Collapse
Affiliation(s)
- Yinglian Wang
- Institute of Integrated Medicine, Qingdao Medical College, Qingdao University, Qingdao, 266071, Shandong, China
- Changyi People's Hospital, Weifang, 261300, Shandong, China
| | - Zhenhua Li
- Pulmonary and Critical Care Medicine Department 2, Qingdao Municipal Hospital of Traditional Chinese Medicine (Qingdao Hiser Medical Group), Qingdao, 266033, China
- Department of Respiratory Diseases, The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao Haici Hospital, Qingdao, 266033, Shandong, China
| | - Xiuxiu Wang
- Department of Respiratory Medicine, Qilu Hospital (Qingdao), Cheeloo College of Medicine, Shandong University, Qingdao, 266035, Shandong, China
| | - Wen Jiang
- Pulmonary and Critical Care Medicine Department 2, Qingdao Municipal Hospital of Traditional Chinese Medicine (Qingdao Hiser Medical Group), Qingdao, 266033, China
- Department of Respiratory Diseases, The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao Haici Hospital, Qingdao, 266033, Shandong, China
| | - Wenqing Jiang
- Pulmonary and Critical Care Medicine Department 2, Qingdao Municipal Hospital of Traditional Chinese Medicine (Qingdao Hiser Medical Group), Qingdao, 266033, China.
- Department of Respiratory Diseases, The Affiliated Qingdao Hiser Hospital of Qingdao University, Qingdao Haici Hospital, Qingdao, 266033, Shandong, China.
| |
Collapse
|
4
|
Saldivar-Espinoza B, Garcia-Segura P, Novau-Ferré N, Macip G, Martínez R, Puigbò P, Cereto-Massagué A, Pujadas G, Garcia-Vallve S. The Mutational Landscape of SARS-CoV-2. Int J Mol Sci 2023; 24:ijms24109072. [PMID: 37240420 DOI: 10.3390/ijms24109072] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/12/2023] [Accepted: 05/16/2023] [Indexed: 05/28/2023] Open
Abstract
Mutation research is crucial for detecting and treating SARS-CoV-2 and developing vaccines. Using over 5,300,000 sequences from SARS-CoV-2 genomes and custom Python programs, we analyzed the mutational landscape of SARS-CoV-2. Although almost every nucleotide in the SARS-CoV-2 genome has mutated at some time, the substantial differences in the frequency and regularity of mutations warrant further examination. C>U mutations are the most common. They are found in the largest number of variants, pangolin lineages, and countries, which indicates that they are a driving force behind the evolution of SARS-CoV-2. Not all SARS-CoV-2 genes have mutated in the same way. Fewer non-synonymous single nucleotide variations are found in genes that encode proteins with a critical role in virus replication than in genes with ancillary roles. Some genes, such as spike (S) and nucleocapsid (N), show more non-synonymous mutations than others. Although the prevalence of mutations in the target regions of COVID-19 diagnostic RT-qPCR tests is generally low, in some cases, such as for some primers that bind to the N gene, it is significant. Therefore, ongoing monitoring of SARS-CoV-2 mutations is crucial. The SARS-CoV-2 Mutation Portal provides access to a database of SARS-CoV-2 mutations.
Collapse
Affiliation(s)
- Bryan Saldivar-Espinoza
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Pol Garcia-Segura
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Nil Novau-Ferré
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Guillem Macip
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | | | - Pere Puigbò
- Department of Biology, University of Turku, 20500 Turku, Finland
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, 43007 Tarragona, Spain
- Eurecat, Technology Centre of Catalonia, Unit of Nutrition and Health, 43204 Reus, Spain
| | - Adrià Cereto-Massagué
- EURECAT Centre Tecnològic de Catalunya, Centre for Omic Sciences (COS), Joint Unit Universitat Rovira i Virgili-EURECAT, Unique Scientific and Technical Infrastructures (ICTS), 43204 Reus, Spain
| | - Gerard Pujadas
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Santiago Garcia-Vallve
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| |
Collapse
|
5
|
Si F, Song S, Yu R, Li Z, Wei W, Wu C. Coronavirus accessory protein ORF3 biology and its contribution to viral behavior and pathogenesis. iScience 2023; 26:106280. [PMID: 36945252 PMCID: PMC9972675 DOI: 10.1016/j.isci.2023.106280] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
Coronavirus porcine epidemic diarrhea virus (PEDV) is classified in the genus Alphacoronavirus, family Coronaviridae that encodes the only accessory protein, ORF3 protein. However, how ORF3 contributes to viral pathogenicity, adaptability, and replication is obscure. In this review, we summarize current knowledge and identify gaps in many aspects of ORF3 protein in PEDV, with emphasis on its unique biological features, including membrane topology, Golgi retention mechanism, potential intrinsic disordered property, functional motifs, protein glycosylation, and codon usage phenotypes related to genetic evolution and gene expression. In addition, we propose intriguing questions related to ORF3 protein that we hope to stimulate further studies and encourage collaboration among virologists worldwide to provide constructive knowledge about the unique characteristics and biological functions of ORF3 protein, by which their potential role in clarifying viral behavior and pathogenesis can be possible.
Collapse
Affiliation(s)
- Fusheng Si
- Institute of Animal Science and Veterinary Medicine, Shanghai Academy of Agricultural Sciences, Shanghai Key Laboratory of Agricultural Genetics and Breeding, Shanghai Engineering Research Center of Breeding Pig, Shanghai 201106, P.R. China
| | - Shuai Song
- Institute of Animal Health, Guangdong Academy of Agricultural Sciences, Scientific Observation and Experiment Station of Veterinary Drugs and Diagnostic Techniques of Guangdong Province, Ministry of Agriculture of Rural Affairs, and Key Laboratory of Animal Disease Prevention of Guangdong Province, Guangzhou 510640, P.R. China
| | - Ruisong Yu
- Institute of Animal Science and Veterinary Medicine, Shanghai Academy of Agricultural Sciences, Shanghai Key Laboratory of Agricultural Genetics and Breeding, Shanghai Engineering Research Center of Breeding Pig, Shanghai 201106, P.R. China
| | - Zhen Li
- Institute of Animal Science and Veterinary Medicine, Shanghai Academy of Agricultural Sciences, Shanghai Key Laboratory of Agricultural Genetics and Breeding, Shanghai Engineering Research Center of Breeding Pig, Shanghai 201106, P.R. China
| | - Wenqiang Wei
- Department of Microbiology, School of Basic Medical Sciences, Henan University, Kaifeng, Henan 475004, P.R. China
| | - Chao Wu
- Department of Pathology and Immunology, Washington University in St. Louis, St. Louis, MO 63110, USA
| |
Collapse
|
6
|
Lu X, Chen Y, Zhang G. Functional evolution of SARS-CoV-2 spike protein: Maintaining wide host spectrum and enhancing infectivity via surface charge of spike protein. Comput Struct Biotechnol J 2023; 21:2068-2074. [PMID: 36936817 PMCID: PMC10008190 DOI: 10.1016/j.csbj.2023.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 03/08/2023] [Accepted: 03/08/2023] [Indexed: 03/14/2023] Open
Abstract
The SARS-CoV-2 virus, which causes the COVID-19, is rapidly accumulating mutations to adapt to the hosts. We collected SARS-CoV-2 sequence data from the end of 2019 to January 2023 to analyze for their evolutionary features during the pandemic. We found that most of the SARS-CoV-2 genes are undergoing negative purifying selection, while the spike protein gene (S-gene) is undergoing rapid positive selection. From the original strain to the alpha, delta and omicron variant types, the Ka/Ks of the S-gene increases, while the Ka/Ks within one variant type decreases over time. During the evolution, the codon usage did not evolve towards optimal translation and protein expression. In contrast, only S-gene mutations showed a remarkable trend on accumulating more positive charges. This facilitates the infection via binding human ACE2 for cell entry and binding furin for cleavage. Such a functional evolution emphasizes the survival strategy of SARS-CoV-2, and indicated new druggable target to contain the viral infection. The nearly fully positively-charged interaction surfaces indicated that the infectivity of SARS-CoV-2 virus may approach a limit.
Collapse
Affiliation(s)
- Xiaolong Lu
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Yang Chen
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes and MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
- Chi-Biotech Co. Ltd., Shenzhen, China
| |
Collapse
|