1
|
Elkin ME, Zhu X. Paying attention to the SARS-CoV-2 dialect : a deep neural network approach to predicting novel protein mutations. Commun Biol 2025; 8:98. [PMID: 39838059 PMCID: PMC11751191 DOI: 10.1038/s42003-024-07262-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Accepted: 11/13/2024] [Indexed: 01/23/2025] Open
Abstract
Predicting novel mutations has long-lasting impacts on life science research. Traditionally, this problem is addressed through wet-lab experiments, which are often expensive and time consuming. The recent advancement in neural language models has provided stunning results in modeling and deciphering sequences. In this paper, we propose a Deep Novel Mutation Search (DNMS) method, using deep neural networks, to model protein sequence for mutation prediction. We use SARS-CoV-2 spike protein as the target and use a protein language model to predict novel mutations. Different from existing research which is often limited to mutating the reference sequence for prediction, we propose a parent-child mutation prediction paradigm where a parent sequence is modeled for mutation prediction. Because mutations introduce changing context to the underlying sequence, DNMS models three aspects of the protein sequences: semantic changes, grammatical changes, and attention changes, each modeling protein sequence aspects from shifting of semantics, grammar coherence, and amino-acid interactions in latent space. A ranking approach is proposed to combine all three aspects to capture mutations demonstrating evolving traits, in accordance with real-world SARS-CoV-2 spike protein sequence evolution. DNMS can be adopted for an early warning variant detection system, creating public health awareness of future SARS-CoV-2 mutations.
Collapse
Affiliation(s)
- Magdalyn E Elkin
- Dept. Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL, 33431, USA.
| | - Xingquan Zhu
- Dept. Electrical Engineering and Computer Science, Florida Atlantic University, 777 Glades Road, Boca Raton, FL, 33431, USA.
| |
Collapse
|
2
|
Garcia-Segura P, Llop-Peiró A, Novau-Ferré N, Mestres-Truyol J, Saldivar-Espinoza B, Pujadas G, Garcia-Vallvé S. SARS-CoV-2 main protease (M-pro) mutational profiling: An insight into mutation coldspots. Comput Biol Med 2025; 184:109344. [PMID: 39531923 DOI: 10.1016/j.compbiomed.2024.109344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 09/20/2024] [Accepted: 10/30/2024] [Indexed: 11/16/2024]
Abstract
SARS-CoV-2 and the COVID-19 pandemic have marked a milestone in the history of scientific research worldwide. To ensure that treatments are successful in the mid-long term, it is crucial to characterize SARS-CoV-2 mutations, as they might lead to viral resistance. Data from >5,700,000 SARS-CoV-2 genomes available at GISAID was used to report SARS-CoV-2 mutations. Given the pivotal role of its main protease (M-pro) in virus replication, a detailed analysis of SARS-CoV-2 M-pro mutations was conducted, with particular attention to mutation-resistant residues or mutation coldspots, defined as those residues that have mutated in five or fewer genomes. 32 mutation coldspots were identified, most of which mediate interprotomer interactions or funneling interaction networks from the substrate-binding site towards the dimerization surface and vice versa. Besides, mutation coldspots were virtually conserved in all main proteases from other CoVs. Our results provide valuable information about key residues to M-pro structure that could be useful in rational target-directed drug design and establish a solid groundwork based on mutation analyses for the inhibition of M-pro dimerization, with a potential applicability to future coronavirus outbreaks.
Collapse
Affiliation(s)
- Pol Garcia-Segura
- Universitat Rovira i Virgili, Departament de Bioquímica i Biotecnologia, Research group in Cheminformatics & Nutrition, Campus de Sescelades, 43007, Tarragona, Spain.
| | - Ariadna Llop-Peiró
- Universitat Rovira i Virgili, Departament de Bioquímica i Biotecnologia, Research group in Cheminformatics & Nutrition, Campus de Sescelades, 43007, Tarragona, Spain.
| | - Nil Novau-Ferré
- Universitat Rovira i Virgili, Departament de Bioquímica i Biotecnologia, Research group in Cheminformatics & Nutrition, Campus de Sescelades, 43007, Tarragona, Spain.
| | - Júlia Mestres-Truyol
- Universitat Rovira i Virgili, Departament de Bioquímica i Biotecnologia, Research group in Cheminformatics & Nutrition, Campus de Sescelades, 43007, Tarragona, Spain.
| | - Bryan Saldivar-Espinoza
- Universitat Rovira i Virgili, Departament de Bioquímica i Biotecnologia, Research group in Cheminformatics & Nutrition, Campus de Sescelades, 43007, Tarragona, Spain.
| | - Gerard Pujadas
- Universitat Rovira i Virgili, Departament de Bioquímica i Biotecnologia, Research group in Cheminformatics & Nutrition, Campus de Sescelades, 43007, Tarragona, Spain
| | - Santiago Garcia-Vallvé
- Universitat Rovira i Virgili, Departament de Bioquímica i Biotecnologia, Research group in Cheminformatics & Nutrition, Campus de Sescelades, 43007, Tarragona, Spain.
| |
Collapse
|
3
|
Choi WJ, Park J, Seong DY, Chung DS, Hong D. A prediction of mutations in infectious viruses using artificial intelligence. Genomics Inform 2024; 22:15. [PMID: 39380083 PMCID: PMC11463117 DOI: 10.1186/s44342-024-00019-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Accepted: 09/18/2024] [Indexed: 10/10/2024] Open
Abstract
Many subtypes of SARS-CoV-2 have emerged since its early stages, with mutations showing regional and racial differences. These mutations significantly affected the infectivity and severity of the virus. This study aimed to predict the mutations that occur during the evolution of SARS-CoV-2 and identify the key characteristics for making these predictions. We collected and organized data on the lineage, date, clade, and mutations of SARS-CoV-2 from publicly available databases and processed them to predict the mutations. In addition, we utilized various artificial intelligence models to predict newly emerging mutations and created various training sets based on clade information. Using only mutation information resulted in low performance of the learning models, whereas incorporating clade differentiation resulted in high performance in machine learning models, including XGBoost (accuracy: 0.999). However, mutations fixed in the receptor-binding motif (RBM) region of Omicron resulted in decreased predictive performance. Using these models, we predicted potential mutation positions for 24C, following the recently emerged 24A and 24B clades. We identified a mutation at position Q493 in the RBM region. Our study developed effective artificial intelligence models and characteristics for predicting new mutations in continuously evolving infectious viruses.
Collapse
Affiliation(s)
- Won Jong Choi
- Department of Precision Medicine and Big Data, College of Medicine, The Catholic University of Korea, Seoul, 06591, Republic of Korea
- Department of Medical Informatics, The Catholic University of Korea, Seoul, 06591, Republic of Korea
| | - Jongkeun Park
- Department of Medical Informatics, The Catholic University of Korea, Seoul, 06591, Republic of Korea
| | - Do Young Seong
- Department of Precision Medicine and Big Data, College of Medicine, The Catholic University of Korea, Seoul, 06591, Republic of Korea
- Department of Medical Informatics, The Catholic University of Korea, Seoul, 06591, Republic of Korea
| | - Dae Sun Chung
- Department of Medical Informatics, The Catholic University of Korea, Seoul, 06591, Republic of Korea
- Department of Medical Sciences, Graduate Schoolof, College of Medicine , The Catholic University of Korea, Seoul, 06591, Republic of Korea
| | - Dongwan Hong
- Department of Precision Medicine and Big Data, College of Medicine, The Catholic University of Korea, Seoul, 06591, Republic of Korea.
- Department of Medical Informatics, The Catholic University of Korea, Seoul, 06591, Republic of Korea.
- Department of Medical Sciences, Graduate Schoolof, College of Medicine , The Catholic University of Korea, Seoul, 06591, Republic of Korea.
- Precision Medicine Research Center, College of Medicine, The Catholic University of Korea, Seoul, 06591, Republic of Korea.
- Cancer Evolution Research Center, College of Medicine, The Catholic University of Korea, Seoul, 06591, Republic of Korea.
- College of Medicine, CMC Institute for Basic Medical Science, The Catholic University of Korea, Seoul, 06591, Republic of Korea.
| |
Collapse
|
4
|
Rogozin IB, Saura A, Poliakov E, Bykova A, Roche-Lima A, Pavlov YI, Yurchenko V. Properties and Mechanisms of Deletions, Insertions, and Substitutions in the Evolutionary History of SARS-CoV-2. Int J Mol Sci 2024; 25:3696. [PMID: 38612505 PMCID: PMC11011937 DOI: 10.3390/ijms25073696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2024] [Revised: 03/22/2024] [Accepted: 03/23/2024] [Indexed: 04/14/2024] Open
Abstract
SARS-CoV-2 has accumulated many mutations since its emergence in late 2019. Nucleotide substitutions leading to amino acid replacements constitute the primary material for natural selection. Insertions, deletions, and substitutions appear to be critical for coronavirus's macro- and microevolution. Understanding the molecular mechanisms of mutations in the mutational hotspots (positions, loci with recurrent mutations, and nucleotide context) is important for disentangling roles of mutagenesis and selection. In the SARS-CoV-2 genome, deletions and insertions are frequently associated with repetitive sequences, whereas C>U substitutions are often surrounded by nucleotides resembling the APOBEC mutable motifs. We describe various approaches to mutation spectra analyses, including the context features of RNAs that are likely to be involved in the generation of recurrent mutations. We also discuss the interplay between mutations and natural selection as a complex evolutionary trend. The substantial variability and complexity of pipelines for the reconstruction of mutations and the huge number of genomic sequences are major problems for the analyses of mutations in the SARS-CoV-2 genome. As a solution, we advocate for the development of a centralized database of predicted mutations, which needs to be updated on a regular basis.
Collapse
Affiliation(s)
- Igor B. Rogozin
- Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic
| | - Andreu Saura
- Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic
| | - Eugenia Poliakov
- National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Anastassia Bykova
- Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic
| | - Abiel Roche-Lima
- Center for Collaborative Research in Health Disparities—RCMI Program, Medical Sciences Campus, University of Puerto Rico, San Juan 00936, Puerto Rico
| | - Youri I. Pavlov
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Vyacheslav Yurchenko
- Life Science Research Centre, Faculty of Science, University of Ostrava, 710 00 Ostrava, Czech Republic
| |
Collapse
|
5
|
Saldivar-Espinoza B, Garcia-Segura P, Novau-Ferré N, Macip G, Martínez R, Puigbò P, Cereto-Massagué A, Pujadas G, Garcia-Vallve S. The Mutational Landscape of SARS-CoV-2. Int J Mol Sci 2023; 24:ijms24109072. [PMID: 37240420 DOI: 10.3390/ijms24109072] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 05/12/2023] [Accepted: 05/16/2023] [Indexed: 05/28/2023] Open
Abstract
Mutation research is crucial for detecting and treating SARS-CoV-2 and developing vaccines. Using over 5,300,000 sequences from SARS-CoV-2 genomes and custom Python programs, we analyzed the mutational landscape of SARS-CoV-2. Although almost every nucleotide in the SARS-CoV-2 genome has mutated at some time, the substantial differences in the frequency and regularity of mutations warrant further examination. C>U mutations are the most common. They are found in the largest number of variants, pangolin lineages, and countries, which indicates that they are a driving force behind the evolution of SARS-CoV-2. Not all SARS-CoV-2 genes have mutated in the same way. Fewer non-synonymous single nucleotide variations are found in genes that encode proteins with a critical role in virus replication than in genes with ancillary roles. Some genes, such as spike (S) and nucleocapsid (N), show more non-synonymous mutations than others. Although the prevalence of mutations in the target regions of COVID-19 diagnostic RT-qPCR tests is generally low, in some cases, such as for some primers that bind to the N gene, it is significant. Therefore, ongoing monitoring of SARS-CoV-2 mutations is crucial. The SARS-CoV-2 Mutation Portal provides access to a database of SARS-CoV-2 mutations.
Collapse
Affiliation(s)
- Bryan Saldivar-Espinoza
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Pol Garcia-Segura
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Nil Novau-Ferré
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Guillem Macip
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | | | - Pere Puigbò
- Department of Biology, University of Turku, 20500 Turku, Finland
- Department of Biochemistry and Biotechnology, Rovira i Virgili University, 43007 Tarragona, Spain
- Eurecat, Technology Centre of Catalonia, Unit of Nutrition and Health, 43204 Reus, Spain
| | - Adrià Cereto-Massagué
- EURECAT Centre Tecnològic de Catalunya, Centre for Omic Sciences (COS), Joint Unit Universitat Rovira i Virgili-EURECAT, Unique Scientific and Technical Infrastructures (ICTS), 43204 Reus, Spain
| | - Gerard Pujadas
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| | - Santiago Garcia-Vallve
- Departament de Bioquímica i Biotecnologia, Research Group in Cheminformatics & Nutrition, Campus de Sescelades, Universitat Rovira i Virgili, 43007 Tarragona, Spain
| |
Collapse
|
6
|
Lippi G, Henry BM, Plebani M. A Simple Epidemiologic Model for Predicting Impaired Neutralization of New SARS-CoV-2 Variants. Vaccines (Basel) 2023; 11:128. [PMID: 36679973 PMCID: PMC9863154 DOI: 10.3390/vaccines11010128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 12/29/2022] [Accepted: 01/03/2023] [Indexed: 01/06/2023] Open
Abstract
This study is aimed at developing a simple epidemiologic model that could help predict the impaired neutralization of new SARS-CoV-2 variants. We explored the potential association between neutralization of recent and more prevalent SARS-CoV-2 sublineages belonging to the Omicron family (i.e., BA.4/5, BA.4.6, BA.2.75.2, BQ.1.1 and XBB.1) expressed as FFRNT50 (>50% suppression of fluorescent foci fluorescent focus reduction neutralization test) in recipients of four doses of monovalent mRNA-based coronavirus disease 2019 (COVID-19) vaccines, with epidemiologic variables like emergence date and number of spike protein mutations of these sublineages, cumulative worldwide COVID-19 cases and cumulative number of COVID-19 vaccine doses administered worldwide at the time of SARS-CoV-2 Omicron sublineage emergence. In the univariate analysis, the FFRNT50 value for the different SARS-CoV-2 Omicron sublineages was significantly associated with all such variables except with the number of spike protein mutations. Such associations were confirmed in the multivariate analysis, which enabled the construction of the equation: “−0.3917 × [Emergence (date)] + 1.403 × [COVID-19 cases (million)] − 121.8 × [COVID-19 Vaccine doses (billion)] + 18,250”, predicting the FFRNT50 value of the five SARS-CoV-2 Omicron sublineages with 0.996 accuracy (p = 0.013). We have shown in this work that a simple mathematical approach, encompassing a limited number of widely available epidemiologic variables, such as emergence date of new variants and number of COVID-19 cases and vaccinations, could help identifying the emergence and surge of future lineages with major propensity to impair humoral immunity.
Collapse
Affiliation(s)
- Giuseppe Lippi
- Section of Clinical Biochemistry, School of Medicine, University of Verona, Piazzale L.A. Scuro 10, 37134 Verona, Italy
| | - Brandon M. Henry
- Clinical Laboratory, Division of Nephrology and Hypertension, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Mario Plebani
- Department of Medicine, University of Padova, 35128 Padova, Italy
| |
Collapse
|