1
|
Elsherbini AMA, Elkholy AH, Fadel YM, Goussarov G, Elshal AM, El-Hadidi M, Mysara M. Utilizing genomic signatures to gain insights into the dynamics of SARS-CoV-2 through Machine and Deep Learning techniques. BMC Bioinformatics 2024; 25:131. [PMID: 38539073 PMCID: PMC10967124 DOI: 10.1186/s12859-024-05648-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 01/10/2024] [Indexed: 11/11/2024] Open
Abstract
The global spread of the SARS-CoV-2 pandemic, originating in Wuhan, China, has had profound consequences on both health and the economy. Traditional alignment-based phylogenetic tree methods for tracking epidemic dynamics demand substantial computational power due to the growing number of sequenced strains. Consequently, there is a pressing need for an alignment-free approach to characterize these strains and monitor the dynamics of various variants. In this work, we introduce a swift and straightforward tool named GenoSig, implemented in C++. The tool exploits the Di and Tri nucleotide frequency signatures to delineate the taxonomic lineages of SARS-CoV-2 by employing diverse machine learning (ML) and deep learning (DL) models. Our approach achieved a tenfold cross-validation accuracy of 87.88% (± 0.013) for DL and 86.37% (± 0.0009) for Random Forest (RF) model, surpassing the performance of other ML models. Validation using an additional unexposed dataset yielded comparable results. Despite variations in architectures between DL and RF, it was observed that later clades, specifically GRA, GRY, and GK, exhibited superior performance compared to earlier clades G and GH. As for the continental origin of the virus, both DL and RF models exhibited lower performance than in predicting clades. However, both models demonstrated relatively higher accuracy for Europe, North America, and South America compared to other continents, with DL outperforming RF. Both models consistently demonstrated a preference for cytosine and guanine over adenine and thymine in both clade and continental analyses, in both Di and Tri nucleotide frequencies signatures. Our findings suggest that GenoSig provides a straightforward approach to address taxonomic, epidemiological, and biological inquiries, utilizing a reductive method applicable not only to SARS-CoV-2 but also to similar research questions in an alignment-free context.
Collapse
Affiliation(s)
- Ahmed M A Elsherbini
- Bioinformatics Group, Center for Informatics Science, School of Information Technology and Computer Science, Nile University, Giza, Egypt
| | - Amr Hassan Elkholy
- Bioinformatics Group, Center for Informatics Science, School of Information Technology and Computer Science, Nile University, Giza, Egypt
| | - Youssef M Fadel
- Bioinformatics Group, Center for Informatics Science, School of Information Technology and Computer Science, Nile University, Giza, Egypt
| | - Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK•CEN), Mol, Belgium
| | - Ahmed Mohamed Elshal
- Bioinformatics Group, Center for Informatics Science, School of Information Technology and Computer Science, Nile University, Giza, Egypt
| | - Mohamed El-Hadidi
- Bioinformatics Group, Center for Informatics Science, School of Information Technology and Computer Science, Nile University, Giza, Egypt
| | - Mohamed Mysara
- Bioinformatics Group, Center for Informatics Science, School of Information Technology and Computer Science, Nile University, Giza, Egypt.
| |
Collapse
|
2
|
Banerjee S, Sengupta A, Ghosh SK, Banerjee R. CDH1 gene as biomarker towards breast cancer prediction. J Biomol Struct Dyn 2024:1-14. [PMID: 38373072 DOI: 10.1080/07391102.2024.2316770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/03/2024] [Indexed: 02/21/2024]
Abstract
Breast cancer is considered to be happened due to genetic aberration. Out of several genes expressed, it is found that cadherin 1, type 1 (CDH1) is responsible in several ways to control the metabolic order in human. Deregulation of the function of protein E-cadherin, expressed from CDH1 plays an important role in lobular breast cancer. In order to understand the root cause of this recent claim, we focus on CDH1 gene: whether the genetic information translated due to any deviation/alteration/modification in its sequence is related to the occurrence of the different types breast cancer. Towards this end, quantitative analysis of different biophysical and bio-chemical properties of CDH1 gene in genomic and proteomic levels from the available genomic (cDNA) sequences of CDH1 gene (obtained from the COSMIC Database for 78 patients, suffering from various types of breast cancer) clearly emphasizes that alternation/modification in the sequence of the CDH1 gene can be detrimental. Furthermore, Random forest, K-nearest neighbour and stochastic gradient descent (SGD) algorithms are applied on the derived dataset to classify the types of breast cancer, and to validate our hypothesis regarding the acute role of CDH1 as potential bio marker for breast cancer. Analysis of the mutated CDH1 gene sequences, and their related parameters using aforesaid machine learning techniques clearly establish that CDH1 gene can take the deterministic role in predicting the chances of occurrences of different types of breast cancer with an accuracy of > 90 % . Such an observation opens a new paradigm in diagnostic approach of breast cancer.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Srijan Banerjee
- Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, Nadia, West Bengal, India
| | - Antara Sengupta
- Department of Computer Science and Engineering, University of Calcutta, Kolkata, West Bengal, India
| | - Shankar Kumar Ghosh
- Department of Computer Science and Engineering, Shiv Nadar Institution of Eminence, Delhi, India
| | - Raja Banerjee
- Department of Biotechnology, Maulana Abul Kalam Azad University of Technology, Nadia, West Bengal, India
| |
Collapse
|
3
|
Setthapramote C, Wongsuk T, Thongnak C, Phumisantiphong U, Hansirisathit T, Thanunchai M. SARS-CoV-2 Variants by Whole-Genome Sequencing in a University Hospital in Bangkok: First to Third COVID-19 Waves. Pathogens 2023; 12:pathogens12040626. [PMID: 37111512 PMCID: PMC10146024 DOI: 10.3390/pathogens12040626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2023] [Revised: 03/30/2023] [Accepted: 04/17/2023] [Indexed: 04/29/2023] Open
Abstract
BACKGROUND Multiple severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants emerged globally during the recent coronavirus disease (COVID-19) pandemic. From April 2020 to April 2021, Thailand experienced three COVID-19 waves, and each wave was driven by different variants. Therefore, we aimed to analyze the genetic diversity of circulating SARS-CoV-2 using whole-genome sequencing analysis. METHODS A total of 33 SARS-CoV-2 positive samples from three consecutive COVID-19 waves were collected and sequenced by whole-genome sequencing, of which, 8, 10, and 15 samples were derived from the first, second, and third waves, respectively. The genetic diversity of variants in each wave and the correlation between mutations and disease severity were explored. RESULTS During the first wave, A.6, B, B.1, and B.1.375 were found to be predominant. The occurrence of mutations in these lineages was associated with low asymptomatic and mild symptoms, providing no transmission advantage and resulting in extinction after a few months of circulation. B.1.36.16, the predominant lineage of the second wave, caused more symptomatic COVID-19 cases and contained a small number of key mutations. This variant was replaced by the VOC alpha variant, which later became dominant in the third wave. We found that B.1.1.7 lineage-specific mutations were crucial for increasing transmissibility and infectivity, but not likely associated with disease severity. There were six additional mutations found only in severe COVID-19 patients, which might have altered the virus phenotype with an inclination toward more highly pathogenic SARS-CoV-2. CONCLUSION The findings of this study highlighted the importance of whole-genome analysis in tracking newly emerging variants, exploring the genetic determinants essential for transmissibility, infectivity, and pathogenicity, and helping better understand the evolutionary process in the adaptation of viruses in humans.
Collapse
Affiliation(s)
- Chayanee Setthapramote
- Department of Clinical Pathology, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok 10300, Thailand
| | - Thanwa Wongsuk
- Department of Clinical Pathology, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok 10300, Thailand
| | - Chuphong Thongnak
- Department of Clinical Pathology, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok 10300, Thailand
| | - Uraporn Phumisantiphong
- Department of Clinical Pathology, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok 10300, Thailand
- Department of Central Laboratory and Blood Bank, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok 10300, Thailand
| | - Tonsan Hansirisathit
- Department of Central Laboratory and Blood Bank, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok 10300, Thailand
| | - Maytawan Thanunchai
- Department of Clinical Pathology, Faculty of Medicine Vajira Hospital, Navamindradhiraj University, Bangkok 10300, Thailand
- Division of Clinical Microbiology, Department of Medical Technology, Faculty of Associated Medical Sciences, Chiang Mai University, Chiang Mai 50200, Thailand
| |
Collapse
|
4
|
Sadeghi K, Zadheidar S, Zebardast A, Nejati A, Faraji M, Ghavami N, Kalantari S, Salimi V, Yavarian J, Abedi A, Jandaghi NZS, Mokhtari‐Azad T. Genomic surveillance of SARS-CoV-2 strains circulating in Iran during six waves of the pandemic. Influenza Other Respir Viruses 2023; 17:e13135. [PMID: 37078070 PMCID: PMC10106497 DOI: 10.1111/irv.13135] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 03/25/2023] [Accepted: 04/03/2023] [Indexed: 04/21/2023] Open
Abstract
Background SARS-CoV-2 genomic surveillance is necessary for the detection, monitoring, and evaluation of virus variants, which can have increased transmissibility, disease severity, or other adverse effects. We sequenced 330 SARS-CoV-2 genomes during the sixth wave of the COVID pandemic in Iran and compared them with five previous waves, for identifying SARS-CoV-2 variants, the genomic behavior of the virus, and understanding its characteristics. Methods After viral RNA extraction from clinical samples collected during the COVID-19 pandemic, next generation sequencing was performed using the Nextseq and Nanopore platforms. The sequencing data were analyzed and compared with reference sequences. Results In Iran during the first wave, V and L clades were detected. The second wave was recognized by G, GH, and GR clades. Circulating clades during the third wave were GH and GR. In the fourth wave, GRY (alpha variant), GK (delta variant), and one GH clade (beta variant) were detected. All viruses in the fifth wave were in GK clade (delta variant). In the sixth wave, Omicron variant (GRA clade) was circulating. Conclusions Genome sequencing, a key strategy in genomic surveillance systems, helps to detect and monitor the prevalence of SARS-CoV-2 variants, monitor the viral evolution of SARS-CoV-2, identify new variants for disease prevention, control, and treatment, and also provide information for and conduct public health measures in this area. With this system, Iran could be ready for surveillance of other respiratory virus diseases besides influenza and SARS-CoV-2.
Collapse
Affiliation(s)
- Kaveh Sadeghi
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| | - Sevrin Zadheidar
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| | - Arghavan Zebardast
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| | - Ahmad Nejati
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| | - Marziyeh Faraji
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| | - Nastaran Ghavami
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| | - Shirin Kalantari
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| | - Vahid Salimi
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| | - Jila Yavarian
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
- Research Center for Antibiotic Stewardship & Antimicrobial ResistanceTehran University of Medical SciencesTehranIran
| | - Adel Abedi
- Mathematics DepartmentShahid Beheshti UniversityTehranIran
| | | | - Talat Mokhtari‐Azad
- Virology Department, School of Public HealthTehran University of Medical SciencesTehranIran
| |
Collapse
|
5
|
Basu S, Plewczynski D. Computational methods and strategies for combating COVID-19. Methods 2022; 206:99-100. [PMID: 36028161 PMCID: PMC9398558 DOI: 10.1016/j.ymeth.2022.08.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Affiliation(s)
- Subhadip Basu
- Computer Science & Engineering Department, Jadavpur University, Kolkata 700032, India
| | - Dariusz Plewczynski
- Centre of New Technologies, University of Warsaw, Warsaw, Poland; Faculty of Mathematics and Information Sciences, Warsaw University of Technology, Warsaw, Poland.
| |
Collapse
|
6
|
Santoni D, Ghosh N, Saha I. An entropy-based study on mutational trajectory of SARS-CoV-2 in India. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2022; 97:105154. [PMID: 34808395 PMCID: PMC8603812 DOI: 10.1016/j.meegid.2021.105154] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 11/17/2021] [Accepted: 11/17/2021] [Indexed: 01/20/2023]
Abstract
The pandemic of COVID-19 has been haunting us for almost the past two years. Although, the vaccination drive is in full swing throughout the world, different mutations of the SARS-CoV-2 virus are making it very difficult to put an end to the pandemic. The second wave in India, one of the worst sufferers of this pandemic, can be mainly attributed to the Delta variant i.e. B.1.617.2. Thus, it is very important to analyse and understand the mutational trajectory of SARS-CoV-2 through the study of the 26 virus proteins. In this regard, more than 17,000 protein sequences of Indian SARS-CoV-2 genomes are analysed using entropy-based approach in order to find the monthly mutational trajectory. Furthermore, Hellinger distance is also used to show the difference of the mutation events between the consecutive months for each of the 26 SARS-CoV-2 protein. The results show that the mutation rates and the mutation events of the viral proteins though changing in the initial months, start stabilizing later on for mainly the four structural proteins while the non-structural proteins mostly exhibit a more constant trend. As a consequence, it can be inferred that the evolution of the new mutative configurations will eventually reduce.
Collapse
Affiliation(s)
- Daniele Santoni
- Institute for System Analysis and Computer Science "Antonio Ruberti", National Research Council of Italy, Via dei Taurini 19, Rome 00185, Italy.
| | - Nimisha Ghosh
- Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Warsaw, Poland; Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| | - Indrajit Saha
- Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, West Bengal, India
| |
Collapse
|