1
|
Ahsan R, Ebrahimi F, Ebrahimi M. Classification of imbalanced protein sequences with deep-learning approaches; application on influenza A imbalanced virus classes. INFORMATICS IN MEDICINE UNLOCKED 2022. [DOI: 10.1016/j.imu.2022.100860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
2
|
Ahsan R, Tahsili MR, Ebrahimi F, Ebrahimie E, Ebrahimi M. Image processing unravels the evolutionary pattern of SARS-CoV-2 against SARS and MERS through position-based pattern recognition. Comput Biol Med 2021; 134:104471. [PMID: 34004573 PMCID: PMC8106241 DOI: 10.1016/j.compbiomed.2021.104471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 04/27/2021] [Accepted: 05/02/2021] [Indexed: 12/16/2022]
Abstract
SARS-COV-2, Severe Acute Respiratory Syndrome (SARS), and the Middle East respiratory syndrome-related coronavirus (MERS) viruses are from the coronaviridae family; the former became a global pandemic (with low mortality rate) while the latter were confined to a limited region (with high mortality rates). To investigate the possible structural differences at basic levels for the three viruses, genomic and proteomic sequences were downloaded and converted to polynomial datasets. Seven attribute weighting (feature selection) models were employed to find the key differences in their genome's nucleotide sequence. Most attribute weighting models selected the final nucleotide sequences (from 29,000th nucleotide positions to the end of the genome) as significantly different among the three virus classes. The genome and proteome sequences of this hot zone area (which corresponds to the 3'UTR region and encodes for nucleoprotein (N)) and Spike (S) protein sequences (as the most important viral protein) were converted into binary images and were analyzed by image processing techniques and Convolutional deep Neural Network (CNN). Although the predictive accuracy of CNN for Spike (S) proteins was low (0.48%), the machine-based learning algorithms were able to classify the three members of coronaviridae viruses with 100% accuracy based on 3'UTR region. For the first time ever, the relationship between the possible structural differences of coronaviruses at the sequential levels and their pathogenesis are being reported, which paves the road to deciphering the high pathogenicity of the SARS-COV-2 virus.
Collapse
Affiliation(s)
- Reza Ahsan
- Department of Computer Engineering, Qom Branch, Islamic Azad University, Qom, Iran
| | | | - Faezeh Ebrahimi
- Faculty of Life Sciences and Biotechnology, Department of Microbiology and Microbial Biotechnology, Shahid Beheshti University, Tehran, Iran
| | - Esmaeil Ebrahimie
- Genomics Research Platform, School of Life Sciences, College of Science, Health and Engineering, La Trobe University, Melbourne, Victoria, 3086, Australia,School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia
| | - Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran,School of Animal and Veterinary Sciences, The University of Adelaide, South Australia, 5371, Australia,Corresponding author. Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| |
Collapse
|