1
|
Han S, Liu L. GP-HTNLoc: A graph prototype head-tail network-based model for multi-label subcellular localization prediction of ncRNAs. Comput Struct Biotechnol J 2024; 23:2034-2048. [PMID: 38765609 PMCID: PMC11101938 DOI: 10.1016/j.csbj.2024.04.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/17/2024] [Accepted: 04/18/2024] [Indexed: 05/22/2024] Open
Abstract
Numerous research results demonstrated that understanding the subcellular localization of non-coding RNAs (ncRNAs) is pivotal in elucidating their roles and regulatory mechanisms in cells. Despite the existence of over ten computational models dedicated to predicting the subcellular localization of ncRNAs, a majority of these models are designed solely for single-label prediction. In reality, ncRNAs often exhibit localization across multiple subcellular compartments. Furthermore, the existing multi-label localization prediction models are insufficient in addressing the challenges posed by the scarcity of training samples and class imbalance in ncRNA dataset. To address these limitations, this study proposes a novel multi-label localization prediction model for ncRNAs, named GP-HTNLoc. To mitigate class imbalance, GP-HTNLoc adopts separate training approaches for head and tail location labels. Additionally, GP-HTNLoc introduces a pioneering graph prototype module to enhance its performance in small-sample, multi-label scenarios. The experimental results based on 10-fold cross-validation on benchmark datasets demonstrate that GP-HTNLoc achieves competitive predictive performance. The average results from 10 rounds of testing on an independent dataset show that GP-HTNLoc outperforms the best existing models on the human lncRNA, human snoRNA, and human miRNA subsets, with average precision improvements of 31.5%, 14.2%, and 5.6%, respectively, reaching 0.685, 0.632, and 0.704. A user-friendly online GP-HTNLoc server is accessible at https://56s8y85390.goho.co.
Collapse
Affiliation(s)
- Shuangkai Han
- School of Information, Yunnan Normal University, Kunming, China
- Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education of Yunnan Province, China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming, China
- Engineering Research Center of Computer Vision and Intelligent Control Technology, Department of Education of Yunnan Province, China
| |
Collapse
|
2
|
Dixit H, Upadhyay V, Kulharia M, Verma SK. The Study of Metalloproteome of DNA Viruses: Identification, Functional Annotation, and Diversity Analysis of Viral Metal-Binding Proteins. J Proteome Res 2024; 23:4014-4026. [PMID: 39134029 DOI: 10.1021/acs.jproteome.4c00358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Metalloproteins are fundamental to diverse biological processes but still lack extensive investigation in viral contexts. This study reveals the prevalence and functional diversity of metal-binding proteins in DNA viruses. Among a subset of 1432 metalloproteins, zinc and magnesium-binding proteins are notably abundant, indicating their importance in viral biology. Furthermore, significant numbers of proteins binding to iron, manganese, copper, nickel, mercury, and cadmium were also detected. Human-infecting viral proteins displayed a rich landscape of metalloproteins, with MeBiPred (964 proteins) and Pfam (666) yielding the highest numbers. Interestingly, many essential viral proteins exhibited metal-binding capabilities, including polymerases, DNA binding proteins, helicases, dUPTase, thymidine kinase, and various structural and accessory proteins. This study sheds light on the ubiquitous presence of metalloproteins, their functional signatures, subcellular placements, and metal-utilization patterns, providing valuable insights into viral biology. A similar metal utilization pattern was observed in similar functional proteins across the various DNA viruses. Furthermore, these findings provide a foundation for identifying potential drug targets for combating viral infections.
Collapse
Affiliation(s)
- Himisha Dixit
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra, 176206, Himachal Pradesh, India
| | - Vipin Upadhyay
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra, 176206, Himachal Pradesh, India
| | - Mahesh Kulharia
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra, 176206, Himachal Pradesh, India
| | | |
Collapse
|
3
|
Biswas R, Swetha RG, Basu S, Roy A, Ramaiah S, Anbarasu A. Designing multi-epitope vaccine against human cytomegalovirus integrating pan-genome and reverse vaccinology pipelines. Biologicals 2024; 87:101782. [PMID: 39003966 DOI: 10.1016/j.biologicals.2024.101782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 05/13/2024] [Accepted: 07/08/2024] [Indexed: 07/16/2024] Open
Abstract
Human cytomegalovirus (HCMV) is accountable for high morbidity in neonates and immunosuppressed individuals. Due to the high genetic variability of HCMV, current prophylactic measures are insufficient. In this study, we employed a pan-genome and reverse vaccinology approach to screen the target for efficient vaccine candidates. Four proteins, envelope glycoprotein M, UL41A, US23, and US28, were shortlisted based on cellular localization, high solubility, antigenicity, and immunogenicity. A total of 29 B-cell and 44 T-cell highly immunogenic and antigenic epitopes with high global population coverage were finalized using immunoinformatics tools and algorithms. Further, the epitopes that were overlapping among the finalized B-cell and T-cell epitopes were linked with suitable linkers to form various combinations of multi-epitopic vaccine constructs. Among 16 vaccine constructs, Vc12 was selected based on physicochemical and structural properties. The docking and molecular simulations of VC12 were performed, which showed its high binding affinity (-23.35 kcal/mol) towards TLR4 due to intermolecular hydrogen bonds, salt bridges, and hydrophobic interactions, and there were only minimal fluctuations. Furthermore, Vc12 eliciting a good response was checked for its expression in Escherichia coli through in silico cloning and codon optimization, suggesting it to be a potent vaccine candidate.
Collapse
Affiliation(s)
- Rhitam Biswas
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, 632014, Tamil Nadu, India; Department of Biotechnology, SBST, VIT, Vellore, 632014, Tamil Nadu, India
| | - Rayapadi G Swetha
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, 632014, Tamil Nadu, India; Department of Biosciences, SBST, VIT, Vellore, 632014, Tamil Nadu, India
| | - Soumya Basu
- Department of Biotechnology, NIST University, Berhampur, 761008, Odisha, India
| | - Aditi Roy
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, 632014, Tamil Nadu, India; Department of Biotechnology, SBST, VIT, Vellore, 632014, Tamil Nadu, India
| | - Sudha Ramaiah
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, 632014, Tamil Nadu, India; Department of Biosciences, SBST, VIT, Vellore, 632014, Tamil Nadu, India
| | - Anand Anbarasu
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, 632014, Tamil Nadu, India; Department of Biotechnology, SBST, VIT, Vellore, 632014, Tamil Nadu, India.
| |
Collapse
|
4
|
Roy A, Swetha RG, Basu S, Biswas R, Ramaiah S, Anbarasu A. Integrating pan-genome and reverse vaccinology to design multi-epitope vaccine against Herpes simplex virus type-1. 3 Biotech 2024; 14:176. [PMID: 38855144 PMCID: PMC11153438 DOI: 10.1007/s13205-024-04022-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 05/27/2024] [Indexed: 06/11/2024] Open
Abstract
Herpes simplex virus type-1 (HSV-1), the etiological agent of sporadic encephalitis and recurring oral (sometimes genital) infections in humans, affects millions each year. The evolving viral genome reduces susceptibility to existing antivirals and, thus, necessitates new therapeutic strategies. Immunoinformatics strategies have shown promise in designing novel vaccine candidates in the absence of a clinically licensed vaccine to prevent HSV-1. However, to encourage clinical translation, the HSV-1 pan-genome was integrated with the reverse-vaccinology pipeline for rigorous screening of universal vaccine candidates. Viral targets were screened from 104 available complete genomes. Among 364 proteins, envelope glycoprotein D being an outer membrane protein with a high antigenicity score (> 0.4) and solubility (> 0.6) was selected for epitope screening. A total of 17 T-cell and 4 B-cell epitopes with highly antigenic, immunogenic, non-toxic properties and high global population coverage were identified. Furthermore, 8 vaccine constructs were designed using different combinations of epitopes and suitable linkers. VC-8 was identified as the most potential vaccine candidate regarding chemical and structural stability. Molecular docking revealed high interactive affinity (low binding energy: - 56.25 kcal/mol) of VC-8 with the target elicited by firm intermolecular H-bonds, salt-bridges, and hydrophobic interactions, which was validated with simulations. Compatibility of the vaccine candidate to be expressed in pET-29(a) + plasmid was established by in silico cloning studies. Immune simulations confirmed the potential of VC-8 to trigger robust B-cell, T-cell, cytokine, and antibody-mediated responses, thereby suggesting a promising candidate for the future of HSV-1 prevention. Supplementary Information The online version contains supplementary material available at 10.1007/s13205-024-04022-6.
Collapse
Affiliation(s)
- Aditi Roy
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, Tamil Nadu 632014 India
- Department of Biotechnology, SBST, VIT, Vellore, Tamil Nadu 632014 India
| | - Rayapadi G. Swetha
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, Tamil Nadu 632014 India
- Department of Biosciences, SBST, VIT, Vellore, Tamil Nadu 632014 India
| | - Soumya Basu
- Department of Biotechnology, NIST University, Berhampur, Odisha 761008 India
| | - Rhitam Biswas
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, Tamil Nadu 632014 India
- Department of Biotechnology, SBST, VIT, Vellore, Tamil Nadu 632014 India
| | - Sudha Ramaiah
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, Tamil Nadu 632014 India
- Department of Biosciences, SBST, VIT, Vellore, Tamil Nadu 632014 India
| | - Anand Anbarasu
- Medical and Biological Computing Laboratory, School of Biosciences and Technology (SBST), Vellore Institute of Technology (VIT), Vellore, Tamil Nadu 632014 India
- Department of Biotechnology, SBST, VIT, Vellore, Tamil Nadu 632014 India
| |
Collapse
|
5
|
Praveen M. Characterizing the West Nile Virus's polyprotein from nucleotide sequence to protein structure - Computational tools. J Taibah Univ Med Sci 2024; 19:338-350. [PMID: 38304694 PMCID: PMC10831166 DOI: 10.1016/j.jtumed.2024.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 11/27/2023] [Accepted: 01/08/2024] [Indexed: 02/03/2024] Open
Abstract
Objectives West Nile virus (WNV) belongs to the Flaviviridae family and causes West Nile fever. The mechanism of transmission involves the culex mosquito species. Infected individuals are primarily asymptomatic, and few exhibit common symptoms. Moreover, 10 % of neuronal infection caused by this virus cause death. The proteins encoded by these genes had been uncharacterized, although understanding their function and structure is important for formulating antiviral drugs. Methods Herein, we used in silico approaches, including various bioinformatic tools and databases, to analyse the proteins from the WNV polyprotein individually. The characterization included GC content, physicochemical properties, conserved domains, soluble and transmembrane regions, signal localization, protein disorder, and secondary structure features and their respective 3D protein structures. Results Among 11 proteins, eight had >50 % GC content, eight proteins had basic pI values, three proteins were unstable under in vitro conditions, four were thermostable according to >100 AI values and some had negative GRAVY values in physicochemical analyses. All protein-conserved domains were shared among Flaviviridae family members. Five proteins were soluble and lacked transmembrane regions. Two proteins had signals for localization in the host endoplasmic reticulum. Non-structural (NS) 2A showed low protein disorder. The secondary structural features and tertiary structure models provide a valuable biochemical resource for designing selective substrates and synthetic inhibitors. Conclusions WNV proteins NS2A, NS2B, PM, NS3 and NS5 can be used as drug targets for the pharmacological design of lead antiviral compounds.
Collapse
Affiliation(s)
- Mallari Praveen
- Department of Zoology, Indira Gandhi National Tribal University, Amarkantak, Madhya Pradesh, India
| |
Collapse
|
6
|
Xiao H, Zou Y, Wang J, Wan S. A Review for Artificial Intelligence Based Protein Subcellular Localization. Biomolecules 2024; 14:409. [PMID: 38672426 PMCID: PMC11048326 DOI: 10.3390/biom14040409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 03/21/2024] [Accepted: 03/25/2024] [Indexed: 04/28/2024] Open
Abstract
Proteins need to be located in appropriate spatiotemporal contexts to carry out their diverse biological functions. Mislocalized proteins may lead to a broad range of diseases, such as cancer and Alzheimer's disease. Knowing where a target protein resides within a cell will give insights into tailored drug design for a disease. As the gold validation standard, the conventional wet lab uses fluorescent microscopy imaging, immunoelectron microscopy, and fluorescent biomarker tags for protein subcellular location identification. However, the booming era of proteomics and high-throughput sequencing generates tons of newly discovered proteins, making protein subcellular localization by wet-lab experiments a mission impossible. To tackle this concern, in the past decades, artificial intelligence (AI) and machine learning (ML), especially deep learning methods, have made significant progress in this research area. In this article, we review the latest advances in AI-based method development in three typical types of approaches, including sequence-based, knowledge-based, and image-based methods. We also elaborately discuss existing challenges and future directions in AI-based method development in this research field.
Collapse
Affiliation(s)
- Hanyu Xiao
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Yijin Zou
- College of Veterinary Medicine, China Agricultural University, Beijing 100193, China;
| | - Jieqiong Wang
- Department of Neurological Sciences, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| | - Shibiao Wan
- Department of Genetics, Cell Biology and Anatomy, College of Medicine, University of Nebraska Medical Center, Omaha, NE 68198, USA;
| |
Collapse
|
7
|
Wang C, Wang Y, Ding P, Li S, Yu X, Yu B. ML-FGAT: Identification of multi-label protein subcellular localization by interpretable graph attention networks and feature-generative adversarial networks. Comput Biol Med 2024; 170:107944. [PMID: 38215617 DOI: 10.1016/j.compbiomed.2024.107944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/08/2023] [Accepted: 01/01/2024] [Indexed: 01/14/2024]
Abstract
The prediction of multi-label protein subcellular localization (SCL) is a pivotal area in bioinformatics research. Recent advancements in protein structure research have facilitated the application of graph neural networks. This paper introduces a novel approach termed ML-FGAT. The approach begins by extracting node information of proteins from sequence data, physical-chemical properties, evolutionary insights, and structural details. Subsequently, various evolutionary techniques are integrated to consolidate multi-view information. A linear discriminant analysis framework, grounded on entropy weight, is then employed to reduce the dimensionality of the merged features. To enhance the robustness of the model, the training dataset is augmented using feature-generative adversarial networks. For the primary prediction step, graph attention networks are employed to determine multi-label protein SCL, leveraging both node and neighboring information. The interpretability is enhanced by analyzing the attention weight parameters. The training is based on the Gram-positive bacteria dataset, while validation employs newly constructed datasets: human, virus, Gram-negative bacteria, plant, and SARS-CoV-2. Following a leave-one-out cross-validation procedure, ML-FGAT demonstrates noteworthy superiority in this domain.
Collapse
Affiliation(s)
- Congjing Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Yifei Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Pengju Ding
- College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Shan Li
- School of Mathematics and Statistics, Central South University, Changsha, 410083, China
| | - Xu Yu
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China
| | - Bin Yu
- School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Data Science, University of Science and Technology of China, Hefei, 230027, China.
| |
Collapse
|
8
|
Hossain MI, Asha AT, Hossain MA, Mahmud S, Chowdhury K, Mohiuddin RB, Nahar N, Sarker S, Napis S, Hossain MS, Mohiuddin A. Investigating the role of hypothetical protein (AAB33144.1) in HIV-1 virus pathogenicity: A comparative study with FDA-Approved inhibitor compounds through In silico analysis and molecular docking. Heliyon 2024; 10:e23183. [PMID: 38163140 PMCID: PMC10755284 DOI: 10.1016/j.heliyon.2023.e23183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Accepted: 11/28/2023] [Indexed: 01/03/2024] Open
Abstract
Aim and objective Due to the a lot of unexplored proteins in HIV-1, this research aimed to explore the functional roles of a hypothetical protein (AAB33144.1) that might play a key role in HIV-1 pathogenicity. Methods The homologous protein was identified along with building and validating the 3D structure by searching several bioinformatics tools. Results Retroviral aspartyl protease and retropepsin like functional domains and motifs, folding pattern (cupredoxins), and subcellular localization in cytoplasmic membrane were determined as biological activity. Besides, the functional annotation revealed that the chosen hypothetical protein possessed protease-like activity. To validate our generated protein 3D structure, molecular docking was performed with five compounds where nelfinavir showed (-8.2 kcal/mol) best binding affinity against HXB2 viral protease (PDB ID: 7SJX) and main protease (PDB ID: 4EYR) protein. Conclusions This study suggests that the annotated hypothetical protein related to protease action, which may be useful in viral genetics and drug discovery.
Collapse
Affiliation(s)
- Md. Imran Hossain
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh
| | - Anika Tabassum Asha
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh
| | - Md. Arju Hossain
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh
| | - Shahin Mahmud
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh
| | - Kamal Chowdhury
- Biology Department, Claflin University, 400 Magnolia St, Orangeburg, SC 29115, USA
| | - Ramisa Binti Mohiuddin
- Department of Pharmacy, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh
| | - Nazneen Nahar
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh
| | - Saborni Sarker
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh
| | - Suhaimi Napis
- Department of Cell and Molecular Biology, Faculty of Biotechnology and Biomolecular Sciences, Universiti Putra Malaysia, 43400 Serdang, Selangor D.E., Malaysia
| | - Md Sanower Hossain
- Centre for Sustainability of Mineral and Resource Recovery Technology (Pusat SMaRRT), Universiti Malaysia Pahang Al-Sultan Abdullah, Kuantan 26300, Malaysia
| | - A.K.M. Mohiuddin
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, 1902, Bangladesh
| |
Collapse
|
9
|
Khan S, Irfan M, Hameed AR, Ullah A, Abideen SA, Ahmad S, Haq MU, El Bakri Y, Al-Harbi AI, Ali M, Haleem A. Vaccinomics to design a multi-epitope-based vaccine against monkeypox virus using surface-associated proteins. J Biomol Struct Dyn 2023; 41:10859-10868. [PMID: 36533379 DOI: 10.1080/07391102.2022.2158942] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 12/10/2022] [Indexed: 12/23/2022]
Abstract
In 2022, the ongoing multi-country outbreak of monkeypox virus-now occurring outside Africa, too is a global health concern. Monkeypox is a zoonotic virus, which causes disease mainly in animals, and then it is transferred to humans. Recently, in the monkeypox epidemic, a large number of human cases emerged while the global health community worked to tackle the outbreak and save lives. Herein, a multi-epitope-based vaccine is designed against monkeypox virus using two surface-associated proteins: MPXVgp002 accession number > YP_010377003.1 and MPXVgp008 accession number > YP_010377007.1 proteins. These proteins were utilized for B- and T-cell epitopes prediction. The epitopes were further screened, and the screen filtered KCKDNEYRSR, RSCNTTHNR, and RTRRETGAS with the antigenicity scores of 0.5279, 0.5604, and 0.7628, respectively. Overall, the epitopes can induce immunity in 99.74% population of the world. Further, GPGPG linkers were used for joining the epitopes and EAAAK linker was used for adjuvant attachment. It has a three-dimensional structure modelled for retaining the structural stability. Three pairs of amino acid residues that were able to make disulfide bonds were chosen: Gly1-Ser82, Cys7-Tyr10, and Phe51-Ile55. Molecular docking of vaccine was done with toll-like receptors, viz., 2, 3, 4, and 8 immune cell receptors. The docking results revealed that the vaccine as potential molecule due to its better binding affinity with toll-like receptors 2, 3, 4 and 8. Top complex in docking in with each receptor was selected based on lowest energy scores- -888.7 kcal/mol (TLR-2), -976.3 kcal/mol (TLR-3), -801.9 kcal/mol (TLR-4), and -955.4 kcal/mol (TLR-4)-were subjected to simulation. The docked complexes were evaluated in 500 ns of MD simulation. Throughout the simulation time, no significant deviation occurred. This confirmed that the vaccine as potential vaccine candidate to interact with immune cell receptors. This interaction is important for the immune system activation. In conclusion, the proposed vaccine construct against monkeypox could induce an effective immune response and speed up the vaccine development process. However, the study is completely based on the computational approach, hence, the experimental validation is required.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Saifullah Khan
- Institute of Biotechnology and Microbiology, Bacha Khan University, Charsadda, Pakistan
| | - Muhammad Irfan
- Department of Oral Biology, College of Dentistry, University of Florida, Gainesville, Florida, USA
| | - Alaa R Hameed
- Department of Medical Laboratory Techniques, School of Life Sciences, Dijlah University College, Baghdad, Iraq
| | - Asad Ullah
- Department of Health and Biological Sciences, Abasyn University, Peshawar, Pakistan
| | - Syed Ainul Abideen
- School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Sajjad Ahmad
- Department of Health and Biological Sciences, Abasyn University, Peshawar, Pakistan
| | - Mahboob Ul Haq
- Department of Pharmacy, Abasyn University, Peshawar, Pakistan
| | - Youness El Bakri
- Department of Theoretical and Applied Chemistry, South Ural State University, Chelyabinsk, Russian Federation
| | - Alhanouf I Al-Harbi
- Department of Medical Laboratory, College of Applied Medical Sciences, Taibah University, Yanbu, Saudi Arabia
| | - Mahwish Ali
- Department of Biological Science, National University of Medical Sciences, Rawalpindi, Pakistan
| | - Abdul Haleem
- Department of Microbiology, Quaid-i-Azam University, Islamabad, Pakistan
| |
Collapse
|
10
|
Beltrán JF, Belén LH, Farias JG, Zamorano M, Lefin N, Miranda J, Parraguez-Contreras F. VirusHound-I: prediction of viral proteins involved in the evasion of host adaptive immune response using the random forest algorithm and generative adversarial network for data augmentation. Brief Bioinform 2023; 25:bbad434. [PMID: 38033292 PMCID: PMC10753651 DOI: 10.1093/bib/bbad434] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 10/18/2023] [Accepted: 11/05/2023] [Indexed: 12/02/2023] Open
Abstract
Throughout evolution, pathogenic viruses have developed different strategies to evade the response of the adaptive immune system. To carry out successful replication, some pathogenic viruses encode different proteins that manipulate the molecular mechanisms of host cells. Currently, there are different bioinformatics tools for virus research; however, none of them focus on predicting viral proteins that evade the adaptive system. In this work, we have developed a novel tool based on machine and deep learning for predicting this type of viral protein named VirusHound-I. This tool is based on a model developed with the multilayer perceptron algorithm using the dipeptide composition molecular descriptor. In this study, we have also demonstrated the robustness of our strategy for data augmentation of the positive dataset based on generative antagonistic networks. During the 10-fold cross-validation step in the training dataset, the predictive model showed 0.947 accuracy, 0.994 precision, 0.943 F1 score, 0.995 specificity, 0.896 sensitivity, 0.894 kappa, 0.898 Matthew's correlation coefficient and 0.989 AUC. On the other hand, during the testing step, the model showed 0.964 accuracy, 1.0 precision, 0.967 F1 score, 1.0 specificity, 0.936 sensitivity, 0.929 kappa, 0.931 Matthew's correlation coefficient and 1.0 AUC. Taking this model into account, we have developed a tool called VirusHound-I that makes it possible to predict viral proteins that evade the host's adaptive immune system. We believe that VirusHound-I can be very useful in accelerating studies on the molecular mechanisms of evasion of pathogenic viruses, as well as in the discovery of therapeutic targets.
Collapse
Affiliation(s)
- Jorge F Beltrán
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | | | - Jorge G Farias
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Mauricio Zamorano
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Nicolás Lefin
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Javiera Miranda
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| | - Fernanda Parraguez-Contreras
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| |
Collapse
|
11
|
Muhammad SA, Guo J, Noor K, Mustafa A, Amjad A, Bai B. Pangenomic and immunoinformatics based analysis of Nipah virus revealed CD4 + and CD8 + T-Cell epitopes as potential vaccine candidates. Front Pharmacol 2023; 14:1290436. [PMID: 38035008 PMCID: PMC10682379 DOI: 10.3389/fphar.2023.1290436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 10/31/2023] [Indexed: 12/02/2023] Open
Abstract
Introduction: Nipah (NiV) is the zoonotic deadly bat-borne virus that causes neurological and respiratory infections which ultimately lead to death. There are 706 infected cases reported up till now especially in Asia, out of which 409 patients died. There is no vaccine and effective treatment available for NiV infections and we have to timely design such strategies as world could not bear another pandemic situation. Methods: In this study, we screened viral proteins of NiV strains based on pangenomics analysis, antigenicity, molecular weight, and sub-cellular localization. The immunoproteomics based approach was used to predict T-cell epitopes of MHC class-I and II as potential vaccine candidates. These epitopes are capable to activate CD4+, CD8+, and T-cell dependent B-lymphocytes. Results: The two surface proteins including fusion glycoprotein (F) and attachment glycoprotein (G) are antigenic with molecular weights of 60 kDa and 67 kDa respectively. Three epitopes of F protein (VNYNSEGIA, PNFILVRNT, and IKMIPNVSN) were ranked and selected based on the binding affinity with MHC class-I, and 3 epitopes (VILNKRYYS, ILVRNTLIS, and VKLQETAEK) with MHC-II molecules. Similarly, for G protein, 3 epitopes each for MHC-I (GKYDKVMPY, ILKPKLISY, and KNKIWCISL) and MHC-II (LRNIEKGKY, FLIDRINWI, and FLLKNKIWC) with substantial binding energies were predicted. Based on the physicochemical properties, all these epitopes are non-toxic, hydrophilic, and stable. Conclusion: Our vaccinomics and system-level investigation could help to trigger the host immune system to prevent NiV infection.
Collapse
Affiliation(s)
- Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Jinlei Guo
- School of Intelligent Medical Engineering, Sanquan College of Xinxiang Medical University, Xinxiang, China
| | - Komal Noor
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Aymen Mustafa
- University of Health Sciences Lahore, Lahore, Pakistan
| | - Anam Amjad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan, Pakistan
| | - Baogang Bai
- School of Information and Technology, Wenzhou Business College, Wenzhou, China
- Zhejiang Province Engineering Research Center of Intelligent Medicine, Wenzhou, China
- The 1st School of Medical, School of Information and Engineering, The 1st Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
12
|
Chaudhuri D, Majumder S, Datta J, Giri K. In silico designing of an epitope-based peptide vaccine cocktail against Nipah virus: an Indian population-based epidemiological study. Arch Microbiol 2023; 205:380. [PMID: 37955744 DOI: 10.1007/s00203-023-03717-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 10/09/2023] [Accepted: 10/21/2023] [Indexed: 11/14/2023]
Abstract
Nipah virus, a zoonotic virus from the family Paramyxoviridae has led to significant loss of lives till date with the most recent outbreak in India reported in Kerala. The virus has a considerably high mortality rate along with lack of characteristic symptoms which results in the delay of the virus detection. No specific vaccine is available for the virus although monoclonal antibody treatment has been seen to be effective along with favipiravir. The high mortality and complications caused by the virus underscores the necessity to develop alternative modes of vaccination. One such method has been designed in this study using peptide cocktail consisting of the immunologically important epitopes for use as vaccine. The human leucocytic antigens that are used for the study were analyzed for their presence in various ethnic Indian populations. This study may serve as a new avenue for development of more efficient peptide cocktail vaccines in recent future based on the population genetics and ethnicity.
Collapse
Affiliation(s)
- Dwaipayan Chaudhuri
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India
| | - Satyabrata Majumder
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India
| | - Joyeeta Datta
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India
| | - Kalyan Giri
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata, 700073, India.
| |
Collapse
|
13
|
Koçkaya ES, Can H, Yaman Y, Ün C. In silico discovery of epitopes of gag and env proteins for the development of a multi-epitope vaccine candidate against Maedi Visna Virus using reverse vaccinology approach. Biologicals 2023; 84:101715. [PMID: 37793308 DOI: 10.1016/j.biologicals.2023.101715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 08/28/2023] [Accepted: 09/25/2023] [Indexed: 10/06/2023] Open
Abstract
Maedi Visna Virus (MVV) causes a chronic viral disease in sheep. Since there is no specific therapeutic drug that targets MVV, development of a vaccine against the MVV is inevitable. This study aimed to analyze the gag and env proteins as vaccine candidate proteins and to identify epitopes in these proteins. In addition, it was aimed to construct a multi-epitope vaccine candidate. According to the obtained results, the gag protein was detected to be more conserved and had a higher antigenicity value. Also, the number of alpha helix in the secondary structure was higher and transmembrane helices were not detected. Although many B cell and MHC-I/II epitopes were predicted, only 19 of them were detected to have the properties of antigenic, non-allergenic, non-toxic, soluble, and non-hemolytic. Of these epitopes, five were remarkable due to having the highest antigenicity value. However, the final multi-epitope vaccine was constructed with 19 epitopes. A strong affinity was shown between the final multi-epitope vaccine and TLR-2/4. In conclusion, the gag protein was a better antigen. However, both proteins had epitopes with high antigenicity value. Also, the final multi-epitope vaccine construct had a potential to be used as a peptide vaccine due to its immuno-informatics results.
Collapse
Affiliation(s)
- Ecem Su Koçkaya
- Ege University Faculty of Science Department of Biology Molecular Biology Section, İzmir, Türkiye
| | - Hüseyin Can
- Ege University Faculty of Science Department of Biology Molecular Biology Section, İzmir, Türkiye
| | - Yalçın Yaman
- Siirt University Faculty of Veterinary Medicine, Department of Genetics, Siirt, Türkiye
| | - Cemal Ün
- Ege University Faculty of Science Department of Biology Molecular Biology Section, İzmir, Türkiye.
| |
Collapse
|
14
|
Ismail M, Bai B, Guo J, Bai Y, Sajid Z, Muhammad SA, Shaikh RS. Experimental Validation of MHC Class I and II Peptide-Based Potential Vaccine Candidates for Human Papilloma Virus Using Sprague-Dawly Models. Molecules 2023; 28:1687. [PMID: 36838675 PMCID: PMC9968051 DOI: 10.3390/molecules28041687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 01/09/2023] [Accepted: 02/01/2023] [Indexed: 02/12/2023] Open
Abstract
Human papilloma virus (HPV) causes cervical and many other cancers. Recent trend in vaccine design is shifted toward epitope-based developments that are more specific, safe, and easy to produce. In this study, we predicted eight immunogenic peptides of CD4+ and CD8+ T-lymphocytes (MHC class I and II as M1 and M2) including early proteins (E2 and E6), major (L1) and minor capsid protein (L2). Male and female Sprague Dawly rats in groups were immunized with each synthetic peptide. L1M1, L1M2, L2M1, and L2M2 induced significant immunogenic response compared to E2M1, E2M2, E6M1 and E6M2. We observed optimal titer of IgG antibodies (>1.25 g/L), interferon-γ (>64 ng/L), and granzyme-B (>40 pg/mL) compared to control at second booster dose (240 µg/500 µL). The induction of peptide-specific IgG antibodies in immunized rats indicates the T-cell dependent B-lymphocyte activation. A substantial CD4+ and CD8+ cell count was observed at 240 µg/500 µL. In male and female rats, CD8+ cell count for L1 and L2 peptide is 3000 and 3118, and CD4+ is 3369 and 3484 respectively compared to control. In conclusion, we demonstrated that L1M1, L1M2, L2M1, L2M2 are likely to contain potential epitopes for induction of immune responses supporting the feasibility of peptide-based vaccine development for HPV.
Collapse
Affiliation(s)
- Mehreen Ismail
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan 60800, Pakistan
| | - Baogang Bai
- School of Information and Technology, Wenzhou Business College, Wenzhou 325015, China
- Engineering Research Center of Intelligent Medicine, Wenzhou 325000, China
- The 1st School of Medical, School of Information and Engineering, The 1st Affiliated Hospital of Wenzhou Medical University, Wenzhou 325015, China
| | - Jinlei Guo
- School of Medical Engineering, Sanquan College of Xinxiang Medical University, Xinxiang 453513, China
| | - Yuhui Bai
- Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Zureesha Sajid
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan 60800, Pakistan
| | - Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan 60800, Pakistan
| | - Rehan Sadiq Shaikh
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University, Multan 60800, Pakistan
- Centre for Applied Molecular Biology, University of the Punjab, Lahore 54000, Pakistan
| |
Collapse
|
15
|
Dixit H, Kulharia M, Verma SK. Metalloproteome of human-infective RNA viruses: a study towards understanding the role of metal ions in virology. Pathog Dis 2023; 81:ftad020. [PMID: 37653445 DOI: 10.1093/femspd/ftad020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/07/2023] [Accepted: 08/29/2023] [Indexed: 09/02/2023] Open
Abstract
Metalloproteins and metal-based inhibitors have been shown to effectively combat infectious diseases, particularly those caused by RNA viruses. In this study, a diverse set of bioinformatics methods was employed to identify metal-binding proteins of human RNA viruses. Seventy-three viral proteins with a high probability of being metal-binding proteins were identified. These proteins included 40 zinc-, 47 magnesium- and 14 manganese-binding proteins belonging to 29 viral species and eight significant viral families, including Coronaviridae, Flaviviridae and Retroviridae. Further functional characterization has revealed that these proteins play a critical role in several viral processes, including viral replication, fusion and host viral entry. They fall under the essential categories of viral proteins, including polymerase and protease enzymes. Magnesium ion is abundantly predicted to interact with these viral enzymes, followed by zinc. In addition, this study also examined the evolutionary aspects of predicted viral metalloproteins, offering essential insights into the metal utilization patterns among different viral species. The analysis indicates that the metal utilization patterns are conserved within the functional classes of the proteins. In conclusion, the findings of this study provide significant knowledge on viral metalloproteins that can serve as a valuable foundation for future research in this area.
Collapse
Affiliation(s)
- Himisha Dixit
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra 176206, Himachal Pradesh, India
| | - Mahesh Kulharia
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra 176206, Himachal Pradesh, India
| | - Shailender Kumar Verma
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra 176206, Himachal Pradesh, India
- Department of Environmental Studies, University of Delhi 110007, Delhi, India
| |
Collapse
|
16
|
Dixit H, Upadhyay V, Kulharia M, Verma SK. The putative metal-binding proteome of the Coronaviridae family. METALLOMICS : INTEGRATED BIOMETAL SCIENCE 2023; 15:6969429. [PMID: 36610727 DOI: 10.1093/mtomcs/mfad001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 12/28/2022] [Indexed: 01/09/2023]
Abstract
Metalloproteins are well-known for playing various physicochemical processes in all life forms, including viruses. Some life-threatening viruses (such as some members of the Coronaviridae family of viruses) are emerged and remerged frequently and are rapidly transmitted throughout the globe. This study aims to identify and characterize the metal-binding proteins (MBPs) of the Coronaviridae family of viruses and further provides insight into the MBP's role in sustaining and propagating viruses inside a host cell and in the outer environment. In this study, the available proteome of the Coronaviridae family was exploited. Identified potential MBPs were analyzed for their functional domains, structural aspects, and subcellular localization. We also demonstrate phylogenetic aspects of all predicted MBPs among other Coronaviridae family members to understand the evolutionary trend among their respective hosts. A total of 256 proteins from 51 different species of coronaviruses are predicted as MBPs. These MBPs perform various key roles in the replication and survival of viruses within the host cell. Cysteine, aspartic acid, threonine, and glutamine are key amino acid residues interacting with respective metal ions. Our observations also indicate that the metalloproteins of this family of viruses circulated and evolved in different hosts, which supports the zoonotic nature of coronaviruses. The comprehensive information on MBPs of the Coronaviridae family may be further helpful in designing novel therapeutic metalloprotein targets. Moreover, the study of viral MBPs can also help to understand the roles of MBPs in virus pathogenesis and virus-host interactions.
Collapse
Affiliation(s)
- Himisha Dixit
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra176206, India
| | - Vipin Upadhyay
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra176206, India
| | - Mahesh Kulharia
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra176206, India
| | - Shailender Kumar Verma
- Centre for Computational Biology & Bioinformatics, Central University of Himachal Pradesh, Kangra176206, India.,Department of Environmental Studies, University of Delhi, Delhi110007, India
| |
Collapse
|
17
|
Characterization of a putative novel higrevirus infecting Phellodendron amurense Rupr. in China. Arch Virol 2023; 168:58. [PMID: 36617592 DOI: 10.1007/s00705-022-05676-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 10/26/2022] [Indexed: 01/10/2023]
Abstract
Phellodendron-associated higre-like virus (PaHLV) was identified in Phellodendron amurense Rupr. in China. Three near-full-length sequences of the viral genomic RNAs (RNA1-RNA3) were first obtained by RNA-seq, and their complete sequences were then determined by RT-PCR, 5'-RACE, and 3'-RACE. RNA1-3 of PaHLV were determined to be 8,183, 3,062, and 3,998 nucleotides (nt) in length, respectively, excluding the poly(A) tail. All of the viral proteins encoded by PaHLV shared the highest amino acid sequence identity (44.8-78.1%) with the unclassified kitavirid pistachio virus X (PiVX, MT334618-MT334620) from Iranian pistachio. Sequence comparisons and phylogenetic analysis also showed PiVX to be the closest relative of PaHLV and supported their inclusion in the genus Higrevirus, family Kitaviridae. Thus, PaHLV is proposed to be a member of a new species in this genus, for which we suggest the binomial name "Higrevirus amur".
Collapse
|
18
|
Multi-Epitope Vaccine for Monkeypox Using Pan-Genome and Reverse Vaccinology Approaches. Viruses 2022; 14:v14112504. [PMID: 36423113 PMCID: PMC9695528 DOI: 10.3390/v14112504] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/09/2022] [Accepted: 11/10/2022] [Indexed: 11/16/2022] Open
Abstract
Outbreaks of monkeypox virus infections have imposed major health concerns worldwide, with high morbidity threats to children and immunocompromised adults. Although repurposed drugs and vaccines are being used to curb the disease, the evolving traits of the virus, exhibiting considerable genetic dynamicity, challenge the limits of a targeted treatment. A pan-genome-based reverse vaccinology approach can provide fast and efficient solutions to resolve persistent inconveniences in experimental vaccine design during an outbreak-exigency. The approach encompassed screening of available monkeypox whole genomes (n = 910) to identify viral targets. From 102 screened viral targets, viral proteins L5L, A28, and L5 were finalized based on their location, solubility, and antigenicity. The potential T-cell and B-cell epitopes were extracted from the proteins using immunoinformatics tools and algorithms. Multiple vaccine constructs were designed by combining the epitopes. Based on immunological properties, chemical stability, and structural quality, a novel multi-epitopic vaccine construct, V4, was finalized. Flexible-docking and coarse-dynamics simulation portrayed that the V4 had high binding affinity towards human HLA-proteins (binding energy < -15.0 kcal/mol) with low conformational fluctuations (<1 Å). Thus, the vaccine construct (V4) may act as an efficient vaccine to induce immunity against monkeypox, which encourages experimental validation and similar approaches against emerging viral infections.
Collapse
|
19
|
Zhao Y, Wu P, Liu L, Ma B, Pan M, Huang Y, Du N, Yu H, Sui L, Wang ZD, Hou Z, Liu Q. Characterization and subcellular localization of Alongshan virus proteins. Front Microbiol 2022; 13:1000322. [PMID: 36238596 PMCID: PMC9551281 DOI: 10.3389/fmicb.2022.1000322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 09/05/2022] [Indexed: 11/20/2022] Open
Abstract
Alongshan virus (ALSV) in the Jingmenvirus group within the family Flaviviridae is a newly discovered tick-borne virus associated with human disease, whose genome includes four segments and encodes four structural proteins (VP1a, VP1b, VP2, VP3, and VP4) and two non-structural proteins (NSP1 and NSP2). Here, we characterized the subcellular distribution and potential function of ALSV proteins in host cells. We found that viral proteins exhibited diverse subcellular distribution in multiple tissue-deriving cells and induced various morphological changes in the endoplasmic reticulum (ER), and NSP2, VP1b, VP2, and VP4 were all co-localized in the ER. The nuclear transfer and co-localization of VP4 and calnexin (a marker protein of ER), which were independent of their interaction, were unique to HepG2 cells. Expression of NSP1 could significantly reduce mitochondria quantity by inducing mitophagy. These findings would contribute to better understanding of the pathogenesis of emerging segmented flaviviruses.
Collapse
Affiliation(s)
- Yinghua Zhao
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Ping Wu
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Li Liu
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Baohua Ma
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Mingming Pan
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Yuan Huang
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Nianyan Du
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Hongyan Yu
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
| | - Liyan Sui
- Department of Infectious Diseases, Center of Infectious Diseases and Pathogen Biology, Key Laboratory of Organ Regeneration and Transplantation of the Ministry of Education, The First Hospital of Jilin University, Changchun, China
| | - Ze-Dong Wang
- Department of Infectious Diseases, Center of Infectious Diseases and Pathogen Biology, Key Laboratory of Organ Regeneration and Transplantation of the Ministry of Education, The First Hospital of Jilin University, Changchun, China
| | - Zhijun Hou
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
- *Correspondence: Zhijun Hou,
| | - Quan Liu
- College of Wildlife and Protected Area, Northeast Forestry University, Harbin, China
- Department of Infectious Diseases, Center of Infectious Diseases and Pathogen Biology, Key Laboratory of Organ Regeneration and Transplantation of the Ministry of Education, The First Hospital of Jilin University, Changchun, China
- School of Life Sciences and Engineering, Foshan University, Foshan, China
- Quan Liu,
| |
Collapse
|
20
|
Abbasi BA, Saraf D, Sharma T, Sinha R, Singh S, Sood S, Gupta P, Gupta A, Mishra K, Kumari P, Rawal K. Identification of vaccine targets & design of vaccine against SARS-CoV-2 coronavirus using computational and deep learning-based approaches. PeerJ 2022; 10:e13380. [PMID: 35611169 PMCID: PMC9124463 DOI: 10.7717/peerj.13380] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Accepted: 04/13/2022] [Indexed: 01/13/2023] Open
Abstract
An unusual pneumonia infection, named COVID-19, was reported on December 2019 in China. It was reported to be caused by a novel coronavirus which has infected approximately 220 million people worldwide with a death toll of 4.5 million as of September 2021. This study is focused on finding potential vaccine candidates and designing an in-silico subunit multi-epitope vaccine candidates using a unique computational pipeline, integrating reverse vaccinology, molecular docking and simulation methods. A protein named spike protein of SARS-CoV-2 with the GenBank ID QHD43416.1 was shortlisted as a potential vaccine candidate and was examined for presence of B-cell and T-cell epitopes. We also investigated antigenicity and interaction with distinct polymorphic alleles of the epitopes. High ranking epitopes such as DLCFTNVY (B cell epitope), KIADYNKL (MHC Class-I) and VKNKCVNFN (MHC class-II) were shortlisted for subsequent analysis. Digestion analysis verified the safety and stability of the shortlisted peptides. Docking study reported a strong binding of proposed peptides with HLA-A*02 and HLA-B7 alleles. We used standard methods to construct vaccine model and this construct was evaluated further for its antigenicity, physicochemical properties, 2D and 3D structure prediction and validation. Further, molecular docking followed by molecular dynamics simulation was performed to evaluate the binding affinity and stability of TLR-4 and vaccine complex. Finally, the vaccine construct was reverse transcribed and adapted for E. coli strain K 12 prior to the insertion within the pET-28-a (+) vector for determining translational and microbial expression followed by conservancy analysis. Also, six multi-epitope subunit vaccines were constructed using different strategies containing immunogenic epitopes, appropriate adjuvants and linker sequences. We propose that our vaccine constructs can be used for downstream investigations using in-vitro and in-vivo studies to design effective and safe vaccine against different strains of COVID-19.
Collapse
|
21
|
Liu Y, Jin S, Gao H, Wang X, Wang C, Zhou W, Yu B. Predicting the multi-label protein subcellular localization through multi-information fusion and MLSI dimensionality reduction based on MLFE classifier. Bioinformatics 2021; 38:1223-1230. [PMID: 34864897 PMCID: PMC8690230 DOI: 10.1093/bioinformatics/btab811] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2021] [Revised: 11/17/2021] [Accepted: 11/30/2021] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION Multi-label (ML) protein subcellular localization (SCL) is an indispensable way to study protein function. It can locate a certain protein (such as the human transmembrane protein that promotes the invasion of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)) or expression product at a specific location in a cell, which can provide a reference for clinical treatment of diseases such as coronavirus disease 2019 (COVID-19). RESULTS The article proposes a novel method named ML-locMLFE. First of all, six feature extraction methods are adopted to obtain protein effective information. These methods include pseudo amino acid composition, encoding based on grouped weight, gene ontology, multi-scale continuous and discontinuous, residue probing transformation and evolutionary distance transformation. In the next part, we utilize the ML information latent semantic index method to avoid the interference of redundant information. In the end, ML learning with feature-induced labeling information enrichment is adopted to predict the ML protein SCL. The Gram-positive bacteria dataset is chosen as a training set, while the Gram-negative bacteria dataset, virus dataset, newPlant dataset and SARS-CoV-2 dataset as the test sets. The overall actual accuracy of the first four datasets are 99.23%, 93.82%, 93.24% and 96.72% by the leave-one-out cross validation. It is worth mentioning that the overall actual accuracy prediction result of our predictor on the SARS-CoV-2 dataset is 72.73%. The results indicate that the ML-locMLFE method has obvious advantages in predicting the SCL of ML protein, which provides new ideas for further research on the SCL of ML protein. AVAILABILITY AND IMPLEMENTATION The source codes and datasets are publicly available at https://github.com/QUST-AIBBDRC/ML-locMLFE/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yushuang Liu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Shuping Jin
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Hongli Gao
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Xue Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Congjing Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Weifeng Zhou
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China,Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Bin Yu
- School of Data Science, Qingdao University of Science and Technology, Qingdao 266061, China,College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China,To whom correspondence should be addressed.
| |
Collapse
|
22
|
Naqvi STQ, Yasmeen M, Ismail M, Muhammad SA, Nawazish-i-Husain S, Ali A, Munir F, Zhang Q. Designing of Potential Polyvalent Vaccine Model for Respiratory Syncytial Virus by System Level Immunoinformatics Approaches. BIOMED RESEARCH INTERNATIONAL 2021; 2021:9940010. [PMID: 34136576 PMCID: PMC8177976 DOI: 10.1155/2021/9940010] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 04/18/2021] [Accepted: 05/10/2021] [Indexed: 11/25/2022]
Abstract
BACKGROUND Respiratory syncytial virus (RSV) infection is a public health epidemic, leading to around 3 million hospitalization and about 66,000 deaths each year. It is a life-threatening condition exclusive to children with no effective treatment. METHODS In this study, we used system-level and vaccinomics approaches to design a polyvalent vaccine for RSV, which could stimulate the immune components of the host to manage this infection. Our framework involves data accession, antigenicity and subcellular localization analysis, T cell epitope prediction, proteasomal and conservancy evaluation, host-pathogen-protein interactions, pathway studies, and in silico binding affinity analysis. RESULTS We found glycoprotein (G), fusion protein (F), and small hydrophobic protein (SH) of RSV as potential vaccine candidates. Of these proteins (G, F, and SH), we found 9 epitopes for multiple alleles of MHC classes I and II bear significant binding affinity. These potential epitopes were linked to form a polyvalent construct using AAY, GPGPG linkers, and cholera toxin B adjuvant at N-terminal with a 23.9 kDa molecular weight of 224 amino acid residues. The final construct was a stable, immunogenic, and nonallergenic protein containing cleavage sites, TAP transport efficiency, posttranslation shifts, and CTL epitopes. The molecular docking indicated the optimum binding affinity of RSV polyvalent construct with MHC molecules (-12.49 and -10.48 kcal/mol for MHC classes I and II, respectively). This interaction showed that a polyvalent construct could manage and control this disease. CONCLUSION Our vaccinomics and system-level investigation could be appropriate to trigger the host immune system to prevent RSV infection.
Collapse
Affiliation(s)
| | - Mamoona Yasmeen
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University Multan, Pakistan
| | - Mehreen Ismail
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University Multan, Pakistan
| | - Syed Aun Muhammad
- Institute of Molecular Biology and Biotechnology, Bahauddin Zakariya University Multan, Pakistan
| | | | - Amjad Ali
- ASAB, National University of Sciences and Technology (NUST), Islamabad, Pakistan
| | - Fahad Munir
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, China
- Wenzhou Medical University, Wenzhou, Zhejiang Province, China
| | - QiYu Zhang
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Wenzhou Medical University, China
| |
Collapse
|
23
|
Zhang Q, Zhang Y, Li S, Han Y, Jin S, Gu H, Yu B. Accurate prediction of multi-label protein subcellular localization through multi-view feature learning with RBRL classifier. Brief Bioinform 2021; 22:6127451. [PMID: 33537726 DOI: 10.1093/bib/bbab012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 12/12/2020] [Accepted: 01/06/2021] [Indexed: 01/27/2023] Open
Abstract
Multi-label proteins can participate in carrier transportation, enzyme catalysis, hormone regulation and other life activities. Meanwhile, they play a key role in the fields of biopharmaceuticals, gene and cell therapy. This article proposes a prediction method called Mps-mvRBRL to predict the subcellular localization (SCL) of multi-label protein. Firstly, pseudo position-specific scoring matrix, dipeptide composition, position specific scoring matrix-transition probability composition, gene ontology and pseudo amino acid composition algorithms are used to obtain numerical information from different views. Based on the contribution of five individual feature extraction methods, differential evolution is used for the first time to learn the weight of single feature, and then these original features use a weighted combination method to fuse multi-view information. Secondly, the fused high-dimensional features use a weighted linear discriminant analysis framework based on binary weight form to eliminate irrelevant information. Finally, the best feature vector is input into the joint ranking support vector machine and binary relevance with robust low-rank learning classifier to predict the SCL. After applying leave-one-out cross-validation, the overall actual accuracy (OAA) and overall location accuracy (OLA) of Mps-mvRBRL on the training set of Gram-positive bacteria are both 99.81%. The OAA on the test sets of plant, virus and Gram-negative bacteria datasets are 97.24%, 98.55% and 98.20%, respectively, and the OLA are 97.16%, 97.62% and 98.28%, respectively. The results show that the model achieves good prediction performance for predicting the SCL of multi-label protein.
Collapse
Affiliation(s)
- Qi Zhang
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| | - Yandan Zhang
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| | - Shan Li
- School of Mathematics and Statistics, Central South University, China
| | - Yu Han
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| | - Shuping Jin
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| | - Haiming Gu
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, China
| |
Collapse
|
24
|
Wang H, Ding Y, Tang J, Zou Q, Guo F. Identify RNA-associated subcellular localizations based on multi-label learning using Chou's 5-steps rule. BMC Genomics 2021; 22:56. [PMID: 33451286 PMCID: PMC7811227 DOI: 10.1186/s12864-020-07347-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2020] [Accepted: 12/22/2020] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Biological functions of biomolecules rely on the cellular compartments where they are located in cells. Importantly, RNAs are assigned in specific locations of a cell, enabling the cell to implement diverse biochemical processes in the way of concurrency. However, lots of existing RNA subcellular localization classifiers only solve the problem of single-label classification. It is of great practical significance to expand RNA subcellular localization into multi-label classification problem. RESULTS In this study, we extract multi-label classification datasets about RNA-associated subcellular localizations on various types of RNAs, and then construct subcellular localization datasets on four RNA categories. In order to study Homo sapiens, we further establish human RNA subcellular localization datasets. Furthermore, we utilize different nucleotide property composition models to extract effective features to adequately represent the important information of nucleotide sequences. In the most critical part, we achieve a major challenge that is to fuse the multivariate information through multiple kernel learning based on Hilbert-Schmidt independence criterion. The optimal combined kernel can be put into an integration support vector machine model for identifying multi-label RNA subcellular localizations. Our method obtained excellent results of 0.703, 0.757, 0.787, and 0.800, respectively on four RNA data sets on average precision. CONCLUSION To be specific, our novel method performs outstanding rather than other prediction tools on novel benchmark datasets. Moreover, we establish user-friendly web server with the implementation of our method.
Collapse
Affiliation(s)
- Hao Wang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
- School of Computational Science and Engineering, University of South Carolina, Columbia, 29208, SC, US
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.
| |
Collapse
|
25
|
Bhattacharya S, Banerjee A, Ray S. Development of new vaccine target against SARS-CoV2 using envelope (E) protein: An evolutionary, molecular modeling and docking based study. Int J Biol Macromol 2020; 172:74-81. [PMID: 33385461 PMCID: PMC7833863 DOI: 10.1016/j.ijbiomac.2020.12.192] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 12/24/2020] [Accepted: 12/25/2020] [Indexed: 02/04/2023]
Abstract
COVID-19 is one of the fatal pandemic throughout the world. For cellular fusion, its antigenic peptides are presented by major histocompatibility complex (MHC) in humans. Therefore, exploration into residual interaction details of CoV2 with MHCs shall be a promising point for instigating the vaccine development. Envelope (E) protein, the smallest outer surface protein from SARS-CoV2 genome was found to possess the highest antigenicity and is therefore used to identify B-cell and T-cell epitopes. Four novel mutations (T55S, V56F, E69R and G70del) were observed in E-protein of SARS-CoV2 after evolutionary analysis. It showed a coil➔helix transition in the protein conformation. Antigenic variability of the epitopes was also checked to explore the novel mutations in the epitope region. It was found that the interactions were more when SARS-CoV2 E-protein interacted with MHC-I than with MHC-II through several ionic and H-bonds. Tyr42 and Tyr57 played a predominant role upon interaction with MHC-I. The higher ΔG values with lesser dissociation constant values also affirm the stronger and spontaneous interaction by SARS-CoV2 proteins with MHCs. On comparison with the consensus E-protein, SARS-CoV2 E-protein showed stronger interaction with the MHCs with lesser solvent accessibility. E-protein can therefore be targeted as a potential vaccine target against SARS-CoV2.
Collapse
Affiliation(s)
- Shreya Bhattacharya
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam, India
| | - Arundhati Banerjee
- Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia, India
| | - Sujay Ray
- Amity Institute of Bioechnology, Amity University, Kolkata, India.
| |
Collapse
|
26
|
Can H, Köseoğlu AE, Erkunt Alak S, Güvendi M, Döşkaya M, Karakavuk M, Gürüz AY, Ün C. In silico discovery of antigenic proteins and epitopes of SARS-CoV-2 for the development of a vaccine or a diagnostic approach for COVID-19. Sci Rep 2020; 10:22387. [PMID: 33372181 PMCID: PMC7769971 DOI: 10.1038/s41598-020-79645-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2020] [Accepted: 12/10/2020] [Indexed: 12/11/2022] Open
Abstract
In the genome of SARS-CoV-2, the 5′-terminus encodes a polyprotein, which is further cleaved into 15 non-structural proteins whereas the 3′ terminus encodes four structural proteins and eight accessory proteins. Among these 27 proteins, the present study aimed to discover likely antigenic proteins and epitopes to be used for the development of a vaccine or serodiagnostic assay using an in silico approach. For this purpose, after the full genome analysis of SARS-CoV-2 Wuhan isolate and variant proteins that are detected frequently, surface proteins including spike, envelope, and membrane proteins as well as proteins with signal peptide were determined as probable vaccine candidates whereas the remaining were considered as possible antigens to be used during the development of serodiagnostic assays. According to results obtained, among 27 proteins, 26 of them were predicted as probable antigen. In 26 proteins, spike protein was selected as the best vaccine candidate because of having a signal peptide, negative GRAVY value, one transmembrane helix, moderate aliphatic index, a big molecular weight, a long-estimated half-life, beta wrap motifs as well as having stable, soluble and non-allergic features. In addition, orf7a, orf8, and nsp-10 proteins with signal peptide were considered as potential vaccine candidates. Nucleocapsid protein and a highly antigenic GGDGKMKD epitope were identified as ideal antigens to be used in the development of serodiagnostic assays. Moreover, considering MHC-I alleles, highly antigenic KLNDLCFTNV and ITLCFTLKRK epitopes can be used to develop an epitope-based peptide vaccine.
Collapse
Affiliation(s)
- Hüseyin Can
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey
| | - Ahmet Efe Köseoğlu
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey
| | - Sedef Erkunt Alak
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey
| | - Mervenur Güvendi
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey
| | - Mert Döşkaya
- Department of Parasitology, Faculty of Medicine, Ege University, Bornova, İzmir, Turkey
| | | | - Adnan Yüksel Gürüz
- Department of Parasitology, Faculty of Medicine, Ege University, Bornova, İzmir, Turkey
| | - Cemal Ün
- Department of Biology Molecular Biology Section, Faculty of Science, Ege University, Bornova, İzmir, Turkey.
| |
Collapse
|
27
|
Liu GH, Zhang BW, Qian G, Wang B, Mao B, Bichindaritz I. Bioimage-Based Prediction of Protein Subcellular Location in Human Tissue with Ensemble Features and Deep Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1966-1980. [PMID: 31107658 DOI: 10.1109/tcbb.2019.2917429] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Prediction of protein subcellular location has currently become a hot topic because it has been proven to be useful for understanding both the disease mechanisms and novel drug design. With the rapid development of automated microscopic imaging technology in recent years, classification methods of bioimage-based protein subcellular location have attracted considerable attention for images can describe the protein distribution intuitively and in detail. In the current study, a prediction method of protein subcellular location was proposed based on multi-view image features that are extracted from three different views, including the four texture features of the original image, the global and local features of the protein extracted from the protein channel images after color segmentation, and the global features of DNA extracted from the DNA channel image. Finally, the extracted features were combined together to improve the performance of subcellular localization prediction. From the performance comparison of different combination features under the same classifier, the best ensemble features could be obtained. In this work, a classifier based on Stacked Auto-encoders and the random forest was also put forward. To improve the prediction results, the deep network was combined with the traditional statistical classification methods. Stringent cross-validation and independent validation tests on the benchmark dataset demonstrated the efficacy of the proposed method.
Collapse
|
28
|
Cong H, Liu H, Chen Y, Cao Y. Self-evoluting framework of deep convolutional neural network for multilocus protein subcellular localization. Med Biol Eng Comput 2020; 58:3017-3038. [PMID: 33078303 DOI: 10.1007/s11517-020-02275-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Accepted: 10/14/2020] [Indexed: 12/12/2022]
Abstract
In the present paper, deep convolutional neural network (DCNN) is applied to multilocus protein subcellular localization as it is more suitable for multi-class classification. There are two main problems with this application. First, the appropriate features for correlation between multiple sites are hard to find. Second, the classifier structure is difficult to determine as it is greatly affected by the distribution of classified data. To solve these problems, a self-evoluting framework using DCNNs for multilocus protein subcellular localization is proposed. It has three characteristics that the previous algorithms do not. The first is that it combines the ant colony algorithm with the DCNN to form a self-evoluting algorithm for multilocus protein subcellular localization. The second is that it randomly groups subcellular sites using a limited random k-labelsets multi-label classification method. It also solves complex problems in a divide-and-conquer approach and proposes a flexible expansion model. The third is that it realizes the random selection feature extraction method in the positioning process and avoids the defects in individual feature extraction methods. The algorithm in the present paper is tested on the human database, and the overall correct rate is 67.17%, which is higher than that for the stacked self-encoder (SAE), support vector machine (SVM), random forest classifier (RF), or single deep convolutional neural network.Graphical abstract The algorithm mentioned in the present paper mainly includes four parts. They are protein sequence data preprocessing, integrated DCNN model construction, finding optimal DCNN combination by ant colony optimization, and protein subcellular localization for sequences. These parts are sequential relationships and the data obtained in the previous part is the basis for the latter part of the function. In the part of data preprocessing, the limited RAkEL multi-label classification method is used to randomly group subcellular sites. At the same time, the feature fusion of protein sequences is carried out by using multiple feature extraction methods. Each combination including features and sites information corresponds to a DCNN model. In the part of finding optimal DCNN combination by ant colony optimization, the main purpose is to find the best combination of DCNN models through the global optimization ability of the ant colony algorithm. The positioning of sequences is mainly to obtain multilocus subcellular localization by the optimal model combination.
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China.,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, No. 88, Wenhua East Road, Jinan City, China. .,Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Shandong Normal University, Jinan, China.
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China.,Shandong Provincial Key Laboratory of Network Based Intelligent Computing, University of Jinan, Jinan, China
| |
Collapse
|
29
|
Sahu SS, Loaiza CD, Kaundal R. Plant-mSubP: a computational framework for the prediction of single- and multi-target protein subcellular localization using integrated machine-learning approaches. AOB PLANTS 2020; 12:plz068. [PMID: 32528639 PMCID: PMC7274489 DOI: 10.1093/aobpla/plz068] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2019] [Accepted: 10/11/2019] [Indexed: 05/18/2023]
Abstract
The subcellular localization of proteins is very important for characterizing its function in a cell. Accurate prediction of the subcellular locations in computational paradigm has been an active area of interest. Most of the work has been focused on single localization prediction. Only few studies have discussed the multi-target localization, but have not achieved good accuracy so far; in plant sciences, very limited work has been done. Here we report the development of a novel tool Plant-mSubP, which is based on integrated machine learning approaches to efficiently predict the subcellular localizations in plant proteomes. The proposed approach predicts with high accuracy 11 single localizations and three dual locations of plant cell. Several hybrid features based on composition and physicochemical properties of a protein such as amino acid composition, pseudo amino acid composition, auto-correlation descriptors, quasi-sequence-order descriptors and hybrid features are used to represent the protein. The performance of the proposed method has been assessed through a training set as well as an independent test set. Using the hybrid feature of the pseudo amino acid composition, N-Center-C terminal amino acid composition and the dipeptide composition (PseAAC-NCC-DIPEP), an overall accuracy of 81.97 %, 84.75 % and 87.88 % is achieved on the training data set of proteins containing the single-label, single- and dual-label combined, and dual-label proteins, respectively. When tested on the independent data, an accuracy of 64.36 %, 64.84 % and 81.08 % is achieved on the single-label, single- and dual-label, and dual-label proteins, respectively. The prediction models have been implemented on a web server available at http://bioinfo.usu.edu/Plant-mSubP/. The results indicate that the proposed approach is comparable to the existing methods in single localization prediction and outperforms all other existing tools when compared for dual-label proteins. The prediction tool will be a useful resource for better annotation of various plant proteomes.
Collapse
Affiliation(s)
- Sitanshu S Sahu
- Department of Electronics and Communication Engineering, Birla Institute of Technology, Mesra, Ranchi, India
| | - Cristian D Loaiza
- Department of Plants, Soils, and Climate/Center for Integrated BioSystems, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
| | - Rakesh Kaundal
- Department of Plants, Soils, and Climate/Center for Integrated BioSystems, College of Agriculture and Applied Sciences, Utah State University, Logan, UT, USA
- Bioinformatics Facility, Center for Integrated BioSystems, Utah State University, Logan, UT, USA
- Corresponding author’s e-mail address:
| |
Collapse
|
30
|
In silico designing of peptide based vaccine for Hepatitis viruses using reverse vaccinology approach. INFECTION GENETICS AND EVOLUTION 2020; 84:104388. [PMID: 32485330 DOI: 10.1016/j.meegid.2020.104388] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Revised: 05/15/2020] [Accepted: 05/29/2020] [Indexed: 12/27/2022]
Abstract
Five different Hepatitis virus from different viral species cause viral-hepatitis, which is a life threatening disease leading to a high number of loss of lives every year. The mode of infection and transmission is different for each species and mostly spreads by direct contact and body fluids (for HBV and HCV). No such vaccine is available that can cure all types of Hepatitis with cross-protection. Thus our study involves a peptide based vaccine design with the help of Immunoinformatics approach. We focused only on the secretory and extracellular proteins of each types and identified their epitopes. Epitopes were examined for antigenicity, allergenicity, toxicity, anti-inflammatory property and IFN-γ induction. The short-listed peptides were stitched using linkers and TLR4 adjuvant. This final vaccine was proven to have good physico-chemical and structural properties. Simulation study to determine structural stability of the vaccine showed good result. Docking structure of vaccine with TLR4 has high affinity binding. Immune-simulation reveals favourable induction of immune response with high level of interleukins production important for immunity. Periplasmic expression in E.coli K12 strain was quite satisfactory. This study of designing recombinant chimeric vaccine using reverse vaccinology method provides some idea about the vaccine production against Hepatitis virus.
Collapse
|
31
|
Nath B, Sharma K, Ahire K, Goyal A, Kumar S. Structure analysis of the nucleoprotein of Newcastle disease virus: An insight towards its multimeric form in solution. Int J Biol Macromol 2020; 151:402-411. [PMID: 32061852 DOI: 10.1016/j.ijbiomac.2020.02.133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 02/12/2020] [Accepted: 02/12/2020] [Indexed: 10/25/2022]
Abstract
Newcastle disease virus (NDV) has been explored to a great extent to understand the biology of negative-sense RNA viruses. Nucleoprotein (N) is the most abundant protein in the virus particles, and its primary function is to encapsidate the virus genome for its transcription, replication, and packaging. Here, we report the structural investigations of the N protein of NDV (NDV-N) in solution. The N gene of NDV was cloned and expressed in E. coli as a soluble protein of ~53 kDa in size. The FE-TEM imaging of the purified NDV-N displayed a nearly spherical shape with a diameter of 28 nm and the DLS analysis of the purified NDV-N displayed a monodispersed nature, with averaged hydrodynamic radius, 26.5 nm. The conformational behavior of the NDV-N in solution was studied by SAXS analysis, which suggested two ring structures of NDV-N formed by thirteen monomeric units each. Each ring interacts with RNA molecules and forms a large molecule with a size of ~1450 kDa and are stacked on each other in a spiral arrangement. More profound knowledge of the N protein structure will help us in deciphering the control of viral RNA synthesis at the early stage of NDV life-cycle.
Collapse
Affiliation(s)
- Barnali Nath
- Viral Immunology Lab, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India
| | - Kedar Sharma
- Carbohydrate Enzyme Biotechnology Laboratory, Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India
| | - Komal Ahire
- Viral Immunology Lab, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India
| | - Arun Goyal
- Carbohydrate Enzyme Biotechnology Laboratory, Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India.
| | - Sachin Kumar
- Viral Immunology Lab, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India.
| |
Collapse
|
32
|
ACNNT3: Attention-CNN Framework for Prediction of Sequence-Based Bacterial Type III Secreted Effectors. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:3974598. [PMID: 32328150 PMCID: PMC7157791 DOI: 10.1155/2020/3974598] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2020] [Revised: 03/09/2020] [Accepted: 03/17/2020] [Indexed: 12/18/2022]
Abstract
The type III secretion system (T3SS) is a special protein delivery system in Gram-negative bacteria which delivers T3SS-secreted effectors (T3SEs) to host cells causing pathological changes. Numerous experiments have verified that T3SEs play important roles in many biological activities and in host-pathogen interactions. Accurate identification of T3SEs is therefore essential to help understand the pathogenic mechanism of bacteria; however, many existing biological experimental methods are time-consuming and expensive. New deep-learning methods have recently been successfully applied to T3SE recognition, but improving the recognition accuracy of T3SEs is still a challenge. In this study, we developed a new deep-learning framework, ACNNT3, based on the attention mechanism. We converted 100 residues of the N-terminal of the protein sequence into a fusion feature vector of protein primary structure information (one-hot encoding) and position-specific scoring matrix (PSSM) which are used as the feature input of the network model. We then embedded the attention layer into CNN to learn the characteristic preferences of type III effector proteins, which can accurately classify any protein directly as either T3SEs or non-T3SEs. We found that the introduction of new protein features can improve the recognition accuracy of the model. Our method combines the advantages of CNN and the attention mechanism and is superior in many indicators when compared to other popular methods. Using the common independent dataset, our method is more accurate than the previous method, showing an improvement of 4.1-20.0%.
Collapse
|
33
|
Shao Y, Chou KC. pLoc_Deep-mVirus: A CNN Model for Predicting Subcellular Localization of Virus Proteins by Deep Learning. ACTA ACUST UNITED AC 2020. [DOI: 10.4236/ns.2020.126033] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
34
|
Javed F, Hayat M. Predicting subcellular localization of multi-label proteins by incorporating the sequence features into Chou's PseAAC. Genomics 2019; 111:1325-1332. [DOI: 10.1016/j.ygeno.2018.09.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2018] [Accepted: 09/04/2018] [Indexed: 12/13/2022]
|
35
|
pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset. Genomics 2019; 111:1274-1282. [DOI: 10.1016/j.ygeno.2018.08.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 08/14/2018] [Accepted: 08/16/2018] [Indexed: 12/17/2022]
|
36
|
Chou KC. Impacts of Pseudo Amino Acid Components and 5-steps Rule to Proteomics and Proteome Analysis. Curr Top Med Chem 2019; 19:2283-2300. [DOI: 10.2174/1568026619666191018100141] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 08/18/2019] [Accepted: 08/26/2019] [Indexed: 01/27/2023]
Abstract
Stimulated by the 5-steps rule during the last decade or so, computational proteomics has achieved remarkable progresses in the following three areas: (1) protein structural class prediction; (2) protein subcellular location prediction; (3) post-translational modification (PTM) site prediction. The results obtained by these predictions are very useful not only for an in-depth study of the functions of proteins and their biological processes in a cell, but also for developing novel drugs against major diseases such as cancers, Alzheimer’s, and Parkinson’s. Moreover, since the targets to be predicted may have the multi-label feature, two sets of metrics are introduced: one is for inspecting the global prediction quality, while the other for the local prediction quality. All the predictors covered in this review have a userfriendly web-server, through which the majority of experimental scientists can easily obtain their desired data without the need to go through the complicated mathematics.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| |
Collapse
|
37
|
Shen Y, Ding Y, Tang J, Zou Q, Guo F. Critical evaluation of web-based prediction tools for human protein subcellular localization. Brief Bioinform 2019; 21:1628-1640. [DOI: 10.1093/bib/bbz106] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Revised: 07/23/2019] [Accepted: 07/27/2019] [Indexed: 11/12/2022] Open
Abstract
Abstract
Human protein subcellular localization has an important research value in biological processes, also in elucidating protein functions and identifying drug targets. Over the past decade, a number of protein subcellular localization prediction tools have been designed and made freely available online. The purpose of this paper is to summarize the progress of research on the subcellular localization of human proteins in recent years, including commonly used data sets proposed by the predecessors and the performance of all selected prediction tools against the same benchmark data set. We carry out a systematic evaluation of several publicly available subcellular localization prediction methods on various benchmark data sets. Among them, we find that mLASSO-Hum and pLoc-mHum provide a statistically significant improvement in performance, as measured by the value of accuracy, relative to the other methods. Meanwhile, we build a new data set using the latest version of Uniprot database and construct a new GO-based prediction method HumLoc-LBCI in this paper. Then, we test all selected prediction tools on the new data set. Finally, we discuss the possible development directions of human protein subcellular localization. Availability: The codes and data are available from http://www.lbci.cn/syn/.
Collapse
Affiliation(s)
- Yinan Shen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, China
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
- School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
- Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
38
|
Chou KC. Advances in Predicting Subcellular Localization of Multi-label Proteins and its Implication for Developing Multi-target Drugs. Curr Med Chem 2019; 26:4918-4943. [PMID: 31060481 DOI: 10.2174/0929867326666190507082559] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 01/29/2019] [Accepted: 01/31/2019] [Indexed: 12/16/2022]
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
39
|
Abstract
The smallest unit of life is a cell, which contains numerous protein molecules. Most
of the functions critical to the cell’s survival are performed by these proteins located in its different
organelles, usually called ‘‘subcellular locations”. Information of subcellular localization
for a protein can provide useful clues about its function. To reveal the intricate pathways at the
cellular level, knowledge of the subcellular localization of proteins in a cell is prerequisite.
Therefore, one of the fundamental goals in molecular cell biology and proteomics is to determine
the subcellular locations of proteins in an entire cell. It is also indispensable for prioritizing
and selecting the right targets for drug development. Unfortunately, it is both timeconsuming
and costly to determine the subcellular locations of proteins purely based on experiments.
With the avalanche of protein sequences generated in the post-genomic age, it is highly
desired to develop computational methods for rapidly and effectively identifying the subcellular
locations of uncharacterized proteins based on their sequences information alone. Actually,
considerable progresses have been achieved in this regard. This review is focused on those
methods, which have the capacity to deal with multi-label proteins that may simultaneously
exist in two or more subcellular location sites. Protein molecules with this kind of characteristic
are vitally important for finding multi-target drugs, a current hot trend in drug development.
Focused in this review are also those methods that have use-friendly web-servers established so
that the majority of experimental scientists can use them to get the desired results without the
need to go through the detailed mathematics involved.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
40
|
Han GS, Yu ZG. ML-rRBF-ECOC: A Multi-Label Learning Classifier for Predicting Protein Subcellular Localization with Both Single and Multiple Sites. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164616666190103143945] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The subcellular localization of a protein is closely related with its functions
and interactions. More and more evidences show that proteins may simultaneously exist at, or move
between, two or more different subcellular localizations. Therefore, predicting protein subcellular localization
is an important but challenging problem.
Observation:
Most of the existing methods for predicting protein subcellular localization assume that a
protein locates at a single site. Although a few methods have been proposed to deal with proteins with
multiple sites, correlations between subcellular localization are not efficiently taken into account. In
this paper, we propose an integrated method for predicting protein subcellular localizations with both
single site and multiple sites.
Methods:
Firstly, we extend the Multi-Label Radial Basis Function (ML-RBF) method to the regularized
version, and augment the first layer of ML-RBF to take local correlations between subcellular localization
into account. Secondly, we embed the modified ML-RBF into a multi-label Error-Correcting
Output Codes (ECOC) method in order to further consider the subcellular localization dependency. We
name our method ML-rRBF-ECOC. Finally, the performance of ML-rRBF-ECOC is evaluated on
three benchmark datasets.
Results:
The results demonstrate that ML-rRBF-ECOC has highly competitive performance to the related
multi-label learning method and some state-of-the-art methods for predicting protein subcellular
localizations with multiple sites. Considering dependency between subcellular localizations can contribute
to the improvement of prediction performance.
Conclusion:
This also indicates that correlations between different subcellular localizations really exist.
Our method at least plays a complementary role to existing methods for predicting protein subcellular
localizations with multiple sites.
Collapse
Affiliation(s)
- Guo-Sheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan 411105, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Hunan 411105, China
| |
Collapse
|
41
|
Xiao X, Cheng X, Chen G, Mao Q, Chou KC. pLoc_bal-mVirus: Predict Subcellular Localization of Multi-Label Virus Proteins by Chou's General PseAAC and IHTS Treatment to Balance Training Dataset. Med Chem 2019; 15:496-509. [DOI: 10.2174/1573406415666181217114710] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 10/23/2018] [Accepted: 12/12/2018] [Indexed: 12/17/2022]
Abstract
Background/Objective:Knowledge of protein subcellular localization is vitally important for both basic research and drug development. Facing the avalanche of protein sequences emerging in the post-genomic age, it is urgent to develop computational tools for timely and effectively identifying their subcellular localization based on the sequence information alone. Recently, a predictor called “pLoc-mVirus” was developed for identifying the subcellular localization of virus proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems in which some proteins, known as “multiplex proteins”, may simultaneously occur in, or move between two or more subcellular location sites. Despite the fact that it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mVirus was trained by an extremely skewed dataset in which some subset was over 10 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset.Methods:Using the Chou's general PseAAC (Pseudo Amino Acid Composition) approach and the IHTS (Inserting Hypothetical Training Samples) treatment to balance out the training dataset, we have developed a new predictor called “pLoc_bal-mVirus” for predicting the subcellular localization of multi-label virus proteins.Results:Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the proposed new predictor is remarkably superior to pLoc-mVirus, the existing state-of-theart predictor for the same purpose.Conclusion:Its user-friendly web-server is available at http://www.jci-bioinfo.cn/pLoc_balmVirus/, by which the majority of experimental scientists can easily get their desired results without the need to go through the detailed complicated mathematics. Accordingly, pLoc_bal-mVirus will become a very useful tool for designing multi-target drugs and in-depth understanding of the biological process in a cell.
Collapse
Affiliation(s)
- Xuan Xiao
- Gordon Life Science Institute, Boston, MA 02478, United States
| | - Xiang Cheng
- Gordon Life Science Institute, Boston, MA 02478, United States
| | - Genqiang Chen
- College of Chemistry, Chemical Engineering and Biotechnology, Donghua University, Shanghai 201620, China
| | - Qi Mao
- College of Information Science and Technology, Donghua University, Shanghai, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
42
|
Abstract
Background:
Revealing the subcellular location of a newly discovered protein can
bring insight into their function and guide research at the cellular level. The experimental methods
currently used to identify the protein subcellular locations are both time-consuming and expensive.
Thus, it is highly desired to develop computational methods for efficiently and effectively identifying
the protein subcellular locations. Especially, the rapidly increasing number of protein sequences
entering the genome databases has called for the development of automated analysis methods.
Methods:
In this review, we will describe the recent advances in predicting the protein subcellular
locations with machine learning from the following aspects: i) Protein subcellular location benchmark
dataset construction, ii) Protein feature representation and feature descriptors, iii) Common
machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web
servers.
Result & Conclusion:
Concomitant with a large number of protein sequences generated by highthroughput
technologies, four future directions for predicting protein subcellular locations with
machine learning should be paid attention. One direction is the selection of novel and effective features
(e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins.
Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth
one is the protein multiple location sites prediction.
Collapse
Affiliation(s)
- Ting-He Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Shao-Wu Zhang
- School of Automation, Northwestern Polytechnical University, Xi'an, 710072, China
| |
Collapse
|
43
|
Dehzangi A, López Y, Taherzadeh G, Sharma A, Tsunoda T. SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure. Molecules 2018; 23:E3260. [PMID: 30544729 PMCID: PMC6320791 DOI: 10.3390/molecules23123260] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Revised: 11/30/2018] [Accepted: 12/05/2018] [Indexed: 12/13/2022] Open
Abstract
Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD 21251, USA.
| | - Yosvany López
- Genesis Institute of Genetic Research, Genesis Healthcare Co., Tokyo 150-6015, Japan.
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane 4111, Australia.
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.
- CREST, JST, Tokyo 102-0076, Japan.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.
- CREST, JST, Tokyo 102-0076, Japan.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
| |
Collapse
|
44
|
Shen Y, Tang J, Guo F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou's general PseAAC. J Theor Biol 2018; 462:230-239. [PMID: 30452958 DOI: 10.1016/j.jtbi.2018.11.012] [Citation(s) in RCA: 101] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 11/07/2018] [Accepted: 11/15/2018] [Indexed: 01/07/2023]
Abstract
Identifying the location of proteins in a cell plays an important role in understanding their functions, such as drug design, therapeutic target discovery and biological research. However, the traditional subcellular localization experiments are time-consuming, laborious and small scale. With the development of next-generation sequencing technology, the number of proteins has grown exponentially, which lays the foundation of the computational method for identifying protein subcellular localization. Although many methods for predicting subcellular localization of proteins have been proposed, most of them are limited to single-location. In this paper, we propose a multi-kernel SVM to predict subcellular localization of both multi-location and single-location proteins. First, we make use of the evolutionary information extracted from position specific scoring matrix (PSSM) and physicochemical properties of proteins, by Chou's general PseAAC and other efficient functions. Then, we propose a multi-kernel support vector machine (SVM) model to identify multi-label protein subcellular localization. As a result, our method has a good performance on predicting subcellular localization of proteins. It achieves an average precision of 0.7065 and 0.6889 on two human datasets, respectively. All results are higher than those achieved by other existing methods. Therefore, we provide an efficient system via a novel perspective to study the protein subcellular localization.
Collapse
Affiliation(s)
- Yinan Shen
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
| | - Jijun Tang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China; School of Computational Science and Engineering, University of South Carolina, Columbia, USA.
| | - Fei Guo
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Yaguan Road, Jinnan District, Tianjin, PR China.
| |
Collapse
|
45
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Jia JH, Chou KC. iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics 2018; 110:239-246. [DOI: 10.1016/j.ygeno.2017.10.008] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 10/23/2017] [Accepted: 10/25/2017] [Indexed: 01/23/2023]
|
46
|
Kumar R, Kumari B, Kumar M. Proteome-wide prediction and annotation of mitochondrial and sub-mitochondrial proteins by incorporating domain information. Mitochondrion 2018; 42:11-22. [DOI: 10.1016/j.mito.2017.10.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2017] [Revised: 07/21/2017] [Accepted: 10/06/2017] [Indexed: 12/22/2022]
|
47
|
Uddin MR, Sharma A, Farid DM, Rahman MM, Dehzangi A, Shatabda S. EvoStruct-Sub: An accurate Gram-positive protein subcellular localization predictor using evolutionary and structural features. J Theor Biol 2018; 443:138-146. [DOI: 10.1016/j.jtbi.2018.02.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2017] [Revised: 01/18/2018] [Accepted: 02/03/2018] [Indexed: 12/21/2022]
|
48
|
Xiao X, Ye HX, Liu Z, Jia JH, Chou KC. iROS-gPseKNC: Predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget 2018; 7:34180-9. [PMID: 27147572 PMCID: PMC5085147 DOI: 10.18632/oncotarget.9057] [Citation(s) in RCA: 109] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 04/09/2016] [Indexed: 11/25/2022] Open
Abstract
DNA replication, occurring in all living organisms and being the basis for biological inheritance, is the process of producing two identical replicas from one original DNA molecule. To in-depth understand such an important biological process and use it for developing new strategy against genetics diseases, the knowledge of duplication origin sites in DNA is indispensible. With the explosive growth of DNA sequences emerging in the postgenomic age, it is highly desired to develop high throughput tools to identify these regions purely based on the sequence information alone. In this paper, by incorporating the dinucleotide position-specific propensity information into the general pseudo nucleotide composition and using the random forest classifier, a new predictor called iROS-gPseKNC was proposed. Rigorously cross-validations have indicated that the proposed predictor is significantly better than the best existing method in sensitivity, specificity, overall accuracy, and stability. Furthermore, a user-friendly web-server for iROS-gPseKNC has been established at http://www.jci-bioinfo.cn/iROS-gPseKNC, by which users can easily get their desired results without the need to bother the complicated mathematics, which were presented just for the integrity of the methodology itself.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, 333403, China.,Information School, ZheJiang Textile and Fashion College, NingBo, 315211, China.,Gordon Life Science Institute, Boston, Massachusetts, 02478, USA
| | - Han-Xiao Ye
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, 333403, China
| | - Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Jian-Hua Jia
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, 333403, China
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia.,Gordon Life Science Institute, Boston, Massachusetts, 02478, USA
| |
Collapse
|
49
|
Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. J Theor Biol 2018; 437:239-250. [DOI: 10.1016/j.jtbi.2017.10.030] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 09/29/2017] [Accepted: 10/27/2017] [Indexed: 12/27/2022]
|
50
|
Wang L, Zhao Y, Chen Y, Wang D. The effect of three novel feature extraction methods on the prediction of the subcellular localization of multi-site virus proteins. Bioengineered 2018; 9:196-202. [PMID: 28886267 PMCID: PMC5972939 DOI: 10.1080/21655979.2017.1373536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 07/05/2017] [Indexed: 11/08/2022] Open
Abstract
Experimental methods play a crucial role in identifying the subcellular localization of proteins and building high-quality databases. However, more efficient, automated computational methods are required to predict the subcellular localization of proteins on a large scale. Various efficient feature extraction methods have been proposed to predict subcellular localization, but challenges remain. In this paper, three novel feature extraction methods are established to improve multi-site prediction. The first novel feature extraction method utilizes repetitive information via moving windows based on a dipeptide pseudo amino acid composition method (R-Dipeptide). The second novel feature extraction method utilizes the impact of each amino acid residue on its following residues based on pseudo amino acids (I-PseAAC). The third novel feature extraction method provides local information about protein sequences that reflects the strength of the physicochemical properties of residues (PseAAC2). The multi-label k-nearest neighbor algorithm (MLKNN) is used to predict the subcellular localization of multi-site virus proteins. The best overall accuracy values of R-Dipeptide, I-PseAAC, and PseAAC2 when applied to dataset S from Virus-mPloc are 59.92%, 59.13%, and 57.94% respectively.
Collapse
Affiliation(s)
- Lei Wang
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Yaou Zhao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Dong Wang
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|