1
|
Chong LC, Khan AM. A Systematic Bioinformatics Approach for Mapping the Minimal Set of a Viral Peptidome. Curr Protoc 2024; 4:e1056. [PMID: 38856995 DOI: 10.1002/cpz1.1056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Sequence changes in viral genomes generate protein sequence diversity that enables viruses to evade the host immune system, hindering the development of effective preventive and therapeutic interventions. The massive proliferation of sequence data provides unprecedented opportunities to study viral adaptation and evolution. An alignment-free approach removes various restrictions posed by an alignment-dependent approach for studying sequence diversity. The publicly available tool, UNIQmin, offers an alignment-free approach for studying viral sequence diversity at any given rank of taxonomy lineage and is big data ready. The tool performs an exhaustive search to determine the minimal set of sequences required to capture the peptidome diversity within a given dataset. This compression is possible through the removal of identical sequences and unique sequences that do not contribute effectively to the peptidome diversity pool. Herein, we describe a detailed four-part protocol utilizing UNIQmin to generate the minimal set for the purpose of viral diversity analyses, alignment-free at any rank of the taxonomy lineage, using the recent global public health threat Monkeypox virus (MPX) sequence data as a case study. The protocol enables a systematic bioinformatics approach to study sequence diversity across taxonomic lineages, which is crucial for our future preparedness against viral epidemics. This is particularly important when data are abundant, freely available, and alignment is not an option. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Tool installation and input file preparation Basic Protocol 2: Generation of a minimal set of sequences for a given dataset Basic Protocol 3: Comparative minimal set analysis across taxonomic lineage ranks Basic Protocol 4: Factors affecting the minimal set of sequences.
Collapse
Affiliation(s)
- Li Chuin Chong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Kuala Lumpur, Malaysia
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Turkey
- Current affiliation: Institute for Experimental Virology, TWINCORE Centre for Experimental and Clinical Infection Research, a Medical School Hannover (MHH) and Helmholtz Centre for Infection Research (HZI) joint venture, Hannover, Germany
| | - Asif M Khan
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Kuala Lumpur, Malaysia
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Turkey
- Current affiliation: College of Computing and Information Technology, University of Doha for Science and Technology, Doha, Qatar
| |
Collapse
|
2
|
Praveen M. Characterizing the West Nile Virus's polyprotein from nucleotide sequence to protein structure - Computational tools. J Taibah Univ Med Sci 2024; 19:338-350. [PMID: 38304694 PMCID: PMC10831166 DOI: 10.1016/j.jtumed.2024.01.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 11/27/2023] [Accepted: 01/08/2024] [Indexed: 02/03/2024] Open
Abstract
Objectives West Nile virus (WNV) belongs to the Flaviviridae family and causes West Nile fever. The mechanism of transmission involves the culex mosquito species. Infected individuals are primarily asymptomatic, and few exhibit common symptoms. Moreover, 10 % of neuronal infection caused by this virus cause death. The proteins encoded by these genes had been uncharacterized, although understanding their function and structure is important for formulating antiviral drugs. Methods Herein, we used in silico approaches, including various bioinformatic tools and databases, to analyse the proteins from the WNV polyprotein individually. The characterization included GC content, physicochemical properties, conserved domains, soluble and transmembrane regions, signal localization, protein disorder, and secondary structure features and their respective 3D protein structures. Results Among 11 proteins, eight had >50 % GC content, eight proteins had basic pI values, three proteins were unstable under in vitro conditions, four were thermostable according to >100 AI values and some had negative GRAVY values in physicochemical analyses. All protein-conserved domains were shared among Flaviviridae family members. Five proteins were soluble and lacked transmembrane regions. Two proteins had signals for localization in the host endoplasmic reticulum. Non-structural (NS) 2A showed low protein disorder. The secondary structural features and tertiary structure models provide a valuable biochemical resource for designing selective substrates and synthetic inhibitors. Conclusions WNV proteins NS2A, NS2B, PM, NS3 and NS5 can be used as drug targets for the pharmacological design of lead antiviral compounds.
Collapse
Affiliation(s)
- Mallari Praveen
- Department of Zoology, Indira Gandhi National Tribal University, Amarkantak, Madhya Pradesh, India
| |
Collapse
|
3
|
James SA, Ong HS, Hari R, Khan AM. A systematic bioinformatics approach for large-scale identification and characterization of host-pathogen shared sequences. BMC Genomics 2021; 22:700. [PMID: 34583643 PMCID: PMC8477458 DOI: 10.1186/s12864-021-07657-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 04/28/2021] [Indexed: 11/10/2022] Open
Abstract
Background Biology has entered the era of big data with the advent of high-throughput omics technologies. Biological databases provide public access to petabytes of data and information facilitating knowledge discovery. Over the years, sequence data of pathogens has seen a large increase in the number of records, given the relatively small genome size and their important role as infectious and symbiotic agents. Humans are host to numerous pathogenic diseases, such as that by viruses, many of which are responsible for high mortality and morbidity. The interaction between pathogens and humans over the evolutionary history has resulted in sharing of sequences, with important biological and evolutionary implications. Results This study describes a large-scale, systematic bioinformatics approach for identification and characterization of shared sequences between the host and pathogen. An application of the approach is demonstrated through identification and characterization of the Flaviviridae-human share-ome. A total of 2430 nonamers represented the Flaviviridae-human share-ome with 100% identity. Although the share-ome represented a small fraction of the repertoire of Flaviviridae (~ 0.12%) and human (~ 0.013%) non-redundant nonamers, the 2430 shared nonamers mapped to 16,946 Flaviviridae and 7506 human non-redundant protein sequences. The shared nonamer sequences mapped to 125 species of Flaviviridae, including several with unclassified genus. The majority (~ 68%) of the shared sequences mapped to Hepacivirus C species; West Nile, dengue and Zika viruses of the Flavivirus genus accounted for ~ 11%, ~ 7%, and ~ 3%, respectively, of the Flaviviridae protein sequences (16,946) mapped by the share-ome. Further characterization of the share-ome provided important structural-functional insights to Flaviviridae-human interactions. Conclusion Mapping of the host-pathogen share-ome has important implications for the design of vaccines and drugs, diagnostics, disease surveillance and the discovery of unknown, potential host-pathogen interactions. The generic workflow presented herein is potentially applicable to a variety of pathogens, such as of viral, bacterial or parasitic origin. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-07657-4.
Collapse
Affiliation(s)
- Stephen Among James
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Damansara Heights, Kuala Lumpur, 50490, Malaysia.,Department of Biochemistry, Faculty of Science, Kaduna State University, Kaduna, 800211, Nigeria
| | - Hui San Ong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Damansara Heights, Kuala Lumpur, 50490, Malaysia
| | - Ranjeev Hari
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Damansara Heights, Kuala Lumpur, 50490, Malaysia
| | - Asif M Khan
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Damansara Heights, Kuala Lumpur, 50490, Malaysia. .,Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, 34820, Turkey.
| |
Collapse
|
4
|
Chong LC, Lim WL, Ban KHK, Khan AM. An Alignment-Independent Approach for the Study of Viral Sequence Diversity at Any Given Rank of Taxonomy Lineage. BIOLOGY 2021; 10:biology10090853. [PMID: 34571730 PMCID: PMC8466476 DOI: 10.3390/biology10090853] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/13/2021] [Accepted: 08/19/2021] [Indexed: 11/16/2022]
Abstract
The study of viral diversity is imperative in understanding sequence change and its implications for intervention strategies. The widely used alignment-dependent approaches to study viral diversity are limited in their utility as sequence dissimilarity increases, particularly when expanded to the genus or higher ranks of viral species lineage. Herein, we present an alignment-independent algorithm, implemented as a tool, UNIQmin, to determine the effective viral sequence diversity at any rank of the viral taxonomy lineage. This is done by performing an exhaustive search to generate the minimal set of sequences for a given viral non-redundant sequence dataset. The minimal set is comprised of the smallest possible number of unique sequences required to capture the diversity inherent in the complete set of overlapping k-mers encoded by all the unique sequences in the given dataset. Such dataset compression is possible through the removal of unique sequences, whose entire repertoire of overlapping k-mers can be represented by other sequences, thus rendering them redundant to the collective pool of sequence diversity. A significant reduction, namely ~44%, ~45%, and ~53%, was observed for all reported unique sequences of species Dengue virus, genus Flavivirus, and family Flaviviridae, respectively, while still capturing the entire repertoire of nonamer (9-mer) viral peptidome diversity present in the initial input dataset. The algorithm is scalable for big data as it was applied to ~2.2 million non-redundant sequences of all reported viruses. UNIQmin is open source and publicly available on GitHub. The concept of a minimal set is generic and, thus, potentially applicable to other pathogenic microorganisms of non-viral origin, such as bacteria.
Collapse
Affiliation(s)
- Li Chuin Chong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Kuala Lumpur 50490, Malaysia;
| | - Wei Lun Lim
- Faculty of Computing and Informatics, Multimedia University, Cyberjaya 63100, Malaysia;
| | - Kenneth Hon Kim Ban
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117596, Singapore;
| | - Asif M. Khan
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Kuala Lumpur 50490, Malaysia;
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, 34820 Istanbul, Turkey
- Correspondence: or
| |
Collapse
|
5
|
Avian Influenza H7N9 Virus Adaptation to Human Hosts. Viruses 2021; 13:v13050871. [PMID: 34068495 PMCID: PMC8150935 DOI: 10.3390/v13050871] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 04/03/2021] [Accepted: 04/05/2021] [Indexed: 01/06/2023] Open
Abstract
Avian influenza virus A (H7N9), after circulating in avian hosts for decades, was identified as a human pathogen in 2013. Herein, amino acid substitutions possibly essential for human adaptation were identified by comparing the 4706 aligned overlapping nonamer position sequences (1–9, 2–10, etc.) of the reported 2014 and 2017 avian and human H7N9 datasets. The initial set of virus sequences (as of year 2014) exhibited a total of 109 avian-to-human (A2H) signature amino acid substitutions. Each represented the most prevalent substitution at a given avian virus nonamer position that was selectively adapted as the corresponding index (most prevalent sequence) of the human viruses. The majority of these avian substitutions were long-standing in the evolution of H7N9, and only 17 were first detected in 2013 as possibly essential for the initial human adaptation. Strikingly, continued evolution of the avian H7N9 virus has resulted in avian and human protein sequences that are almost identical. This rapid and continued adaptation of the avian H7N9 virus to the human host, with near identity of the avian and human viruses, is associated with increased human infection and a predicted greater risk of human-to-human transmission.
Collapse
|
6
|
Abd Raman HS, Tan S, August JT, Khan AM. Dynamics of Influenza A (H5N1) virus protein sequence diversity. PeerJ 2020; 7:e7954. [PMID: 32518710 PMCID: PMC7261124 DOI: 10.7717/peerj.7954] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 09/26/2019] [Indexed: 11/20/2022] Open
Abstract
Background Influenza A (H5N1) virus is a global concern with potential as a pandemic threat. High sequence variability of influenza A viruses is a major challenge for effective vaccine design. A continuing goal towards this is a greater understanding of influenza A (H5N1) proteome sequence diversity in the context of the immune system (antigenic diversity), the dynamics of mutation, and effective strategies to overcome the diversity for vaccine design. Methods Herein, we report a comprehensive study of the dynamics of H5N1 mutations by analysis of the aligned overlapping nonamer positions (1–9, 2–10, etc.) of more than 13,000 protein sequences of avian and human influenza A (H5N1) viruses, reported over at least 50 years. Entropy calculations were performed on 9,408 overlapping nonamer position of the proteome to study the diversity in the context of immune system. The nonamers represent the predominant length of the binding cores for peptides recognized by the cellular immune system. To further dissect the sequence diversity, each overlapping nonamer position was quantitatively analyzed for four patterns of sequence diversity motifs: index, major, minor and unique. Results Almost all of the aligned overlapping nonamer positions of each viral proteome exhibited variants (major, minor, and unique) to the predominant index sequence. Each variant motif displayed a characteristic pattern of incidence change in relation to increased total variants. The major variant exhibited a restrictive pyramidal incidence pattern, with peak incidence at 50% total variants. Post this peak incidence, the minor variants became the predominant motif for majority of the positions. Unique variants, each sequence observed only once, were present at nearly all of the nonamer positions. The diversity motifs (index and variants) demonstrated complex inter-relationships, with motif switching being a common phenomenon. Additionally, 25 highly conserved sequences were identified to be shared across viruses of both hosts, with half conserved to several other influenza A subtypes. Discussion The presence of distinct sequences (nonatypes) at nearly all nonamer positions represents a large repertoire of reported viral variants in the proteome, which influence the variability dynamics of the viral population. This work elucidated and provided important insights on the components that make up the viral diversity, delineating inherent patterns in the organization of sequence changes that function in the viral fitness-selection. Additionally, it provides a catalogue of all the mutational changes involved in the dynamics of H5N1 viral diversity for both avian and human host populations. This work provides data relevant for the design of prophylactics and therapeutics that overcome the diversity of the virus, and can aid in the surveillance of existing and future strains of influenza viruses.
Collapse
Affiliation(s)
| | - Swan Tan
- School of Data Sciences, Perdana University, Serdang, Selangor, Malaysia.,Institute for Immunology and Informatics, University of Rhode Island, Providence, RI, United States of America
| | - Joseph Thomas August
- School of Medicine, Johns Hopkins University, Baltimore, MD, United States of America
| | - Asif M Khan
- School of Data Sciences, Perdana University, Serdang, Selangor, Malaysia.,School of Medicine, Johns Hopkins University, Baltimore, MD, United States of America.,Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, Turkey
| |
Collapse
|
7
|
West Nile Virus Vaccine Design by T Cell Epitope Selection: In Silico Analysis of Conservation, Functional Cross-Reactivity with the Human Genome, and Population Coverage. J Immunol Res 2020; 2020:7235742. [PMID: 32258174 PMCID: PMC7106935 DOI: 10.1155/2020/7235742] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2019] [Accepted: 12/05/2019] [Indexed: 12/22/2022] Open
Abstract
West Nile Virus (WNV) causes a debilitating and life-threatening neurological disease in humans. Since its emergence in Africa 50 years ago, new strains of WNV and an expanding geographical distribution have increased public health concerns. There are no licensed therapeutics against WNV, limiting effective infection control. Vaccines represent the most efficacious and efficient medical intervention known. Epitope-based vaccines against WNV remain significantly underexploited. Here, we use a selection protocol to identify a set of conserved prevalidated immunogenic T cell epitopes comprising a putative WNV vaccine. Experimentally validated immunogenic WNV epitopes and WNV sequences were retrieved from the IEDB and West Nile Virus Variation Database. Clustering and multiple sequence alignment identified a smaller subset of representative sequences. Protein variability analysis identified evolutionarily conserved sequences, which were used to select a diverse set of immunogenic candidate T cell epitopes. Cross-reactivity and human leukocyte antigen-binding affinities were assessed to eliminate unsuitable epitope candidates. Population protection coverage (PPC) quantified individual epitopes and epitope combinations against the world population. 3 CD8+ T cell epitopes (ITYTDVLRY, TLARGFPFV, and SYHDRRWCF) and 1 CD4+ epitope (VTVNPFVSVATANAKVLI) were selected as a putative WNV vaccine, with an estimated PPC of 97.14%.
Collapse
|
8
|
Chong LC, Khan AM. Identification of highly conserved, serotype-specific dengue virus sequences: implications for vaccine design. BMC Genomics 2019; 20:921. [PMID: 31874646 PMCID: PMC6929274 DOI: 10.1186/s12864-019-6311-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Accepted: 11/19/2019] [Indexed: 11/24/2022] Open
Abstract
Background The sequence diversity of dengue virus (DENV) is one of the challenges in developing an effective vaccine against the virus. Highly conserved, serotype-specific (HCSS), immune-relevant DENV sequences are attractive candidates for vaccine design, and represent an alternative to the approach of selecting pan-DENV conserved sequences. The former aims to limit the number of possible cross-reactive epitope variants in the population, while the latter aims to limit the cross-reactivity between the serotypes to favour a serotype-specific response. Herein, we performed a large-scale systematic study to map and characterise HCSS sequences in the DENV proteome. Methods All reported DENV protein sequence data for each serotype was retrieved from the NCBI Entrez Protein (nr) Database (txid: 12637). The downloaded sequences were then separated according to the individual serotype proteins by use of BLASTp search, and subsequently removed for duplicates and co-aligned across the serotypes. Shannon’s entropy and mutual information (MI) analyses, by use of AVANA, were performed to measure the diversity within and between the serotype proteins to identify HCSS nonamers. The sequences were evaluated for the presence of promiscuous T-cell epitopes by use of NetCTLpan 1.1 and NetMHCIIpan 3.2 server for human leukocyte antigen (HLA) class I and class II supertypes, respectively. The predicted epitopes were matched to reported epitopes in the Immune Epitope Database. Results A total of 2321 nonamers met the HCSS selection criteria of entropy < 0.25 and MI > 0.8. Concatenating these resulted in a total of 337 HCSS sequences. DENV4 had the most number of HCSS nonamers; NS5, NS3 and E proteins had among the highest, with none in the C and only one in prM. The HCSS sequences were immune-relevant; 87 HCSS sequences were both reported T-cell epitopes/ligands in human and predicted epitopes, supporting the accuracy of the predictions. A number of the HCSS clustered as immunological hotspots and exhibited putative promiscuity beyond a single HLA supertype. The HCSS sequences represented, on average, ~ 40% of the proteome length for each serotype; more than double of pan-DENV sequences (conserved across the four serotypes), and thus offer a larger choice of sequences for vaccine target selection. HCSS sequences of a given serotype showed significant amino acid difference to all the variants of the other serotypes, supporting the notion of serotype-specificity. Conclusion This work provides a catalogue of HCSS sequences in the DENV proteome, as candidates for vaccine target selection. The methodology described herein provides a framework for similar application to other pathogens.
Collapse
Affiliation(s)
- Li Chuin Chong
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Jalan MAEPS Perdana, 43400, Serdang, Selangor Darul Ehsan, Malaysia
| | - Asif M Khan
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Jalan MAEPS Perdana, 43400, Serdang, Selangor Darul Ehsan, Malaysia.
| |
Collapse
|
9
|
Abstract
With the rise in novel infectious agents and disease pandemics, a new era of vaccine discovery is necessary. To address this, the new field of immunomics is described, which is synergistically powered by integrating bioinformatics methodologies with technological advances in biology and high-throughput instrumentation. By incorporating biological data from immunology and molecular biology with current genomics and proteomics, immunomics is geared to deliver an insight into immune function, optimal stimulation of immune responses and precise mapping and rational selection of immune targets that cover antigenic diversity. These efforts are expected to contribute towards the development of new generation of vaccines, tailored to both the genetic make-up of the human population and of the pathogen. Vaccine technologies are also being explored for prevention or control of non-communicable diseases.
Collapse
|
10
|
Tripathi NK, Karothia D, Shrivastava A, Banger S, Kumar JS. Enhanced production and immunological characterization of recombinant West Nile virus envelope domain III protein. N Biotechnol 2018; 46:7-13. [PMID: 29768182 DOI: 10.1016/j.nbt.2018.05.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Revised: 05/04/2018] [Accepted: 05/06/2018] [Indexed: 12/18/2022]
Abstract
West Nile virus (WNV) is an emerging mosquito-borne virus which is responsible for severe and fatal encephalitis in humans and for which there is no licensed vaccine or therapeutic available to prevent infection. The envelope domain III protein (EDIII) of WNV was over-expressed in Escherichia coli and purified using a two-step chromatography process which included immobilized metal affinity chromatography and ion exchange chromatography. E. coli cells were grown in a bioreactor to high density using batch and fed-batch cultivation. Wet biomass obtained after batch and fed-batch cultivation processes was 11.2 g and 84 g/L of culture respectively. Protein yield after affinity purification was 5.76 mg and 5.81 mg/g wet cell weight after batch and fed-batch processes respectively. The purified WNV EDIII elicited specific antibodies in rabbits, confirming its immunogenicity. Moreover, the antibodies were able to neutralize WNV in vitro. These results established that the refolded and purified WNV EDIII could be a potential vaccine candidate.
Collapse
Affiliation(s)
- Nagesh K Tripathi
- Bioprocess Scale Up Facility, Defence Research and Development Establishment, Jhansi Road, Gwalior, 474002, India.
| | - Divyanshi Karothia
- Division of Virology, Defence Research and Development Establishment, Jhansi Road, Gwalior, 474002, India
| | - Ambuj Shrivastava
- Division of Virology, Defence Research and Development Establishment, Jhansi Road, Gwalior, 474002, India
| | - Swati Banger
- Bioprocess Scale Up Facility, Defence Research and Development Establishment, Jhansi Road, Gwalior, 474002, India
| | - Jyoti S Kumar
- Division of Virology, Defence Research and Development Establishment, Jhansi Road, Gwalior, 474002, India
| |
Collapse
|
11
|
Abstract
Background Ebolavirus (EBOV) is responsible for one of the most fatal diseases encountered by mankind. Cellular T-cell responses have been implicated to be important in providing protection against the virus. Antigenic variation can result in viral escape from immune recognition. Mapping targets of immune responses among the sequence of viral proteins is, thus, an important first step towards understanding the immune responses to viral variants and can aid in the identification of vaccine targets. Herein, we performed a large-scale, proteome-wide mapping and diversity analyses of putative HLA supertype-restricted T-cell epitopes of Zaire ebolavirus (ZEBOV), the most pathogenic species among the EBOV family. Methods All publicly available ZEBOV sequences (14,098) for each of the nine viral proteins were retrieved, removed of irrelevant and duplicate sequences, and aligned. The overall proteome diversity of the non-redundant sequences was studied by use of Shannon’s entropy. The sequences were predicted, by use of the NetCTLpan server, for HLA-A2, -A3, and -B7 supertype-restricted epitopes, which are relevant to African and other ethnicities and provide for large (~86%) population coverage. The predicted epitopes were mapped to the alignment of each protein for analyses of antigenic sequence diversity and relevance to structure and function. The putative epitopes were validated by comparison with experimentally confirmed epitopes. Results & discussion ZEBOV proteome was generally conserved, with an average entropy of 0.16. The 185 HLA supertype-restricted T-cell epitopes predicted (82 (A2), 37 (A3) and 66 (B7)) mapped to 125 alignment positions and covered ~24% of the proteome length. Many of the epitopes showed a propensity to co-localize at select positions of the alignment. Thirty (30) of the mapped positions were completely conserved and may be attractive for vaccine design. The remaining (95) positions had one or more epitopes, with or without non-epitope variants. A significant number (24) of the putative epitopes matched reported experimentally validated HLA ligands/T-cell epitopes of A2, A3 and/or B7 supertype representative allele restrictions. The epitopes generally corresponded to functional motifs/domains and there was no correlation to localization on the protein 3D structure. These data and the epitope map provide important insights into the interaction between EBOV and the host immune system. Electronic supplementary material The online version of this article (10.1186/s12864-017-4328-8) contains supplementary material, which is available to authorized users.
Collapse
|
12
|
Khan AM, Hu Y, Miotto O, Thevasagayam NM, Sukumaran R, Abd Raman HS, Brusic V, Tan TW, Thomas August J. Analysis of viral diversity for vaccine target discovery. BMC Med Genomics 2017; 10:78. [PMID: 29322922 PMCID: PMC5763473 DOI: 10.1186/s12920-017-0301-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Viral vaccine target discovery requires understanding the diversity of both the virus and the human immune system. The readily available and rapidly growing pool of viral sequence data in the public domain enable the identification and characterization of immune targets relevant to adaptive immunity. A systematic bioinformatics approach is necessary to facilitate the analysis of such large datasets for selection of potential candidate vaccine targets. RESULTS This work describes a computational methodology to achieve this analysis, with data of dengue, West Nile, hepatitis A, HIV-1, and influenza A viruses as examples. Our methodology has been implemented as an analytical pipeline that brings significant advancement to the field of reverse vaccinology, enabling systematic screening of known sequence data in nature for identification of vaccine targets. This includes key steps (i) comprehensive and extensive collection of sequence data of viral proteomes (the virome), (ii) data cleaning, (iii) large-scale sequence alignments, (iv) peptide entropy analysis, (v) intra- and inter-species variation analysis of conserved sequences, including human homology analysis, and (vi) functional and immunological relevance analysis. CONCLUSION These steps are combined into the pipeline ensuring that a more refined process, as compared to a simple evolutionary conservation analysis, will facilitate a better selection of vaccine targets and their prioritization for subsequent experimental validation.
Collapse
Affiliation(s)
- Asif M. Khan
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Jalan MAEPS Perdana, Serdang, Selangor Darul Ehsan 43400 Malaysia
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205 USA
| | - Yongli Hu
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597 Singapore
| | - Olivo Miotto
- Centre for Genomics and Global Health, University of Oxford, Oxford, UK
- Mahidol-Oxford Research Unit, Faculty of Tropical Medicine, Mahidol University, Rajthevee, Bangkok, Thailand
| | - Natascha M. Thevasagayam
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597 Singapore
| | - Rashmi Sukumaran
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597 Singapore
| | - Hadia Syahirah Abd Raman
- Centre for Bioinformatics, School of Data Sciences, Perdana University, Jalan MAEPS Perdana, Serdang, Selangor Darul Ehsan 43400 Malaysia
| | - Vladimir Brusic
- Menzies Health Institute Queensland, Griffith University, Parklands Dr, Southport, 4215 QLD Australia
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore, 117597 Singapore
| | - J. Thomas August
- Department of Pharmacology and Molecular Sciences, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205 USA
| |
Collapse
|
13
|
T cell epitope mapping of the e-protein of West Nile virus in BALB/c mice. PLoS One 2014; 9:e115343. [PMID: 25506689 PMCID: PMC4266646 DOI: 10.1371/journal.pone.0115343] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 11/21/2014] [Indexed: 11/29/2022] Open
Abstract
West Nile virus (WNV) is a zoonotic virus, which is transmitted by mosquitoes. It is the causative agent of the disease syndrome called West Nile fever. In some human cases, a WNV infection can be associated with severe neurological symptoms. The immune response to WNV is multifactorial and includes both humoral and cellular immunity. T-cell epitope mapping of the WNV envelope (E) protein has been performed in C57BL/6 mice, but not in BALB/c mice. Therefore, we performed in BALB/c mice a T-cell epitope mapping using a series of peptides spanning the WNV envelope (E) protein. To this end, the WNV-E specific T cell repertoire was first expanded by vaccinating BALB/c mice with a DNA vaccine that generates subviral particles that resemble West Nile virus. Furthermore, the WNV structural protein was expressed in Escherichia coli as a series of overlapping 20-mer peptides fused to a carrier-protein. Cytokine-based ELISPOT assays using these purified peptides revealed positive WNV-specific T cell responses to peptides within the different domains of the E-protein.
Collapse
|
14
|
Langevin SA, Bowen RA, Reisen WK, Andrade CC, Ramey WN, Maharaj PD, Anishchenko M, Kenney JL, Duggal NK, Romo H, Bera AK, Sanders TA, Bosco-Lauth A, Smith JL, Kuhn R, Brault AC. Host competence and helicase activity differences exhibited by West Nile viral variants expressing NS3-249 amino acid polymorphisms. PLoS One 2014; 9:e100802. [PMID: 24971589 PMCID: PMC4074097 DOI: 10.1371/journal.pone.0100802] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2014] [Accepted: 05/14/2014] [Indexed: 01/27/2023] Open
Abstract
A single helicase amino acid substitution, NS3-T249P, has been shown to increase viremia magnitude/mortality in American crows (AMCRs) following West Nile virus (WNV) infection. Lineage/intra-lineage geographic variants exhibit consistent amino acid polymorphisms at this locus; however, the majority of WNV isolates associated with recent outbreaks reported worldwide have a proline at the NS3-249 residue. In order to evaluate the impact of NS3-249 variants on avian and mammalian virulence, multiple amino acid substitutions were engineered into a WNV infectious cDNA (NY99; NS3-249P) and the resulting viruses inoculated into AMCRs, house sparrows (HOSPs) and mice. Differential viremia profiles were observed between mutant viruses in the two bird species; however, the NS3-249P virus produced the highest mean peak viral loads in both avian models. In contrast, this avian modulating virulence determinant had no effect on LD50 or the neurovirulence phenotype in the murine model. Recombinant helicase proteins demonstrated variable helicase and ATPase activities; however, differences did not correlate with avian or murine viremia phenotypes. These in vitro and in vivo data indicate that avian-specific phenotypes are modulated by critical viral-host protein interactions involving the NS3-249 residue that directly influence transmission efficiency and therefore the magnitude of WNV epizootics in nature.
Collapse
Affiliation(s)
- Stanley A. Langevin
- Center for Vectorborne Diseases and Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California, Davis, California, United States of America
| | - Richard A. Bowen
- Department of Biomedical Sciences, Colorado State University, Fort Collins, Colorado, United States of America
| | - William K. Reisen
- Center for Vectorborne Diseases and Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California, Davis, California, United States of America
| | - Christy C. Andrade
- Center for Vectorborne Diseases and Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California, Davis, California, United States of America
| | - Wanichaya N. Ramey
- Center for Vectorborne Diseases and Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California, Davis, California, United States of America
| | - Payal D. Maharaj
- Center for Vectorborne Diseases and Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California, Davis, California, United States of America
| | - Michael Anishchenko
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
| | - Joan L. Kenney
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
| | - Nisha K. Duggal
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
| | - Hannah Romo
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
| | - Aloke Kumar Bera
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Todd A. Sanders
- United States Fish and Wildlife Service, Portland, Oregon, United States of America
| | - Angela Bosco-Lauth
- Department of Biomedical Sciences, Colorado State University, Fort Collins, Colorado, United States of America
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
| | - Janet L. Smith
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Richard Kuhn
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Aaron C. Brault
- Center for Vectorborne Diseases and Department of Pathology, Microbiology and Immunology, School of Veterinary Medicine, University of California, Davis, California, United States of America
- Division of Vector-Borne Diseases, Centers for Disease Control and Prevention, Fort Collins, Colorado, United States of America
- * E-mail:
| |
Collapse
|
15
|
Hu Y, Tan PT, Tan TW, August JT, Khan AM. Dissecting the dynamics of HIV-1 protein sequence diversity. PLoS One 2013; 8:e59994. [PMID: 23593157 PMCID: PMC3617185 DOI: 10.1371/journal.pone.0059994] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2012] [Accepted: 02/21/2013] [Indexed: 12/22/2022] Open
Abstract
The rapid mutation of human immunodeficiency virus-type 1 (HIV-1) and the limited characterization of the composition and incidence of the variant population are major obstacles to the development of an effective HIV-1 vaccine. This issue was addressed by a comprehensive analysis of over 58,000 clade B HIV-1 protein sequences reported over at least 26 years. The sequences were aligned and the 2,874 overlapping nonamer amino acid positions of the viral proteome, each a possible core binding domain for human leukocyte antigen molecules and T-cell receptors, were quantitatively analyzed for four patterns of sequence motifs: (1) "index", the most prevalent sequence; (2) "major" variant, the most common variant sequence; (3) "minor" variants, multiple different sequences, each with an incidence less than that of the major variant; and (4) "unique" variants, each observed only once in the alignment. The collective incidence of the major, minor, and unique variants at each nonamer position represented the total variant population for the position. Positions with more than 50% total variants contained correspondingly reduced incidences of index and major variant sequences and increased minor and unique variants. Highly diverse positions, with 80 to 98% variant nonamer sequences, were present in each protein, including 5% of Gag, and 27% of Env and Nef, each. The multitude of different variant nonamer sequences (i.e. nonatypes; up to 68%) at the highly diverse positions, represented by the major, multiple minor, and multiple unique variants likely supported variants function both in immune escape and as altered peptide ligands with deleterious T-cell responses. The patterns of mutational change were consistent with the sequences of individual HXB2 and C1P viruses and can be considered applicable to all HIV-1 viruses. This characterization of HIV-1 protein mutation provides a foundation for the design of peptide-based vaccines and therapeutics.
Collapse
Affiliation(s)
- Yongli Hu
- Perdana University Graduate School of Medicine, Selangor Darul Ehsan, Malaysia
| | | | | | | | | |
Collapse
|
16
|
Yang CW. A comparative study of short linear motif compositions of the influenza A virus ribonucleoproteins. PLoS One 2012; 7:e38637. [PMID: 22715401 PMCID: PMC3371030 DOI: 10.1371/journal.pone.0038637] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2012] [Accepted: 05/08/2012] [Indexed: 11/19/2022] Open
Abstract
Protein-protein interactions through short linear motifs (SLiMs) are an emerging concept that is different from interactions between globular domains. The SLiMs encode a functional interaction interface in a short (three to ten residues) poorly conserved sequence. This characteristic makes them much more likely to arise/disappear spontaneously via mutations, and they may be more evolutionarily labile than globular domains. The diversity of SLiM composition may provide functional diversity for a viral protein from different viral strains. This study is designed to determine the different SLiM compositions of ribonucleoproteins (RNPs) from influenza A viruses (IAVs) from different hosts and with different levels of virulence. The 96 consensus sequences (regular expressions) of SLiMs from the ELM server were used to conduct a comprehensive analysis of the 52,513 IAV RNP sequences. The SLiM compositions of RNPs from IAVs from different hosts and with different levels of virulence were compared. The SLiM compositions of 845 RNPs from highly virulent/pandemic IAVs were also analyzed. In total, 292 highly conserved SLiMs were found in RNPs regardless of the IAV host range. These SLiMs may be basic motifs that are essential for the normal functions of RNPs. Moreover, several SLiMs that are rare in seasonal IAV RNPs but are present in RNPs from highly virulent/pandemic IAVs were identified. The SLiMs identified in this study provide a useful resource for experimental virologists to study the interactions between IAV RNPs and host intracellular proteins. Moreover, the SLiM compositions of IAV RNPs also provide insights into signal transduction pathways and protein interaction networks with which IAV RNPs might be involved. Information about SLiMs might be useful for the development of anti-IAV drugs.
Collapse
Affiliation(s)
- Chu-Wen Yang
- Department of Microbiology, Soochow University, Shih-Lin, Taipei, Taiwan, Republic of China.
| |
Collapse
|
17
|
West Nile virus T-cell ligand sequences shared with other flaviviruses: a multitude of variant sequences as potential altered peptide ligands. J Virol 2012; 86:7616-24. [PMID: 22573867 DOI: 10.1128/jvi.00166-12] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic relatedness and cocirculation of several major human pathogen flaviviruses are recognized as a possible cause of deleterious immune responses to mixed infection or immunization and call for a greater understanding of the inter-Flavivirus protein homologies. This study focused on the identification of human leukocyte antigen (HLA)-restricted West Nile virus (WNV) T-cell ligands and characterization of their distribution in reported sequence data of WNV and other flaviviruses. H-2-deficient mice transgenic for either A2, A24, B7, DR2, DR3, or DR4 HLA alleles were immunized with overlapping peptides of the WNV proteome, and peptide-specific T-cell activation was measured by gamma interferon (IFN-γ) enzyme-linked immunosorbent spot (ELISpot) assays. Approximately 30% (137) of the WNV proteome peptides were identified as HLA-restricted T-cell ligands. The majority of these ligands were conserved in ∼≥88% of analyzed WNV sequences. Notably, only 51 were WNV specific, and the remaining 86, chiefly of E, NS3, and NS5, shared an identity of nine or more consecutive amino acids with sequences of 64 other flaviviruses, including several major human pathogens. Many of the shared ligands had an incidence of >50% in the analyzed sequences of one or more of six major flaviviruses. The multitude of WNV sequences shared with other flaviviruses as interspecies variants highlights the possible hazard of defective T-cell activation by altered peptide ligands in the event of dual exposure to WNV and other flaviviruses, by either infection or immunization. The data suggest the possible preferred use of sequences that are pathogen specific with minimum interspecies sequence homology for the design of Flavivirus vaccines.
Collapse
|
18
|
Olsen LR, Zhang GL, Keskin DB, Reinherz EL, Brusic V. Conservation analysis of dengue virus T-cell epitope-based vaccine candidates using Peptide block entropy. Front Immunol 2011; 2:69. [PMID: 22566858 PMCID: PMC3341948 DOI: 10.3389/fimmu.2011.00069] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Accepted: 11/14/2011] [Indexed: 01/02/2023] Open
Abstract
Broad coverage of the pathogen population is particularly important when designing CD8+ T-cell epitope vaccines against viral pathogens. Traditional approaches are based on combinations of highly conserved T-cell epitopes. Peptide block entropy analysis is a novel approach for assembling sets of broadly covering antigens. Since T-cell epitopes are recognized as peptides rather than individual residues, this method is based on calculating the information content of blocks of peptides from a multiple sequence alignment of homologous proteins rather than using the information content of individual residues. The block entropy analysis provides broad coverage of variant antigens. We applied the block entropy analysis method to the proteomes of the four serotypes of dengue virus (DENV) and found 1,551 blocks of 9-mer peptides, which cover 99% of available sequences with five or fewer unique peptides. In contrast, the benchmark study by Khan et al. (2008) resulted in 165 conserved 9-mer peptides. Many of the conserved blocks are located consecutively in the proteins. Connecting these blocks resulted in 78 conserved regions. Of the 1551 blocks of 9-mer peptides 110 comprised predicted HLA binder sets. In total, 457 subunit peptides that encompass the diversity of all sequenced DENV strains of which 333 are T-cell epitope candidates.
Collapse
Affiliation(s)
- Lars Rønn Olsen
- Cancer Vaccine Center, Dana-Farber Cancer Institute Boston, MA, USA
| | | | | | | | | |
Collapse
|
19
|
Tong JC, Ng LFP. Understanding infectious agents from an in silico perspective. Drug Discov Today 2010; 16:42-9. [PMID: 20974283 PMCID: PMC7185741 DOI: 10.1016/j.drudis.2010.10.007] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2010] [Revised: 10/01/2010] [Accepted: 10/18/2010] [Indexed: 12/31/2022]
Abstract
Knowledge of infectious diseases now emerging from genomic, proteomic, epidemiological and clinical data can provide insights into the mechanisms of immune function, disease pathogenesis and epidemiology. Here, we describe how considerable advances in computational methods of data mining, mathematical modeling in epidemiology and simulation have been used to enhance our understanding of infectious agents and discuss their impact on the discovery of new therapeutics and controlling their spread.
Collapse
Affiliation(s)
- Joo Chuan Tong
- Data Mining Department, Institute for Infocomm Research, 1 Fusionopolis Way, 21-01 Connexis South Tower, Singapore 138632, Singapore.
| | | |
Collapse
|
20
|
Larsen MV, Lelic A, Parsons R, Nielsen M, Hoof I, Lamberth K, Loeb MB, Buus S, Bramson J, Lund O. Identification of CD8+ T cell epitopes in the West Nile virus polyprotein by reverse-immunology using NetCTL. PLoS One 2010; 5:e12697. [PMID: 20856867 PMCID: PMC2939062 DOI: 10.1371/journal.pone.0012697] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2010] [Accepted: 08/21/2010] [Indexed: 11/19/2022] Open
Abstract
Background West Nile virus (WNV) is a growing threat to public health and a greater understanding of the immune response raised against WNV is important for the development of prophylactic and therapeutic strategies. Methodology/Principal Findings In a reverse-immunology approach, we used bioinformatics methods to predict WNV-specific CD8+ T cell epitopes and selected a set of peptides that constitutes maximum coverage of 20 fully-sequenced WNV strains. We then tested these putative epitopes for cellular reactivity in a cohort of WNV-infected patients. We identified 26 new CD8+ T cell epitopes, which we propose are restricted by 11 different HLA class I alleles. Aiming for optimal coverage of human populations, we suggest that 11 of these new WNV epitopes would be sufficient to cover from 48% to 93% of ethnic populations in various areas of the World. Conclusions/Significance The 26 identified CD8+ T cell epitopes contribute to our knowledge of the immune response against WNV infection and greatly extend the list of known WNV CD8+ T cell epitopes. A polytope incorporating these and other epitopes could possibly serve as the basis for a WNV vaccine.
Collapse
Affiliation(s)
- Mette Voldby Larsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.
| | | | | | | | | | | | | | | | | | | |
Collapse
|