1
|
Trevizani R, Yan Z, Greenbaum JA, Sette A, Nielsen M, Peters B. A comprehensive analysis of the IEDB MHC class-I automated benchmark. Brief Bioinform 2022; 23:6632617. [PMID: 35794711 DOI: 10.1093/bib/bbac259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/27/2022] [Accepted: 06/05/2022] [Indexed: 11/12/2022] Open
Abstract
In 2014, the Immune Epitope Database automated benchmark was created to compare the performance of the MHC class I binding predictors. However, this is not a straightforward process due to the different and non-standardized outputs of the methods. Additionally, some methods are more restrictive regarding the HLA alleles and epitope sizes for which they predict binding affinities, while others are more comprehensive. To address how these problems impacted the ranking of the predictors, we developed an approach to assess the reliability of different metrics. We found that using percentile-ranked results improved the stability of the ranks and allowed the predictors to be reliably ranked despite not being evaluated on the same data. We also found that given the rate new data are incorporated into the benchmark, a new method must wait for at least 4 years to be ranked against the pre-existing methods. The best-performing tools with statistically indistinguishable scores in this benchmark were NetMHCcons, NetMHCpan4.0, ANN3.4, NetMHCpan3.0 and NetMHCpan2.8. The results of this study will be used to improve the evaluation and display of benchmark performance. We highly encourage anyone working on MHC binding predictions to participate in this benchmark to get an unbiased evaluation of their predictors.
Collapse
Affiliation(s)
- Raphael Trevizani
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA.,Fiocruz Ceará, Fundação Oswaldo Cruz, Rua São José s/n, Precabura, Eusébio/CE, Brazil
| | - Zhen Yan
- Bioinformatics Core, La Jolla Institute for Immunology, La Jolla, California 92037, USA
| | - Jason A Greenbaum
- Bioinformatics Core, La Jolla Institute for Immunology, La Jolla, California 92037, USA
| | - Alessandro Sette
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA.,Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Morten Nielsen
- Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.,Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, B1650 Buenos Aires, Argentina
| | - Bjoern Peters
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA.,Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
2
|
Boßelmann CM, Hedrich UB, Müller P, Sonnenberg L, Parthasarathy S, Helbig I, Lerche H, Pfeifer N. Predicting the functional effects of voltage-gated potassium channel missense variants with multi-task learning. EBioMedicine 2022; 81:104115. [PMID: 35759918 PMCID: PMC9250003 DOI: 10.1016/j.ebiom.2022.104115] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 05/30/2022] [Accepted: 05/31/2022] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Variants in genes encoding voltage-gated potassium channels are associated with a broad spectrum of neurological diseases including epilepsy, ataxia, and intellectual disability. Knowledge of the resulting functional changes, characterized as overall ion channel gain- or loss-of-function, is essential to guide clinical management including precision medicine therapies. However, for an increasing number of variants, little to no experimental data is available. New tools are needed to evaluate variant functional effects. METHODS We catalogued a comprehensive dataset of 959 functional experiments across 19 voltage-gated potassium channels, leveraging data from 782 unique disease-associated and synthetic variants. We used these data to train a taxonomy-based multi-task learning support vector machine (MTL-SVM), and compared performance to several baseline methods. FINDINGS MTL-SVM maintains channel family structure during model training, improving overall predictive performance (mean balanced accuracy 0·718 ± 0·041, AU-ROC 0·761 ± 0·063) over baseline (mean balanced accuracy 0·620 ± 0·045, AU-ROC 0·711 ± 0·022). We can obtain meaningful predictions even for channels with few known variants (KCNC1, KCNQ5). INTERPRETATION Our model enables functional variant prediction for voltage-gated potassium channels. It may assist in tailoring current and future precision therapies for the increasing number of patients with ion channel disorders. FUNDING This work was supported by intramural funding of the Medical Faculty, University of Tuebingen (PATE F.1315137.1), the Federal Ministry for Education and Research (Treat-ION, 01GM1907A/B/G/H) and the German Research Foundation (FOR-2715, Le1030/16-2, He8155/1-2).
Collapse
Affiliation(s)
- Christian Malte Boßelmann
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Hoppe-Seyler-Str. 3, D-72076 Tuebingen, Germany,Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Sand 14, D-72076 Tuebingen, Germany
| | - Ulrike B.S. Hedrich
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Hoppe-Seyler-Str. 3, D-72076 Tuebingen, Germany
| | - Peter Müller
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Hoppe-Seyler-Str. 3, D-72076 Tuebingen, Germany
| | - Lukas Sonnenberg
- Institute for Neurobiology, University of Tuebingen, Tuebingen, Germany
| | - Shridhar Parthasarathy
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ingo Helbig
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA,The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA, USA,Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA, USA,Department of Neurology, University of Pennsylvania, Philadelphia, PA, USA
| | - Holger Lerche
- Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tuebingen, Hoppe-Seyler-Str. 3, D-72076 Tuebingen, Germany,Corresponding authors.
| | - Nico Pfeifer
- Methods in Medical Informatics, Department of Computer Science, University of Tuebingen, Sand 14, D-72076 Tuebingen, Germany,Interfaculty Institute for Biomedical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany,Faculty of Medicine, University of Tuebingen, Tuebingen, Germany,German Center for Infection Research, Partner Site Tuebingen, Tuebingen, Germany,Corresponding authors.
| |
Collapse
|
3
|
Khetan R, Curtis R, Deane CM, Hadsund JT, Kar U, Krawczyk K, Kuroda D, Robinson SA, Sormanni P, Tsumoto K, Warwicker J, Martin ACR. Current advances in biopharmaceutical informatics: guidelines, impact and challenges in the computational developability assessment of antibody therapeutics. MAbs 2022; 14:2020082. [PMID: 35104168 PMCID: PMC8812776 DOI: 10.1080/19420862.2021.2020082] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Therapeutic monoclonal antibodies and their derivatives are key components of clinical pipelines in the global biopharmaceutical industry. The availability of large datasets of antibody sequences, structures, and biophysical properties is increasingly enabling the development of predictive models and computational tools for the "developability assessment" of antibody drug candidates. Here, we provide an overview of the antibody informatics tools applicable to the prediction of developability issues such as stability, aggregation, immunogenicity, and chemical degradation. We further evaluate the opportunities and challenges of using biopharmaceutical informatics for drug discovery and optimization. Finally, we discuss the potential of developability guidelines based on in silico metrics that can be used for the assessment of antibody stability and manufacturability.
Collapse
Affiliation(s)
- Rahul Khetan
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Robin Curtis
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | | | | | - Uddipan Kar
- Department of Biological Engineering, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | | | - Daisuke Kuroda
- Department of Bioengineering, School of Engineering, The University of Tokyo, Tokyo, Japan.,Medical Device Development and Regulation Research Center, School of Engineering, The University of Tokyo, Tokyo, Japan.,Department of Chemistry and Biotechnology, School of Engineering, The University of Tokyo, Tokyo, Japan
| | | | - Pietro Sormanni
- Chemistry of Health, Yusuf Hamied Department of Chemistry, University of Cambridge
| | - Kouhei Tsumoto
- Department of Bioengineering, School of Engineering, The University of Tokyo, Tokyo, Japan.,Medical Device Development and Regulation Research Center, School of Engineering, The University of Tokyo, Tokyo, Japan.,Department of Chemistry and Biotechnology, School of Engineering, The University of Tokyo, Tokyo, Japan.,The Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Jim Warwicker
- Manchester Institute of Biotechnology, University of Manchester, Manchester, UK
| | - Andrew C R Martin
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK
| |
Collapse
|
4
|
McCaffrey P. Bioinformatic Techniques for Vaccine Development: Epitope Prediction and Structural Vaccinology. Methods Mol Biol 2022; 2412:413-423. [PMID: 34918258 DOI: 10.1007/978-1-0716-1892-9_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Structural vaccinology involves characterizing the interactions between an antigen and antibodies or host immune receptors. Central to this is the task of epitope prediction, which involves describing the binding affinity and interactions of a given peptide typically to the major histocompatibility complex in the case of T-cells or to the antibodies in the case of B-cells. Several computational models exist for this purpose which we will review here. Generally, epitope predictions for MHC-I and MHC-II are substantially different tasks as well as epitope prediction for continuous versus discontinuous B-cell epitopes. Overall, these models suffer from overprediction of epitopes although general themes support both the use of neural networks as well as the incorporation of more abundant and more varied experimental annotation into model training as valuable in improving predictive performance.
Collapse
Affiliation(s)
- Peter McCaffrey
- Department of Pathology, University of Texas Medical Branch, Galveston, TX, USA.
| |
Collapse
|
5
|
Di D, Nunes JM, Jiang W, Sanchez-Mazas A. Like Wings of a Bird: Functional Divergence and Complementarity between HLA-A and HLA-B Molecules. Mol Biol Evol 2021; 38:1580-1594. [PMID: 33320202 PMCID: PMC8355449 DOI: 10.1093/molbev/msaa325] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Human leukocyte antigen (HLA) genes are among the most polymorphic of our genome, as a likely consequence of balancing selection related to their central role in adaptive immunity. HLA-A and HLA-B genes were recently suggested to evolve through a model of joint divergent asymmetric selection conferring all human populations, including those with severe loss of diversity, an equivalent immune potential. However, the mechanisms by which these two genes might undergo joint evolution while displaying very distinct allelic profiles in populations are still unknown. To address this issue, we carried out extensive data analyses (among which factorial correspondence analysis and linear modeling) on 2,909 common and rare HLA-A, HLA-B, and HLA-C alleles and 200,000 simulated pathogenic peptides by taking into account sequence variation, predicted peptide-binding affinity and HLA allele frequencies in 123 populations worldwide. Our results show that HLA-A and HLA-B (but not HLA-C) molecules maintain considerable functional divergence in almost all populations, which likely plays an instrumental role in their immune defense. We also provide robust evidence of functional complementarity between HLA-A and HLA-B molecules, which display asymmetric relationships in terms of amino acid diversity at both inter- and intraprotein levels and in terms of promiscuous or fastidious peptide-binding specificities. Like two wings of a flying bird, the functional complementarity of HLA-A and HLA-B is a perfect example, in our genome, of duplicated genes sharing their capacity of assuming common vital functions while being submitted to complex and sometimes distinct environmental pressures.
Collapse
Affiliation(s)
- Da Di
- Laboratory of Anthropology, Genetics and Peopling History (AGP Lab), Department of Genetics and Evolution-Anthropology Unit, University of Geneva, Geneva, Switzerland
| | - Jose Manuel Nunes
- Laboratory of Anthropology, Genetics and Peopling History (AGP Lab), Department of Genetics and Evolution-Anthropology Unit, University of Geneva, Geneva, Switzerland.,Institute of Genetics and Genomics in Geneva (IGE3), University of Geneva Medical Centre (CMU), Geneva, Switzerland
| | - Wei Jiang
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Alicia Sanchez-Mazas
- Laboratory of Anthropology, Genetics and Peopling History (AGP Lab), Department of Genetics and Evolution-Anthropology Unit, University of Geneva, Geneva, Switzerland.,Institute of Genetics and Genomics in Geneva (IGE3), University of Geneva Medical Centre (CMU), Geneva, Switzerland
| |
Collapse
|
6
|
Sadri Najafabadi Z, Nazarian S, Kargar M, Kafilzadeh F. Designing of a chimeric protein contains StxB, intimin and EscC against toxicity and adherence of enterohemorrhagic Escherichia coli O157:H7 and evaluation of serum antibody titers against it. Mol Immunol 2021; 134:218-227. [PMID: 33823320 DOI: 10.1016/j.molimm.2021.03.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Revised: 02/08/2021] [Accepted: 03/15/2021] [Indexed: 10/21/2022]
Abstract
Enterohemorrhagic Escherichia coli (EHEC) O157:H7 strain is known as one of the major human foodborne pathogens. Lack of effective clinical treatment for human diarrheal diseases confirms the need for vaccine production against enteric bacteria such as E.coli O157:H7. Shiga-like toxin (Stx), EscC, and Intimin are the main important virulent factors of this enteric pathogen. In the present study, a comparative Omics analysis was conducted to identify most invasion EHEC antigenic factors as a potential immunogen. SEI (Stx-EscC-Intimin) trivalent chimeric protein was designed from the exposed and epitope rich part of these virulence factors. Sequence optimization, physicochemical properties, mRNA folding, three-dimensional structure and immunoinformatics data were investigated. The chimeric gene was synthesized with codon bias of E. coli. Recombinant protein was expressed and confirmed by western blot analysis. To evaluate the immunogenicity of the designed protein, the protein was administered to BALB/c mice and the serum IgG was determined by ELISA. Based on the Ramachandran plot, the validation data showed that 90.1 % of residues lie in the favored region. The high antigenicity of the multimeric protein was predicted by the immunoinformatic analysis. Epitope prediction had shown the proper distribution of linear and conformational B-cell epitopes and the competition of T-cell epitopes to bind MHC molecules too. Recombinant ESI Protein with 74.5 kDa was expressed in E. coli. Western blot analysis by anti-Stx antibody, confirmed a single band of chimeric protein. Consequently, the chimeric gene was designed and constructed after assessments. From in silico approach, the protein deduced from this cassette can be an immunogen candidate, and act against toxicity and adherence of EHEC.
Collapse
Affiliation(s)
| | - Shahram Nazarian
- Department of Biological Sciences, Faculty of Science, Imam Hossein University, Tehran, Iran.
| | - Mohammad Kargar
- Department of Microbiology, Jahrom Branch, Islamic Azad University, Jahrom, Iran
| | - Farshid Kafilzadeh
- Department of Biology, Jahrom Branch, Islamic Azad University, Jahrom, Iran
| |
Collapse
|
7
|
Abstract
Immunoinformatics is a discipline that applies methods of computer science to study and model the immune system. A fundamental question addressed by immunoinformatics is how to understand the rules of antigen presentation by MHC molecules to T cells, a process that is central to adaptive immune responses to infections and cancer. In the modern era of personalized medicine, the ability to model and predict which antigens can be presented by MHC is key to manipulating the immune system and designing strategies for therapeutic intervention. Since the MHC is both polygenic and extremely polymorphic, each individual possesses a personalized set of MHC molecules with different peptide-binding specificities, and collectively they present a unique individualized peptide imprint of the ongoing protein metabolism. Mapping all MHC allotypes is an enormous undertaking that cannot be achieved without a strong bioinformatics component. Computational tools for the prediction of peptide-MHC binding have thus become essential in most pipelines for T cell epitope discovery and an inescapable component of vaccine and cancer research. Here, we describe the development of several such tools, from pioneering efforts to the current state-of-the-art methods, that have allowed for accurate predictions of peptide binding of all MHC molecules, even including those that have not yet been characterized experimentally.
Collapse
Affiliation(s)
- Morten Nielsen
- Department of Health Technology, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, CP 1650 San Martin, Buenos Aires, Argentina
| | - Massimo Andreatta
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, CP 1650 San Martin, Buenos Aires, Argentina
| | - Bjoern Peters
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA
- Department of Medicine, University of California, San Diego, La Jolla, California 92093, USA
| | - Søren Buus
- Department of Immunology and Microbiology, Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark
| |
Collapse
|
8
|
Abstract
Throughout the body, T cells monitor MHC-bound ligands expressed on the surface of essentially all cell types. MHC ligands that trigger a T cell immune response are referred to as T cell epitopes. Identifying such epitopes enables tracking, phenotyping, and stimulating T cells involved in immune responses in infectious disease, allergy, autoimmunity, transplantation, and cancer. The specific T cell epitopes recognized in an individual are determined by genetic factors such as the MHC molecules the individual expresses, in parallel to the individual's environmental exposure history. The complexity and importance of T cell epitope mapping have motivated the development of computational approaches that predict what T cell epitopes are likely to be recognized in a given individual or in a broader population. Such predictions guide experimental epitope mapping studies and enable computational analysis of the immunogenic potential of a given protein sequence region.
Collapse
Affiliation(s)
- Bjoern Peters
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA; ,
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| | - Morten Nielsen
- Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark;
- Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, B1650 Buenos Aires, Argentina
| | - Alessandro Sette
- Division of Vaccine Discovery, La Jolla Institute for Immunology, La Jolla, California 92037, USA; ,
- Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
| |
Collapse
|
9
|
Bugembe DL, Ekii AO, Ndembi N, Serwanga J, Kaleebu P, Pala P. Computational MHC-I epitope predictor identifies 95% of experimentally mapped HIV-1 clade A and D epitopes in a Ugandan cohort. BMC Infect Dis 2020; 20:172. [PMID: 32087680 PMCID: PMC7036183 DOI: 10.1186/s12879-020-4876-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 02/12/2020] [Indexed: 12/21/2022] Open
Abstract
Background Identifying immunogens that induce HIV-1-specific immune responses is a lengthy process that can benefit from computational methods, which predict T-cell epitopes for various HLA types. Methods We tested the performance of the NetMHCpan4.0 computational neural network in re-identifying 93 T-cell epitopes that had been previously independently mapped using the whole proteome IFN-γ ELISPOT assays in 6 HLA class I typed Ugandan individuals infected with HIV-1 subtypes A1 and D. To provide a benchmark we compared the predictions for NetMHCpan4.0 to MHCflurry1.2.0 and NetCTL1.2. Results NetMHCpan4.0 performed best correctly predicting 88 of the 93 experimentally mapped epitopes for a set length of 9-mer and matched HLA class I alleles. Receiver Operator Characteristic (ROC) analysis gave an area under the curve (AUC) of 0.928. Setting NetMHCpan4.0 to predict 11-14mer length did not improve the prediction (37–79 of 93 peptides) with an inverse correlation between the number of predictions and length set. Late time point peptides were significantly stronger binders than early peptides (Wilcoxon signed rank test: p = 0.0000005). MHCflurry1.2.0 similarly predicted all but 2 of the peptides that NetMHCpan4.0 predicted and NetCTL1.2 predicted only 14 of the 93 experimental peptides. Conclusion NetMHCpan4.0 class I epitope predictions covered 95% of the epitope responses identified in six HIV-1 infected individuals, and would have reduced the number of experimental confirmatory tests by > 80%. Algorithmic epitope prediction in conjunction with HLA allele frequency information can cost-effectively assist immunogen design through minimizing the experimental effort.
Collapse
Affiliation(s)
- Daniel Lule Bugembe
- MRC/UVRI and LSHTM Uganda Research Unit, P. O. Box 49, Plot 51-59 Nakiwogo Road, Entebbe, Uganda.
| | - Andrew Obuku Ekii
- MRC/UVRI and LSHTM Uganda Research Unit, P. O. Box 49, Plot 51-59 Nakiwogo Road, Entebbe, Uganda
| | | | - Jennifer Serwanga
- MRC/UVRI and LSHTM Uganda Research Unit, P. O. Box 49, Plot 51-59 Nakiwogo Road, Entebbe, Uganda.,Uganda Virus Research Institute, Entebbe, Uganda
| | - Pontiano Kaleebu
- MRC/UVRI and LSHTM Uganda Research Unit, P. O. Box 49, Plot 51-59 Nakiwogo Road, Entebbe, Uganda.,Uganda Virus Research Institute, Entebbe, Uganda
| | - Pietro Pala
- MRC/UVRI and LSHTM Uganda Research Unit, P. O. Box 49, Plot 51-59 Nakiwogo Road, Entebbe, Uganda
| |
Collapse
|
10
|
Lohia N, Baranwal M. An Immunoinformatics Approach in Design of Synthetic Peptide Vaccine Against Influenza Virus. Methods Mol Biol 2020; 2131:229-243. [PMID: 32162257 DOI: 10.1007/978-1-0716-0389-5_11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Peptide-based vaccines are an appealing strategy which involves usage of short synthetic peptides to engineer a highly targeted immune response. These short synthetic peptides contain potential T- and B-cell epitopes. Experimental approaches in identifying these epitopes are time-consuming and expensive; hence immunoinformatics approach came into picture. Immuninformatics approach involves epitope prediction tools, molecular docking, and population coverage analysis in design of desired immunogenic peptides. In order to overcome the antigenic variation of viruses, conserved regions are targeted to find the potential epitopes. The present chapter demonstrates the use of immunoinformatics approach to select potential peptide containing multiple T- (CD8+ and CD4+) and B-cell epitopes from Avian H3N2 M1 Protein. Further, molecular docking (to analyse HLA-peptide interaction) and population coverage analysis have been used to verify the potential of peptide to be presented by polymorphic HLA molecules. In silico approach of epitope prediction has proven to be successful methodology in screening the putative epitopes among numerous possible vaccine targets in a given protein.
Collapse
Affiliation(s)
- Neha Lohia
- Department of Biotechnology, Thapar Institute of Engineering and Technology, Patiala, India.
- School of Life Sciences, Jaipur National University, Jaipur, India.
| | - Manoj Baranwal
- Department of Biotechnology, Thapar Institute of Engineering and Technology, Patiala, India
| |
Collapse
|
11
|
Liu Z, Cui Y, Xiong Z, Nasiri A, Zhang A, Hu J. DeepSeqPan, a novel deep convolutional neural network model for pan-specific class I HLA-peptide binding affinity prediction. Sci Rep 2019; 9:794. [PMID: 30692623 PMCID: PMC6349913 DOI: 10.1038/s41598-018-37214-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 12/04/2018] [Indexed: 11/09/2022] Open
Abstract
Interactions between human leukocyte antigens (HLAs) and peptides play a critical role in the human immune system. Accurate computational prediction of HLA-binding peptides can be used for peptide drug discovery. Currently, the best prediction algorithms are neural network-based pan-specific models, which take advantage of the large amount of data across HLA alleles. However, current pan-specific models are all based on the pseudo sequence encoding for modeling the binding context, which is based on 34 positions identified from the HLA protein-peptide bound structures in early works. In this work, we proposed a novel deep convolutional neural network model (DCNN) for HLA-peptide binding prediction, in which the encoding of the HLA sequence and the binding context are both learned by the network itself without requiring the HLA-peptide bound structure information. Our DCNN model is also characterized by its binding context extraction layer and dual outputs with both binding affinity output and binding probability outputs. Evaluation on public benchmark datasets shows that our DeepSeqPan model without HLA structural information in training achieves state-of-the-art performance on a large number of HLA alleles with good generalization capability. Since our model only needs raw sequences from the HLA-peptide binding pairs, it can be applied to binding predictions of HLAs without structure information and can also be applied to other protein binding problems such as protein-DNA and protein-RNA bindings. The implementation code and trained models are freely available at https://github.com/pcpLiu/DeepSeqPan .
Collapse
Affiliation(s)
- Zhonghao Liu
- Department of Computer Science and Engineering, University of South Carolina, 29201, Columbia, SC, United States
| | - Yuxin Cui
- Department of Computer Science and Engineering, University of South Carolina, 29201, Columbia, SC, United States
| | - Zheng Xiong
- Department of Computer Science and Engineering, University of South Carolina, 29201, Columbia, SC, United States
| | - Alierza Nasiri
- Department of Computer Science and Engineering, University of South Carolina, 29201, Columbia, SC, United States
| | - Ansi Zhang
- School of Mechanical Engineering, Guizhou University, 50033, Guiyang, Guizhou, China
| | - Jianjun Hu
- Department of Computer Science and Engineering, University of South Carolina, 29201, Columbia, SC, United States.
- School of Mechanical Engineering, Guizhou University, 50033, Guiyang, Guizhou, China.
| |
Collapse
|
12
|
Degoot AM, Chirove F, Ndifon W. Trans-Allelic Model for Prediction of Peptide:MHC-II Interactions. Front Immunol 2018; 9:1410. [PMID: 29988560 PMCID: PMC6026802 DOI: 10.3389/fimmu.2018.01410] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 06/06/2018] [Indexed: 12/30/2022] Open
Abstract
Major histocompatibility complex class two (MHC-II) molecules are trans-membrane proteins and key components of the cellular immune system. Upon recognition of foreign peptides expressed on the MHC-II binding groove, CD4+ T cells mount an immune response against invading pathogens. Therefore, mechanistic identification and knowledge of physicochemical features that govern interactions between peptides and MHC-II molecules is useful for the design of effective epitope-based vaccines, as well as for understanding of immune responses. In this article, we present a comprehensive trans-allelic prediction model, a generalized version of our previous biophysical model, that can predict peptide interactions for all three human MHC-II loci (HLA-DR, HLA-DP, and HLA-DQ), using both peptide sequence data and structural information of MHC-II molecules. The advantage of this approach over other machine learning models is that it offers a simple and plausible physical explanation for peptide–MHC-II interactions. We train the model using a benchmark experimental dataset and measure its predictive performance using novel data. Despite its relative simplicity, we find that the model has comparable performance to the state-of-the-art method, the NetMHCIIpan method. Focusing on the physical basis of peptide–MHC binding, we find support for previous theoretical predictions about the contributions of certain binding pockets to the binding energy. In addition, we find that binding pocket P5 of HLA-DP, which was not previously considered as a primary anchor, does make strong contribution to the binding energy. Together, the results indicate that our model can serve as a useful complement to alternative approaches to predicting peptide–MHC interactions.
Collapse
Affiliation(s)
- Abdoelnaser M Degoot
- African Institute of Mathematical Sciences (AIMS), Muizenberg, South Africa.,School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa.,DST-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), Gauteng, South Africa
| | - Faraimunashe Chirove
- School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa
| | - Wilfred Ndifon
- African Institute of Mathematical Sciences (AIMS), Muizenberg, South Africa
| |
Collapse
|
13
|
Di Carluccio AR, Triffon CF, Chen W. Perpetual complexity: predicting human CD8 + T-cell responses to pathogenic peptides. Immunol Cell Biol 2018; 96:358-369. [PMID: 29424002 DOI: 10.1111/imcb.12019] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Revised: 02/01/2018] [Accepted: 02/02/2018] [Indexed: 01/17/2023]
Abstract
The accurate prediction of human CD8+ T-cell epitopes has great potential clinical and translational implications in the context of infection, cancer and autoimmunity. Prediction algorithms have traditionally focused on calculated peptide affinity for the binding groove of MHC-I. However, over the years it has become increasingly clear that the ultimate T-cell recognition of MHC-I-bound peptides is governed by many contributing factors within the complex antigen presentation pathway. Recent advances in next-generation sequencing and immunnopeptidomics have increased the precision of HLA-I sub-allele classification, and have led to the discovery of peptide processing events and individual allele-specific binding preferences. Here, we review some of the discoveries that initiated the development of peptide prediction algorithms, and outline some of the current available online tools for CD8+ T-cell epitope prediction.
Collapse
Affiliation(s)
- Anthony R Di Carluccio
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| | - Cristina F Triffon
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| | - Weisan Chen
- Department of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| |
Collapse
|
14
|
Kazi A, Chuah C, Majeed ABA, Leow CH, Lim BH, Leow CY. Current progress of immunoinformatics approach harnessed for cellular- and antibody-dependent vaccine design. Pathog Glob Health 2018. [PMID: 29528265 DOI: 10.1080/20477724.2018.1446773] [Citation(s) in RCA: 71] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022] Open
Abstract
Immunoinformatics plays a pivotal role in vaccine design, immunodiagnostic development, and antibody production. In the past, antibody design and vaccine development depended exclusively on immunological experiments which are relatively expensive and time-consuming. However, recent advances in the field of immunological bioinformatics have provided feasible tools which can be used to lessen the time and cost required for vaccine and antibody development. This approach allows the selection of immunogenic regions from the pathogen genomes. The ideal regions could be developed as potential vaccine candidates to trigger protective immune responses in the hosts. At present, epitope-based vaccines are attractive concepts which have been successfully trailed to develop vaccines which target rapidly mutating pathogens. In this article, we provide an overview of the current progress of immunoinformatics and their applications in the vaccine design, immune system modeling and therapeutics.
Collapse
Affiliation(s)
- Ada Kazi
- a Institute for Research in Molecular Medicine (INFORMM) , Universiti Sains Malaysia , Kelantan , Malaysia.,b School of Health Sciences , Universiti Sains Malaysia , Kelantan , Malaysia
| | - Candy Chuah
- c School of Medical Sciences , Universiti Sains Malaysia , Kelantan , Malaysia
| | | | - Chiuan Herng Leow
- d Institute for Research in Molecular Medicine (INFORMM) , Universiti Sains Malaysia , Penang , Malaysia
| | - Boon Huat Lim
- b School of Health Sciences , Universiti Sains Malaysia , Kelantan , Malaysia
| | - Chiuan Yee Leow
- a Institute for Research in Molecular Medicine (INFORMM) , Universiti Sains Malaysia , Kelantan , Malaysia
| |
Collapse
|
15
|
Kar P, Ruiz-Perez L, Arooj M, Mancera RL. Current methods for the prediction of T-cell epitopes. Pept Sci (Hoboken) 2018. [DOI: 10.1002/pep2.24046] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Prattusha Kar
- School of Pharmacy and Biomedical Sciences; Curtin Health Innovation Research Institute and Curtin Institute for Computation, Curtin University; Perth Western Australia 6845 Australia
| | - Lanie Ruiz-Perez
- School of Pharmacy and Biomedical Sciences; Curtin Health Innovation Research Institute and Curtin Institute for Computation, Curtin University; Perth Western Australia 6845 Australia
| | - Mahreen Arooj
- School of Pharmacy and Biomedical Sciences; Curtin Health Innovation Research Institute and Curtin Institute for Computation, Curtin University; Perth Western Australia 6845 Australia
| | - Ricardo L. Mancera
- School of Pharmacy and Biomedical Sciences; Curtin Health Innovation Research Institute and Curtin Institute for Computation, Curtin University; Perth Western Australia 6845 Australia
| |
Collapse
|
16
|
Reginald K, Chan Y, Plebanski M, Poh CL. Development of Peptide Vaccines in Dengue. Curr Pharm Des 2018; 24:1157-1173. [PMID: 28914200 PMCID: PMC6040172 DOI: 10.2174/1381612823666170913163904] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 08/30/2017] [Accepted: 09/06/2017] [Indexed: 12/11/2022]
Abstract
Dengue is one of the most important arboviral infections worldwide, infecting up to 390 million people and causing 25,000 deaths annually. Although a licensed dengue vaccine is available, it is not efficacious against dengue serotypes that infect people living in South East Asia, where dengue is an endemic disease. Hence, there is an urgent need to develop an efficient dengue vaccine for this region. Data from different clinical trials indicate that a successful dengue vaccine must elicit both neutralizing antibodies and cell mediated immunity. This can be achieved by designing a multi-epitope peptide vaccine comprising B, CD8+ and CD4+ T cell epitopes. As recognition of T cell epitopes are restricted by human leukocyte antigens (HLA), T cell epitopes which are able to recognize several major HLAs will be preferentially included in the vaccine design. While peptide vaccines are safe, biocompatible and cost-effective, it is poorly immunogenic. Strategies to improve its immunogenicity by the use of long peptides, adjuvants and nanoparticle delivery mechanisms are discussed.
Collapse
Affiliation(s)
| | | | | | - Chit Laa Poh
- Address correspondence to this author at the Research Centre for Biomedical Sciences, School of Science and Technology, Sunway University, 5 Jalan University, Bandar Sunway, 47500 Subang Jaya, Selangor, Malaysia; Tel: +60-3-7491 8622 ext. 7338; E-mail:
| |
Collapse
|
17
|
Fundamentals and Methods for T- and B-Cell Epitope Prediction. J Immunol Res 2017; 2017:2680160. [PMID: 29445754 PMCID: PMC5763123 DOI: 10.1155/2017/2680160] [Citation(s) in RCA: 284] [Impact Index Per Article: 40.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 11/22/2017] [Accepted: 11/27/2017] [Indexed: 12/25/2022] Open
Abstract
Adaptive immunity is mediated by T- and B-cells, which are immune cells capable of developing pathogen-specific memory that confers immunological protection. Memory and effector functions of B- and T-cells are predicated on the recognition through specialized receptors of specific targets (antigens) in pathogens. More specifically, B- and T-cells recognize portions within their cognate antigens known as epitopes. There is great interest in identifying epitopes in antigens for a number of practical reasons, including understanding disease etiology, immune monitoring, developing diagnosis assays, and designing epitope-based vaccines. Epitope identification is costly and time-consuming as it requires experimental screening of large arrays of potential epitope candidates. Fortunately, researchers have developed in silico prediction methods that dramatically reduce the burden associated with epitope mapping by decreasing the list of potential epitope candidates for experimental testing. Here, we analyze aspects of antigen recognition by T- and B-cells that are relevant for epitope prediction. Subsequently, we provide a systematic and inclusive review of the most relevant B- and T-cell epitope prediction methods and tools, paying particular attention to their foundations.
Collapse
|
18
|
Naiyer MM, Cassidy SA, Magri A, Cowton V, Chen K, Mansour S, Kranidioti H, Mbirbindi B, Rettman P, Harris S, Fanning LJ, Mulder A, Claas FHJ, Davidson AD, Patel AH, Purbhoo MA, Khakoo SI. KIR2DS2 recognizes conserved peptides derived from viral helicases in the context of HLA-C. Sci Immunol 2017; 2:2/15/eaal5296. [DOI: 10.1126/sciimmunol.aal5296] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2016] [Revised: 05/30/2017] [Accepted: 08/03/2017] [Indexed: 12/22/2022]
|
19
|
Capietto AH, Jhunjhunwala S, Delamarre L. Characterizing neoantigens for personalized cancer immunotherapy. Curr Opin Immunol 2017; 46:58-65. [PMID: 28478383 DOI: 10.1016/j.coi.2017.04.007] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2017] [Accepted: 04/17/2017] [Indexed: 12/18/2022]
Abstract
Somatic mutations can generate neoantigens that are presented on MHC molecules and drive effective T cells responses against cancer. Mutation load in cancer patients predicts response to immune checkpoint blockade therapy. Additionally, vaccination targeting neoantigens controls established tumor growth in preclinical models. These recent findings led to a renewed interest in the field of cancer vaccines and the development of antigen-targeted cancer immunotherapies. However, targeting neoantigens is challenging, as most mutations are unique to each cancer patient. In addition, only a small fraction of the mutations are immunogenic and therefore their accurate prediction is critical. In this review, we discuss the properties of neoantigens that influence their immunogenicity, along with questions that remain to be addressed in order to improve prediction algorithms.
Collapse
|
20
|
Nikumbh S, Pfeifer N. Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization. BMC Bioinformatics 2017; 18:218. [PMID: 28420341 PMCID: PMC5395875 DOI: 10.1186/s12859-017-1624-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 04/05/2017] [Indexed: 11/25/2022] Open
Abstract
Background Knowing the three-dimensional (3D) structure of the chromatin is important for obtaining a complete picture of the regulatory landscape. Changes in the 3D structure have been implicated in diseases. While there exist approaches that attempt to predict the long-range chromatin interactions, they focus only on interactions between specific genomic regions — the promoters and enhancers, neglecting other possibilities, for instance, the so-called structural interactions involving intervening chromatin. Results We present a method that can be trained on 5C data using the genetic sequence of the candidate loci to predict potential genome-wide interaction partners of a particular locus of interest. We have built locus-specific support vector machine (SVM)-based predictors using the oligomer distance histograms (ODH) representation. The method shows good performance with a mean test AUC (area under the receiver operating characteristic (ROC) curve) of 0.7 or higher for various regions across cell lines GM12878, K562 and HeLa-S3. In cases where any locus did not have sufficient candidate interaction partners for model training, we employed multitask learning to share knowledge between models of different loci. In this scenario, across the three cell lines, the method attained an average performance increase of 0.09 in the AUC. Performance evaluation of the models trained on 5C data regarding prediction on an independent high-resolution Hi-C dataset (which is a rather hard problem) shows 0.56 AUC, on average. Additionally, we have developed new, intuitive visualization methods that enable interpretation of sequence signals that contributed towards prediction of locus-specific interaction partners. The analysis of these sequence signals suggests a potential general role of short tandem repeat sequences in genome organization. Conclusions We demonstrated how our approach can 1) provide insights into sequence features of locus-specific interaction partners, and 2) also identify their cell-line specificity. That our models deem short tandem repeat sequences as discriminative for prediction of potential interaction partners, suggests that they could play a larger role in genome organization. Thus, our approach can (a) be beneficial to broadly understand, at the sequence-level, chromatin interactions and higher-order structures like (meta-) topologically associating domains (TADs); (b) study regions omitted from existing prediction approaches using various information sources (e.g., epigenetic information); and (c) improve methods that predict the 3D structure of the chromatin. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1624-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sarvesh Nikumbh
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Building E1.4, Saarbruecken, D-66123, Germany.
| | - Nico Pfeifer
- Computational Biology & Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Building E1.4, Saarbruecken, D-66123, Germany.,Present address: Department of Computer Science, University of Tübingen, Sand 14, Tübingen, D-72076, Germany
| |
Collapse
|
21
|
Prediction of peptide binding to a major histocompatibility complex class I molecule based on docking simulation. J Comput Aided Mol Des 2016; 30:875-887. [PMID: 27624584 DOI: 10.1007/s10822-016-9967-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 09/07/2016] [Indexed: 10/21/2022]
Abstract
Binding between major histocompatibility complex (MHC) class I molecules and immunogenic epitopes is one of the most important processes for cell-mediated immunity. Consequently, computational prediction of amino acid sequences of MHC class I binding peptides from a given sequence may lead to important biomedical advances. In this study, an efficient structure-based method for predicting peptide binding to MHC class I molecules was developed, in which the binding free energy of the peptide was evaluated by two individual docking simulations. An original penalty function and restriction of degrees of freedom were determined by analysis of 361 published X-ray structures of the complex and were then introduced into the docking simulations. To validate the method, calculations using a 50-amino acid sequence as a prediction target were performed. In 27 calculations, the binding free energy of the known peptide was within the top 5 of 166 peptides generated from the 50-amino acid sequence. Finally, demonstrative calculations using a whole sequence of a protein as a prediction target were performed. These data clearly demonstrate high potential of this method for predicting peptide binding to MHC class I molecules.
Collapse
|
22
|
sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides. Sci Rep 2016; 6:32115. [PMID: 27558848 PMCID: PMC4997263 DOI: 10.1038/srep32115] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 08/02/2016] [Indexed: 12/19/2022] Open
Abstract
Understanding the binding between human leukocyte antigens (HLAs) and peptides is important to understand the functioning of the immune system. Since it is time-consuming and costly to measure the binding between large numbers of HLAs and peptides, computational methods including machine learning models and network approaches have been developed to predict HLA-peptide binding. However, there are several limitations for the existing methods. We developed a network-based algorithm called sNebula to address these limitations. We curated qualitative Class I HLA-peptide binding data and demonstrated the prediction performance of sNebula on this dataset using leave-one-out cross-validation and five-fold cross-validations. This algorithm can predict not only peptides of different lengths and different types of HLAs, but also the peptides or HLAs that have no existing binding data. We believe sNebula is an effective method to predict HLA-peptide binding and thus improve our understanding of the immune system.
Collapse
|
23
|
Zhou Q, Zhao Q. Flexible Clustered Multi-Task Learning by Learning Representative Tasks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2016; 38:266-278. [PMID: 26761733 DOI: 10.1109/tpami.2015.2452911] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Multi-task learning (MTL) methods have shown promising performance by learning multiple relevant tasks simultaneously, which exploits to share useful information across relevant tasks. Among various MTL methods, clustered multi-task learning (CMTL) assumes that all tasks can be clustered into groups and attempts to learn the underlying cluster structure from the training data. In this paper, we present a new approach for CMTL, called flexible clustered multi-task (FCMTL), in which the cluster structure is learned by identifying representative tasks. The new approach allows an arbitrary task to be described by multiple representative tasks, effectively soft-assigning a task to multiple clusters with different weights. Unlike existing counterpart, the proposed approach is more flexible in that (a) it does not require clusters to be disjoint, (b) tasks within one particular cluster do not have to share information to the same extent, and (c) the number of clusters is automatically inferred from data. Computationally, the proposed approach is formulated as a row-sparsity pursuit problem. We validate the proposed FCMTL on both synthetic and real-world data sets, and empirical results demonstrate that it outperforms many existing MTL methods.
Collapse
|
24
|
Abstract
Immunoinformatics involves the application of computational methods to immunological problems. Prediction of B- and T-cell epitopes has long been the focus of immunoinformatics, given the potential translational implications, and many tools have been developed. With the advent of next-generation sequencing (NGS) methods, an unprecedented wealth of information has become available that requires more-advanced immunoinformatics tools. Based on information from whole-genome sequencing, exome sequencing and RNA sequencing, it is possible to characterize with high accuracy an individual’s human leukocyte antigen (HLA) allotype (i.e., the individual set of HLA alleles of the patient), as well as changes arising in the HLA ligandome (the collection of peptides presented by the HLA) owing to genomic variation. This has allowed new opportunities for translational applications of epitope prediction, such as epitope-based design of prophylactic and therapeutic vaccines, and personalized cancer immunotherapies. Here, we review a wide range of immunoinformatics tools, with a focus on B- and T-cell epitope prediction. We also highlight fundamental differences in the underlying algorithms and discuss the various metrics employed to assess prediction quality, comparing their strengths and weaknesses. Finally, we discuss the new challenges and opportunities presented by high-throughput data-sets for the field of epitope prediction.
Collapse
Affiliation(s)
- Linus Backert
- Applied Bioinformatics, Center of Bioinformatics and Department of Computer Science, University of Tübingen, Sand 14, 72076, Tübingen, Germany.
| | - Oliver Kohlbacher
- Applied Bioinformatics, Center of Bioinformatics and Department of Computer Science, University of Tübingen, Sand 14, 72076, Tübingen, Germany.,Quantitative Biology Center, University of Tübingen, Auf der Morgenstelle 10, 72076, Tübingen, Germany.,Biomolecular Interactions, Max Planck Institute for Developmental Biology, Spemannstrasse 35, 72076, Tübingen, Germany
| |
Collapse
|
25
|
Valdivia-Olarte H, Requena D, Ramirez M, Saravia LE, Izquierdo R, Falconi-Agapito F, Zavaleta M, Best I, Fernández-Díaz M, Zimic M. Design of a predicted MHC restricted short peptide immunodiagnostic and vaccine candidate for Fowl adenovirus C in chicken infection. Bioinformation 2015; 11:460-5. [PMID: 26664030 PMCID: PMC4658644 DOI: 10.6026/97320630011460] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 10/18/2015] [Indexed: 11/23/2022] Open
Abstract
Fowl adenoviruses (FAdVs) are the ethiologic agents of multiple pathologies in chicken. There are five different species of FAdVs grouped as FAdV-A, FAdV-B, FAdV-C, FAdV-D, and FAdV-E. It is of interest to develop immunodiagnostics and vaccine candidate for Peruvian FAdV-C in chicken infection using MHC restricted short peptide candidates. We sequenced the complete genome of one FAdV strain isolated from a chicken of a local farm. A total of 44 protein coding genes were identified in each genome. We sequenced twelve Cobb chicken MHC alleles from animals of different farms in the central coast of Peru, and subsequently determined three optimal human MHC-I and four optimal human MHC-II substitute alleles for MHC-peptide prediction. The potential MHC restricted short peptide epitope-like candidates were predicted using human specific (with determined suitable chicken substitutes) NetMHC MHC-peptide prediction model with web server features from all the FAdV genomes available. FAdV specific peptides with calculated binding values to known substituted chicken MHC-I and MHC-II were further filtered for diagnostics and potential vaccine epitopes. Promiscuity to the 3/4 optimal human MHC-I/II alleles and conservation among the available FAdV genomes was considered in this analysis. The localization on the surface of the protein was considered for class II predicted peptides. Thus, a set of class I and class II specific peptides from FAdV were reported in this study. Hence, a multiepitopic protein was built with these peptides, and subsequently tested to confirm the production of specific antibodies in chicken.
Collapse
Affiliation(s)
- Hugo Valdivia-Olarte
- Farvet s.A.C. Carretera Panamericana Sur N° 766 Km 198.5, Chincha Alta. Ica – Peru
- Laboratorio de Bioinformática y Biologáa
Molecular, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofáa, Universidad Peruana Cayetano Heredia,
Av. Honorio Delgado 430, San Martin de Porres. Lima – Peru
| | - David Requena
- Farvet s.A.C. Carretera Panamericana Sur N° 766 Km 198.5, Chincha Alta. Ica – Peru
- Laboratorio de Bioinformática y Biologáa
Molecular, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofáa, Universidad Peruana Cayetano Heredia,
Av. Honorio Delgado 430, San Martin de Porres. Lima – Peru
| | - Manuel Ramirez
- Farvet s.A.C. Carretera Panamericana Sur N° 766 Km 198.5, Chincha Alta. Ica – Peru
- Laboratorio de Bioinformática y Biologáa
Molecular, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofáa, Universidad Peruana Cayetano Heredia,
Av. Honorio Delgado 430, San Martin de Porres. Lima – Peru
| | - Luis E Saravia
- Farvet s.A.C. Carretera Panamericana Sur N° 766 Km 198.5, Chincha Alta. Ica – Peru
| | - Ray Izquierdo
- Farvet s.A.C. Carretera Panamericana Sur N° 766 Km 198.5, Chincha Alta. Ica – Peru
| | | | - Milagros Zavaleta
- Farvet s.A.C. Carretera Panamericana Sur N° 766 Km 198.5, Chincha Alta. Ica – Peru
| | - Iván Best
- Farvet s.A.C. Carretera Panamericana Sur N° 766 Km 198.5, Chincha Alta. Ica – Peru
| | | | - Mirko Zimic
- Farvet s.A.C. Carretera Panamericana Sur N° 766 Km 198.5, Chincha Alta. Ica – Peru
- Laboratorio de Bioinformática y Biologáa
Molecular, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofáa, Universidad Peruana Cayetano Heredia,
Av. Honorio Delgado 430, San Martin de Porres. Lima – Peru
| |
Collapse
|
26
|
Luo H, Ye H, Ng HW, Shi L, Tong W, Mendrick DL, Hong H. Machine Learning Methods for Predicting HLA-Peptide Binding Activity. Bioinform Biol Insights 2015; 9:21-9. [PMID: 26512199 PMCID: PMC4603527 DOI: 10.4137/bbi.s29466] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Revised: 07/30/2015] [Accepted: 08/02/2015] [Indexed: 11/23/2022] Open
Abstract
As major histocompatibility complexes in humans, the human leukocyte antigens (HLAs) have important functions to present antigen peptides onto T-cell receptors for immunological recognition and responses. Interpreting and predicting HLA–peptide binding are important to study T-cell epitopes, immune reactions, and the mechanisms of adverse drug reactions. We review different types of machine learning methods and tools that have been used for HLA–peptide binding prediction. We also summarize the descriptors based on which the HLA–peptide binding prediction models have been constructed and discuss the limitation and challenges of the current methods. Lastly, we give a future perspective on the HLA–peptide binding prediction method based on network analysis.
Collapse
Affiliation(s)
- Heng Luo
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA. ; University of Arkansas at Little Rock/University of Arkansas for Medical Sciences Bioinformatics Graduate Program, Little Rock, AR, USA
| | - Hao Ye
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Hui Wen Ng
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Leming Shi
- Center for Pharmacogenomics, School of Pharmacy, Fudan University, Shanghai, China
| | - Weida Tong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Donna L Mendrick
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| | - Huixiao Hong
- National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA
| |
Collapse
|
27
|
Gutiérrez AH, Martin WD, Bailey-Kellogg C, Terry F, Moise L, De Groot AS. Development and validation of an epitope prediction tool for swine (PigMatrix) based on the pocket profile method. BMC Bioinformatics 2015; 16:290. [PMID: 26370412 PMCID: PMC4570239 DOI: 10.1186/s12859-015-0724-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Accepted: 08/26/2015] [Indexed: 12/14/2022] Open
Abstract
Background T cell epitope prediction tools and associated vaccine design algorithms have accelerated the development of vaccines for humans. Predictive tools for swine and other food animals are not as well developed, primarily because the data required to develop the tools are lacking. Here, we overcome a lack of T cell epitope data to construct swine epitope predictors by systematically leveraging available human information. Applying the “pocket profile method”, we use sequence and structural similarities in the binding pockets of human and swine major histocompatibility complex proteins to infer Swine Leukocyte Antigen (SLA) peptide binding preferences. We developed epitope-prediction matrices (PigMatrices), for three SLA class I alleles (SLA-1*0401, 2*0401 and 3*0401) and one class II allele (SLA-DRB1*0201), based on the binding preferences of the best-matched Human Leukocyte Antigen (HLA) pocket for each SLA pocket. The contact residues involved in the binding pockets were defined for class I based on crystal structures of either SLA (SLA-specific contacts, Ssc) or HLA supertype alleles (HLA contacts, Hc); for class II, only Hc was possible. Different substitution matrices were evaluated (PAM and BLOSUM) for scoring pocket similarity and identifying the best human match. The accuracy of the PigMatrices was compared to available online swine epitope prediction tools such as PickPocket and NetMHCpan. Results PigMatrices that used Ssc to define the pocket sequences and PAM30 to score pocket similarity demonstrated the best predictive performance and were able to accurately separate binders from random peptides. For SLA-1*0401 and 2*0401, PigMatrix achieved area under the receiver operating characteristic curves (AUC) of 0.78 and 0.73, respectively, which were equivalent or better than PickPocket (0.76 and 0.54) and NetMHCpan version 2.4 (0.41 and 0.51) and version 2.8 (0.72 and 0.71). In addition, we developed the first predictive SLA class II matrix, obtaining an AUC of 0.73 for existing SLA-DRB1*0201 epitopes. Notably, PigMatrix achieved this level of predictive power without training on SLA binding data. Conclusion Overall, the pocket profile method combined with binding preferences from HLA binding data shows significant promise for developing T cell epitope prediction tools for pigs. When combined with existing vaccine design algorithms, PigMatrix will be useful for developing genome-derived vaccines for a range of pig pathogens for which no effective vaccines currently exist (e.g. porcine reproductive and respiratory syndrome, influenza and porcine epidemic diarrhea). Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0724-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Andres H Gutiérrez
- Institute for Immunology and Informatics, CMB/CELS, University of Rhode Island, Providence, RI, 02903, USA.
| | | | | | | | - Leonard Moise
- Institute for Immunology and Informatics, CMB/CELS, University of Rhode Island, Providence, RI, 02903, USA. .,EpiVax, Inc., Providence, RI, 02860, USA.
| | - Anne S De Groot
- Institute for Immunology and Informatics, CMB/CELS, University of Rhode Island, Providence, RI, 02903, USA. .,EpiVax, Inc., Providence, RI, 02860, USA.
| |
Collapse
|
28
|
Zhang W, Niu Y, Zou H, Luo L, Liu Q, Wu W. Accurate prediction of immunogenic T-cell epitopes from epitope sequences using the genetic algorithm-based ensemble learning. PLoS One 2015; 10:e0128194. [PMID: 26020952 PMCID: PMC4447411 DOI: 10.1371/journal.pone.0128194] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Accepted: 04/24/2015] [Indexed: 11/19/2022] Open
Abstract
Background T-cell epitopes play the important role in T-cell immune response, and they are critical components in the epitope-based vaccine design. Immunogenicity is the ability to trigger an immune response. The accurate prediction of immunogenic T-cell epitopes is significant for designing useful vaccines and understanding the immune system. Methods In this paper, we attempt to differentiate immunogenic epitopes from non-immunogenic epitopes based on their primary structures. First of all, we explore a variety of sequence-derived features, and analyze their relationship with epitope immunogenicity. To effectively utilize various features, a genetic algorithm (GA)-based ensemble method is proposed to determine the optimal feature subset and develop the high-accuracy ensemble model. In the GA optimization, a chromosome is to represent a feature subset in the search space. For each feature subset, the selected features are utilized to construct the base predictors, and an ensemble model is developed by taking the average of outputs from base predictors. The objective of GA is to search for the optimal feature subset, which leads to the ensemble model with the best cross validation AUC (area under ROC curve) on the training set. Results Two datasets named ‘IMMA2’ and ‘PAAQD’ are adopted as the benchmark datasets. Compared with the state-of-the-art methods POPI, POPISK, PAAQD and our previous method, the GA-based ensemble method produces much better performances, achieving the AUC score of 0.846 on IMMA2 dataset and the AUC score of 0.829 on PAAQD dataset. The statistical analysis demonstrates the performance improvements of GA-based ensemble method are statistically significant. Conclusions The proposed method is a promising tool for predicting the immunogenic epitopes. The source codes and datasets are available in S1 File.
Collapse
Affiliation(s)
- Wen Zhang
- School of Computer, Wuhan University, Wuhan, 430072, China
- Research Institute of Shenzhen, Wuhan University, Shenzhen, 518057, China
- * E-mail:
| | - Yanqing Niu
- School of Mathematics and Statistics, South-central University for Nationalities, Wuhan, 430074, China
| | - Hua Zou
- School of Computer, Wuhan University, Wuhan, 430072, China
| | - Longqiang Luo
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430072, China
| | - Qianchao Liu
- School of Computer, Wuhan University, Wuhan, 430072, China
| | - Weijian Wu
- School of Computer, Wuhan University, Wuhan, 430072, China
| |
Collapse
|
29
|
Jain S, Gitter A, Bar-Joseph Z. Multitask learning of signaling and regulatory networks with application to studying human response to flu. PLoS Comput Biol 2014; 10:e1003943. [PMID: 25522349 PMCID: PMC4270428 DOI: 10.1371/journal.pcbi.1003943] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 09/28/2014] [Indexed: 01/04/2023] Open
Abstract
Reconstructing regulatory and signaling response networks is one of the major goals of systems biology. While several successful methods have been suggested for this task, some integrating large and diverse datasets, these methods have so far been applied to reconstruct a single response network at a time, even when studying and modeling related conditions. To improve network reconstruction we developed MT-SDREM, a multi-task learning method which jointly models networks for several related conditions. In MT-SDREM, parameters are jointly constrained across the networks while still allowing for condition-specific pathways and regulation. We formulate the multi-task learning problem and discuss methods for optimizing the joint target function. We applied MT-SDREM to reconstruct dynamic human response networks for three flu strains: H1N1, H5N1 and H3N2. Our multi-task learning method was able to identify known and novel factors and genes, improving upon prior methods that model each condition independently. The MT-SDREM networks were also better at identifying proteins whose removal affects viral load indicating that joint learning can still lead to accurate, condition-specific, networks. Supporting website with MT-SDREM implementation: http://sb.cs.cmu.edu/mtsdrem To understand why some flu strains are more virulent than others, researchers attempt to profile and model the molecular human response to these strains and identify similarities and differences between the resulting models. So far, the modeling and analysis part has been done independently for each strain and the results contrasted in a post-processing step. Here we present a new method, termed MT-SDREM, that simultaneously models the response to all strains allowing us to identify both, the core response elements that are shared among the strains, and factors that are uniquely activated or repressed by individual strains. We applied this method to study the human response to three flu strains: H1N1, H3N2 and H5N1. As we show, the method was able to correctly identify several common and known factors regulating immune response to such strains and also identified unique factors for each of the strains. The models reconstructed by the simultaneous analysis method improved upon those generated by methods that model each strain response separately. Our joint models can be used to identify strain specific treatments as well as treatments that are likely to be effective against all three strains.
Collapse
Affiliation(s)
- Siddhartha Jain
- Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Anthony Gitter
- Microsoft Research, Cambridge, Massachusetts, United States of America
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| | - Ziv Bar-Joseph
- Lane Center for Computational Biology and Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
30
|
Carrasco Pro S, Zimic M, Nielsen M. Improved pan-specific MHC class I peptide-binding predictions using a novel representation of the MHC-binding cleft environment. ACTA ACUST UNITED AC 2014; 83:94-100. [PMID: 24447175 DOI: 10.1111/tan.12292] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 12/16/2013] [Indexed: 11/28/2022]
Abstract
Major histocompatibility complex (MHC) molecules play a key role in cell-mediated immune responses presenting bounded peptides for recognition by the immune system cells. Several in silico methods have been developed to predict the binding affinity of a given peptide to a specific MHC molecule. One of the current state-of-the-art methods for MHC class I is NetMHCpan, which has a core ingredient for the representation of the MHC class I molecule using a pseudo-sequence representation of the binding cleft amino acid environment. New and large MHC-peptide-binding data sets are constantly being made available, and also new structures of MHC class I molecules with a bound peptide have been published. In order to test if the NetMHCpan method can be improved by integrating this novel information, we created new pseudo-sequence definitions for the MHC-binding cleft environment from sequence and structural analyses of different MHC data sets including human leukocyte antigen (HLA), non-human primates (chimpanzee, macaque and gorilla) and other animal alleles (cattle, mouse and swine). From these constructs, we showed that by focusing on MHC sequence positions found to be polymorphic across the MHC molecules used to train the method, the NetMHCpan method achieved a significant increase in the predictive performance, in particular, of non-human MHCs. This study hence showed that an improved performance of MHC-binding methods can be achieved not only by the accumulation of more MHC-peptide-binding data but also by a refined definition of the MHC-binding environment including information from non-human species.
Collapse
Affiliation(s)
- S Carrasco Pro
- Laboratorio de Bioinformática y Biología Molecular, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
| | | | | |
Collapse
|
31
|
Lu YF, Sheng H, Zhang Y, Li ZY. Computational prediction of cleavage using proteasomal in vitro digestion and MHC I ligand data. J Zhejiang Univ Sci B 2014; 14:816-28. [PMID: 24009202 DOI: 10.1631/jzus.b1200299] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Proteasomes are responsible for the production of the majority of cytotoxic T lymphocyte (CTL) epitopes. Hence, it is important to identify correctly which peptides will be generated by proteasomes from an unknown protein. However, the pool of proteasome cleavage data used in the prediction algorithms, whether from major histocompatibility complex (MHC) I ligand or in vitro digestion data, is not identical to in vivo proteasomal digestion products. Therefore, the accuracy and reliability of these models still need to be improved. In this paper, three types of proteasomal cleavage data, constitutive proteasome (cCP), immunoproteasome (iCP) in vitro cleavage, and MHC I ligand data, were used for training cleave-site predictive methods based on the kernel-function stabilized matrix method (KSMM). The predictive accuracies of the KSMM+pair coefficients were 75.0%, 72.3%, and 83.1% for cCP, iCP, and MHC I ligand data, respectively, which were comparable to the results from support vector machine (SVM). The three proteasomal cleavage methods were combined in turn with MHC I-peptide binding predictions to model MHC I-peptide processing and the presentation pathway. These integrations markedly improved MHC I peptide identification, increasing area under the receiver operator characteristics (ROC) curve (AUC) values from 0.82 to 0.91. The results suggested that both MHC I ligand and proteasomal in vitro degradation data can give an exact simulation of in vivo processed digestion. The information extracted from cCP and iCP in vitro cleavage data demonstrated that both cCP and iCP are selective in their usage of peptide bonds for cleavage.
Collapse
Affiliation(s)
- Yu-feng Lu
- School of Mathematical Sciences, Dalian University of Technology, Dalian 116023, China; College of Science, Hebei University of Science and Technology, Shijiazhuang 050018, China; School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China
| | | | | | | |
Collapse
|
32
|
Multi-population genomic prediction using a multi-task Bayesian learning model. BMC Genet 2014; 15:53. [PMID: 24884927 PMCID: PMC4024655 DOI: 10.1186/1471-2156-15-53] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 04/28/2014] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Genomic prediction in multiple populations can be viewed as a multi-task learning problem where tasks are to derive prediction equations for each population and multi-task learning property can be improved by sharing information across populations. The goal of this study was to develop a multi-task Bayesian learning model for multi-population genomic prediction with a strategy to effectively share information across populations. Simulation studies and real data from Holstein and Ayrshire dairy breeds with phenotypes on five milk production traits were used to evaluate the proposed multi-task Bayesian learning model and compare with a single-task model and a simple data pooling method. RESULTS A multi-task Bayesian learning model was proposed for multi-population genomic prediction. Information was shared across populations through a common set of latent indicator variables while SNP effects were allowed to vary in different populations. Both simulation studies and real data analysis showed the effectiveness of the multi-task model in improving genomic prediction accuracy for the smaller Ayshire breed. Simulation studies suggested that the multi-task model was most effective when the number of QTL was small (n = 20), with an increase of accuracy by up to 0.09 when QTL effects were lowly correlated between two populations (ρ = 0.2), and up to 0.16 when QTL effects were highly correlated (ρ = 0.8). When QTL genotypes were included for training and validation, the improvements were 0.16 and 0.22, respectively, for scenarios of the low and high correlation of QTL effects between two populations. When the number of QTL was large (n = 200), improvement was small with a maximum of 0.02 when QTL genotypes were not included for genomic prediction. Reduction in accuracy was observed for the simple pooling method when the number of QTL was small and correlation of QTL effects between the two populations was low. For the real data, the multi-task model achieved an increase of accuracy between 0 and 0.07 in the Ayrshire validation set when 28,206 SNPs were used, while the simple data pooling method resulted in a reduction of accuracy for all traits except for protein percentage. When 246,668 SNPs were used, the accuracy achieved from the multi-task model increased by 0 to 0.03, while using the pooling method resulted in a reduction of accuracy by 0.01 to 0.09. In the Holstein population, the three methods had similar performance. CONCLUSIONS Results in this study suggest that the proposed multi-task Bayesian learning model for multi-population genomic prediction is effective and has the potential to improve the accuracy of genomic prediction.
Collapse
|
33
|
Widmer C, Kloft M, Lou X, Rätsch G. Regularization-Based Multitask Learning With Applications to Genome Biology and Biological Imaging. KUNSTLICHE INTELLIGENZ 2014. [DOI: 10.1007/s13218-013-0283-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
34
|
Hajisharifi Z, Piryaiee M, Mohammad Beigi M, Behbahani M, Mohabatkar H. Predicting anticancer peptides with Chou′s pseudo amino acid composition and investigating their mutagenicity via Ames test. J Theor Biol 2014; 341:34-40. [DOI: 10.1016/j.jtbi.2013.08.037] [Citation(s) in RCA: 169] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2013] [Revised: 08/28/2013] [Accepted: 08/31/2013] [Indexed: 12/27/2022]
|
35
|
Karpenko LI, Bazhan SI, Antonets DV, Belyakov IM. Novel approaches in polyepitope T-cell vaccine development against HIV-1. Expert Rev Vaccines 2013; 13:155-73. [PMID: 24308576 DOI: 10.1586/14760584.2014.861748] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
RV144 clinical trial was modestly effective in preventing HIV infection. New alternative approaches are needed to design improved HIV-1 vaccines and their delivery strategies. One of these approaches is construction of synthetic polyepitope HIV-1 immunogen using protective T- and B-cell epitopes that can induce broadly neutralizing antibodies and responses of cytotoxic (CD8(+) CTL) and helpers (CD4(+) Th) T-lymphocytes. This approach seems to be promising for designing of new generation of vaccines against HIV-1, enables in theory to cope with HIV-1 antigenic variability, focuses immune responses on protective determinants and enables to exclude from the vaccine compound that can induce autoantibodies or antibodies enhancing HIV-1 infectivity. Herein, the authors will focus on construction and rational design of polyepitope T-cell HIV-1 immunogens and their delivery, including: advantages and disadvantages of existing T-cell epitope prediction methods; features of organization of polyepitope immunogens, which can generate high-level CD8(+) and CD4(+) T-lymphocyte responses; the strategies to optimize efficient processing, presentation and immunogenicity of polyepitope constructs; original software to design polyepitope immunogens; and delivery vectors as well as mucosal strategies of vaccination. This new knowledge may bring us a one step closer to developing an effective T-cell vaccine against HIV-1, other chronic viral infections and cancer.
Collapse
Affiliation(s)
- Larisa I Karpenko
- State Research Center of Virology and Biotechnology "Vector", Koltsovo, Novosibirsk region, 630559, Russia
| | | | | | | |
Collapse
|
36
|
Guo L, Luo C, Zhu S. MHC2SKpan: a novel kernel based approach for pan-specific MHC class II peptide binding prediction. BMC Genomics 2013; 14 Suppl 5:S11. [PMID: 24564280 PMCID: PMC3852073 DOI: 10.1186/1471-2164-14-s5-s11] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Computational methods for the prediction of Major Histocompatibility Complex (MHC) class II binding peptides play an important role in facilitating the understanding of immune recognition and the process of epitope discovery. To develop an effective computational method, we need to consider two important characteristics of the problem: (1) the length of binding peptides is highly flexible; and (2) MHC molecules are extremely polymorphic and for the vast majority of them there are no sufficient training data. METHODS We develop a novel string kernel MHC2SK (MHC-II String Kernel) method to measure the similarities among peptides with variable lengths. By considering the distinct features of MHC-II peptide binding prediction problem, MHC2SK differs significantly from the recently developed kernel based method, GS (Generic String) kernel, in the way of computing similarities. Furthermore, we extend MHC2SK to MHC2SKpan for pan-specific MHC-II peptide binding prediction by leveraging the binding data of various MHC molecules. RESULTS MHC2SK outperformed GS in allele specific prediction using a benchmark dataset, which demonstrates the effectiveness of MHC2SK. Furthermore, we evaluated the performance of MHC2SKpan using various benckmark data sets from several different perspectives: Leave-one-allele-out (LOO), 5-fold cross validation as well as independent data testing. MHC2SKpan has achieved comparable performance with NetMHCIIpan-2.0 and outperformed NetMHCIIpan-1.0, TEPITOPEpan and MultiRTA, being statistically significant. MHC2SKpan can be freely accessed at http://datamining-iip.fudan.edu.cn/service/MHC2SKpan/index.html.
Collapse
|
37
|
Koch CP, Pillong M, Hiss JA, Schneider G. Computational Resources for MHC Ligand Identification. Mol Inform 2013; 32:326-36. [PMID: 27481589 DOI: 10.1002/minf.201300042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 04/04/2013] [Indexed: 01/16/2023]
Abstract
Advances in the high-throughput determination of functional modulators of major histocompatibility complex (MHC) and improved computational predictions of MHC ligands have rendered the rational design of immunomodulatory peptides feasible. Proteome-derived peptides and 'reverse vaccinology' by computational means will play a driving role in future vaccine design. Here we review the molecular mechanisms of the MHC mediated immune response, present the computational approaches that have emerged in this area of biotechnology, and provide an overview of publicly available computational resources for predicting and designing new peptidic MHC ligands.
Collapse
Affiliation(s)
- Christian P Koch
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland
| | - Max Pillong
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland
| | - Jan A Hiss
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland
| | - Gisbert Schneider
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Wolfgang-Pauli-Str. 10, 8093 Zürich, Switzerland.
| |
Collapse
|
38
|
Learning a peptide-protein binding affinity predictor with kernel ridge regression. BMC Bioinformatics 2013; 14:82. [PMID: 23497081 PMCID: PMC3651388 DOI: 10.1186/1471-2105-14-82] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2012] [Accepted: 02/21/2013] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND The cellular function of a vast majority of proteins is performed through physical interactions with other biomolecules, which, most of the time, are other proteins. Peptides represent templates of choice for mimicking a secondary structure in order to modulate protein-protein interaction. They are thus an interesting class of therapeutics since they also display strong activity, high selectivity, low toxicity and few drug-drug interactions. Furthermore, predicting peptides that would bind to a specific MHC alleles would be of tremendous benefit to improve vaccine based therapy and possibly generate antibodies with greater affinity. Modern computational methods have the potential to accelerate and lower the cost of drug and vaccine discovery by selecting potential compounds for testing in silico prior to biological validation. RESULTS We propose a specialized string kernel for small bio-molecules, peptides and pseudo-sequences of binding interfaces. The kernel incorporates physico-chemical properties of amino acids and elegantly generalizes eight kernels, comprised of the Oligo, the Weighted Degree, the Blended Spectrum, and the Radial Basis Function. We provide a low complexity dynamic programming algorithm for the exact computation of the kernel and a linear time algorithm for it's approximation. Combined with kernel ridge regression and SupCK, a novel binding pocket kernel, the proposed kernel yields biologically relevant and good prediction accuracy on the PepX database. For the first time, a machine learning predictor is capable of predicting the binding affinity of any peptide to any protein with reasonable accuracy. The method was also applied to both single-target and pan-specific Major Histocompatibility Complex class II benchmark datasets and three Quantitative Structure Affinity Model benchmark datasets. CONCLUSION On all benchmarks, our method significantly (p-value ≤ 0.057) outperforms the current state-of-the-art methods at predicting peptide-protein binding affinities. The proposed approach is flexible and can be applied to predict any quantitative biological activity. Moreover, generating reliable peptide-protein binding affinities will also improve system biology modelling of interaction pathways. Lastly, the method should be of value to a large segment of the research community with the potential to accelerate the discovery of peptide-based drugs and facilitate vaccine development. The proposed kernel is freely available at http://graal.ift.ulaval.ca/downloads/gs-kernel/.
Collapse
|
39
|
Binkowski TA, Marino SR, Joachimiak A. Predicting HLA class I non-permissive amino acid residues substitutions. PLoS One 2012; 7:e41710. [PMID: 22905104 PMCID: PMC3414483 DOI: 10.1371/journal.pone.0041710] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2012] [Accepted: 06/27/2012] [Indexed: 12/20/2022] Open
Abstract
Prediction of peptide binding to human leukocyte antigen (HLA) molecules is essential to a wide range of clinical entities from vaccine design to stem cell transplant compatibility. Here we present a new structure-based methodology that applies robust computational tools to model peptide-HLA (p-HLA) binding interactions. The method leverages the structural conservation observed in p-HLA complexes to significantly reduce the search space and calculate the system’s binding free energy. This approach is benchmarked against existing p-HLA complexes and the prediction performance is measured against a library of experimentally validated peptides. The effect on binding activity across a large set of high-affinity peptides is used to investigate amino acid mismatches reported as high-risk factors in hematopoietic stem cell transplantation.
Collapse
Affiliation(s)
- T Andrew Binkowski
- Biosciences Division, Argonne National Laboratory, Midwest Center for Structural Genomics, Argonne, Illinois, United States of America
| | | | | |
Collapse
|
40
|
An in silico chimeric multi subunit vaccine targeting virulence factors of enterotoxigenic Escherichia coli (ETEC) with its bacterial inbuilt adjuvant. J Microbiol Methods 2012; 90:36-45. [DOI: 10.1016/j.mimet.2012.04.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Revised: 04/04/2012] [Accepted: 04/08/2012] [Indexed: 01/25/2023]
|
41
|
Tung CW, Ziehm M, Kämper A, Kohlbacher O, Ho SY. POPISK: T-cell reactivity prediction using support vector machines and string kernels. BMC Bioinformatics 2011; 12:446. [PMID: 22085524 PMCID: PMC3228774 DOI: 10.1186/1471-2105-12-446] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 11/15/2011] [Indexed: 02/03/2023] Open
Abstract
Background Accurate prediction of peptide immunogenicity and characterization of relation between peptide sequences and peptide immunogenicity will be greatly helpful for vaccine designs and understanding of the immune system. In contrast to the prediction of antigen processing and presentation pathway, the prediction of subsequent T-cell reactivity is a much harder topic. Previous studies of identifying T-cell receptor (TCR) recognition positions were based on small-scale analyses using only a few peptides and concluded different recognition positions such as positions 4, 6 and 8 of peptides with length 9. Large-scale analyses are necessary to better characterize the effect of peptide sequence variations on T-cell reactivity and design predictors of a peptide's T-cell reactivity (and thus immunogenicity). The identification and characterization of important positions influencing T-cell reactivity will provide insights into the underlying mechanism of immunogenicity. Results This work establishes a large dataset by collecting immunogenicity data from three major immunology databases. In order to consider the effect of MHC restriction, peptides are classified by their associated MHC alleles. Subsequently, a computational method (named POPISK) using support vector machine with a weighted degree string kernel is proposed to predict T-cell reactivity and identify important recognition positions. POPISK yields a mean 10-fold cross-validation accuracy of 68% in predicting T-cell reactivity of HLA-A2-binding peptides. POPISK is capable of predicting immunogenicity with scores that can also correctly predict the change in T-cell reactivity related to point mutations in epitopes reported in previous studies using crystal structures. Thorough analyses of the prediction results identify the important positions 4, 6, 8 and 9, and yield insights into the molecular basis for TCR recognition. Finally, we relate this finding to physicochemical properties and structural features of the MHC-peptide-TCR interaction. Conclusions A computational method POPISK is proposed to predict immunogenicity with scores which are useful for predicting immunogenicity changes made by single-residue modifications. The web server of POPISK is freely available at http://iclab.life.nctu.edu.tw/POPISK.
Collapse
Affiliation(s)
- Chun-Wei Tung
- School of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | | | | | | | | |
Collapse
|
42
|
Huang JC, Jojic N. Modeling major histocompatibility complex binding by nonparametric averaging of multiple predictors and sequence encodings. J Immunol Methods 2011; 374:35-42. [DOI: 10.1016/j.jim.2010.10.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2010] [Revised: 09/27/2010] [Accepted: 10/04/2010] [Indexed: 11/16/2022]
|
43
|
Karosiene E, Lundegaard C, Lund O, Nielsen M. NetMHCcons: a consensus method for the major histocompatibility complex class I predictions. Immunogenetics 2011; 64:177-86. [PMID: 22009319 DOI: 10.1007/s00251-011-0579-8] [Citation(s) in RCA: 246] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2011] [Accepted: 09/28/2011] [Indexed: 12/01/2022]
Abstract
A key role in cell-mediated immunity is dedicated to the major histocompatibility complex (MHC) molecules that bind peptides for presentation on the cell surface. Several in silico methods capable of predicting peptide binding to MHC class I have been developed. The accuracy of these methods depends on the data available characterizing the binding specificity of the MHC molecules. It has, moreover, been demonstrated that consensus methods defined as combinations of two or more different methods led to improved prediction accuracy. This plethora of methods makes it very difficult for the non-expert user to choose the most suitable method for predicting binding to a given MHC molecule. In this study, we have therefore made an in-depth analysis of combinations of three state-of-the-art MHC-peptide binding prediction methods (NetMHC, NetMHCpan and PickPocket). We demonstrate that a simple combination of NetMHC and NetMHCpan gives the highest performance when the allele in question is included in the training and is characterized by at least 50 data points with at least ten binders. Otherwise, NetMHCpan is the best predictor. When an allele has not been characterized, the performance depends on the distance to the training data. NetMHCpan has the highest performance when close neighbours are present in the training set, while the combination of NetMHCpan and PickPocket outperforms either of the two methods for alleles with more remote neighbours. The final method, NetMHCcons, is publicly available at www.cbs.dtu.dk/services/NetMHCcons , and allows the user in an automatic manner to obtain the most accurate predictions for any given MHC molecule.
Collapse
Affiliation(s)
- Edita Karosiene
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Building 208, Kemitorvet, Lyngby, 2800, Denmark.
| | | | | | | |
Collapse
|
44
|
Mordelet F, Vert JP. ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics 2011; 12:389. [PMID: 21977986 PMCID: PMC3215680 DOI: 10.1186/1471-2105-12-389] [Citation(s) in RCA: 95] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 10/06/2011] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. RESULTS We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases. CONCLUSIONS ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at http://cbio.ensmp.fr/prodige.
Collapse
Affiliation(s)
- Fantine Mordelet
- Centre for Computational Biology, Mines ParisTech, Fontainebleau, F-77300 France
| | | |
Collapse
|
45
|
Zhang L, Udaka K, Mamitsuka H, Zhu S. Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and tools. Brief Bioinform 2011; 13:350-64. [PMID: 21949215 DOI: 10.1093/bib/bbr060] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Binding of short antigenic peptides to major histocompatibility complex (MHC) molecules is a core step in adaptive immune response. Precise identification of MHC-restricted peptides is of great significance for understanding the mechanism of immune response and promoting the discovery of immunogenic epitopes. However, due to the extremely high MHC polymorphism and huge cost of biochemical experiments, there is no experimentally measured binding data for most MHC molecules. To address the problem of predicting peptides binding to these MHC molecules, recently computational approaches, called pan-specific methods, have received keen interest. Pan-specific methods make use of experimentally obtained binding data of multiple alleles, by which binding peptides (binders) of not only these alleles but also those alleles with no known binders can be predicted. To investigate the possibility of further improvement in performance and usability of pan-specific methods, this article extensively reviews existing pan-specific methods and their web servers. We first present a general framework of pan-specific methods. Then, the strategies and performance as well as utilities of web servers are compared. Finally, we discuss the future direction to improve pan-specific methods for MHC-peptide binding prediction.
Collapse
Affiliation(s)
- Lianming Zhang
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China
| | | | | | | |
Collapse
|
46
|
Abstract
Vaccine informatics is an emerging research area that focuses on development and applications of bioinformatics methods that can be used to facilitate every aspect of the preclinical, clinical, and postlicensure vaccine enterprises. Many immunoinformatics algorithms and resources have been developed to predict T- and B-cell immune epitopes for epitope vaccine development and protective immunity analysis. Vaccine protein candidates are predictable in silico from genome sequences using reverse vaccinology. Systematic transcriptomics and proteomics gene expression analyses facilitate rational vaccine design and identification of gene responses that are correlates of protection in vivo. Mathematical simulations have been used to model host-pathogen interactions and improve vaccine production and vaccination protocols. Computational methods have also been used for development of immunization registries or immunization information systems, assessment of vaccine safety and efficacy, and immunization modeling. Computational literature mining and databases effectively process, mine, and store large amounts of vaccine literature and data. Vaccine Ontology (VO) has been initiated to integrate various vaccine data and support automated reasoning.
Collapse
|
47
|
Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2011; 39:W385-90. [PMID: 21609959 PMCID: PMC3125735 DOI: 10.1093/nar/gkr284] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Sequence-derived structural and physicochemical features have been extensively used for analyzing and predicting structural, functional, expression and interaction profiles of proteins and peptides. PROFEAT has been developed as a web server for computing commonly used features of proteins and peptides from amino acid sequence. To facilitate more extensive studies of protein and peptides, numerous improvements and updates have been made to PROFEAT. We added new functions for computing descriptors of protein–protein and protein–small molecule interactions, segment descriptors for local properties of protein sequences, topological descriptors for peptide sequences and small molecule structures. We also added new feature groups for proteins and peptides (pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, total amino acid properties and atomic-level topological descriptors) as well as for small molecules (atomic-level topological descriptors). Overall, PROFEAT computes 11 feature groups of descriptors for proteins and peptides, and a feature group of more than 400 descriptors for small molecules plus the derived features for protein–protein and protein–small molecule interactions. Our computational algorithms have been extensively tested and used in a number of published works for predicting proteins of specific structural or functional classes, protein–protein interactions, peptides of specific functions and quantitative structure activity relationships of small molecules. PROFEAT is accessible free of charge at http://bidd.cz3.nus.edu.sg/cgi-bin/prof/protein/profnew.cgi.
Collapse
Affiliation(s)
- H B Rao
- College of Chemistry, Sichuan University, Chengdu, 610064, PR China
| | | | | | | | | |
Collapse
|
48
|
Xu Q, Pan SJ, Xue HH, Yang Q. Multitask learning for protein subcellular location prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:748-759. [PMID: 20421687 DOI: 10.1109/tcbb.2010.22] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Protein subcellular localization is concerned with predicting the location of a protein within a cell using computational methods. The location information can indicate key functionalities of proteins. Thus, accurate prediction of subcellular localizations of proteins can help the prediction of protein functions and genome annotations, as well as the identification of drug targets. Machine learning methods such as Support Vector Machines (SVMs) have been used in the past for the problem of protein subcellular localization, but have been shown to suffer from a lack of annotated training data in each species under study. To overcome this data sparsity problem, we observe that because some of the organisms may be related to each other, there may be some commonalities across different organisms that can be discovered and used to help boost the data in each localization task. In this paper, we formulate protein subcellular localization problem as one of multitask learning across different organisms. We adapt and compare two specializations of the multitask learning algorithms on 20 different organisms. Our experimental results show that multitask learning performs much better than the traditional single-task methods. Among the different multitask learning methods, we found that the multitask kernels and supertype kernels under multitask learning that share parameters perform slightly better than multitask learning by sharing latent features. The most significant improvement in terms of localization accuracy is about 25 percent. We find that if the organisms are very different or are remotely related from a biological point of view, then jointly training the multiple models cannot lead to significant improvement. However, if they are closely related biologically, the multitask learning can do much better than individual learning.
Collapse
Affiliation(s)
- Qian Xu
- Bioengineering Program, Hong Kong University of Science and Technology, Clearwater Bay, Kowloon, Hong Kong.
| | | | | | | |
Collapse
|
49
|
Hertz T, Nolan D, James I, John M, Gaudieri S, Phillips E, Huang JC, Riadi G, Mallal S, Jojic N. Mapping the landscape of host-pathogen coevolution: HLA class I binding and its relationship with evolutionary conservation in human and viral proteins. J Virol 2011; 85:1310-21. [PMID: 21084470 PMCID: PMC3020499 DOI: 10.1128/jvi.01966-10] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 11/09/2010] [Indexed: 12/24/2022] Open
Abstract
The high diversity of HLA binding preferences has been driven by the sequence diversity of short segments of relevant pathogenic proteins presented by HLA molecules to the immune system. To identify possible commonalities in HLA binding preferences, we quantify these using a novel measure termed "targeting efficiency," which captures the correlation between HLA-peptide binding affinities and the conservation of the targeted proteomic regions. Analysis of targeting efficiencies for 95 HLA class I alleles over thousands of human proteins and 52 human viruses indicates that HLA molecules preferentially target conserved regions in these proteomes, although the arboviral Flaviviridae are a notable exception where nonconserved regions are preferentially targeted by most alleles. HLA-A alleles and several HLA-B alleles that have maintained close sequence identity with chimpanzee homologues target conserved human proteins and DNA viruses such as Herpesviridae and Adenoviridae most efficiently, while all HLA-B alleles studied efficiently target RNA viruses. These patterns of host and pathogen specialization are both consistent with coevolutionary selection and functionally relevant in specific cases; for example, preferential HLA targeting of conserved proteomic regions is associated with improved outcomes in HIV infection and with protection against dengue hemorrhagic fever. Efficiency analysis provides a novel perspective on the coevolutionary relationship between HLA class I molecular diversity, self-derived peptides that shape T-cell immunity through ontogeny, and the broad range of viruses that subsequently engage with the adaptive immune response.
Collapse
Affiliation(s)
- Tomer Hertz
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - David Nolan
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - Ian James
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - Mina John
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - Silvana Gaudieri
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - Elizabeth Phillips
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - Jim C. Huang
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - Gonzalo Riadi
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - Simon Mallal
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| | - Nebojsa Jojic
- Microsoft Research, One Microsoft Way, Redmond, Washington 98052, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Murdoch 6150, Western Australia, Australia, School of Anatomy and Human Biology, Centre for Forensic Science, University of Western Australia, Australia, Fundación Ciencia para la Vida, Avenida Zañartu 1482, Ñuñoa, Santiago, Chile
| |
Collapse
|
50
|
Shao X, Tan CSH, Voss C, Li SSC, Deng N, Bader GD. A regression framework incorporating quantitative and negative interaction data improves quantitative prediction of PDZ domain-peptide interaction from primary sequence. ACTA ACUST UNITED AC 2010; 27:383-90. [PMID: 21127034 PMCID: PMC3031032 DOI: 10.1093/bioinformatics/btq657] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Motivation: Predicting protein interactions involving peptide recognition domains is essential for understanding the many important biological processes they mediate. It is important to consider the binding strength of these interactions to help us construct more biologically relevant protein interaction networks that consider cellular context and competition between potential binders. Results: We developed a novel regression framework that considers both positive (quantitative) and negative (qualitative) interaction data available for mouse PDZ domains to quantitatively predict interactions between PDZ domains, a large peptide recognition domain family, and their peptide ligands using primary sequence information. First, we show that it is possible to learn from existing quantitative and negative interaction data to infer the relative binding strength of interactions involving previously unseen PDZ domains and/or peptides given their primary sequence. Performance was measured using cross-validated hold out testing and testing with previously unseen PDZ domain–peptide interactions. Second, we find that incorporating negative data improves quantitative interaction prediction. Third, we show that sequence similarity is an important prediction performance determinant, which suggests that experimentally collecting additional quantitative interaction data for underrepresented PDZ domain subfamilies will improve prediction. Availability and Implementation: The Matlab code for our SemiSVR predictor and all data used here are available at http://baderlab.org/Data/PDZAffinity. Contact:gary.bader@utoronto.ca; dengnaiyang@cau.edu.cn Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaojian Shao
- Department of Applied Mathematics, College of Science, China Agricultural University, Beijing, 100083, China
| | | | | | | | | | | |
Collapse
|