1
|
Hu T, Li K, Ma C, Zhou N, Chen Q, Qi C. Improved classification of soil As contamination at continental scale: Resolving class imbalances using machine learning approach. CHEMOSPHERE 2024; 363:142697. [PMID: 38925515 DOI: 10.1016/j.chemosphere.2024.142697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 06/11/2024] [Accepted: 06/23/2024] [Indexed: 06/28/2024]
Abstract
The identification of arsenic (As)-contaminated areas is an important prerequisite for soil management and reclamation. Although previous studies have attempted to identify soil As contamination via machine learning (ML) methods combined with soil spectroscopy, they have ignored the rarity of As-contaminated soil samples, leading to an imbalanced learning problem. A novel ML framework was thus designed herein to solve the imbalance issue in identifying soil As contamination from soil visible and near-infrared spectra. Spectral preprocessing, imbalanced dataset resampling, and model comparisons were combined in the ML framework, and the optimal combination was selected based on the recall. In addition, Bayesian optimization was used to tune the model hyperparameters. The optimized model achieved recall, area under the curve, and balanced accuracy values of 0.83, 0.88, and 0.79, respectively, on the testing set. The recall was further improved to 0.87 with the threshold adjustment, indicating the model's excellent performance and generalization capability in classifying As-contaminated soil samples. The optimal model was applied to a global soil spectral dataset to predict areas at a high risk of soil As contamination on a global scale. The ML framework established in this study represents a milestone in the classification of soil As contamination and can serve as a valuable reference for contamination management in soil science.
Collapse
Affiliation(s)
- Tao Hu
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Kechao Li
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Chundi Ma
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Nana Zhou
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Qiusong Chen
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China
| | - Chongchong Qi
- School of Resources and Safety Engineering, Central South University, Changsha, 410083, China; School of Metallurgy and Environment, Central South University, Changsha, 410083, China; Fankou Lead-Zinc Mine, NONFEMET, Shaoguan, 511100, China.
| |
Collapse
|
2
|
Yang Y, Wei Z, Cia G, Song X, Pucci F, Rooman M, Xue F, Hou Q. MHCII-peptide presentation: an assessment of the state-of-the-art prediction methods. Front Immunol 2024; 15:1293706. [PMID: 38646540 PMCID: PMC11027168 DOI: 10.3389/fimmu.2024.1293706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 02/19/2024] [Indexed: 04/23/2024] Open
Abstract
Major histocompatibility complex Class II (MHCII) proteins initiate and regulate immune responses by presentation of antigenic peptides to CD4+ T-cells and self-restriction. The interactions between MHCII and peptides determine the specificity of the immune response and are crucial in immunotherapy and cancer vaccine design. With the ever-increasing amount of MHCII-peptide binding data available, many computational approaches have been developed for MHCII-peptide interaction prediction over the last decade. There is thus an urgent need to provide an up-to-date overview and assessment of these newly developed computational methods. To benchmark the prediction performance of these methods, we constructed an independent dataset containing binding and non-binding peptides to 20 human MHCII protein allotypes from the Immune Epitope Database, covering DP, DR and DQ alleles. After collecting 11 known predictors up to January 2022, we evaluated those available through a webserver or standalone packages on this independent dataset. The benchmarking results show that MixMHC2pred and NetMHCIIpan-4.1 achieve the best performance among all predictors. In general, newly developed methods perform better than older ones due to the rapid expansion of data on which they are trained and the development of deep learning algorithms. Our manuscript not only draws a full picture of the state-of-art of MHCII-peptide binding prediction, but also guides researchers in the choice among the different predictors. More importantly, it will inspire biomedical researchers in both academia and industry for the future developments in this field.
Collapse
Affiliation(s)
- Yaqing Yang
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Zhonghui Wei
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Gabriel Cia
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Xixi Song
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Fuzhong Xue
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| | - Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, China
- National Institute of Health Data Science of China, Shandong University, Jinan, China
| |
Collapse
|
3
|
Zawawi A, Forman R, Smith H, Mair I, Jibril M, Albaqshi MH, Brass A, Derrick JP, Else KJ. In silico design of a T-cell epitope vaccine candidate for parasitic helminth infection. PLoS Pathog 2020; 16:e1008243. [PMID: 32203551 PMCID: PMC7117776 DOI: 10.1371/journal.ppat.1008243] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 04/02/2020] [Accepted: 02/20/2020] [Indexed: 11/20/2022] Open
Abstract
Trichuris trichiura is a parasite that infects 500 million people worldwide, leading to colitis, growth retardation and Trichuris dysentery syndrome. There are no licensed vaccines available to prevent Trichuris infection and current treatments are of limited efficacy. Trichuris infections are linked to poverty, reducing children's educational performance and the economic productivity of adults. We employed a systematic, multi-stage process to identify a candidate vaccine against trichuriasis based on the incorporation of selected T-cell epitopes into virus-like particles. We conducted a systematic review to identify the most appropriate in silico prediction tools to predict histocompatibility complex class II (MHC-II) molecule T-cell epitopes. These tools were used to identify candidate MHC-II epitopes from predicted ORFs in the Trichuris genome, selected using inclusion and exclusion criteria. Selected epitopes were incorporated into Hepatitis B core antigen virus-like particles (VLPs). Bone marrow-derived dendritic cells and bone marrow-derived macrophages responded in vitro to VLPs irrespective of whether the VLP also included T-cell epitopes. The VLPs were internalized and co-localized in the antigen presenting cell lysosomes. Upon challenge infection, mice vaccinated with the VLPs+T-cell epitopes showed a significantly reduced worm burden, and mounted Trichuris-specific IgM and IgG2c antibody responses. The protection of mice by VLPs+T-cell epitopes was characterised by the production of mesenteric lymph node (MLN)-derived Th2 cytokines and goblet cell hyperplasia. Collectively our data establishes that a combination of in silico genome-based CD4+ T-cell epitope prediction, combined with VLP delivery, offers a promising pipeline for the development of an effective, safe and affordable helminth vaccine.
Collapse
Affiliation(s)
- Ayat Zawawi
- Lydia Becker Institute of Immunology and Inflammation, School of Biological Sciences, Faculty of Biology, Medicine, and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Ruth Forman
- Lydia Becker Institute of Immunology and Inflammation, School of Biological Sciences, Faculty of Biology, Medicine, and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Hannah Smith
- Lydia Becker Institute of Immunology and Inflammation, School of Biological Sciences, Faculty of Biology, Medicine, and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Iris Mair
- Lydia Becker Institute of Immunology and Inflammation, School of Biological Sciences, Faculty of Biology, Medicine, and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Murtala Jibril
- Lydia Becker Institute of Immunology and Inflammation, School of Biological Sciences, Faculty of Biology, Medicine, and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Munirah H. Albaqshi
- Lydia Becker Institute of Immunology and Inflammation, School of Biological Sciences, Faculty of Biology, Medicine, and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Andrew Brass
- Faculty of Biology, Medicine and Health, Division of Informatics, Imaging and Data Sciences, The University of Manchester, Manchester, United Kingdom
| | - Jeremy P. Derrick
- Lydia Becker Institute of Immunology and Inflammation, School of Biological Sciences, Faculty of Biology, Medicine, and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| | - Kathryn J. Else
- Lydia Becker Institute of Immunology and Inflammation, School of Biological Sciences, Faculty of Biology, Medicine, and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom
| |
Collapse
|
4
|
Moghram BA, Nabil E, Badr A. Ab-initio conformational epitope structure prediction using genetic algorithm and SVM for vaccine design. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2018; 153:161-170. [PMID: 29157448 DOI: 10.1016/j.cmpb.2017.10.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Revised: 09/24/2017] [Accepted: 10/10/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND AND OBJECTIVE T-cell epitope structure identification is a significant challenging immunoinformatic problem within epitope-based vaccine design. Epitopes or antigenic peptides are a set of amino acids that bind with the Major Histocompatibility Complex (MHC) molecules. The aim of this process is presented by Antigen Presenting Cells to be inspected by T-cells. MHC-molecule-binding epitopes are responsible for triggering the immune response to antigens. The epitope's three-dimensional (3D) molecular structure (i.e., tertiary structure) reflects its proper function. Therefore, the identification of MHC class-II epitopes structure is a significant step towards epitope-based vaccine design and understanding of the immune system. METHODS In this paper, we propose a new technique using a Genetic Algorithm for Predicting the Epitope Structure (GAPES), to predict the structure of MHC class-II epitopes based on their sequence. The proposed Elitist-based genetic algorithm for predicting the epitope's tertiary structure is based on Ab-Initio Empirical Conformational Energy Program for Peptides (ECEPP) Force Field Model. The developed secondary structure prediction technique relies on Ramachandran Plot. We used two alignment algorithms: the ROSS alignment and TM-Score alignment. We applied four different alignment approaches to calculate the similarity scores of the dataset under test. We utilized the support vector machine (SVM) classifier as an evaluation of the prediction performance. RESULTS The prediction accuracy and the Area Under Receiver Operating Characteristic (ROC) Curve (AUC) were calculated as measures of performance. The calculations are performed on twelve similarity-reduced datasets of the Immune Epitope Data Base (IEDB) and a large dataset of peptide-binding affinities to HLA-DRB1*0101. The results showed that GAPES was reliable and very accurate. We achieved an average prediction accuracy of 93.50% and an average AUC of 0.974 in the IEDB dataset. Also, we achieved an accuracy of 95.125% and an AUC of 0.987 on the HLA-DRB1*0101 allele of the Wang benchmark dataset. CONCLUSIONS The results indicate that the proposed prediction technique "GAPES" is a promising technique that will help researchers and scientists to predict the protein structure and it will assist them in the intelligent design of new epitope-based vaccines.
Collapse
Affiliation(s)
- Basem Ameen Moghram
- Department of Computer Science, Faculty of Computers and Information, Cairo University, Cairo, 12613, Egypt.
| | - Emad Nabil
- Department of Computer Science, Faculty of Computers and Information, Cairo University, Cairo, 12613, Egypt.
| | - Amr Badr
- Department of Computer Science, Faculty of Computers and Information, Cairo University, Cairo, 12613, Egypt.
| |
Collapse
|
5
|
Subramaniyan V, Venkatachalam R, Srinivasan P, Palani M. In silico prediction of monovalent and chimeric tetravalent vaccines for prevention and treatment of dengue fever. J Biomed Res 2017; 32:222. [PMID: 29497025 PMCID: PMC6265401 DOI: 10.7555/jbr.31.20160109] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 10/27/2017] [Indexed: 11/22/2022] Open
Abstract
Reverse vaccinology method was used to predict the monovalent peptide vaccine candidate to produce antibodies for therapeutic purpose and to predict tetravalent vaccine candidate to act as a common vaccine to cover all the fever dengue virus serotypes. Envelope (E)-proteins of DENV-1-4 serotypes were used for vaccine prediction using NCBI, Uniprot/Swissprot, Swiss-prot viewer, VaxiJen V2.0, TMHMM, BCPREDS, Propred-1, Propred and MHC Pred,. E-proteins of DENV-1-4 serotypes were identified as antigen from which T cell epitopes, through B cell epitopes, were predicted to act as peptide vaccine candidates. Each selected T cell epitope of E-protein was confirmed to act as vaccine and to induce complementary antibody against particular serotype of dengue virus. Chimeric tetravalent vaccine was formed by the conjugation of four vaccines, each from four dengue serotypes to act as a common vaccine candidate for all the four dengue serotypes. It can be justifiably concluded that the monovalent 9-mer T cell epitope for each DENV serotypes can be used to produce specific antibody agaomst dengue virus and a chimeric common tetravalent vaccine candidate to yield a comparative vaccine to cover any of the four dengue virus serotype. This vaccine is expected to act as highly immunogenic against preventing dengue fever.
Collapse
Affiliation(s)
- Vijayakumar Subramaniyan
- Computational Phytochemistry Laboratory P.G. and Research Department of Botany and Microbiology, A.V.V.M. Sri Pushpam College (Autonomous), Poondi, Thanjavur district, Tamil Nadu 613503, India
| | - Ramesh Venkatachalam
- Computational Phytochemistry Laboratory P.G. and Research Department of Botany and Microbiology, A.V.V.M. Sri Pushpam College (Autonomous), Poondi, Thanjavur district, Tamil Nadu 613503, India
| | - Prabhu Srinivasan
- Computational Phytochemistry Laboratory P.G. and Research Department of Botany and Microbiology, A.V.V.M. Sri Pushpam College (Autonomous), Poondi, Thanjavur district, Tamil Nadu 613503, India
| | - Manogar Palani
- Computational Phytochemistry Laboratory P.G. and Research Department of Botany and Microbiology, A.V.V.M. Sri Pushpam College (Autonomous), Poondi, Thanjavur district, Tamil Nadu 613503, India
| |
Collapse
|
6
|
Characterization and expression of MHC class II alpha and II beta genes in mangrove red snapper ( Lutjanus argentimaculatus ). Mol Immunol 2015; 68:373-81. [DOI: 10.1016/j.molimm.2015.09.018] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2014] [Revised: 09/13/2015] [Accepted: 09/22/2015] [Indexed: 01/25/2023]
|
7
|
Computational modelling approaches to vaccinology. Pharmacol Res 2015; 92:40-5. [DOI: 10.1016/j.phrs.2014.08.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Revised: 08/04/2014] [Accepted: 08/18/2014] [Indexed: 01/22/2023]
|
8
|
Eng LP, Tan TW, Tong JC. Building MHC class II epitope predictor using machine learning approaches. Methods Mol Biol 2015; 1268:67-73. [PMID: 25555721 DOI: 10.1007/978-1-4939-2285-7_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Identification of T-cell epitopes binding to MHC class II molecules is an important step in epitope-based vaccine development. This process has since been accelerated with the use of bioinformatics tools to aid in the prediction of peptide binding to MHC class II molecules and also to systematically scan for candidate peptides in antigenic proteins. There have been many prediction software developed over the years using various methods and algorithms and they are becoming increasingly sophisticated. Here, we illustrate the use of machine learning algorithms to train on MHC class II peptide data represented by feature vectors describing their amino acid physicochemical properties. The developed prediction model can then be used to predict new peptide data.
Collapse
Affiliation(s)
- Loan Ping Eng
- Department of Biochemistry, National University of Singapore, 14 Medical Drive #14-01T, Singapore, Singapore, 117599
| | | | | |
Collapse
|
9
|
Eng CLP, Tong JC, Tan TW. Predicting host tropism of influenza A virus proteins using random forest. BMC Med Genomics 2014; 7 Suppl 3:S1. [PMID: 25521718 PMCID: PMC4290784 DOI: 10.1186/1755-8794-7-s3-s1] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Background Majority of influenza A viruses reside and circulate among animal populations, seldom infecting humans due to host range restriction. Yet when some avian strains do acquire the ability to overcome species barrier, they might become adapted to humans, replicating efficiently and causing diseases, leading to potential pandemic. With the huge influenza A virus reservoir in wild birds, it is a cause for concern when a new influenza strain emerges with the ability to cross host species barrier, as shown in light of the recent H7N9 outbreak in China. Several influenza proteins have been shown to be major determinants in host tropism. Further understanding and determining host tropism would be important in identifying zoonotic influenza virus strains capable of crossing species barrier and infecting humans. Results In this study, computational models for 11 influenza proteins have been constructed using the machine learning algorithm random forest for prediction of host tropism. The prediction models were trained on influenza protein sequences isolated from both avian and human samples, which were transformed into amino acid physicochemical properties feature vectors. The results were highly accurate prediction models (ACC>96.57; AUC>0.980; MCC>0.916) capable of determining host tropism of individual influenza proteins. In addition, features from all 11 proteins were used to construct a combined model to predict host tropism of influenza virus strains. This would help assess a novel influenza strain's host range capability. Conclusions From the prediction models constructed, all achieved high prediction performance, indicating clear distinctions in both avian and human proteins. When used together as a host tropism prediction system, zoonotic strains could potentially be identified based on different protein prediction results. Understanding and predicting host tropism of influenza proteins lay an important foundation for future work in constructing computation models capable of directly predicting interspecies transmission of influenza viruses. The models are available for prediction at http://fluleap.bic.nus.edu.sg.
Collapse
|
10
|
Kim Y, Sidney J, Buus S, Sette A, Nielsen M, Peters B. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions. BMC Bioinformatics 2014; 15:241. [PMID: 25017736 PMCID: PMC4111843 DOI: 10.1186/1471-2105-15-241] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 07/08/2014] [Indexed: 11/23/2022] Open
Abstract
Background It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set. Results We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates. Conclusion It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-241) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | - Bjoern Peters
- La Jolla Institute for Allergy & Immunology, 9420 Athena Circle, La Jolla, CA 92037, USA.
| |
Collapse
|
11
|
Gowthaman U, Agrewala JN. In silicomethods for predicting T-cell epitopes: Dr Jekyll or Mr Hyde? Expert Rev Proteomics 2014; 6:527-37. [DOI: 10.1586/epr.09.71] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
12
|
Berti F, Adamo R. Recent mechanistic insights on glycoconjugate vaccines and future perspectives. ACS Chem Biol 2013; 8:1653-63. [PMID: 23841819 DOI: 10.1021/cb400423g] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Vaccination is a key strategy for the control of various infectious diseases. Many pathogens, such as Streptococcus pneumoniae , Haemophilus influenzae type b (Hib), and Neisseria meningitidis produce on their surfaces dense and complex glycan structures, which represent an optimal target for eliciting carbohydrate specific antibodies able to confer protection against those bacteria. Glycoconjugates represent nowadays an important class of efficacious and safe commercial vaccines. It has been known for a long time that covalent linkage of poorly immunogenic carbohydrates to protein is fundamental to provide T cell epitopes for eliciting a memory response of the immune system against the saccharide. However, while the traditional mechanism of action of glycoconjugates has considered peptides generated from the carrier protein to be responsible of T cell help recruitment, only recently evidence of the active involvement of the carbohydrate part in determining the T cell help has been shown. In addition, zwitterionic polysaccharides have been proven to activate the adaptive immune system without further conjugation to protein. Progress in this interface area between chemistry and biology, in combination with novel synthetic and biosynthetic methods for the preparation of glycoconjugates, is opening new perspectives to clarify their mechanism of action and give new insights for the design of improved carbohydrate-based vaccines.
Collapse
Affiliation(s)
- Francesco Berti
- Novartis Vaccines and Diagnostics, Research Center, Via Fiorentina 1, 53100 Siena, Italy
| | - Roberto Adamo
- Novartis Vaccines and Diagnostics, Research Center, Via Fiorentina 1, 53100 Siena, Italy
| |
Collapse
|
13
|
Koch CP, Perna AM, Pillong M, Todoroff NK, Wrede P, Folkers G, Hiss JA, Schneider G. Scrutinizing MHC-I binding peptides and their limits of variation. PLoS Comput Biol 2013; 9:e1003088. [PMID: 23754940 PMCID: PMC3674988 DOI: 10.1371/journal.pcbi.1003088] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2012] [Accepted: 04/23/2013] [Indexed: 12/20/2022] Open
Abstract
Designed peptides that bind to major histocompatibility protein I (MHC-I) allomorphs bear the promise of representing epitopes that stimulate a desired immune response. A rigorous bioinformatical exploration of sequence patterns hidden in peptides that bind to the mouse MHC-I allomorph H-2Kb is presented. We exemplify and validate these motif findings by systematically dissecting the epitope SIINFEKL and analyzing the resulting fragments for their binding potential to H-2Kb in a thermal denaturation assay. The results demonstrate that only fragments exclusively retaining the carboxy- or amino-terminus of the reference peptide exhibit significant binding potential, with the N-terminal pentapeptide SIINF as shortest ligand. This study demonstrates that sophisticated machine-learning algorithms excel at extracting fine-grained patterns from peptide sequence data and predicting MHC-I binding peptides, thereby considerably extending existing linear prediction models and providing a fresh view on the computer-based molecular design of future synthetic vaccines. The server for prediction is available at http://modlab-cadd.ethz.ch (SLiDER tool, MHC-I version 2012). Future success in vaccine development will critically depend on identifying potent epitopes with reduced side effects. Among such candidate molecules, immunogenic peptides binding to major histocompatibility protein I (MHC-I) represent a preferred class of biomolecules for vaccine design. Computational models assist in the selection of the best candidate peptides by providing a mathematical rationale for antigen recognition by MHC-I. Here we present a machine-learning model that was trained on recognizing features of known MHC-I binding and non-binding peptide sequences with sustained accuracy. We were able to biochemically validate the computational predictions in a direct binding assay measuring complex formation between synthesized candidate peptides and MHC-I. Strong correspondence between the predictions and the experimentally determined binding potential corroborate the machine-learning model as viable for future antigen design. Thus, our study provides a concept for rapidly finding innovative MHC-I binding peptides with limited experimental effort.
Collapse
Affiliation(s)
- Christian P. Koch
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Zürich, Switzerland
| | - Anna M. Perna
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Zürich, Switzerland
| | - Max Pillong
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Zürich, Switzerland
| | - Nickolay K. Todoroff
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Zürich, Switzerland
| | - Paul Wrede
- Charite-Universitätsmedizin Berlin, Molekularbiologie und Bioinformatik, Berlin, Germany
| | - Gerd Folkers
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Zürich, Switzerland
| | - Jan A. Hiss
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Zürich, Switzerland
| | - Gisbert Schneider
- ETH Zürich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, Zürich, Switzerland
- * E-mail:
| |
Collapse
|
14
|
Velez Rueda AJ, Mistchenko AS, Viegas M. Phylogenetic and phylodynamic analyses of human metapneumovirus in Buenos Aires (Argentina) for a three-year period (2009-2011). PLoS One 2013; 8:e63070. [PMID: 23646177 PMCID: PMC3639999 DOI: 10.1371/journal.pone.0063070] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 03/27/2013] [Indexed: 12/26/2022] Open
Abstract
Human metapneumovirus, which belongs to the Paramyxoviridae family and has been classified as a member of the Pneumovirus genus, is genetically and clinically similar to other family members such as human respiratory syncytial virus. A total of 1146 nasopharyngeal aspirates from pediatric patients with moderate and severe acute lower respiratory tract infections, hospitalized at the Ricardo Gutierrez Childreńs Hospital (Buenos Aires, Argentina), were tested by real time RT-PCR for human metapneumovirus. Results showed that 168 (14.65%) were positive. Thirty-six of these 168 samples were randomly selected to characterize positive cases molecularly. The phylogenetic analysis of the sequences of the G and F genes showed that genotypes A2 and B2 cocirculated during 2009 and 2010 and that only genotype A2 circulated in 2011 in Argentina. Genotype A2 prevailed during the study period, a fact supported by a higher effective population size (Neτ) and higher diversity as compared to that of genotype B2 (10.9% (SE 1.3%) vs. 1.7% (SE 0.4%), respectively). The phylogeographic analysis of the G protein gene sequences showed that this virus has no geographical restrictions and can travel globally harbored in hosts. The selection pressure analysis of the F protein showed that although this protein has regions with polymorphisms, it has vast structural and functional constraints. In addition, the predicted B-linear epitopes and the sites recognized by previously described monoclonal antibodies were conserved in all Argentine sequences. This points out this protein as a potential candidate to be the target of future humanized antibodies or vaccines.
Collapse
Affiliation(s)
- Ana Julia Velez Rueda
- Laboratorio de Virología, Hospital de Niños “Dr. Ricardo Gutiérrez”, Ciudad Autónoma de Buenos Aires, Argentina
- Comisión de Investigaciones Científicas (CIC), La Plata, Provincia de Buenos Aires, Argentina
| | - Alicia Susana Mistchenko
- Laboratorio de Virología, Hospital de Niños “Dr. Ricardo Gutiérrez”, Ciudad Autónoma de Buenos Aires, Argentina
- Comisión de Investigaciones Científicas (CIC), La Plata, Provincia de Buenos Aires, Argentina
| | - Mariana Viegas
- Laboratorio de Virología, Hospital de Niños “Dr. Ricardo Gutiérrez”, Ciudad Autónoma de Buenos Aires, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- * E-mail:
| |
Collapse
|
15
|
Bordner AJ. Structure-based prediction of Major Histocompatibility Complex (MHC) epitopes. Methods Mol Biol 2013; 1061:323-43. [PMID: 23963947 DOI: 10.1007/978-1-62703-589-7_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Because of the enormous diversity of both MHC proteins and peptide epitopes, computational epitope prediction methods are needed in order to supplement limited experimental data. These prediction methods are useful for guiding experiments and have many potential biomedical applications. Unlike popular sequence-based methods, structure-based epitope prediction methods can predict epitopes for multiple MHC types with highly distinct peptide binding propensities. In this chapter, we describe in detail our previously developed structure-based epitope prediction methods for both class I and class II MHC proteins. We also discuss the relative advantages and disadvantages of sequence-based versus structure-based methods and how to evaluate prediction performance.
Collapse
|
16
|
Flower DR, Perrie Y. Identification of Candidate Vaccine Antigens In Silico. IMMUNOMIC DISCOVERY OF ADJUVANTS AND CANDIDATE SUBUNIT VACCINES 2013. [PMCID: PMC7120937 DOI: 10.1007/978-1-4614-5070-2_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The identification of immunogenic whole-protein antigens is fundamental to the successful discovery of candidate subunit vaccines and their rapid, effective, and efficient transformation into clinically useful, commercially successful vaccine formulations. In the wider context of the experimental discovery of vaccine antigens, with particular reference to reverse vaccinology, this chapter adumbrates the principal computational approaches currently deployed in the hunt for novel antigens: genome-level prediction of antigens, antigen identification through the use of protein sequence alignment-based approaches, antigen detection through the use of subcellular location prediction, and the use of alignment-independent approaches to antigen discovery. Reference is also made to the recent emergence of various expert systems for protein antigen identification.
Collapse
Affiliation(s)
- Darren R. Flower
- Aston Pharmacy School, School of Life and Health Sciences, University of Aston, Aston Triangle, Birmingham, B4 7ET United Kingdom
| | - Yvonne Perrie
- Aston Pharmacy School, School of Life and Health Sciences, Aston University, Aston Triangle, Birmingham, B4 7ET United Kingdom
| |
Collapse
|
17
|
A rationally engineered anti-HIV peptide fusion inhibitor with greatly reduced immunogenicity. Antimicrob Agents Chemother 2012; 57:679-88. [PMID: 23147734 DOI: 10.1128/aac.01152-12] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Peptides derived from the C-terminal heptad repeat 2 (HR2) region of the HIV-1 gp41 envelope glycoprotein, so-called C peptides, are very efficient HIV-1 fusion inhibitors. We previously developed innovative gene therapeutic approaches aiming at the direct in vivo production of C peptides from genetically modified host cells and found that T cells expressing membrane-anchored or secreted C peptides are protected from HIV-1 infection. However, an unwanted immune response against such antiviral peptides may significantly impair clinical efficacy and pose safety risks to patients. To overcome this problem, we engineered a novel C peptide, V2o, with greatly reduced immunogenicity and excellent antiviral activity. V2o is based on the chimeric C peptide C46-EHO, which is derived from the HR2 regions of HIV-2(EHO) and HIV-1(HxB2) and has broad anti-HIV and anti-simian immunodeficiency virus activity. Antibody and major histocompatibility complex class I epitopes within the C46-EHO peptide sequence were identified by in silico and in vitro analyses. Using rational design, we removed these epitopes by amino acid substitutions and thus minimized antigenicity and immunogenicity considerably. At the same time, the antiviral activity of the deimmunized peptide V2o was preserved or even enhanced compared to that of the parental C46-EHO peptide. Thus, V2o is an ideal candidate, especially for those novel therapeutic approaches for HIV infection that involve direct in vivo production of antiviral C peptides.
Collapse
|
18
|
Bendtsen C. Prediction of human major histocompatibility complex class II binding peptides: a frequent case of publication bias? Artif Intell Med 2012; 55:209. [PMID: 22633493 DOI: 10.1016/j.artmed.2012.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2012] [Revised: 05/02/2012] [Accepted: 05/02/2012] [Indexed: 10/28/2022]
|
19
|
Zhang L, Udaka K, Mamitsuka H, Zhu S. Toward more accurate pan-specific MHC-peptide binding prediction: a review of current methods and tools. Brief Bioinform 2011; 13:350-64. [PMID: 21949215 DOI: 10.1093/bib/bbr060] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Binding of short antigenic peptides to major histocompatibility complex (MHC) molecules is a core step in adaptive immune response. Precise identification of MHC-restricted peptides is of great significance for understanding the mechanism of immune response and promoting the discovery of immunogenic epitopes. However, due to the extremely high MHC polymorphism and huge cost of biochemical experiments, there is no experimentally measured binding data for most MHC molecules. To address the problem of predicting peptides binding to these MHC molecules, recently computational approaches, called pan-specific methods, have received keen interest. Pan-specific methods make use of experimentally obtained binding data of multiple alleles, by which binding peptides (binders) of not only these alleles but also those alleles with no known binders can be predicted. To investigate the possibility of further improvement in performance and usability of pan-specific methods, this article extensively reviews existing pan-specific methods and their web servers. We first present a general framework of pan-specific methods. Then, the strategies and performance as well as utilities of web servers are compared. Finally, we discuss the future direction to improve pan-specific methods for MHC-peptide binding prediction.
Collapse
Affiliation(s)
- Lianming Zhang
- School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China
| | | | | | | |
Collapse
|
20
|
Knapp B, Giczi V, Ribarics R, Schreiner W. PeptX: using genetic algorithms to optimize peptides for MHC binding. BMC Bioinformatics 2011; 12:241. [PMID: 21679477 PMCID: PMC3225262 DOI: 10.1186/1471-2105-12-241] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2011] [Accepted: 06/17/2011] [Indexed: 11/18/2022] Open
Abstract
Background The binding between the major histocompatibility complex and the presented peptide is an indispensable prerequisite for the adaptive immune response. There is a plethora of different in silico techniques for the prediction of the peptide binding affinity to major histocompatibility complexes. Most studies screen a set of peptides for promising candidates to predict possible T cell epitopes. In this study we ask the question vice versa: Which peptides do have highest binding affinities to a given major histocompatibility complex according to certain in silico scoring functions? Results Since a full screening of all possible peptides is not feasible in reasonable runtime, we introduce a heuristic approach. We developed a framework for Genetic Algorithms to optimize peptides for the binding to major histocompatibility complexes. In an extensive benchmark we tested various operator combinations. We found that (1) selection operators have a strong influence on the convergence of the population while recombination operators have minor influence and (2) that five different binding prediction methods lead to five different sets of "optimal" peptides for the same major histocompatibility complex. The consensus peptides were experimentally verified as high affinity binders. Conclusion We provide a generalized framework to calculate sets of high affinity binders based on different previously published scoring functions in reasonable runtime. Furthermore we give insight into the different behaviours of operators and scoring functions of the Genetic Algorithm.
Collapse
Affiliation(s)
- Bernhard Knapp
- Center for Medical Statistics, Informatics and Intelligent Systems, Department for Biosimulation and Bioinformatics, Medical University of Vienna, Austria.
| | | | | | | |
Collapse
|
21
|
Wang P, Sidney J, Kim Y, Sette A, Lund O, Nielsen M, Peters B. Peptide binding predictions for HLA DR, DP and DQ molecules. BMC Bioinformatics 2010; 11:568. [PMID: 21092157 PMCID: PMC2998531 DOI: 10.1186/1471-2105-11-568] [Citation(s) in RCA: 486] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2010] [Accepted: 11/22/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND MHC class II binding predictions are widely used to identify epitope candidates in infectious agents, allergens, cancer and autoantigens. The vast majority of prediction algorithms for human MHC class II to date have targeted HLA molecules encoded in the DR locus. This reflects a significant gap in knowledge as HLA DP and DQ molecules are presumably equally important, and have only been studied less because they are more difficult to handle experimentally. RESULTS In this study, we aimed to narrow this gap by providing a large scale dataset of over 17,000 HLA-peptide binding affinities for a set of 11 HLA DP and DQ alleles. We also expanded our dataset for HLA DR alleles resulting in a total of 40,000 MHC class II binding affinities covering 26 allelic variants. Utilizing this dataset, we generated prediction tools utilizing several machine learning algorithms and evaluated their performance. CONCLUSION We found that 1) prediction methodologies developed for HLA DR molecules perform equally well for DP or DQ molecules. 2) Prediction performances were significantly increased compared to previous reports due to the larger amounts of training data available. 3) The presence of homologous peptides between training and testing datasets should be avoided to give real-world estimates of prediction performance metrics, but the relative ranking of different predictors is largely unaffected by the presence of homologous peptides, and predictors intended for end-user applications should include all training data for maximum performance. 4) The recently developed NN-align prediction method significantly outperformed all other algorithms, including a naïve consensus based on all prediction methods. A new consensus method dropping the comparably weak ARB prediction method could outperform the NN-align method, but further research into how to best combine MHC class II binding predictions is required.
Collapse
Affiliation(s)
- Peng Wang
- La Jolla Institute for Allergy and Immunology, La Jolla, USA
| | - John Sidney
- La Jolla Institute for Allergy and Immunology, La Jolla, USA
| | - Yohan Kim
- La Jolla Institute for Allergy and Immunology, La Jolla, USA
| | | | - Ole Lund
- Center for Biological Sequence Analysis, Department for Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Morten Nielsen
- Center for Biological Sequence Analysis, Department for Systems Biology, Technical University of Denmark, Lyngby, Denmark
| | - Bjoern Peters
- La Jolla Institute for Allergy and Immunology, La Jolla, USA
| |
Collapse
|
22
|
Flower DR, Macdonald IK, Ramakrishnan K, Davies MN, Doytchinova IA. Computer aided selection of candidate vaccine antigens. Immunome Res 2010; 6 Suppl 2:S1. [PMID: 21067543 PMCID: PMC2981880 DOI: 10.1186/1745-7580-6-s2-s1] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Immunoinformatics is an emergent branch of informatics science that long ago pullulated from the tree of knowledge that is bioinformatics. It is a discipline which applies informatic techniques to problems of the immune system. To a great extent, immunoinformatics is typified by epitope prediction methods. It has found disappointingly limited use in the design and discovery of new vaccines, which is an area where proper computational support is generally lacking. Most extant vaccines are not based around isolated epitopes but rather correspond to chemically-treated or attenuated whole pathogens or correspond to individual proteins extract from whole pathogens or correspond to complex carbohydrate. In this chapter we attempt to review what progress there has been in an as-yet-underexplored area of immunoinformatics: the computational discovery of whole protein antigens. The effective development of antigen prediction methods would significantly reduce the laboratory resource required to identify pathogenic proteins as candidate subunit vaccines. We begin our review by placing antigen prediction firmly into context, exploring the role of reverse vaccinology in the design and discovery of vaccines. We also highlight several competing yet ultimately complementary methodological approaches: sub-cellular location prediction, identifying antigens using sequence similarity, and the use of sophisticated statistical approaches for predicting the probability of antigen characteristics. We end by exploring how a systems immunomics approach to the prediction of immunogenicity would prove helpful in the prediction of antigens.
Collapse
Affiliation(s)
- Darren R Flower
- School of Life and Health Sciences, University of Aston, Aston Triangle, Birmingham, B4 7ET, UK.
| | | | | | | | | |
Collapse
|
23
|
Flower DR, Phadwal K, Macdonald IK, Coveney PV, Davies MN, Wan S. T-cell epitope prediction and immune complex simulation using molecular dynamics: state of the art and persisting challenges. Immunome Res 2010; 6 Suppl 2:S4. [PMID: 21067546 PMCID: PMC2981876 DOI: 10.1186/1745-7580-6-s2-s4] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Atomistic Molecular Dynamics provides powerful and flexible tools for the prediction and analysis of molecular and macromolecular systems. Specifically, it provides a means by which we can measure theoretically that which cannot be measured experimentally: the dynamic time-evolution of complex systems comprising atoms and molecules. It is particularly suitable for the simulation and analysis of the otherwise inaccessible details of MHC-peptide interaction and, on a larger scale, the simulation of the immune synapse. Progress has been relatively tentative yet the emergence of truly high-performance computing and the development of coarse-grained simulation now offers us the hope of accurately predicting thermodynamic parameters and of simulating not merely a handful of proteins but larger, longer simulations comprising thousands of protein molecules and the cellular scale structures they form. We exemplify this within the context of immunoinformatics.
Collapse
Affiliation(s)
- Darren R Flower
- Life and Health Sciences, Aston University, Aston Triangle, Birmingham B4 7ET, UK
| | - Kanchan Phadwal
- Oxford Biomedical Research Centre, The John Radcliffe Hospital, Room 4503, Corridor 4b, Level 4, Oxford, OX 3 9DU, UK
| | - Isabel K Macdonald
- OncImmune Limited, Clinical Sciences Building, Nottingham City Hospital, Hucknall Rd. Nottingham, NG5 1PB, UK
| | - Peter V Coveney
- Centre for Computational Science, Chemistry Department, University College of London, 20 Gordon Street, WC1H 0AJ, London, UK
| | - Matthew N Davies
- SGDP, Institute of Psychiatry, King's College London, De Crespigny Park, London, SE5 8AF, UK
| | - Shunzhou Wan
- Centre for Computational Science, Chemistry Department, University College of London, 20 Gordon Street, WC1H 0AJ, London, UK
| |
Collapse
|
24
|
Bremel RD, Homan EJ. An integrated approach to epitope analysis I: Dimensional reduction, visualization and prediction of MHC binding using amino acid principal components and regression approaches. Immunome Res 2010; 6:7. [PMID: 21044289 PMCID: PMC2990731 DOI: 10.1186/1745-7580-6-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Accepted: 11/02/2010] [Indexed: 11/30/2022] Open
Abstract
Background Operation of the immune system is multivariate. Reduction of the dimensionality is essential to facilitate understanding of this complex biological system. One multi-dimensional facet of the immune system is the binding of epitopes to the MHC-I and MHC-II molecules by diverse populations of individuals. Prediction of such epitope binding is critical and several immunoinformatic strategies utilizing amino acid substitution matrices have been designed to develop predictive algorithms. Contemporaneously, computational and statistical tools have evolved to handle multivariate and megavariate analysis, but these have not been systematically deployed in prediction of MHC binding. Partial least squares analysis, principal component analysis, and associated regression techniques have become the norm in handling complex datasets in many fields. Over two decades ago Wold and colleagues showed that principal components of amino acids could be used to predict peptide binding to cellular receptors. We have applied this observation to the analysis of MHC binding, and to derivation of predictive methods applicable on a whole proteome scale. Results We show that amino acid principal components and partial least squares approaches can be utilized to visualize the underlying physicochemical properties of the MHC binding domain by using commercially available software. We further show the application of amino acid principal components to develop both linear partial least squares and non-linear neural network regression prediction algorithms for MHC-I and MHC-II molecules. Several visualization options for the output aid in understanding the underlying physicochemical properties, enable confirmation of earlier work on the relative importance of certain peptide residues to MHC binding, and also provide new insights into differences among MHC molecules. We compared both the linear and non-linear MHC binding prediction tools to several predictive tools currently available on the Internet. Conclusions As opposed to the highly constrained user-interaction paradigms of web-server approaches, local computational approaches enable interactive analysis and visualization of complex multidimensional data using robust mathematical tools. Our work shows that prediction tools such as these can be constructed on the widely available JMP® platform, can operate in a spreadsheet environment on a desktop computer, and are capable of handling proteome-scale analysis with high throughput.
Collapse
Affiliation(s)
- Robert D Bremel
- ioGenetics LLC, 3591 Anderson Street, Madison, WI 53704, USA.
| | | |
Collapse
|
25
|
MHC Class II Binding Prediction-A Little Help from a Friend. J Biomed Biotechnol 2010; 2010:705821. [PMID: 20508817 PMCID: PMC2875769 DOI: 10.1155/2010/705821] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2009] [Revised: 01/20/2010] [Accepted: 02/22/2010] [Indexed: 11/18/2022] Open
Abstract
Vaccines are the greatest single instrument of prophylaxis against infectious diseases, with immeasurable benefits to human wellbeing. The accurate and reliable prediction of peptide-MHC binding is fundamental to the robust identification of T-cell epitopes and thus the successful design of peptide- and protein-based vaccines. The prediction of MHC class II peptide binding has hitherto proved recalcitrant and refractory. Here we illustrate the utility of existing computational tools for in silico prediction of peptides binding to class II MHCs. Most of the methods, tested in the present study, detect more than the half of the true binders in the top 5% of all possible nonamers generated from one protein. This number increases in the top 10% and 15% and then does not change significantly. For the top 15% the identified binders approach 86%. In terms of lab work this means 85% less expenditure on materials, labour and time. We show that while existing caveats are well founded, nonetheless use of computational models of class II binding can still offer viable help to the work of the immunologist and vaccinologist.
Collapse
|
26
|
Bordner AJ, Mittelmann HD. Prediction of the binding affinities of peptides to class II MHC using a regularized thermodynamic model. BMC Bioinformatics 2010; 11:41. [PMID: 20089173 PMCID: PMC2828437 DOI: 10.1186/1471-2105-11-41] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2009] [Accepted: 01/20/2010] [Indexed: 12/25/2022] Open
Abstract
Background The binding of peptide fragments of extracellular peptides to class II MHC is a crucial event in the adaptive immune response. Each MHC allotype generally binds a distinct subset of peptides and the enormous number of possible peptide epitopes prevents their complete experimental characterization. Computational methods can utilize the limited experimental data to predict the binding affinities of peptides to class II MHC. Results We have developed the Regularized Thermodynamic Average, or RTA, method for predicting the affinities of peptides binding to class II MHC. RTA accounts for all possible peptide binding conformations using a thermodynamic average and includes a parameter constraint for regularization to improve accuracy on novel data. RTA was shown to achieve higher accuracy, as measured by AUC, than SMM-align on the same data for all 17 MHC allotypes examined. RTA also gave the highest accuracy on all but three allotypes when compared with results from 9 different prediction methods applied to the same data. In addition, the method correctly predicted the peptide binding register of 17 out of 18 peptide-MHC complexes. Finally, we found that suboptimal peptide binding registers, which are often ignored in other prediction methods, made significant contributions of at least 50% of the total binding energy for approximately 20% of the peptides. Conclusions The RTA method accurately predicts peptide binding affinities to class II MHC and accounts for multiple peptide binding registers while reducing overfitting through regularization. The method has potential applications in vaccine design and in understanding autoimmune disorders. A web server implementing the RTA prediction method is available at http://bordnerlab.org/RTA/.
Collapse
|
27
|
Dimitrov I, Garnev P, Flower DR, Doytchinova I. Peptide binding to the HLA-DRB1 supertype: a proteochemometrics analysis. Eur J Med Chem 2009; 45:236-43. [PMID: 19896246 DOI: 10.1016/j.ejmech.2009.09.049] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2009] [Revised: 09/04/2009] [Accepted: 09/29/2009] [Indexed: 11/19/2022]
Abstract
A proteochemometrics approach was applied to a set of 2666 peptides binding to 12 HLA-DRB1 proteins. Sequences of both peptide and protein were described using three z-descriptors. Cross terms accounting for adjacent positions and for every second position in the peptides were included in the models, as well as cross terms for peptide/protein interactions. Models were derived based on combinations of different blocks of variables. These models had moderate goodness of fit, as expressed by r2, which ranged from 0.685 to 0.732; and good cross-validated predictive ability, as expressed by q2, which varied from 0.678 to 0.719. The external predictive ability was tested using a set of 356 HLA-DRB1 binders, which showed an r2(pred) in the range 0.364-0.530. Peptide and protein positions involved in the interactions were analyzed in terms of hydrophobicity, steric bulk and polarity.
Collapse
Affiliation(s)
- Ivan Dimitrov
- Faculty of Pharmacy, Medical University of Sofia, 2 Dunav st, 1000 Sofia, Bulgaria
| | | | | | | |
Collapse
|
28
|
Szabó TG, Palotai R, Antal P, Tokatly I, Tóthfalusi L, Lund O, Nagy G, Falus A, Buzás EI. Critical role of glycosylation in determining the length and structure of T cell epitopes. Immunome Res 2009; 5:4. [PMID: 19778434 PMCID: PMC2760507 DOI: 10.1186/1745-7580-5-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2009] [Accepted: 09/24/2009] [Indexed: 12/02/2022] Open
Abstract
Background Using a combined in silico approach, we investigated the glycosylation of T cell epitopes and autoantigens. The present systems biology analysis was made possible by currently available databases (representing full proteomes, known human T cell epitopes and autoantigens) as well as glycosylation prediction tools. Results We analyzed the probable glycosylation of human T cell epitope sequences extracted from the ImmuneEpitope Database. Our analysis suggests that in contrast to full length SwissProt entries, only a minimal portion of experimentally verified T cell epitopes is potentially N- or O-glycosylated (2.26% and 1.22%, respectively). Bayesian analysis of entries extracted from the Autoantigen Database suggests a correlation between N-glycosylation and autoantigenicity. The analysis of random generated sequences shows that glycosylation probability is also affected by peptide length. Our data suggest that the lack of peptide glycosylation, a feature that probably favors effective recognition by T cells, might have resulted in a selective advantage for short peptides to become T cell epitopes. The length of T cell epitopes is at the intersection of curves determining specificity and glycosylation probability. Thus, the range of length of naturally occurring T cell epitopes may ensure the maximum specificity with the minimal glycosylation probability. Conclusion The findings of this bioinformatical approach shed light on fundamental factors that might have shaped adaptive immunity during evolution. Our data suggest that amino acid sequence-based hypo/non-glycosylation of certain segments of proteins might be substantial for determining T cell immunity/autoimmunity.
Collapse
Affiliation(s)
- Tamás G Szabó
- Department of Genetics, Cell- and Immunobiology, Semmelweis University, Nagyvárad tér 4, Budapest, Hungary.
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Nielsen M, Lund O. NN-align. An artificial neural network-based alignment algorithm for MHC class II peptide binding prediction. BMC Bioinformatics 2009; 10:296. [PMID: 19765293 PMCID: PMC2753847 DOI: 10.1186/1471-2105-10-296] [Citation(s) in RCA: 380] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2009] [Accepted: 09/18/2009] [Indexed: 01/03/2023] Open
Abstract
BACKGROUND The major histocompatibility complex (MHC) molecule plays a central role in controlling the adaptive immune response to infections. MHC class I molecules present peptides derived from intracellular proteins to cytotoxic T cells, whereas MHC class II molecules stimulate cellular and humoral immunity through presentation of extracellularly derived peptides to helper T cells. Identification of which peptides will bind a given MHC molecule is thus of great importance for the understanding of host-pathogen interactions, and large efforts have been placed in developing algorithms capable of predicting this binding event. RESULTS Here, we present a novel artificial neural network-based method, NN-align that allows for simultaneous identification of the MHC class II binding core and binding affinity. NN-align is trained using a novel training algorithm that allows for correction of bias in the training data due to redundant binding core representation. Incorporation of information about the residues flanking the peptide-binding core is shown to significantly improve the prediction accuracy. The method is evaluated on a large-scale benchmark consisting of six independent data sets covering 14 human MHC class II alleles, and is demonstrated to outperform other state-of-the-art MHC class II prediction methods. CONCLUSION The NN-align method is competitive with the state-of-the-art MHC class II peptide binding prediction algorithms. The method is publicly available at http://www.cbs.dtu.dk/services/NetMHCII-2.0.
Collapse
Affiliation(s)
- Morten Nielsen
- Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark.
| | | |
Collapse
|
30
|
Flower DR. Advances in Predicting and Manipulating the Immunogenicity of Biotherapeutics and Vaccines. BioDrugs 2009; 23:231-40. [DOI: 10.2165/11317530-000000000-00000] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|