Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Toseland CP, Clayton DJ, McSparron H, Hemsley SL, Blythe MJ, Paine K, Doytchinova IA, Guan P, Hattotuwagama CK, Flower DR. AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data. Immunome Res 2005;1:4. [PMID: 16305757 PMCID: PMC1289288 DOI: 10.1186/1745-7580-1-4] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2005] [Accepted: 10/06/2005] [Indexed: 11/30/2022] Open

For:	Toseland CP, Clayton DJ, McSparron H, Hemsley SL, Blythe MJ, Paine K, Doytchinova IA, Guan P, Hattotuwagama CK, Flower DR. AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data. Immunome Res 2005;1:4. [PMID: 16305757 PMCID: PMC1289288 DOI: 10.1186/1745-7580-1-4] [Citation(s) in RCA: 141] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2005] [Accepted: 10/06/2005] [Indexed: 11/30/2022] Open

Number

Cited by Other Article(s)

Ananya, Panchariya DC, Karthic A, Singh SP, Mani A, Chawade A, Kushwaha S. Vaccine design and development: Exploring the interface with computational biology and AI. Int Rev Immunol 2024:1-20. [PMID: 38982912 DOI: 10.1080/08830185.2024.2374546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Accepted: 06/26/2024] [Indexed: 07/11/2024]

Deng N, Sinha KM, Vilar E. MONET: a database for prediction of neoantigens derived from microsatellite loci. Front Immunol 2024;15:1394593. [PMID: 38835776 PMCID: PMC11148240 DOI: 10.3389/fimmu.2024.1394593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 05/03/2024] [Indexed: 06/06/2024] Open

Barra C, Nilsson JB, Saksager A, Carri I, Deleuran S, Garcia Alvarez HM, Høie MH, Li Y, Clifford JN, Wan YTR, Moreta LS, Nielsen M. In Silico Tools for Predicting Novel Epitopes. Methods Mol Biol 2024;2813:245-280. [PMID: 38888783 DOI: 10.1007/978-1-0716-3890-3_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]

Kumar N, Bajiya N, Patiyal S, Raghava GPS. Multi-perspectives and challenges in identifying B-cell epitopes. Protein Sci 2023;32:e4785. [PMID: 37733481 PMCID: PMC10578127 DOI: 10.1002/pro.4785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 09/11/2023] [Accepted: 09/16/2023] [Indexed: 09/23/2023]

Zhang X, Wu J, Baeza J, Gu K, Zheng Y, Chen S, Zhou Z. DeepTAP: An RNN-based method of TAP-binding peptide prediction in the selection of tumor neoantigens. Comput Biol Med 2023;164:107247. [PMID: 37454505 DOI: 10.1016/j.compbiomed.2023.107247] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 05/31/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]

<i>In silico</i> Research at the Stages of Designing Modern Means for Prevention of Plague (by the Example of Subunit Vaccines). PROBLEMS OF PARTICULARLY DANGEROUS INFECTIONS 2022. [DOI: 10.21055/0370-1069-2022-3-6-13] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]

Kumar S, Kumar GS, Maitra SS, Malý P, Bharadwaj S, Sharma P, Dwivedi VD. Viral informatics: bioinformatics-based solution for managing viral infections. Brief Bioinform 2022;23:6659740. [PMID: 35947964 DOI: 10.1093/bib/bbac326] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/26/2022] [Accepted: 07/18/2022] [Indexed: 11/13/2022] Open

Wang Y, Tang H, Gao C, Ge M, Li Z, Dong Z, Zhao L. Flexibility-aware graph model for accurate epitope identification. Comput Biol Med 2022;149:106064. [DOI: 10.1016/j.compbiomed.2022.106064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 08/05/2022] [Accepted: 08/27/2022] [Indexed: 11/25/2022]

La Marca AF, Lopes RDS, Lotufo ADP, Bartholomeu DC, Minussi CR. BepFAMN: A Method for Linear B-Cell Epitope Predictions Based on Fuzzy-ARTMAP Artificial Neural Network. SENSORS 2022;22:s22114027. [PMID: 35684648 PMCID: PMC9185646 DOI: 10.3390/s22114027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 05/22/2022] [Accepted: 05/24/2022] [Indexed: 12/02/2022]

Abstract

The public health system is extremely dependent on the use of vaccines to immunize the population from a series of infectious and dangerous diseases, preventing the system from collapsing and millions of people dying every year. However, to develop these vaccines and effectively monitor these diseases, it is necessary to use accurate diagnostic methods capable of identifying highly immunogenic regions within a given pathogenic protein. Existing experimental methods are expensive, time-consuming, and require arduous laboratory work, as they require the screening of a large number of potential candidate epitopes, making the methods extremely laborious, especially for application to larger microorganisms. In the last decades, researchers have developed in silico prediction methods, based on machine learning, to identify these markers, to drastically reduce the list of potential candidate epitopes for experimental tests, and, consequently, to reduce the laborious task associated with their mapping. Despite these efforts, the tools and methods still have low accuracy, slow diagnosis, and offline training. Thus, we develop a method to predict B-cell linear epitopes which are based on a Fuzzy-ARTMAP neural network architecture, called BepFAMN (B Epitope Prediction Fuzzy ARTMAP Artificial Neural Network). This was trained using a linear averaging scheme on 15 properties that include an amino acid ratio scale and a set of 14 physicochemical scales. The database used was obtained from the IEDB website, from which the amino acid sequences with the annotations of their positive and negative epitopes were taken. To train and validate the knowledge models, five-fold cross-validation and competition techniques were used. The BepiPred-2.0 database, an independent database, was used for the tests. In our experiment, the validation dataset reached sensitivity = 91.50%, specificity = 91.49%, accuracy = 91.49%, MCC = 0.83, and an area under the curve (AUC) ROC of approximately 0.9289. The result in the testing dataset achieves a significant improvement, with sensitivity = 81.87%, specificity = 74.75%, accuracy = 78.27%, MCC = 0.56, and AOC = 0.7831. These achieved values demonstrate that BepFAMN outperforms all other linear B-cell epitope prediction tools currently used. In addition, the architecture provides mechanisms for online training, which allow the user to find a new B-cell linear epitope, and to improve the model without need to re-train itself with the whole dataset. This fact contributes to a considerable reduction in the number of potential linear epitopes to be experimentally validated, reducing laboratory time and accelerating the development of diagnostic tests, vaccines, and immunotherapeutic approaches.

Collapse

Biological databases and their application. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00021-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Dhusia K, Su Z, Wu Y. A structural-based machine learning method to classify binding affinities between TCR and peptide-MHC complexes. Mol Immunol 2021;139:76-86. [PMID: 34455212 PMCID: PMC10811653 DOI: 10.1016/j.molimm.2021.07.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 07/13/2021] [Accepted: 07/25/2021] [Indexed: 11/27/2022]

Abstract

The activation of T cells is triggered by the interactions of T cell receptors (TCRs) with their epitopes, which are peptides presented by major histocompatibility complex (MHC) on the surfaces of antigen presenting cells (APC). While each TCR can only recognize a specific subset from a large repertoire of peptide-MHC (pMHC) complexes, it is very often that peptides in this subset share little sequence similarity. This is known as the specificity and cross-reactivity of T cells, respectively. The binding affinities between different types of TCRs and pMHC are the major driving force to shape this specificity and cross-reactivity in T cell recognition. The binding affinities, furthermore, are determined by the sequence and structural properties at the interfaces between TCRs and pMHC. Fortunately, a wealth of data on binding and structures of TCR-pMHC interactions becomes publicly accessible in online resources, which offers us the opportunity to develop a random forest classifier for predicting the binding affinities between TCR and pMHC based on the structure of their complexes. Specifically, the structure and sequence of a given complex were projected onto a high-dimensional feature space as the input of the classifier, which was then trained by a large-scale benchmark dataset. Based on the cross-validation results, we found that our machine learning model can predict if the binding affinity of a given TCR-pMHC complex is stronger or weaker than a predefined threshold with an overall accuracy approximately around 75 %. The significance of our prediction was estimated by statistical analysis. Moreover, more than 60 % of binding affinities in the ATLAS database can be successfully classified into groups within the range of 2 kcal/mol. Additionally, we show that TCR-pMHC complexes with strong binding affinity prefer hydrophobic interactions between amino acids with large aromatic rings instead of electrostatic interactions. Our results therefore provide insights to design engineered TCRs which enhance the specificity for their targeted epitopes. Taken together, this method can serve as a useful addition to a suite of existing approaches which study binding between TCR and pMHC.

Collapse

Qiao X, Qu L, Guo Y, Hoshino T. Secondary Structure and Conformational Stability of the Antigen Residues Making Contact with Antibodies. J Phys Chem B 2021;125:11374-11385. [PMID: 34615354 DOI: 10.1021/acs.jpcb.1c05997] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Abstract

Antibodies are crucial biomolecules that bring high therapeutic efficacy in medicine and accurate molecular detection in diagnosis. Many studies have been devoted to analyzing the antigen-antibody interaction from the importance of understanding the antibody recognition mechanism. However, most of the previous studies examined the characteristic of the antibody for interaction. It is also informative to clarify the significant antigen residues contributing to the binding. To characterize the molecular interaction of antigens, we computationally analyzed 350 antigen-antibody complex structures by molecular mechanics (MM) calculations and molecular dynamics (MD) simulations. Based on the MM calculations, the antigen residues contributing to the binding were extracted from all the 350 complexes. The extracted residues are located at the antigen-antibody interface and are responsible for making contact with the antibody. The appearances of the charged polar residues, Asp, Glu, Arg, and Lys, were noticeably large. In contrast, the populations of the hydrophobic residues, Leu, Val, and Ala, were relatively low. The appearance frequencies of the other amino acid residues were almost close to the abundance of general proteins of eukaryotes. The binding score indicated that the hydrophilic interaction was dominant at the antigen-antibody contact instead of the hydrophobic one. The positively charged residues, Arg and Lys, remarkably contributed to the binding compared to the negatively charged ones, Asp and Glu. Considerable contributions were also observed for the noncharged polar residues, Asn and Gln. The analysis of the secondary structures of the extracted antigen residues suggested that there was no marked difference in recognition by antibodies among helix, sheet, turn, and coil. A long helix of the antigen sometimes made contact with antibody complementarity-determining regions, and a large sheet also frequently covered the antibody heavy and light chains. The turn structure was the most popularly observed at the contact with antibody among 350 complexes. Three typical complexes were picked up for each of the four secondary structures. MD simulations were performed to examine the stability of the interfacial structures of the antigens for these 12 complex models. The alterations of secondary structures were monitored through the simulations. The structural fluctuations of the contact residues were low compared with the other domains of antigen molecules. No drastic conversion was observed for every model during the 100 ns simulation. The motions of the interfacial antigen residues were small compared to the other residues on the protein surface. Therefore, diverse molecular conformations are possible for antibody recognition as long as the target areas are polar, nonflexible, and protruding on the protein surface.

Collapse

Ramchandani R, Hossenbaccus L, Ellis AK. Immunoregulatory T cell epitope peptides for the treatment of allergic disease. Immunotherapy 2021;13:1283-1291. [PMID: 34558985 DOI: 10.2217/imt-2021-0133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Rawal K, Sinha R, Abbasi BA, Chaudhary A, Nath SK, Kumari P, Preeti P, Saraf D, Singh S, Mishra K, Gupta P, Mishra A, Sharma T, Gupta S, Singh P, Sood S, Subramani P, Dubey AK, Strych U, Hotez PJ, Bottazzi ME. Identification of vaccine targets in pathogens and design of a vaccine using computational approaches. Sci Rep 2021;11:17626. [PMID: 34475453 PMCID: PMC8413327 DOI: 10.1038/s41598-021-96863-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 08/10/2021] [Indexed: 02/07/2023] Open

Affiliation(s)

Kamal Rawal Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India.
Robin Sinha Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Bilal Ahmed Abbasi Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Amit Chaudhary Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Swarsat Kaushik Nath Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Priya Kumari Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
P Preeti Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Devansh Saraf Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Shachee Singh Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Kartik Mishra Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Pranjay Gupta Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Astha Mishra Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Trapti Sharma Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Srijanee Gupta Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Prashant Singh Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Shriya Sood Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Preeti Subramani Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Aman Kumar Dubey Centre for Computational Biology and Bioinformatics, Amity Institute of Biotechnology, Amity University Uttar Pradesh, Noida, India
Ulrich Strych Texas Children's Hospital Center for Vaccine Development, Departments of Pediatrics and Molecular Virology and Microbiology, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA
Peter J Hotez Texas Children's Hospital Center for Vaccine Development, Departments of Pediatrics and Molecular Virology and Microbiology, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA Department of Biology, Baylor University, Waco, TX, USA
Maria Elena Bottazzi Texas Children's Hospital Center for Vaccine Development, Departments of Pediatrics and Molecular Virology and Microbiology, National School of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA Department of Biology, Baylor University, Waco, TX, USA

Collapse

Jiang L, Yu H, Li J, Tang J, Guo Y, Guo F. Predicting MHC class I binder: existing approaches and a novel recurrent neural network solution. Brief Bioinform 2021;22:6299205. [PMID: 34131696 DOI: 10.1093/bib/bbab216] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 05/14/2021] [Accepted: 05/17/2021] [Indexed: 01/04/2023] Open

Data curation to improve the pattern recognition performance of B-cell epitope prediction by support vector machine. PURE APPL CHEM 2021. [DOI: 10.1515/pac-2020-1107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Galanis KA, Nastou KC, Papandreou NC, Petichakis GN, Pigis DG, Iconomidou VA. Linear B-Cell Epitope Prediction for In Silico Vaccine Design: A Performance Review of Methods Available via Command-Line Interface. Int J Mol Sci 2021;22:3210. [PMID: 33809918 PMCID: PMC8004178 DOI: 10.3390/ijms22063210] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 03/15/2021] [Accepted: 03/19/2021] [Indexed: 12/17/2022] Open

Predicting Immunogenicity Risk in Biopharmaceuticals. Symmetry (Basel) 2021. [DOI: 10.3390/sym13030388] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Anand R, Biswal S, Bhatt R, Tiwary BN. Computational perspectives revealed prospective vaccine candidates from five structural proteins of novel SARS corona virus 2019 (SARS-CoV-2). PeerJ 2020;8:e9855. [PMID: 33062414 PMCID: PMC7531350 DOI: 10.7717/peerj.9855] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 08/11/2020] [Indexed: 12/30/2022] Open

Abstract

Background

The present pandemic COVID-19 is caused by SARS-CoV-2, a single-stranded positive-sense RNA virus from the Coronaviridae family. Due to a lack of antiviral drugs, vaccines against the virus are urgently required.

Methods

In this study, validated computational approaches were used to identify peptide-based epitopes from six structural proteins having antigenic properties. The Net-CTL 1.2 tool was used for the prediction of CD8⁺ T-cell epitopes, while the robust tools Bepi-Pred 2 and LBtope was employed for the identification of linear B-cell epitopes. Docking studies of the identified epitopes were performed using HADDOCK 2.4 and the structures were visualized by Discovery Studio and LigPlot⁺. Antigenicity, immunogenicity, conservancy, population coverage and allergenicity of the predicted epitopes were determined by the bioinformatics tools like VaxiJen v2.0 server, the Immune Epitope Database tools and AllerTOP v.2.0, AllergenFP 1.0 and ElliPro.

Results

The predicted T cell and linear B-cell epitopes were considered as prime vaccine targets in case they passed the requisite parameters like antigenicity, immunogenicity, conservancy, non-allergenicity and broad range of population coverage. Among the predicted CD8+ T cell epitopes, potential vaccine targets from surface glycoprotein were; YQPYRVVVL, PYRVVVLSF, GVYFASTEK, QLTPTWRVY, and those from ORF3a protein were LKKRWQLAL, HVTFFIYNK. Similarly, RFLYIIKLI, LTWICLLQF from membrane protein and three epitopes viz; SPRWYFYYL, TWLTYTGAI, KTFPPTEPK from nucleocapsid phosphoprotein were the superior vaccine targets observed in our study. The negative values of HADDOCK and Z scores obtained for the best cluster indicated the potential of the epitopes as suitable vaccine candidates. Analysis of the 3D and 2D interaction diagrams of best cluster produced by HADDOCK 2.4 displayed the binding interaction of leading T cell epitopes within the MHC-1 peptide binding clefts. On the other hand, among linear B cell epitopes the majority of potential vaccine targets were from nucleocapsid protein, viz; ⁵⁹⁻HGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLS⁻¹⁰⁵, ²²⁷⁻LNQLE SKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATK⁻²⁶⁶, ³⁻DNGPQNQRNAPRITFGGP⁻²⁰, ²⁹⁻GERSGARSKQRRPQGL⁻⁴⁵. Two other prime vaccine targets, ³⁷⁰⁻NSASFSTFKCYGVSPTKLNDLCFTNV⁻³⁹⁵ and ²⁶⁰⁻AGAAAYYVGYLQPRT⁻²⁷⁴ were identified in the spike protein. The potential B-cell conformational epitopes were predicted on the basis of a higher protrusion index indicating greater solvent accessibility. These conformational epitopes were of various lengths and belonged to spike, ORF3a, membrane and nucleocapsid proteins.

Conclusions

Taken together, eleven T cell epitopes, seven B cell linear epitopes and ten B cell conformational epitopes were identified from five structural proteins of SARS-CoV-2 using advanced computational tools. These potential vaccine candidates may provide important timely directives for an effective vaccine against SARS-CoV-2.

Collapse

Zhou WJ, Qu Z, Song CY, Sun Y, Lai AL, Luo MY, Ying YZ, Meng H, Liang Z, He YJ, Li YH, Liu J. NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020;2019:5670755. [PMID: 31819989 PMCID: PMC6901387 DOI: 10.1093/database/baz128] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2018] [Revised: 10/11/2019] [Accepted: 10/15/2019] [Indexed: 12/13/2022]

Graves J, Byerly J, Priego E, Makkapati N, Parish SV, Medellin B, Berrondo M. A Review of Deep Learning Methods for Antibodies. Antibodies (Basel) 2020;9:E12. [PMID: 32354020 PMCID: PMC7344881 DOI: 10.3390/antib9020012] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/15/2020] [Accepted: 04/16/2020] [Indexed: 01/09/2023] Open

Ramana J, Mehla K. Immunoinformatics and Epitope Prediction. Methods Mol Biol 2020;2131:155-171. [PMID: 32162252 DOI: 10.1007/978-1-0716-0389-5_6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Quinzo MJ, Lafuente EM, Zuluaga P, Flower DR, Reche PA. Computational assembly of a human Cytomegalovirus vaccine upon experimental epitope legacy. BMC Bioinformatics 2019;20:476. [PMID: 31823715 PMCID: PMC6905002 DOI: 10.1186/s12859-019-3052-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 08/23/2019] [Indexed: 01/05/2023] Open

Karapetyan AR, Chaipan C, Winkelbach K, Wimberger S, Jeong JS, Joshi B, Stein RB, Underwood D, Castle JC, van Dijk M, Seibert V. TCR Fingerprinting and Off-Target Peptide Identification. Front Immunol 2019;10:2501. [PMID: 31695703 PMCID: PMC6817589 DOI: 10.3389/fimmu.2019.02501] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Accepted: 10/07/2019] [Indexed: 01/06/2023] Open

Abstract

Adoptive T cell therapy using patient T cells redirected to recognize tumor-specific antigens by expressing genetically engineered high-affinity T-cell receptors (TCRs) has therapeutic potential for melanoma and other solid tumors. Clinical trials implementing genetically modified TCRs in melanoma patients have raised concerns regarding off-target toxicities resulting in lethal destruction of healthy tissue, highlighting the urgency of assessing which off-target peptides can be recognized by a TCR. As a model system we used the clinically efficacious NY-ESO-1-specific TCR C²⁵⁹, which recognizes the peptide epitope SLLMWITQC presented by HLA-A^*02:01. We investigated which amino acids at each position enable a TCR interaction by sequentially replacing every amino acid position outside of anchor positions 2 and 9 with all 19 possible alternative amino acids, resulting in 134 peptides (133 altered peptides plus epitope peptide). Each peptide was individually evaluated using three different in vitro assays: binding of the NY-ESO^c259 TCR to the peptide, peptide-dependent activation of TCR-expressing cells, and killing of peptide-presenting target cells. To represent the TCR recognition kernel, we defined Position Weight Matrices (PWMs) for each assay by assigning normalized measurements to each of the 20 amino acids in each position. To predict potential off-target peptides, we applied a novel algorithm projecting the PWM-defined kernel into the human proteome, scoring NY-ESO^c259 TCR recognition of 336,921 predicted human HLA-A^*02:01 binding 9-mer peptides. Of the 12 peptides with high predicted score, we confirmed 7 (including NY-ESO-1 antigen SLLMWITQC) strongly activate human primary NY-ESO^c259-expressing T cells. These off-target peptides include peptides with up to 7 amino acid changes (of 9 possible), which could not be predicted using the recognition motif as determined by alanine scans. Thus, this replacement scan assay determines the “TCR fingerprint” and, when coupled with the algorithm applied to the database of human 9-mer peptides binding to HLA-A^*02:01, enables the identification of potential off-target antigens and the tissues where they are expressed. This platform enables both screening of multiple TCRs to identify the best candidate for clinical development and identification of TCR-specific cross-reactive peptide recognition and constitutes an improved methodology for the identification of potential off-target peptides presented on MHC class I molecules.

Collapse

Bahrami AA, Payandeh Z, Khalili S, Zakeri A, Bandehpour M. Immunoinformatics: In Silico Approaches and Computational Design of a Multi-epitope, Immunogenic Protein. Int Rev Immunol 2019;38:307-322. [PMID: 31478759 DOI: 10.1080/08830185.2019.1657426] [Citation(s) in RCA: 57] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Li Z, Miao Q, Yan F, Meng Y, Zhou P. Machine Learning in Quantitative Protein–peptide Affinity Prediction: Implications for Therapeutic Peptide Design. Curr Drug Metab 2019;20:170-176. [DOI: 10.2174/1389200219666181012151944] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2017] [Revised: 11/07/2017] [Accepted: 08/20/2018] [Indexed: 01/03/2023]

Dingman R, Balu-Iyer SV. Immunogenicity of Protein Pharmaceuticals. J Pharm Sci 2019;108:1637-1654. [PMID: 30599169 PMCID: PMC6720129 DOI: 10.1016/j.xphs.2018.12.014] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 12/19/2018] [Accepted: 12/20/2018] [Indexed: 02/07/2023]

Cretich M, Gori A, D'Annessa I, Chiari M, Colombo G. Peptides for Infectious Diseases: From Probe Design to Diagnostic Microarrays. Antibodies (Basel) 2019;8:E23. [PMID: 31544829 PMCID: PMC6640701 DOI: 10.3390/antib8010023] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 02/28/2019] [Accepted: 03/04/2019] [Indexed: 01/03/2023] Open

Bioinformatics Applications in Advancing Animal Virus Research. RECENT ADVANCES IN ANIMAL VIROLOGY 2019. [PMCID: PMC7121192 DOI: 10.1007/978-981-13-9073-9_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Computational B-cell epitope identification and production of neutralizing murine antibodies against Atroxlysin-I. Sci Rep 2018;8:14904. [PMID: 30297733 PMCID: PMC6175905 DOI: 10.1038/s41598-018-33298-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 09/03/2018] [Indexed: 11/08/2022] Open

Usmani SS, Kumar R, Bhalla S, Kumar V, Raghava GPS. In Silico Tools and Databases for Designing Peptide-Based Vaccine and Drugs. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2018;112:221-263. [PMID: 29680238 DOI: 10.1016/bs.apcsb.2018.01.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Fundamentals and Methods for T- and B-Cell Epitope Prediction. J Immunol Res 2017;2017:2680160. [PMID: 29445754 PMCID: PMC5763123 DOI: 10.1155/2017/2680160] [Citation(s) in RCA: 284] [Impact Index Per Article: 40.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 11/22/2017] [Accepted: 11/27/2017] [Indexed: 12/25/2022] Open

Sher G, Zhi D, Zhang S. DRREP: deep ridge regressed epitope predictor. BMC Genomics 2017;18:676. [PMID: 28984193 PMCID: PMC5629616 DOI: 10.1186/s12864-017-4024-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Gourlay L, Peri C, Bolognesi M, Colombo G. Structure and Computation in Immunoreagent Design: From Diagnostics to Vaccines. Trends Biotechnol 2017;35:1208-1220. [PMID: 28739221 DOI: 10.1016/j.tibtech.2017.06.018] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Revised: 06/28/2017] [Accepted: 06/30/2017] [Indexed: 11/26/2022]

Vang YS, Xie X. HLA class I binding prediction via convolutional neural networks. Bioinformatics 2017;33:2658-2665. [DOI: 10.1093/bioinformatics/btx264] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2016] [Accepted: 04/18/2017] [Indexed: 01/19/2023] Open

Borrman T, Cimons J, Cosiano M, Purcaro M, Pierce BG, Baker BM, Weng Z. ATLAS: A database linking binding affinities with structures for wild-type and mutant TCR-pMHC complexes. Proteins 2017;85:908-916. [PMID: 28160322 DOI: 10.1002/prot.25260] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 01/17/2017] [Accepted: 01/23/2017] [Indexed: 11/07/2022]

Potocnakova L, Bhide M, Pulzova LB. An Introduction to B-Cell Epitope Mapping and In Silico Epitope Prediction. J Immunol Res 2016;2016:6760830. [PMID: 28127568 PMCID: PMC5227168 DOI: 10.1155/2016/6760830] [Citation(s) in RCA: 198] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 11/21/2016] [Accepted: 12/13/2016] [Indexed: 01/09/2023] Open

Jandrlić DR. SVM and SVR-based MHC-binding prediction using a mathematical presentation of peptide sequences. Comput Biol Chem 2016;65:117-127. [PMID: 27816828 DOI: 10.1016/j.compbiolchem.2016.10.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 09/16/2016] [Accepted: 10/24/2016] [Indexed: 11/16/2022]

sNebula, a network-based algorithm to predict binding between human leukocyte antigens and peptides. Sci Rep 2016;6:32115. [PMID: 27558848 PMCID: PMC4997263 DOI: 10.1038/srep32115] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2016] [Accepted: 08/02/2016] [Indexed: 12/19/2022] Open

Hackl H, Charoentong P, Finotello F, Trajanoski Z. Computational genomics tools for dissecting tumour–immune cell interactions. Nat Rev Genet 2016;17:441-58. [DOI: 10.1038/nrg.2016.67] [Citation(s) in RCA: 188] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]

Wang S, Guo L, Liu D, Liu W, Wu Y. HLAsupE: an integrated database of HLA supertype-specific epitopes to aid in the development of vaccines with broad coverage of the human population. BMC Immunol 2016;17:17. [PMID: 27307005 PMCID: PMC4910211 DOI: 10.1186/s12865-016-0156-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Accepted: 06/08/2016] [Indexed: 01/24/2023] Open

Kozlova E, Viart B, de Avila R, Felicori L, Chavez-Olortegui C. Classification epitopes in groups based on their protein family. BMC Bioinformatics 2015;16 Suppl 19:S7. [PMID: 26696329 PMCID: PMC4686779 DOI: 10.1186/1471-2105-16-s19-s7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Abstract

Background

The humoral immune system response is based on the interaction between antibodies and antigens for the clearance of pathogens and foreign molecules. The interaction between these proteins occurs at specific positions known as antigenic determinants or B-cell epitopes. The experimental identification of epitopes is costly and time consuming. Therefore the use of in silico methods, to help discover new epitopes, is an appealing alternative due the importance of biomedical applications such as vaccine design, disease diagnostic, anti-venoms and immune-therapeutics. However, the performance of predictions is not optimal been around 70% of accuracy. Further research could increase our understanding of the biochemical and structural properties that characterize a B-cell epitope.

Results

We investigated the possibility of linear epitopes from the same protein family to share common properties. This hypothesis led us to analyze physico-chemical (PCP) and predicted secondary structure (PSS) features of a curated dataset of epitope sequences available in the literature belonging to two different groups of antigens (metalloproteinases and neurotoxins). We discovered statistically significant parameters with data mining techniques which allow us to distinguish neurotoxin from metalloproteinase and these two from random sequences. After a five cross fold validation we found that PCP based models obtained area under the curve values (AUC) and accuracy above 0.9 for regression, decision tree and support vector machine.

Conclusions

We demonstrated that antigen's family can be inferred from properties within a single group of linear epitopes (metalloproteinases or neurotoxins). Also we discovered the characteristics that represent these two epitope groups including their similarities and differences with random peptides and their respective amino acid sequence. These findings open new perspectives to improve epitope prediction by considering the specific antigen's protein family. We expect that these findings will help to improve current computational mapping methods based on physico-chemical due it's potential application during epitope discovery.

Collapse

Backert L, Kohlbacher O. Immunoinformatics and epitope prediction in the age of genomic medicine. Genome Med 2015;7:119. [PMID: 26589500 PMCID: PMC4654883 DOI: 10.1186/s13073-015-0245-0] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open

Review on the identification and role of Toxoplasma gondii antigenic epitopes. Parasitol Res 2015;115:459-68. [PMID: 26581372 DOI: 10.1007/s00436-015-4824-1] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 11/10/2015] [Indexed: 12/12/2022]

Molero-Abraham M, Glutting JP, Flower DR, Lafuente EM, Reche PA. EPIPOX: Immunoinformatic Characterization of the Shared T-Cell Epitome between Variola Virus and Related Pathogenic Orthopoxviruses. J Immunol Res 2015;2015:738020. [PMID: 26605344 PMCID: PMC4641182 DOI: 10.1155/2015/738020] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 09/08/2015] [Accepted: 10/01/2015] [Indexed: 11/26/2022] Open

Luo H, Ye H, Ng H, Shi L, Tong W, Mattes W, Mendrick D, Hong H. Understanding and predicting binding between human leukocyte antigens (HLAs) and peptides by network analysis. BMC Bioinformatics 2015;16 Suppl 13:S9. [PMID: 26424483 PMCID: PMC4597169 DOI: 10.1186/1471-2105-16-s13-s9] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Abstract

BACKGROUND

As the major histocompatibility complex (MHC), human leukocyte antigens (HLAs) are one of the most polymorphic genes in humans. Patients carrying certain HLA alleles may develop adverse drug reactions (ADRs) after taking specific drugs. Peptides play an important role in HLA related ADRs as they are the necessary co-binders of HLAs with drugs. Many experimental data have been generated for understanding HLA-peptide binding. However, efficiently utilizing the data for understanding and accurately predicting HLA-peptide binding is challenging. Therefore, we developed a network analysis based method to understand and predict HLA-peptide binding.

METHODS

Qualitative Class I HLA-peptide binding data were harvested and prepared from four major databases. An HLA-peptide binding network was constructed from this dataset and modules were identified by the fast greedy modularity optimization algorithm. To examine the significance of signals in the yielded models, the modularity was compared with the modularity values generated from 1,000 random networks. The peptides and HLAs in the modules were characterized by similarity analysis. The neighbor-edges based and unbiased leverage algorithm (Nebula) was developed for predicting HLA-peptide binding. Leave-one-out (LOO) validations and two-fold cross-validations were conducted to evaluate the performance of Nebula using the constructed HLA-peptide binding network.

RESULTS

Nine modules were identified from analyzing the HLA-peptide binding network with a highest modularity compared to all the random networks. Peptide length and functional side chains of amino acids at certain positions of the peptides were different among the modules. HLA sequences were module dependent to some extent. Nebula archived an overall prediction accuracy of 0.816 in the LOO validations and average accuracy of 0.795 in the two-fold cross-validations and outperformed the method reported in the literature.

CONCLUSIONS

Network analysis is a useful approach for analyzing large and sparse datasets such as the HLA-peptide binding dataset. The modules identified from the network analysis clustered peptides and HLAs with similar sequences and properties of amino acids. Nebula performed well in the predictions of HLA-peptide binding. We demonstrated that network analysis coupled with Nebula is an efficient approach to understand and predict HLA-peptide binding interactions and thus, could further our understanding of ADRs.

Collapse

Databases for T-cell epitopes. Methods Mol Biol 2015;1184:123-34. [PMID: 25048121 DOI: 10.1007/978-1-4939-1115-8_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Bunkute E, Cummins C, Crofts FJ, Bunce G, Nabney IT, Flower DR. PIP-DB: the Protein Isoelectric Point database. Bioinformatics 2014;31:295-6. [PMID: 25252779 DOI: 10.1093/bioinformatics/btu637] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Khalili S, Jahangiri A, Borna H, Ahmadi Zanoos K, Amani J. Computational vaccinology and epitope vaccine design by immunoinformatics. Acta Microbiol Immunol Hung 2014;61:285-307. [PMID: 25261943 DOI: 10.1556/amicr.61.2014.3.4] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Kim Y, Sidney J, Buus S, Sette A, Nielsen M, Peters B. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions. BMC Bioinformatics 2014;15:241. [PMID: 25017736 PMCID: PMC4111843 DOI: 10.1186/1471-2105-15-241] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2014] [Accepted: 07/08/2014] [Indexed: 11/23/2022] Open

Abstract

Background

It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set.

Results

We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates.

Conclusion

It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-241) contains supplementary material, which is available to authorized users.

Collapse